All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/6] Spectre v2 updates
@ 2018-02-10 23:39 David Woodhouse
  2018-02-10 23:39 ` [PATCH v2 1/6] x86/speculation: Update Speculation Control microcode blacklist David Woodhouse
                   ` (8 more replies)
  0 siblings, 9 replies; 74+ messages in thread
From: David Woodhouse @ 2018-02-10 23:39 UTC (permalink / raw)
  To: tglx, karahmed, sironi, x86, kvm, torvalds, pbonzini,
	linux-kernel, bp, peterz, jmattson, rkrcmar, arjan.van.de.ven,
	dave.hansen

Using retpoline ensures the kernel is safe because it doesn't contain
any indirect branches, but firmware still can — and we make calls into
firmware at runtime. Where the IBRS microcode support is available, use
that before calling into firmware.

While doing that, I noticed that we were calling C functions without
telling the compiler about the call-clobbered registers. Stop that.

This also contains the always_inline fix for the performance problem
introduced by retpoline in KVM code, and fixes some other issues with
the per-vCPU KVM handling for the SPEC_CTRL MSR.

Finally, update the microcode blacklist to reflect the latest
information from Intel.

v2: Drop IBRS_ALL patch for the time being
    Add KVM MSR fixes (karahmed)
    Update microcode blacklist



David Woodhouse (4):
  x86/speculation: Update Speculation Control microcode blacklist
  Revert "x86/speculation: Simplify
    indirect_branch_prediction_barrier()"
  KVM: x86: Reduce retpoline performance impact in
    slot_handle_level_range()
  x86/speculation: Use IBRS if available before calling into firmware

KarimAllah Ahmed (2):
  X86/nVMX: Properly set spec_ctrl and pred_cmd before merging MSRs
  KVM/nVMX: Set the CPU_BASED_USE_MSR_BITMAPS if we have a valid L02 MSR
    bitmap

 arch/x86/include/asm/apm.h           |  6 ++++++
 arch/x86/include/asm/cpufeatures.h   |  1 +
 arch/x86/include/asm/efi.h           | 17 +++++++++++++++--
 arch/x86/include/asm/nospec-branch.h | 32 ++++++++++++++++++++++++++++----
 arch/x86/include/asm/processor.h     |  3 ---
 arch/x86/kernel/cpu/bugs.c           | 18 +++++++++++-------
 arch/x86/kernel/cpu/intel.c          |  4 ----
 arch/x86/kvm/mmu.c                   | 10 +++++-----
 arch/x86/kvm/vmx.c                   |  7 ++++---
 drivers/watchdog/hpwdt.c             |  3 +++
 10 files changed, 73 insertions(+), 28 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH v2 1/6] x86/speculation: Update Speculation Control microcode blacklist
  2018-02-10 23:39 [PATCH v2 0/6] Spectre v2 updates David Woodhouse
@ 2018-02-10 23:39 ` David Woodhouse
  2018-02-11 12:08   ` [tip:x86/pti] " tip-bot for David Woodhouse
                     ` (2 more replies)
  2018-02-10 23:39 ` [PATCH v2 2/6] Revert "x86/speculation: Simplify indirect_branch_prediction_barrier()" David Woodhouse
                   ` (7 subsequent siblings)
  8 siblings, 3 replies; 74+ messages in thread
From: David Woodhouse @ 2018-02-10 23:39 UTC (permalink / raw)
  To: tglx, karahmed, sironi, x86, kvm, torvalds, pbonzini,
	linux-kernel, bp, peterz, jmattson, rkrcmar, arjan.van.de.ven,
	dave.hansen

Intel have retroactively blessed the 0xc2 microcode on Skylake mobile
and desktop parts, and the Gemini Lake 0x22 microcode is apparently fine
too. We blacklisted the latter purely because it was present with all
the other problematic ones in the 2018-01-08 release, but now it's
explicitly listed as OK.

We still list 0x84 for the various Kaby Lake / Coffee Lake parts, as
that appeared in one version of the blacklist and then reverted to
0x80 again. We can change it if 0x84 is actually announced to be safe.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/kernel/cpu/intel.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 319bf98..f73b814 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -123,8 +123,6 @@ static const struct sku_microcode spectre_bad_microcodes[] = {
 	{ INTEL_FAM6_KABYLAKE_MOBILE,	0x09,	0x84 },
 	{ INTEL_FAM6_SKYLAKE_X,		0x03,	0x0100013e },
 	{ INTEL_FAM6_SKYLAKE_X,		0x04,	0x0200003c },
-	{ INTEL_FAM6_SKYLAKE_MOBILE,	0x03,	0xc2 },
-	{ INTEL_FAM6_SKYLAKE_DESKTOP,	0x03,	0xc2 },
 	{ INTEL_FAM6_BROADWELL_CORE,	0x04,	0x28 },
 	{ INTEL_FAM6_BROADWELL_GT3E,	0x01,	0x1b },
 	{ INTEL_FAM6_BROADWELL_XEON_D,	0x02,	0x14 },
@@ -136,8 +134,6 @@ static const struct sku_microcode spectre_bad_microcodes[] = {
 	{ INTEL_FAM6_HASWELL_X,		0x02,	0x3b },
 	{ INTEL_FAM6_HASWELL_X,		0x04,	0x10 },
 	{ INTEL_FAM6_IVYBRIDGE_X,	0x04,	0x42a },
-	/* Updated in the 20180108 release; blacklist until we know otherwise */
-	{ INTEL_FAM6_ATOM_GEMINI_LAKE,	0x01,	0x22 },
 	/* Observed in the wild */
 	{ INTEL_FAM6_SANDYBRIDGE_X,	0x06,	0x61b },
 	{ INTEL_FAM6_SANDYBRIDGE_X,	0x07,	0x712 },
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v2 2/6] Revert "x86/speculation: Simplify indirect_branch_prediction_barrier()"
  2018-02-10 23:39 [PATCH v2 0/6] Spectre v2 updates David Woodhouse
  2018-02-10 23:39 ` [PATCH v2 1/6] x86/speculation: Update Speculation Control microcode blacklist David Woodhouse
@ 2018-02-10 23:39 ` David Woodhouse
  2018-02-11 12:09   ` [tip:x86/pti] " tip-bot for David Woodhouse
  2018-02-13  8:58   ` tip-bot for David Woodhouse
  2018-02-10 23:39 ` [PATCH v2 3/6] KVM: x86: Reduce retpoline performance impact in slot_handle_level_range() David Woodhouse
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 74+ messages in thread
From: David Woodhouse @ 2018-02-10 23:39 UTC (permalink / raw)
  To: tglx, karahmed, sironi, x86, kvm, torvalds, pbonzini,
	linux-kernel, bp, peterz, jmattson, rkrcmar, arjan.van.de.ven,
	dave.hansen

This reverts commit 64e16720ea0879f8ab4547e3b9758936d483909b.

We cannot call C functions like that, without marking all the
call-clobbered registers as, well, clobbered. We might have got away
with it for now because the __ibp_barrier() function was *fairly*
unlikely to actually use any other registers. But no. Just no.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/nospec-branch.h | 13 +++++++++----
 arch/x86/include/asm/processor.h     |  3 ---
 arch/x86/kernel/cpu/bugs.c           |  6 ------
 3 files changed, 9 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 4d57894..300cc15 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -164,10 +164,15 @@ static inline void vmexit_fill_RSB(void)
 
 static inline void indirect_branch_prediction_barrier(void)
 {
-	alternative_input("",
-			  "call __ibp_barrier",
-			  X86_FEATURE_USE_IBPB,
-			  ASM_NO_INPUT_CLOBBER("eax", "ecx", "edx", "memory"));
+	asm volatile(ALTERNATIVE("",
+				 "movl %[msr], %%ecx\n\t"
+				 "movl %[val], %%eax\n\t"
+				 "movl $0, %%edx\n\t"
+				 "wrmsr",
+				 X86_FEATURE_USE_IBPB)
+		     : : [msr] "i" (MSR_IA32_PRED_CMD),
+			 [val] "i" (PRED_CMD_IBPB)
+		     : "eax", "ecx", "edx", "memory");
 }
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 513f960..99799fb 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -969,7 +969,4 @@ bool xen_set_default_idle(void);
 
 void stop_this_cpu(void *dummy);
 void df_debug(struct pt_regs *regs, long error_code);
-
-void __ibp_barrier(void);
-
 #endif /* _ASM_X86_PROCESSOR_H */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 71949bf..61152aa 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -337,9 +337,3 @@ ssize_t cpu_show_spectre_v2(struct device *dev,
 		       spectre_v2_module_string());
 }
 #endif
-
-void __ibp_barrier(void)
-{
-	__wrmsr(MSR_IA32_PRED_CMD, PRED_CMD_IBPB, 0);
-}
-EXPORT_SYMBOL_GPL(__ibp_barrier);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v2 3/6] KVM: x86: Reduce retpoline performance impact in slot_handle_level_range()
  2018-02-10 23:39 [PATCH v2 0/6] Spectre v2 updates David Woodhouse
  2018-02-10 23:39 ` [PATCH v2 1/6] x86/speculation: Update Speculation Control microcode blacklist David Woodhouse
  2018-02-10 23:39 ` [PATCH v2 2/6] Revert "x86/speculation: Simplify indirect_branch_prediction_barrier()" David Woodhouse
@ 2018-02-10 23:39 ` David Woodhouse
  2018-02-11 12:09   ` [tip:x86/pti] KVM/x86: Reduce retpoline performance impact in slot_handle_level_range(), by always inlining iterator helper methods tip-bot for David Woodhouse
  2018-02-13  8:58   ` tip-bot for David Woodhouse
  2018-02-10 23:39 ` [PATCH v2 4/6] X86/nVMX: Properly set spec_ctrl and pred_cmd before merging MSRs David Woodhouse
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 74+ messages in thread
From: David Woodhouse @ 2018-02-10 23:39 UTC (permalink / raw)
  To: tglx, karahmed, sironi, x86, kvm, torvalds, pbonzini,
	linux-kernel, bp, peterz, jmattson, rkrcmar, arjan.van.de.ven,
	dave.hansen

With retpoline, tight loops of "call this function for every XXX" are
very much pessimised by taking a prediction miss *every* time. This one
is by far the biggest contributor to the guest launch time with retpoline.

By marking the iterator slot_handle_…() functions always_inline, we can
ensure that the indirect function call can be optimised away into a
direct call and it actually generates slightly smaller code because
some of the other conditionals can get optimised away too.

Performance is now pretty close to what we see with nospectre_v2 on
the command line.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Filippo Sironi <sironi@amazon.de>
Tested-by: Filippo Sironi <sironi@amazon.de>
---
 arch/x86/kvm/mmu.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 2b8eb4d..cc83bdc 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -5058,7 +5058,7 @@ void kvm_mmu_uninit_vm(struct kvm *kvm)
 typedef bool (*slot_level_handler) (struct kvm *kvm, struct kvm_rmap_head *rmap_head);
 
 /* The caller should hold mmu-lock before calling this function. */
-static bool
+static __always_inline bool
 slot_handle_level_range(struct kvm *kvm, struct kvm_memory_slot *memslot,
 			slot_level_handler fn, int start_level, int end_level,
 			gfn_t start_gfn, gfn_t end_gfn, bool lock_flush_tlb)
@@ -5088,7 +5088,7 @@ slot_handle_level_range(struct kvm *kvm, struct kvm_memory_slot *memslot,
 	return flush;
 }
 
-static bool
+static __always_inline bool
 slot_handle_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
 		  slot_level_handler fn, int start_level, int end_level,
 		  bool lock_flush_tlb)
@@ -5099,7 +5099,7 @@ slot_handle_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
 			lock_flush_tlb);
 }
 
-static bool
+static __always_inline bool
 slot_handle_all_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
 		      slot_level_handler fn, bool lock_flush_tlb)
 {
@@ -5107,7 +5107,7 @@ slot_handle_all_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
 				 PT_MAX_HUGEPAGE_LEVEL, lock_flush_tlb);
 }
 
-static bool
+static __always_inline bool
 slot_handle_large_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
 			slot_level_handler fn, bool lock_flush_tlb)
 {
@@ -5115,7 +5115,7 @@ slot_handle_large_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
 				 PT_MAX_HUGEPAGE_LEVEL, lock_flush_tlb);
 }
 
-static bool
+static __always_inline bool
 slot_handle_leaf(struct kvm *kvm, struct kvm_memory_slot *memslot,
 		 slot_level_handler fn, bool lock_flush_tlb)
 {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v2 4/6] X86/nVMX: Properly set spec_ctrl and pred_cmd before merging MSRs
  2018-02-10 23:39 [PATCH v2 0/6] Spectre v2 updates David Woodhouse
                   ` (2 preceding siblings ...)
  2018-02-10 23:39 ` [PATCH v2 3/6] KVM: x86: Reduce retpoline performance impact in slot_handle_level_range() David Woodhouse
@ 2018-02-10 23:39 ` David Woodhouse
  2018-02-11 12:10   ` [tip:x86/pti] " tip-bot for KarimAllah Ahmed
  2018-02-13  8:59   ` tip-bot for KarimAllah Ahmed
  2018-02-10 23:39 ` [PATCH v2 5/6] KVM/nVMX: Set the CPU_BASED_USE_MSR_BITMAPS if we have a valid L02 MSR bitmap David Woodhouse
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 74+ messages in thread
From: David Woodhouse @ 2018-02-10 23:39 UTC (permalink / raw)
  To: tglx, karahmed, sironi, x86, kvm, torvalds, pbonzini,
	linux-kernel, bp, peterz, jmattson, rkrcmar, arjan.van.de.ven,
	dave.hansen

From: KarimAllah Ahmed <karahmed@amazon.de>

These two variables should check whether SPEC_CTRL and PRED_CMD are
supposed to be passed through to L2 guests or not. While
msr_write_intercepted_l01 would return 'true' if it is not passed through.

So just invert the result of msr_write_intercepted_l01 to implement the
correct semantics.

Fixes: 086e7d4118cc ("KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL")
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Jim Mattson <jmattson@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 arch/x86/kvm/vmx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bee4c49..599179b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10219,8 +10219,8 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu,
 	 *    updated to reflect this when L1 (or its L2s) actually write to
 	 *    the MSR.
 	 */
-	bool pred_cmd = msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD);
-	bool spec_ctrl = msr_write_intercepted_l01(vcpu, MSR_IA32_SPEC_CTRL);
+	bool pred_cmd = !msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD);
+	bool spec_ctrl = !msr_write_intercepted_l01(vcpu, MSR_IA32_SPEC_CTRL);
 
 	if (!nested_cpu_has_virt_x2apic_mode(vmcs12) &&
 	    !pred_cmd && !spec_ctrl)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v2 5/6] KVM/nVMX: Set the CPU_BASED_USE_MSR_BITMAPS if we have a valid L02 MSR bitmap
  2018-02-10 23:39 [PATCH v2 0/6] Spectre v2 updates David Woodhouse
                   ` (3 preceding siblings ...)
  2018-02-10 23:39 ` [PATCH v2 4/6] X86/nVMX: Properly set spec_ctrl and pred_cmd before merging MSRs David Woodhouse
@ 2018-02-10 23:39 ` David Woodhouse
  2018-02-11 10:19   ` Ingo Molnar
                     ` (2 more replies)
  2018-02-10 23:39 ` [PATCH v2 6/6] x86/speculation: Use IBRS if available before calling into firmware David Woodhouse
                   ` (3 subsequent siblings)
  8 siblings, 3 replies; 74+ messages in thread
From: David Woodhouse @ 2018-02-10 23:39 UTC (permalink / raw)
  To: tglx, karahmed, sironi, x86, kvm, torvalds, pbonzini,
	linux-kernel, bp, peterz, jmattson, rkrcmar, arjan.van.de.ven,
	dave.hansen

From: KarimAllah Ahmed <karahmed@amazon.de>

We either clear the CPU_BASED_USE_MSR_BITMAPS and end up intercepting all
MSR accesses or create a valid L02 MSR bitmap and use that. This decision
has to be made every time we evaluate whether we are going to generate the
L02 MSR bitmap.

Before commit 086e7d4118cc ("KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL")
this was probably OK since the decision was always identical. This is no
longer the case now since the MSR bitmap might actually change once we
decide to not intercept SPEC_CTRL and PRED_CMD.

Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 arch/x86/kvm/vmx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 599179b..91e3539 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10130,7 +10130,8 @@ static void nested_get_vmcs12_pages(struct kvm_vcpu *vcpu,
 	if (cpu_has_vmx_msr_bitmap() &&
 	    nested_cpu_has(vmcs12, CPU_BASED_USE_MSR_BITMAPS) &&
 	    nested_vmx_merge_msr_bitmap(vcpu, vmcs12))
-		;
+		vmcs_set_bits(CPU_BASED_VM_EXEC_CONTROL,
+			      CPU_BASED_USE_MSR_BITMAPS);
 	else
 		vmcs_clear_bits(CPU_BASED_VM_EXEC_CONTROL,
 				CPU_BASED_USE_MSR_BITMAPS);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v2 6/6] x86/speculation: Use IBRS if available before calling into firmware
  2018-02-10 23:39 [PATCH v2 0/6] Spectre v2 updates David Woodhouse
                   ` (4 preceding siblings ...)
  2018-02-10 23:39 ` [PATCH v2 5/6] KVM/nVMX: Set the CPU_BASED_USE_MSR_BITMAPS if we have a valid L02 MSR bitmap David Woodhouse
@ 2018-02-10 23:39 ` David Woodhouse
  2018-02-11 11:46   ` Ingo Molnar
  2018-02-11 10:41 ` [PATCH v2 0/6] Spectre v2 updates Ingo Molnar
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 74+ messages in thread
From: David Woodhouse @ 2018-02-10 23:39 UTC (permalink / raw)
  To: tglx, karahmed, sironi, x86, kvm, torvalds, pbonzini,
	linux-kernel, bp, peterz, jmattson, rkrcmar, arjan.van.de.ven,
	dave.hansen

Retpoline means the kernel is safe because it has no indirect branches.
But firmware isn't, so use IBRS for firmware calls if it's available.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/apm.h           |  6 ++++++
 arch/x86/include/asm/cpufeatures.h   |  1 +
 arch/x86/include/asm/efi.h           | 17 +++++++++++++++--
 arch/x86/include/asm/nospec-branch.h | 37 +++++++++++++++++++++++++++---------
 arch/x86/kernel/cpu/bugs.c           | 12 +++++++++++-
 drivers/watchdog/hpwdt.c             |  3 +++
 6 files changed, 64 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/apm.h b/arch/x86/include/asm/apm.h
index 4d4015d..4483616 100644
--- a/arch/x86/include/asm/apm.h
+++ b/arch/x86/include/asm/apm.h
@@ -7,6 +7,8 @@
 #ifndef _ASM_X86_MACH_DEFAULT_APM_H
 #define _ASM_X86_MACH_DEFAULT_APM_H
 
+#include <asm/spec_ctrl.h>
+
 #ifdef APM_ZERO_SEGS
 #	define APM_DO_ZERO_SEGS \
 		"pushl %%ds\n\t" \
@@ -32,6 +34,7 @@ static inline void apm_bios_call_asm(u32 func, u32 ebx_in, u32 ecx_in,
 	 * N.B. We do NOT need a cld after the BIOS call
 	 * because we always save and restore the flags.
 	 */
+	firmware_restrict_branch_speculation_start();
 	__asm__ __volatile__(APM_DO_ZERO_SEGS
 		"pushl %%edi\n\t"
 		"pushl %%ebp\n\t"
@@ -44,6 +47,7 @@ static inline void apm_bios_call_asm(u32 func, u32 ebx_in, u32 ecx_in,
 		  "=S" (*esi)
 		: "a" (func), "b" (ebx_in), "c" (ecx_in)
 		: "memory", "cc");
+	firmware_restrict_branch_speculation_end();
 }
 
 static inline bool apm_bios_call_simple_asm(u32 func, u32 ebx_in,
@@ -56,6 +60,7 @@ static inline bool apm_bios_call_simple_asm(u32 func, u32 ebx_in,
 	 * N.B. We do NOT need a cld after the BIOS call
 	 * because we always save and restore the flags.
 	 */
+	firmware_restrict_branch_speculation_start();
 	__asm__ __volatile__(APM_DO_ZERO_SEGS
 		"pushl %%edi\n\t"
 		"pushl %%ebp\n\t"
@@ -68,6 +73,7 @@ static inline bool apm_bios_call_simple_asm(u32 func, u32 ebx_in,
 		  "=S" (si)
 		: "a" (func), "b" (ebx_in), "c" (ecx_in)
 		: "memory", "cc");
+	firmware_restrict_branch_speculation_end();
 	return error;
 }
 
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 73b5fff..66c1434 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -211,6 +211,7 @@
 #define X86_FEATURE_RSB_CTXSW		( 7*32+19) /* "" Fill RSB on context switches */
 
 #define X86_FEATURE_USE_IBPB		( 7*32+21) /* "" Indirect Branch Prediction Barrier enabled */
+#define X86_FEATURE_USE_IBRS_FW		( 7*32+22) /* "" Use IBRS during runtime firmware calls */
 
 /* Virtualization flags: Linux defined, word 8 */
 #define X86_FEATURE_TPR_SHADOW		( 8*32+ 0) /* Intel TPR Shadow */
diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 85f6ccb..a399c1e 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -6,6 +6,7 @@
 #include <asm/pgtable.h>
 #include <asm/processor-flags.h>
 #include <asm/tlb.h>
+#include <asm/nospec-branch.h>
 
 /*
  * We map the EFI regions needed for runtime services non-contiguously,
@@ -36,8 +37,18 @@
 
 extern asmlinkage unsigned long efi_call_phys(void *, ...);
 
-#define arch_efi_call_virt_setup()	kernel_fpu_begin()
-#define arch_efi_call_virt_teardown()	kernel_fpu_end()
+#define arch_efi_call_virt_setup()					\
+({									\
+	kernel_fpu_begin();						\
+	firmware_restrict_branch_speculation_start();			\
+})
+
+#define arch_efi_call_virt_teardown()					\
+({									\
+	firmware_restrict_branch_speculation_end();			\
+	kernel_fpu_end();						\
+})
+
 
 /*
  * Wrap all the virtual calls in a way that forces the parameters on the stack.
@@ -73,6 +84,7 @@ struct efi_scratch {
 	efi_sync_low_kernel_mappings();					\
 	preempt_disable();						\
 	__kernel_fpu_begin();						\
+	firmware_restrict_branch_speculation_start();			\
 									\
 	if (efi_scratch.use_pgd) {					\
 		efi_scratch.prev_cr3 = __read_cr3();			\
@@ -91,6 +103,7 @@ struct efi_scratch {
 		__flush_tlb_all();					\
 	}								\
 									\
+	firmware_restrict_branch_speculation_end();			\
 	__kernel_fpu_end();						\
 	preempt_enable();						\
 })
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 300cc15..788c4da 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -162,17 +162,36 @@ static inline void vmexit_fill_RSB(void)
 #endif
 }
 
+#define alternative_msr_write(_msr, _val, _feature)		\
+	asm volatile(ALTERNATIVE("",				\
+				 "movl %[msr], %%ecx\n\t"	\
+				 "movl %[val], %%eax\n\t"	\
+				 "movl $0, %%edx\n\t"		\
+				 "wrmsr",			\
+				 _feature)			\
+		     : : [msr] "i" (_msr), [val] "i" (_val)	\
+		     : "eax", "ecx", "edx", "memory")
+
 static inline void indirect_branch_prediction_barrier(void)
 {
-	asm volatile(ALTERNATIVE("",
-				 "movl %[msr], %%ecx\n\t"
-				 "movl %[val], %%eax\n\t"
-				 "movl $0, %%edx\n\t"
-				 "wrmsr",
-				 X86_FEATURE_USE_IBPB)
-		     : : [msr] "i" (MSR_IA32_PRED_CMD),
-			 [val] "i" (PRED_CMD_IBPB)
-		     : "eax", "ecx", "edx", "memory");
+	alternative_msr_write(MSR_IA32_PRED_CMD, PRED_CMD_IBPB,
+			      X86_FEATURE_USE_IBPB);
+}
+
+/*
+ * With retpoline, we must use IBRS to restrict branch prediction
+ * before calling into firmware.
+ */
+static inline void firmware_restrict_branch_speculation_start(void)
+{
+	alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS,
+			      X86_FEATURE_USE_IBRS_FW);
+}
+
+static inline void firmware_restrict_branch_speculation_end(void)
+{
+	alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,
+			      X86_FEATURE_USE_IBRS_FW);
 }
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 61152aa..6f6d763 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -303,6 +303,15 @@ static void __init spectre_v2_select_mitigation(void)
 		setup_force_cpu_cap(X86_FEATURE_USE_IBPB);
 		pr_info("Enabling Indirect Branch Prediction Barrier\n");
 	}
+
+	/*
+	 * Retpoline means the kernel is safe because it has no indirect
+	 * branches. But firmware isn't, so use IBRS to protect that.
+	 */
+	if (boot_cpu_has(X86_FEATURE_IBRS)) {
+		setup_force_cpu_cap(X86_FEATURE_USE_IBRS_FW);
+		pr_info("Enabling Restricted Speculation for firmware calls\n");
+	}
 }
 
 #undef pr_fmt
@@ -332,8 +341,9 @@ ssize_t cpu_show_spectre_v2(struct device *dev,
 	if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V2))
 		return sprintf(buf, "Not affected\n");
 
-	return sprintf(buf, "%s%s%s\n", spectre_v2_strings[spectre_v2_enabled],
+	return sprintf(buf, "%s%s%s%s\n", spectre_v2_strings[spectre_v2_enabled],
 		       boot_cpu_has(X86_FEATURE_USE_IBPB) ? ", IBPB" : "",
+		       boot_cpu_has(X86_FEATURE_USE_IBRS_FW) ? ", IBRS_FW" : "",
 		       spectre_v2_module_string());
 }
 #endif
diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
index 67fbe35..bab3721 100644
--- a/drivers/watchdog/hpwdt.c
+++ b/drivers/watchdog/hpwdt.c
@@ -38,6 +38,7 @@
 #endif /* CONFIG_HPWDT_NMI_DECODING */
 #include <asm/nmi.h>
 #include <asm/frame.h>
+#include <asm/nospec-branch.h>
 
 #define HPWDT_VERSION			"1.4.0"
 #define SECS_TO_TICKS(secs)		((secs) * 1000 / 128)
@@ -486,11 +487,13 @@ static int hpwdt_pretimeout(unsigned int ulReason, struct pt_regs *regs)
 	if (!hpwdt_nmi_decoding)
 		return NMI_DONE;
 
+	firmware_restrict_branch_speculation_start();
 	spin_lock_irqsave(&rom_lock, rom_pl);
 	if (!die_nmi_called && !is_icru && !is_uefi)
 		asminline_call(&cmn_regs, cru_rom_addr);
 	die_nmi_called = 1;
 	spin_unlock_irqrestore(&rom_lock, rom_pl);
+	firmware_restrict_branch_speculation_end();
 
 	if (allow_kdump)
 		hpwdt_stop();
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [PATCH v2 5/6] KVM/nVMX: Set the CPU_BASED_USE_MSR_BITMAPS if we have a valid L02 MSR bitmap
  2018-02-10 23:39 ` [PATCH v2 5/6] KVM/nVMX: Set the CPU_BASED_USE_MSR_BITMAPS if we have a valid L02 MSR bitmap David Woodhouse
@ 2018-02-11 10:19   ` Ingo Molnar
       [not found]     ` <1518345844.3677.365.camel@amazon.co.uk>
  2018-02-11 12:10   ` [tip:x86/pti] " tip-bot for KarimAllah Ahmed
  2018-02-13  8:59   ` tip-bot for KarimAllah Ahmed
  2 siblings, 1 reply; 74+ messages in thread
From: Ingo Molnar @ 2018-02-11 10:19 UTC (permalink / raw)
  To: David Woodhouse
  Cc: tglx, karahmed, sironi, x86, kvm, torvalds, pbonzini,
	linux-kernel, bp, peterz, jmattson, rkrcmar, arjan.van.de.ven,
	dave.hansen


* David Woodhouse <dwmw@amazon.co.uk> wrote:

> From: KarimAllah Ahmed <karahmed@amazon.de>
> 
> We either clear the CPU_BASED_USE_MSR_BITMAPS and end up intercepting all
> MSR accesses or create a valid L02 MSR bitmap and use that. This decision
> has to be made every time we evaluate whether we are going to generate the
> L02 MSR bitmap.
> 
> Before commit 086e7d4118cc ("KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL")
> this was probably OK since the decision was always identical. This is no
> longer the case now since the MSR bitmap might actually change once we
> decide to not intercept SPEC_CTRL and PRED_CMD.

Note, I fixed the changelog to refer to the correct upstream SHA1, which is:

  d28b387fb74d: KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v2 0/6] Spectre v2 updates
  2018-02-10 23:39 [PATCH v2 0/6] Spectre v2 updates David Woodhouse
                   ` (5 preceding siblings ...)
  2018-02-10 23:39 ` [PATCH v2 6/6] x86/speculation: Use IBRS if available before calling into firmware David Woodhouse
@ 2018-02-11 10:41 ` Ingo Molnar
  2018-02-11 15:19 ` [PATCH v2.1] x86/speculation: Use IBRS if available before calling into firmware David Woodhouse
  2018-02-12  8:27 ` [PATCH v2 0/6] Spectre v2 updates Paolo Bonzini
  8 siblings, 0 replies; 74+ messages in thread
From: Ingo Molnar @ 2018-02-11 10:41 UTC (permalink / raw)
  To: David Woodhouse, Paolo Bonzini, Radim Krčmář
  Cc: tglx, karahmed, sironi, x86, kvm, torvalds, pbonzini,
	linux-kernel, bp, peterz, jmattson, rkrcmar, arjan.van.de.ven,
	dave.hansen


Paolo, Radim,

* David Woodhouse <dwmw@amazon.co.uk> wrote:

> David Woodhouse (4):
>   KVM: x86: Reduce retpoline performance impact in slot_handle_level_range()
> KarimAllah Ahmed (2):
>   X86/nVMX: Properly set spec_ctrl and pred_cmd before merging MSRs
>   KVM/nVMX: Set the CPU_BASED_USE_MSR_BITMAPS if we have a valid L02 MSR bitmap

Similarly to the previous Spectre patches I've applied these three KVM patches to 
tip:x86/pti too, to keep them all in a single backportable group of commits. They 
all look correct to me and solve real problems, and there's no conflict with 
current upstream KVM code.

Let me know if that's OK to you or if you'd like to see any changes to them.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v2 5/6] KVM/nVMX: Set the CPU_BASED_USE_MSR_BITMAPS if we have a valid L02 MSR bitmap
       [not found]     ` <1518345844.3677.365.camel@amazon.co.uk>
@ 2018-02-11 10:55       ` Ingo Molnar
  0 siblings, 0 replies; 74+ messages in thread
From: Ingo Molnar @ 2018-02-11 10:55 UTC (permalink / raw)
  To: Woodhouse, David
  Cc: kvm, linux-kernel, peterz, jmattson, Raslan, KarimAllah,
	arjan.van.de.ven, torvalds, tglx, x86, pbonzini, bp, Sironi,
	Filippo, dave.hansen, rkrcmar


* Woodhouse, David <dwmw@amazon.co.uk> wrote:

> On Sun, 2018-02-11 at 11:19 +0100, Ingo Molnar wrote:
> > Note, I fixed the changelog to refer to the correct upstream SHA1,
> > which is:
> > 
> >   d28b387fb74d: KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL
> 
> Thanks for catching that.
> 
> Wouldn't it be nice if 'git rebase --interactive tip/x86/pti' had done
> that *for* me? :)

Yeah, but given that the commit title changed as well:

   086e7d4118cc ("KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL")
   d28b387fb74d ("KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL")

I'd rather not have tooling guess about such things.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v2 6/6] x86/speculation: Use IBRS if available before calling into firmware
  2018-02-10 23:39 ` [PATCH v2 6/6] x86/speculation: Use IBRS if available before calling into firmware David Woodhouse
@ 2018-02-11 11:46   ` Ingo Molnar
  0 siblings, 0 replies; 74+ messages in thread
From: Ingo Molnar @ 2018-02-11 11:46 UTC (permalink / raw)
  To: David Woodhouse
  Cc: tglx, karahmed, sironi, x86, kvm, torvalds, pbonzini,
	linux-kernel, bp, peterz, jmattson, rkrcmar, arjan.van.de.ven,
	dave.hansen


* David Woodhouse <dwmw@amazon.co.uk> wrote:

> Retpoline means the kernel is safe because it has no indirect branches.
> But firmware isn't, so use IBRS for firmware calls if it's available.

Ok, this approach looks good to me in principle, but:

> --- a/arch/x86/include/asm/apm.h
> +++ b/arch/x86/include/asm/apm.h
> @@ -7,6 +7,8 @@
>  #ifndef _ASM_X86_MACH_DEFAULT_APM_H
>  #define _ASM_X86_MACH_DEFAULT_APM_H
>  
> +#include <asm/spec_ctrl.h>

I cannot see this header file upstream anywhere, nor in any other patch in my mbox 
- is there some dependency that has to be applied first?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [tip:x86/pti] x86/speculation: Update Speculation Control microcode blacklist
  2018-02-10 23:39 ` [PATCH v2 1/6] x86/speculation: Update Speculation Control microcode blacklist David Woodhouse
@ 2018-02-11 12:08   ` tip-bot for David Woodhouse
  2018-02-12  9:50   ` [PATCH v2 1/6] " Darren Kenny
  2018-02-12 14:16   ` David Woodhouse
  2 siblings, 0 replies; 74+ messages in thread
From: tip-bot for David Woodhouse @ 2018-02-11 12:08 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: peterz, jpoimboe, gregkh, bp, dan.j.williams, hpa, dave.hansen,
	dwmw2, arjan, mingo, torvalds, dwmw, luto, tglx, linux-kernel

Commit-ID:  1751342095f0d2b36fa8114d8e12c5688c455ac4
Gitweb:     https://git.kernel.org/tip/1751342095f0d2b36fa8114d8e12c5688c455ac4
Author:     David Woodhouse <dwmw@amazon.co.uk>
AuthorDate: Sat, 10 Feb 2018 23:39:22 +0000
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sun, 11 Feb 2018 11:24:15 +0100

x86/speculation: Update Speculation Control microcode blacklist

Intel have retroactively blessed the 0xc2 microcode on Skylake mobile
and desktop parts, and the Gemini Lake 0x22 microcode is apparently fine
too. We blacklisted the latter purely because it was present with all
the other problematic ones in the 2018-01-08 release, but now it's
explicitly listed as OK.

We still list 0x84 for the various Kaby Lake / Coffee Lake parts, as
that appeared in one version of the blacklist and then reverted to
0x80 again. We can change it if 0x84 is actually announced to be safe.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: arjan.van.de.ven@intel.com
Cc: jmattson@google.com
Cc: karahmed@amazon.de
Cc: kvm@vger.kernel.org
Cc: pbonzini@redhat.com
Cc: rkrcmar@redhat.com
Cc: sironi@amazon.de
Link: http://lkml.kernel.org/r/1518305967-31356-2-git-send-email-dwmw@amazon.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/intel.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 319bf98..f73b814 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -123,8 +123,6 @@ static const struct sku_microcode spectre_bad_microcodes[] = {
 	{ INTEL_FAM6_KABYLAKE_MOBILE,	0x09,	0x84 },
 	{ INTEL_FAM6_SKYLAKE_X,		0x03,	0x0100013e },
 	{ INTEL_FAM6_SKYLAKE_X,		0x04,	0x0200003c },
-	{ INTEL_FAM6_SKYLAKE_MOBILE,	0x03,	0xc2 },
-	{ INTEL_FAM6_SKYLAKE_DESKTOP,	0x03,	0xc2 },
 	{ INTEL_FAM6_BROADWELL_CORE,	0x04,	0x28 },
 	{ INTEL_FAM6_BROADWELL_GT3E,	0x01,	0x1b },
 	{ INTEL_FAM6_BROADWELL_XEON_D,	0x02,	0x14 },
@@ -136,8 +134,6 @@ static const struct sku_microcode spectre_bad_microcodes[] = {
 	{ INTEL_FAM6_HASWELL_X,		0x02,	0x3b },
 	{ INTEL_FAM6_HASWELL_X,		0x04,	0x10 },
 	{ INTEL_FAM6_IVYBRIDGE_X,	0x04,	0x42a },
-	/* Updated in the 20180108 release; blacklist until we know otherwise */
-	{ INTEL_FAM6_ATOM_GEMINI_LAKE,	0x01,	0x22 },
 	/* Observed in the wild */
 	{ INTEL_FAM6_SANDYBRIDGE_X,	0x06,	0x61b },
 	{ INTEL_FAM6_SANDYBRIDGE_X,	0x07,	0x712 },

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [tip:x86/pti] Revert "x86/speculation: Simplify indirect_branch_prediction_barrier()"
  2018-02-10 23:39 ` [PATCH v2 2/6] Revert "x86/speculation: Simplify indirect_branch_prediction_barrier()" David Woodhouse
@ 2018-02-11 12:09   ` tip-bot for David Woodhouse
  2018-02-13  8:58   ` tip-bot for David Woodhouse
  1 sibling, 0 replies; 74+ messages in thread
From: tip-bot for David Woodhouse @ 2018-02-11 12:09 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: dwmw, arjan, torvalds, gregkh, bp, hpa, jpoimboe, peterz,
	dave.hansen, luto, dwmw2, dan.j.williams, linux-kernel, tglx,
	mingo

Commit-ID:  930ce1a7a55bc0eb8917f453ee22f1b6d67df5cd
Gitweb:     https://git.kernel.org/tip/930ce1a7a55bc0eb8917f453ee22f1b6d67df5cd
Author:     David Woodhouse <dwmw@amazon.co.uk>
AuthorDate: Sat, 10 Feb 2018 23:39:23 +0000
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sun, 11 Feb 2018 11:24:15 +0100

Revert "x86/speculation: Simplify indirect_branch_prediction_barrier()"

This reverts commit 64e16720ea0879f8ab4547e3b9758936d483909b.

We cannot call C functions like that, without marking all the
call-clobbered registers as, well, clobbered. We might have got away
with it for now because the __ibp_barrier() function was *fairly*
unlikely to actually use any other registers. But no. Just no.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: arjan.van.de.ven@intel.com
Cc: dave.hansen@intel.com
Cc: jmattson@google.com
Cc: karahmed@amazon.de
Cc: kvm@vger.kernel.org
Cc: pbonzini@redhat.com
Cc: rkrcmar@redhat.com
Cc: sironi@amazon.de
Link: http://lkml.kernel.org/r/1518305967-31356-3-git-send-email-dwmw@amazon.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/nospec-branch.h | 13 +++++++++----
 arch/x86/include/asm/processor.h     |  3 ---
 arch/x86/kernel/cpu/bugs.c           |  6 ------
 3 files changed, 9 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 4d57894..300cc15 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -164,10 +164,15 @@ static inline void vmexit_fill_RSB(void)
 
 static inline void indirect_branch_prediction_barrier(void)
 {
-	alternative_input("",
-			  "call __ibp_barrier",
-			  X86_FEATURE_USE_IBPB,
-			  ASM_NO_INPUT_CLOBBER("eax", "ecx", "edx", "memory"));
+	asm volatile(ALTERNATIVE("",
+				 "movl %[msr], %%ecx\n\t"
+				 "movl %[val], %%eax\n\t"
+				 "movl $0, %%edx\n\t"
+				 "wrmsr",
+				 X86_FEATURE_USE_IBPB)
+		     : : [msr] "i" (MSR_IA32_PRED_CMD),
+			 [val] "i" (PRED_CMD_IBPB)
+		     : "eax", "ecx", "edx", "memory");
 }
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 513f960..99799fb 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -969,7 +969,4 @@ bool xen_set_default_idle(void);
 
 void stop_this_cpu(void *dummy);
 void df_debug(struct pt_regs *regs, long error_code);
-
-void __ibp_barrier(void);
-
 #endif /* _ASM_X86_PROCESSOR_H */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 71949bf..61152aa 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -337,9 +337,3 @@ ssize_t cpu_show_spectre_v2(struct device *dev,
 		       spectre_v2_module_string());
 }
 #endif
-
-void __ibp_barrier(void)
-{
-	__wrmsr(MSR_IA32_PRED_CMD, PRED_CMD_IBPB, 0);
-}
-EXPORT_SYMBOL_GPL(__ibp_barrier);

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [tip:x86/pti] KVM/x86: Reduce retpoline performance impact in slot_handle_level_range(), by always inlining iterator helper methods
  2018-02-10 23:39 ` [PATCH v2 3/6] KVM: x86: Reduce retpoline performance impact in slot_handle_level_range() David Woodhouse
@ 2018-02-11 12:09   ` tip-bot for David Woodhouse
  2018-02-13  8:58   ` tip-bot for David Woodhouse
  1 sibling, 0 replies; 74+ messages in thread
From: tip-bot for David Woodhouse @ 2018-02-11 12:09 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: gregkh, dwmw, hpa, peterz, luto, arjan, dwmw2, sironi, jpoimboe,
	torvalds, dave.hansen, tglx, mingo, linux-kernel, dan.j.williams,
	bp

Commit-ID:  33f1e899478efb7c77b2b833e7edee1203a24a48
Gitweb:     https://git.kernel.org/tip/33f1e899478efb7c77b2b833e7edee1203a24a48
Author:     David Woodhouse <dwmw@amazon.co.uk>
AuthorDate: Sat, 10 Feb 2018 23:39:24 +0000
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sun, 11 Feb 2018 11:24:15 +0100

KVM/x86: Reduce retpoline performance impact in slot_handle_level_range(), by always inlining iterator helper methods

With retpoline, tight loops of "call this function for every XXX" are
very much pessimised by taking a prediction miss *every* time. This one
is by far the biggest contributor to the guest launch time with retpoline.

By marking the iterator slot_handle_…() functions always_inline, we can
ensure that the indirect function call can be optimised away into a
direct call and it actually generates slightly smaller code because
some of the other conditionals can get optimised away too.

Performance is now pretty close to what we see with nospectre_v2 on
the command line.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Tested-by: Filippo Sironi <sironi@amazon.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Filippo Sironi <sironi@amazon.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: arjan.van.de.ven@intel.com
Cc: dave.hansen@intel.com
Cc: jmattson@google.com
Cc: karahmed@amazon.de
Cc: kvm@vger.kernel.org
Cc: pbonzini@redhat.com
Cc: rkrcmar@redhat.com
Link: http://lkml.kernel.org/r/1518305967-31356-4-git-send-email-dwmw@amazon.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kvm/mmu.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 2b8eb4d..cc83bdc 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -5058,7 +5058,7 @@ void kvm_mmu_uninit_vm(struct kvm *kvm)
 typedef bool (*slot_level_handler) (struct kvm *kvm, struct kvm_rmap_head *rmap_head);
 
 /* The caller should hold mmu-lock before calling this function. */
-static bool
+static __always_inline bool
 slot_handle_level_range(struct kvm *kvm, struct kvm_memory_slot *memslot,
 			slot_level_handler fn, int start_level, int end_level,
 			gfn_t start_gfn, gfn_t end_gfn, bool lock_flush_tlb)
@@ -5088,7 +5088,7 @@ slot_handle_level_range(struct kvm *kvm, struct kvm_memory_slot *memslot,
 	return flush;
 }
 
-static bool
+static __always_inline bool
 slot_handle_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
 		  slot_level_handler fn, int start_level, int end_level,
 		  bool lock_flush_tlb)
@@ -5099,7 +5099,7 @@ slot_handle_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
 			lock_flush_tlb);
 }
 
-static bool
+static __always_inline bool
 slot_handle_all_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
 		      slot_level_handler fn, bool lock_flush_tlb)
 {
@@ -5107,7 +5107,7 @@ slot_handle_all_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
 				 PT_MAX_HUGEPAGE_LEVEL, lock_flush_tlb);
 }
 
-static bool
+static __always_inline bool
 slot_handle_large_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
 			slot_level_handler fn, bool lock_flush_tlb)
 {
@@ -5115,7 +5115,7 @@ slot_handle_large_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
 				 PT_MAX_HUGEPAGE_LEVEL, lock_flush_tlb);
 }
 
-static bool
+static __always_inline bool
 slot_handle_leaf(struct kvm *kvm, struct kvm_memory_slot *memslot,
 		 slot_level_handler fn, bool lock_flush_tlb)
 {

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [tip:x86/pti] X86/nVMX: Properly set spec_ctrl and pred_cmd before merging MSRs
  2018-02-10 23:39 ` [PATCH v2 4/6] X86/nVMX: Properly set spec_ctrl and pred_cmd before merging MSRs David Woodhouse
@ 2018-02-11 12:10   ` tip-bot for KarimAllah Ahmed
  2018-02-13  8:59   ` tip-bot for KarimAllah Ahmed
  1 sibling, 0 replies; 74+ messages in thread
From: tip-bot for KarimAllah Ahmed @ 2018-02-11 12:10 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, karahmed, dwmw, dave.hansen, torvalds, arjan, tglx,
	bp, pbonzini, dan.j.williams, mingo, hpa, jmattson, rkrcmar,
	dwmw2, gregkh, luto, jpoimboe, peterz

Commit-ID:  fb5b90b795c76e9c10c520fcdb7fe0d7b8334833
Gitweb:     https://git.kernel.org/tip/fb5b90b795c76e9c10c520fcdb7fe0d7b8334833
Author:     KarimAllah Ahmed <karahmed@amazon.de>
AuthorDate: Sat, 10 Feb 2018 23:39:25 +0000
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sun, 11 Feb 2018 11:24:16 +0100

X86/nVMX: Properly set spec_ctrl and pred_cmd before merging MSRs

These two variables should check whether SPEC_CTRL and PRED_CMD are
supposed to be passed through to L2 guests or not. While
msr_write_intercepted_l01 would return 'true' if it is not passed through.

So just invert the result of msr_write_intercepted_l01 to implement the
correct semantics.

Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Jim Mattson <jmattson@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: arjan.van.de.ven@intel.com
Cc: dave.hansen@intel.com
Cc: kvm@vger.kernel.org
Cc: sironi@amazon.de
Fixes: 086e7d4118cc ("KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL")
Link: http://lkml.kernel.org/r/1518305967-31356-5-git-send-email-dwmw@amazon.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kvm/vmx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bee4c49..599179b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10219,8 +10219,8 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu,
 	 *    updated to reflect this when L1 (or its L2s) actually write to
 	 *    the MSR.
 	 */
-	bool pred_cmd = msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD);
-	bool spec_ctrl = msr_write_intercepted_l01(vcpu, MSR_IA32_SPEC_CTRL);
+	bool pred_cmd = !msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD);
+	bool spec_ctrl = !msr_write_intercepted_l01(vcpu, MSR_IA32_SPEC_CTRL);
 
 	if (!nested_cpu_has_virt_x2apic_mode(vmcs12) &&
 	    !pred_cmd && !spec_ctrl)

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [tip:x86/pti] KVM/nVMX: Set the CPU_BASED_USE_MSR_BITMAPS if we have a valid L02 MSR bitmap
  2018-02-10 23:39 ` [PATCH v2 5/6] KVM/nVMX: Set the CPU_BASED_USE_MSR_BITMAPS if we have a valid L02 MSR bitmap David Woodhouse
  2018-02-11 10:19   ` Ingo Molnar
@ 2018-02-11 12:10   ` tip-bot for KarimAllah Ahmed
  2018-02-13  8:59   ` tip-bot for KarimAllah Ahmed
  2 siblings, 0 replies; 74+ messages in thread
From: tip-bot for KarimAllah Ahmed @ 2018-02-11 12:10 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: luto, arjan, dave.hansen, linux-kernel, peterz, dwmw, rkrcmar,
	torvalds, gregkh, mingo, hpa, jpoimboe, tglx, dan.j.williams, bp,
	dwmw2, karahmed, pbonzini

Commit-ID:  ff37dc0cd96c266c7700386b7ba48abc32a91b1f
Gitweb:     https://git.kernel.org/tip/ff37dc0cd96c266c7700386b7ba48abc32a91b1f
Author:     KarimAllah Ahmed <karahmed@amazon.de>
AuthorDate: Sat, 10 Feb 2018 23:39:26 +0000
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sun, 11 Feb 2018 11:24:16 +0100

KVM/nVMX: Set the CPU_BASED_USE_MSR_BITMAPS if we have a valid L02 MSR bitmap

We either clear the CPU_BASED_USE_MSR_BITMAPS and end up intercepting all
MSR accesses or create a valid L02 MSR bitmap and use that. This decision
has to be made every time we evaluate whether we are going to generate the
L02 MSR bitmap.

Before commit:

  d28b387fb74d ("KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL")

... this was probably OK since the decision was always identical.

This is no longer the case now since the MSR bitmap might actually
change once we decide to not intercept SPEC_CTRL and PRED_CMD.

Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: arjan.van.de.ven@intel.com
Cc: dave.hansen@intel.com
Cc: jmattson@google.com
Cc: kvm@vger.kernel.org
Cc: sironi@amazon.de
Link: http://lkml.kernel.org/r/1518305967-31356-6-git-send-email-dwmw@amazon.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kvm/vmx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 599179b..91e3539 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10130,7 +10130,8 @@ static void nested_get_vmcs12_pages(struct kvm_vcpu *vcpu,
 	if (cpu_has_vmx_msr_bitmap() &&
 	    nested_cpu_has(vmcs12, CPU_BASED_USE_MSR_BITMAPS) &&
 	    nested_vmx_merge_msr_bitmap(vcpu, vmcs12))
-		;
+		vmcs_set_bits(CPU_BASED_VM_EXEC_CONTROL,
+			      CPU_BASED_USE_MSR_BITMAPS);
 	else
 		vmcs_clear_bits(CPU_BASED_VM_EXEC_CONTROL,
 				CPU_BASED_USE_MSR_BITMAPS);

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v2.1] x86/speculation: Use IBRS if available before calling into firmware
  2018-02-10 23:39 [PATCH v2 0/6] Spectre v2 updates David Woodhouse
                   ` (6 preceding siblings ...)
  2018-02-11 10:41 ` [PATCH v2 0/6] Spectre v2 updates Ingo Molnar
@ 2018-02-11 15:19 ` David Woodhouse
  2018-02-11 18:50   ` [PATCH] x86/speculation: Clean up various Spectre related details Ingo Molnar
  2018-02-11 19:19   ` [tip:x86/pti] x86/speculation: Use IBRS if available before calling into firmware tip-bot for David Woodhouse
  2018-02-12  8:27 ` [PATCH v2 0/6] Spectre v2 updates Paolo Bonzini
  8 siblings, 2 replies; 74+ messages in thread
From: David Woodhouse @ 2018-02-11 15:19 UTC (permalink / raw)
  To: x86, mingo, linux-kernel

Retpoline means the kernel is safe because it has no indirect branches.
But firmware isn't, so use IBRS for firmware calls if it's available.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
Helps to include the right header file.

 arch/x86/include/asm/apm.h           |  6 ++++++
 arch/x86/include/asm/cpufeatures.h   |  1 +
 arch/x86/include/asm/efi.h           | 17 +++++++++++++++--
 arch/x86/include/asm/nospec-branch.h | 37 +++++++++++++++++++++++++++---------
 arch/x86/kernel/cpu/bugs.c           | 12 +++++++++++-
 drivers/watchdog/hpwdt.c             |  3 +++
 6 files changed, 64 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/apm.h b/arch/x86/include/asm/apm.h
index 4d4015d..c356098 100644
--- a/arch/x86/include/asm/apm.h
+++ b/arch/x86/include/asm/apm.h
@@ -7,6 +7,8 @@
 #ifndef _ASM_X86_MACH_DEFAULT_APM_H
 #define _ASM_X86_MACH_DEFAULT_APM_H
 
+#include <asm/nospec-branch.h>
+
 #ifdef APM_ZERO_SEGS
 #	define APM_DO_ZERO_SEGS \
 		"pushl %%ds\n\t" \
@@ -32,6 +34,7 @@ static inline void apm_bios_call_asm(u32 func, u32 ebx_in, u32 ecx_in,
 	 * N.B. We do NOT need a cld after the BIOS call
 	 * because we always save and restore the flags.
 	 */
+	firmware_restrict_branch_speculation_start();
 	__asm__ __volatile__(APM_DO_ZERO_SEGS
 		"pushl %%edi\n\t"
 		"pushl %%ebp\n\t"
@@ -44,6 +47,7 @@ static inline void apm_bios_call_asm(u32 func, u32 ebx_in, u32 ecx_in,
 		  "=S" (*esi)
 		: "a" (func), "b" (ebx_in), "c" (ecx_in)
 		: "memory", "cc");
+	firmware_restrict_branch_speculation_end();
 }
 
 static inline bool apm_bios_call_simple_asm(u32 func, u32 ebx_in,
@@ -56,6 +60,7 @@ static inline bool apm_bios_call_simple_asm(u32 func, u32 ebx_in,
 	 * N.B. We do NOT need a cld after the BIOS call
 	 * because we always save and restore the flags.
 	 */
+	firmware_restrict_branch_speculation_start();
 	__asm__ __volatile__(APM_DO_ZERO_SEGS
 		"pushl %%edi\n\t"
 		"pushl %%ebp\n\t"
@@ -68,6 +73,7 @@ static inline bool apm_bios_call_simple_asm(u32 func, u32 ebx_in,
 		  "=S" (si)
 		: "a" (func), "b" (ebx_in), "c" (ecx_in)
 		: "memory", "cc");
+	firmware_restrict_branch_speculation_end();
 	return error;
 }
 
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 73b5fff..66c1434 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -211,6 +211,7 @@
 #define X86_FEATURE_RSB_CTXSW		( 7*32+19) /* "" Fill RSB on context switches */
 
 #define X86_FEATURE_USE_IBPB		( 7*32+21) /* "" Indirect Branch Prediction Barrier enabled */
+#define X86_FEATURE_USE_IBRS_FW		( 7*32+22) /* "" Use IBRS during runtime firmware calls */
 
 /* Virtualization flags: Linux defined, word 8 */
 #define X86_FEATURE_TPR_SHADOW		( 8*32+ 0) /* Intel TPR Shadow */
diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 85f6ccb..a399c1e 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -6,6 +6,7 @@
 #include <asm/pgtable.h>
 #include <asm/processor-flags.h>
 #include <asm/tlb.h>
+#include <asm/nospec-branch.h>
 
 /*
  * We map the EFI regions needed for runtime services non-contiguously,
@@ -36,8 +37,18 @@
 
 extern asmlinkage unsigned long efi_call_phys(void *, ...);
 
-#define arch_efi_call_virt_setup()	kernel_fpu_begin()
-#define arch_efi_call_virt_teardown()	kernel_fpu_end()
+#define arch_efi_call_virt_setup()					\
+({									\
+	kernel_fpu_begin();						\
+	firmware_restrict_branch_speculation_start();			\
+})
+
+#define arch_efi_call_virt_teardown()					\
+({									\
+	firmware_restrict_branch_speculation_end();			\
+	kernel_fpu_end();						\
+})
+
 
 /*
  * Wrap all the virtual calls in a way that forces the parameters on the stack.
@@ -73,6 +84,7 @@ struct efi_scratch {
 	efi_sync_low_kernel_mappings();					\
 	preempt_disable();						\
 	__kernel_fpu_begin();						\
+	firmware_restrict_branch_speculation_start();			\
 									\
 	if (efi_scratch.use_pgd) {					\
 		efi_scratch.prev_cr3 = __read_cr3();			\
@@ -91,6 +103,7 @@ struct efi_scratch {
 		__flush_tlb_all();					\
 	}								\
 									\
+	firmware_restrict_branch_speculation_end();			\
 	__kernel_fpu_end();						\
 	preempt_enable();						\
 })
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 300cc15..788c4da 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -162,17 +162,36 @@ static inline void vmexit_fill_RSB(void)
 #endif
 }
 
+#define alternative_msr_write(_msr, _val, _feature)		\
+	asm volatile(ALTERNATIVE("",				\
+				 "movl %[msr], %%ecx\n\t"	\
+				 "movl %[val], %%eax\n\t"	\
+				 "movl $0, %%edx\n\t"		\
+				 "wrmsr",			\
+				 _feature)			\
+		     : : [msr] "i" (_msr), [val] "i" (_val)	\
+		     : "eax", "ecx", "edx", "memory")
+
 static inline void indirect_branch_prediction_barrier(void)
 {
-	asm volatile(ALTERNATIVE("",
-				 "movl %[msr], %%ecx\n\t"
-				 "movl %[val], %%eax\n\t"
-				 "movl $0, %%edx\n\t"
-				 "wrmsr",
-				 X86_FEATURE_USE_IBPB)
-		     : : [msr] "i" (MSR_IA32_PRED_CMD),
-			 [val] "i" (PRED_CMD_IBPB)
-		     : "eax", "ecx", "edx", "memory");
+	alternative_msr_write(MSR_IA32_PRED_CMD, PRED_CMD_IBPB,
+			      X86_FEATURE_USE_IBPB);
+}
+
+/*
+ * With retpoline, we must use IBRS to restrict branch prediction
+ * before calling into firmware.
+ */
+static inline void firmware_restrict_branch_speculation_start(void)
+{
+	alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS,
+			      X86_FEATURE_USE_IBRS_FW);
+}
+
+static inline void firmware_restrict_branch_speculation_end(void)
+{
+	alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,
+			      X86_FEATURE_USE_IBRS_FW);
 }
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 61152aa..6f6d763 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -303,6 +303,15 @@ static void __init spectre_v2_select_mitigation(void)
 		setup_force_cpu_cap(X86_FEATURE_USE_IBPB);
 		pr_info("Enabling Indirect Branch Prediction Barrier\n");
 	}
+
+	/*
+	 * Retpoline means the kernel is safe because it has no indirect
+	 * branches. But firmware isn't, so use IBRS to protect that.
+	 */
+	if (boot_cpu_has(X86_FEATURE_IBRS)) {
+		setup_force_cpu_cap(X86_FEATURE_USE_IBRS_FW);
+		pr_info("Enabling Restricted Speculation for firmware calls\n");
+	}
 }
 
 #undef pr_fmt
@@ -332,8 +341,9 @@ ssize_t cpu_show_spectre_v2(struct device *dev,
 	if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V2))
 		return sprintf(buf, "Not affected\n");
 
-	return sprintf(buf, "%s%s%s\n", spectre_v2_strings[spectre_v2_enabled],
+	return sprintf(buf, "%s%s%s%s\n", spectre_v2_strings[spectre_v2_enabled],
 		       boot_cpu_has(X86_FEATURE_USE_IBPB) ? ", IBPB" : "",
+		       boot_cpu_has(X86_FEATURE_USE_IBRS_FW) ? ", IBRS_FW" : "",
 		       spectre_v2_module_string());
 }
 #endif
diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
index 67fbe35..bab3721 100644
--- a/drivers/watchdog/hpwdt.c
+++ b/drivers/watchdog/hpwdt.c
@@ -38,6 +38,7 @@
 #endif /* CONFIG_HPWDT_NMI_DECODING */
 #include <asm/nmi.h>
 #include <asm/frame.h>
+#include <asm/nospec-branch.h>
 
 #define HPWDT_VERSION			"1.4.0"
 #define SECS_TO_TICKS(secs)		((secs) * 1000 / 128)
@@ -486,11 +487,13 @@ static int hpwdt_pretimeout(unsigned int ulReason, struct pt_regs *regs)
 	if (!hpwdt_nmi_decoding)
 		return NMI_DONE;
 
+	firmware_restrict_branch_speculation_start();
 	spin_lock_irqsave(&rom_lock, rom_pl);
 	if (!die_nmi_called && !is_icru && !is_uefi)
 		asminline_call(&cmn_regs, cru_rom_addr);
 	die_nmi_called = 1;
 	spin_unlock_irqrestore(&rom_lock, rom_pl);
+	firmware_restrict_branch_speculation_end();
 
 	if (allow_kdump)
 		hpwdt_stop();
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH] x86/speculation: Clean up various Spectre related details
  2018-02-11 15:19 ` [PATCH v2.1] x86/speculation: Use IBRS if available before calling into firmware David Woodhouse
@ 2018-02-11 18:50   ` Ingo Molnar
  2018-02-11 19:25     ` David Woodhouse
  2018-02-11 19:19   ` [tip:x86/pti] x86/speculation: Use IBRS if available before calling into firmware tip-bot for David Woodhouse
  1 sibling, 1 reply; 74+ messages in thread
From: Ingo Molnar @ 2018-02-11 18:50 UTC (permalink / raw)
  To: David Woodhouse
  Cc: x86, linux-kernel, tglx, karahmed, sironi, kvm, torvalds,
	pbonzini, bp, peterz, jmattson, rkrcmar, arjan.van.de.ven,
	dave.hansen


* David Woodhouse <dwmw@amazon.co.uk> wrote:

> +	/*
> +	 * Retpoline means the kernel is safe because it has no indirect
> +	 * branches. But firmware isn't, so use IBRS to protect that.
> +	 */
> +	if (boot_cpu_has(X86_FEATURE_IBRS)) {
> +		setup_force_cpu_cap(X86_FEATURE_USE_IBRS_FW);
> +		pr_info("Enabling Restricted Speculation for firmware calls\n");
> +	}

I have changed this text to say:

		pr_info("Spectre mitigation: Restricting branch speculation (enabling IBRS) for firmware calls\n");

In fact while at it I found and improved a few other details as well, such as:

 	 * Retpoline means the kernel is safe because it has no indirect
-	 * branches. But firmware isn't, so use IBRS to protect that.
+	 * branches. But we don't know whether the firmware is safe, so
+	 * use IBRS to protect against that:

most Spectre related messages are now harmonized:

arch/x86/kernel/cpu/bugs.c:             pr_info("Spectre mitigation: Filling RSB on context switch\n");
arch/x86/kernel/cpu/bugs.c:             pr_info("Spectre mitigation: Enabling Indirect Branch Prediction Barrier (IBPB)\n");
arch/x86/kernel/cpu/bugs.c:             pr_info("Spectre mitigation: Restricting branch speculation (enabling IBRS) for firmware calls\n");

Find the full patch below.

Thanks,

	Ingo

=========================>
>From 82c2b2f29691143a05181333f387e786646aa28b Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@kernel.org>
Date: Sat, 10 Feb 2018 11:51:57 +0100
Subject: [PATCH] x86/speculation: Clean up various Spectre related details

Harmonize all the Spectre messages so that a:

    dmesg | grep -i spectre

... gives us most Spectre related kernel boot messages.

Also fix a few other details:

 - clarify a comment about firmware speculation control

 - s/KPTI/PTI

 - remove various line-breaks that made the code uglier

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/bugs.c | 33 ++++++++++++++-------------------
 1 file changed, 14 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 6f6d763225c8..eff45477fcca 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -162,8 +162,7 @@ static enum spectre_v2_mitigation_cmd __init spectre_v2_parse_cmdline(void)
 	if (cmdline_find_option_bool(boot_command_line, "nospectre_v2"))
 		return SPECTRE_V2_CMD_NONE;
 	else {
-		ret = cmdline_find_option(boot_command_line, "spectre_v2", arg,
-					  sizeof(arg));
+		ret = cmdline_find_option(boot_command_line, "spectre_v2", arg, sizeof(arg));
 		if (ret < 0)
 			return SPECTRE_V2_CMD_AUTO;
 
@@ -175,8 +174,7 @@ static enum spectre_v2_mitigation_cmd __init spectre_v2_parse_cmdline(void)
 		}
 
 		if (i >= ARRAY_SIZE(mitigation_options)) {
-			pr_err("unknown option (%s). Switching to AUTO select\n",
-			       mitigation_options[i].option);
+			pr_err("unknown option (%s). Switching to AUTO select\n", mitigation_options[i].option);
 			return SPECTRE_V2_CMD_AUTO;
 		}
 	}
@@ -185,8 +183,7 @@ static enum spectre_v2_mitigation_cmd __init spectre_v2_parse_cmdline(void)
 	     cmd == SPECTRE_V2_CMD_RETPOLINE_AMD ||
 	     cmd == SPECTRE_V2_CMD_RETPOLINE_GENERIC) &&
 	    !IS_ENABLED(CONFIG_RETPOLINE)) {
-		pr_err("%s selected but not compiled in. Switching to AUTO select\n",
-		       mitigation_options[i].option);
+		pr_err("%s selected but not compiled in. Switching to AUTO select\n", mitigation_options[i].option);
 		return SPECTRE_V2_CMD_AUTO;
 	}
 
@@ -256,14 +253,14 @@ static void __init spectre_v2_select_mitigation(void)
 			goto retpoline_auto;
 		break;
 	}
-	pr_err("kernel not compiled with retpoline; no mitigation available!");
+	pr_err("Spectre mitigation: kernel not compiled with retpoline; no mitigation available!");
 	return;
 
 retpoline_auto:
 	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) {
 	retpoline_amd:
 		if (!boot_cpu_has(X86_FEATURE_LFENCE_RDTSC)) {
-			pr_err("LFENCE not serializing. Switching to generic retpoline\n");
+			pr_err("Spectre mitigation: LFENCE not serializing, switching to generic retpoline\n");
 			goto retpoline_generic;
 		}
 		mode = retp_compiler() ? SPECTRE_V2_RETPOLINE_AMD :
@@ -281,7 +278,7 @@ static void __init spectre_v2_select_mitigation(void)
 	pr_info("%s\n", spectre_v2_strings[mode]);
 
 	/*
-	 * If neither SMEP or KPTI are available, there is a risk of
+	 * If neither SMEP or PTI are available, there is a risk of
 	 * hitting userspace addresses in the RSB after a context switch
 	 * from a shallow call stack to a deeper one. To prevent this fill
 	 * the entire RSB, even when using IBRS.
@@ -295,30 +292,30 @@ static void __init spectre_v2_select_mitigation(void)
 	if ((!boot_cpu_has(X86_FEATURE_PTI) &&
 	     !boot_cpu_has(X86_FEATURE_SMEP)) || is_skylake_era()) {
 		setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW);
-		pr_info("Filling RSB on context switch\n");
+		pr_info("Spectre mitigation: Filling RSB on context switch\n");
 	}
 
 	/* Initialize Indirect Branch Prediction Barrier if supported */
 	if (boot_cpu_has(X86_FEATURE_IBPB)) {
 		setup_force_cpu_cap(X86_FEATURE_USE_IBPB);
-		pr_info("Enabling Indirect Branch Prediction Barrier\n");
+		pr_info("Spectre mitigation: Enabling Indirect Branch Prediction Barrier (IBPB)\n");
 	}
 
 	/*
 	 * Retpoline means the kernel is safe because it has no indirect
-	 * branches. But firmware isn't, so use IBRS to protect that.
+	 * branches. But we don't know whether the firmware is safe, so
+	 * use IBRS to protect against that:
 	 */
 	if (boot_cpu_has(X86_FEATURE_IBRS)) {
 		setup_force_cpu_cap(X86_FEATURE_USE_IBRS_FW);
-		pr_info("Enabling Restricted Speculation for firmware calls\n");
+		pr_info("Spectre mitigation: Restricting branch speculation (enabling IBRS) for firmware calls\n");
 	}
 }
 
 #undef pr_fmt
 
 #ifdef CONFIG_SYSFS
-ssize_t cpu_show_meltdown(struct device *dev,
-			  struct device_attribute *attr, char *buf)
+ssize_t cpu_show_meltdown(struct device *dev, struct device_attribute *attr, char *buf)
 {
 	if (!boot_cpu_has_bug(X86_BUG_CPU_MELTDOWN))
 		return sprintf(buf, "Not affected\n");
@@ -327,16 +324,14 @@ ssize_t cpu_show_meltdown(struct device *dev,
 	return sprintf(buf, "Vulnerable\n");
 }
 
-ssize_t cpu_show_spectre_v1(struct device *dev,
-			    struct device_attribute *attr, char *buf)
+ssize_t cpu_show_spectre_v1(struct device *dev, struct device_attribute *attr, char *buf)
 {
 	if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V1))
 		return sprintf(buf, "Not affected\n");
 	return sprintf(buf, "Mitigation: __user pointer sanitization\n");
 }
 
-ssize_t cpu_show_spectre_v2(struct device *dev,
-			    struct device_attribute *attr, char *buf)
+ssize_t cpu_show_spectre_v2(struct device *dev, struct device_attribute *attr, char *buf)
 {
 	if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V2))
 		return sprintf(buf, "Not affected\n");

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [tip:x86/pti] x86/speculation: Use IBRS if available before calling into firmware
  2018-02-11 15:19 ` [PATCH v2.1] x86/speculation: Use IBRS if available before calling into firmware David Woodhouse
  2018-02-11 18:50   ` [PATCH] x86/speculation: Clean up various Spectre related details Ingo Molnar
@ 2018-02-11 19:19   ` tip-bot for David Woodhouse
  2018-02-12  5:59     ` afzal mohammed
                       ` (2 more replies)
  1 sibling, 3 replies; 74+ messages in thread
From: tip-bot for David Woodhouse @ 2018-02-11 19:19 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: mingo, hpa, tglx, linux-kernel, dwmw, torvalds, peterz

Commit-ID:  670c3e8da87fa4046a55077b1409cf250865a203
Gitweb:     https://git.kernel.org/tip/670c3e8da87fa4046a55077b1409cf250865a203
Author:     David Woodhouse <dwmw@amazon.co.uk>
AuthorDate: Sun, 11 Feb 2018 15:19:19 +0000
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sun, 11 Feb 2018 19:44:46 +0100

x86/speculation: Use IBRS if available before calling into firmware

Retpoline means the kernel is safe because it has no indirect branches.
But firmware isn't, so use IBRS for firmware calls if it's available.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1518362359-1005-1-git-send-email-dwmw@amazon.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/apm.h           |  6 ++++++
 arch/x86/include/asm/cpufeatures.h   |  1 +
 arch/x86/include/asm/efi.h           | 17 +++++++++++++++--
 arch/x86/include/asm/nospec-branch.h | 37 +++++++++++++++++++++++++++---------
 arch/x86/kernel/cpu/bugs.c           | 12 +++++++++++-
 drivers/watchdog/hpwdt.c             |  3 +++
 6 files changed, 64 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/apm.h b/arch/x86/include/asm/apm.h
index 4d4015d..c356098 100644
--- a/arch/x86/include/asm/apm.h
+++ b/arch/x86/include/asm/apm.h
@@ -7,6 +7,8 @@
 #ifndef _ASM_X86_MACH_DEFAULT_APM_H
 #define _ASM_X86_MACH_DEFAULT_APM_H
 
+#include <asm/nospec-branch.h>
+
 #ifdef APM_ZERO_SEGS
 #	define APM_DO_ZERO_SEGS \
 		"pushl %%ds\n\t" \
@@ -32,6 +34,7 @@ static inline void apm_bios_call_asm(u32 func, u32 ebx_in, u32 ecx_in,
 	 * N.B. We do NOT need a cld after the BIOS call
 	 * because we always save and restore the flags.
 	 */
+	firmware_restrict_branch_speculation_start();
 	__asm__ __volatile__(APM_DO_ZERO_SEGS
 		"pushl %%edi\n\t"
 		"pushl %%ebp\n\t"
@@ -44,6 +47,7 @@ static inline void apm_bios_call_asm(u32 func, u32 ebx_in, u32 ecx_in,
 		  "=S" (*esi)
 		: "a" (func), "b" (ebx_in), "c" (ecx_in)
 		: "memory", "cc");
+	firmware_restrict_branch_speculation_end();
 }
 
 static inline bool apm_bios_call_simple_asm(u32 func, u32 ebx_in,
@@ -56,6 +60,7 @@ static inline bool apm_bios_call_simple_asm(u32 func, u32 ebx_in,
 	 * N.B. We do NOT need a cld after the BIOS call
 	 * because we always save and restore the flags.
 	 */
+	firmware_restrict_branch_speculation_start();
 	__asm__ __volatile__(APM_DO_ZERO_SEGS
 		"pushl %%edi\n\t"
 		"pushl %%ebp\n\t"
@@ -68,6 +73,7 @@ static inline bool apm_bios_call_simple_asm(u32 func, u32 ebx_in,
 		  "=S" (si)
 		: "a" (func), "b" (ebx_in), "c" (ecx_in)
 		: "memory", "cc");
+	firmware_restrict_branch_speculation_end();
 	return error;
 }
 
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 73b5fff..66c1434 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -211,6 +211,7 @@
 #define X86_FEATURE_RSB_CTXSW		( 7*32+19) /* "" Fill RSB on context switches */
 
 #define X86_FEATURE_USE_IBPB		( 7*32+21) /* "" Indirect Branch Prediction Barrier enabled */
+#define X86_FEATURE_USE_IBRS_FW		( 7*32+22) /* "" Use IBRS during runtime firmware calls */
 
 /* Virtualization flags: Linux defined, word 8 */
 #define X86_FEATURE_TPR_SHADOW		( 8*32+ 0) /* Intel TPR Shadow */
diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 85f6ccb..a399c1e 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -6,6 +6,7 @@
 #include <asm/pgtable.h>
 #include <asm/processor-flags.h>
 #include <asm/tlb.h>
+#include <asm/nospec-branch.h>
 
 /*
  * We map the EFI regions needed for runtime services non-contiguously,
@@ -36,8 +37,18 @@
 
 extern asmlinkage unsigned long efi_call_phys(void *, ...);
 
-#define arch_efi_call_virt_setup()	kernel_fpu_begin()
-#define arch_efi_call_virt_teardown()	kernel_fpu_end()
+#define arch_efi_call_virt_setup()					\
+({									\
+	kernel_fpu_begin();						\
+	firmware_restrict_branch_speculation_start();			\
+})
+
+#define arch_efi_call_virt_teardown()					\
+({									\
+	firmware_restrict_branch_speculation_end();			\
+	kernel_fpu_end();						\
+})
+
 
 /*
  * Wrap all the virtual calls in a way that forces the parameters on the stack.
@@ -73,6 +84,7 @@ struct efi_scratch {
 	efi_sync_low_kernel_mappings();					\
 	preempt_disable();						\
 	__kernel_fpu_begin();						\
+	firmware_restrict_branch_speculation_start();			\
 									\
 	if (efi_scratch.use_pgd) {					\
 		efi_scratch.prev_cr3 = __read_cr3();			\
@@ -91,6 +103,7 @@ struct efi_scratch {
 		__flush_tlb_all();					\
 	}								\
 									\
+	firmware_restrict_branch_speculation_end();			\
 	__kernel_fpu_end();						\
 	preempt_enable();						\
 })
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 300cc15..788c4da 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -162,17 +162,36 @@ static inline void vmexit_fill_RSB(void)
 #endif
 }
 
+#define alternative_msr_write(_msr, _val, _feature)		\
+	asm volatile(ALTERNATIVE("",				\
+				 "movl %[msr], %%ecx\n\t"	\
+				 "movl %[val], %%eax\n\t"	\
+				 "movl $0, %%edx\n\t"		\
+				 "wrmsr",			\
+				 _feature)			\
+		     : : [msr] "i" (_msr), [val] "i" (_val)	\
+		     : "eax", "ecx", "edx", "memory")
+
 static inline void indirect_branch_prediction_barrier(void)
 {
-	asm volatile(ALTERNATIVE("",
-				 "movl %[msr], %%ecx\n\t"
-				 "movl %[val], %%eax\n\t"
-				 "movl $0, %%edx\n\t"
-				 "wrmsr",
-				 X86_FEATURE_USE_IBPB)
-		     : : [msr] "i" (MSR_IA32_PRED_CMD),
-			 [val] "i" (PRED_CMD_IBPB)
-		     : "eax", "ecx", "edx", "memory");
+	alternative_msr_write(MSR_IA32_PRED_CMD, PRED_CMD_IBPB,
+			      X86_FEATURE_USE_IBPB);
+}
+
+/*
+ * With retpoline, we must use IBRS to restrict branch prediction
+ * before calling into firmware.
+ */
+static inline void firmware_restrict_branch_speculation_start(void)
+{
+	alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS,
+			      X86_FEATURE_USE_IBRS_FW);
+}
+
+static inline void firmware_restrict_branch_speculation_end(void)
+{
+	alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,
+			      X86_FEATURE_USE_IBRS_FW);
 }
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 61152aa..6f6d763 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -303,6 +303,15 @@ retpoline_auto:
 		setup_force_cpu_cap(X86_FEATURE_USE_IBPB);
 		pr_info("Enabling Indirect Branch Prediction Barrier\n");
 	}
+
+	/*
+	 * Retpoline means the kernel is safe because it has no indirect
+	 * branches. But firmware isn't, so use IBRS to protect that.
+	 */
+	if (boot_cpu_has(X86_FEATURE_IBRS)) {
+		setup_force_cpu_cap(X86_FEATURE_USE_IBRS_FW);
+		pr_info("Enabling Restricted Speculation for firmware calls\n");
+	}
 }
 
 #undef pr_fmt
@@ -332,8 +341,9 @@ ssize_t cpu_show_spectre_v2(struct device *dev,
 	if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V2))
 		return sprintf(buf, "Not affected\n");
 
-	return sprintf(buf, "%s%s%s\n", spectre_v2_strings[spectre_v2_enabled],
+	return sprintf(buf, "%s%s%s%s\n", spectre_v2_strings[spectre_v2_enabled],
 		       boot_cpu_has(X86_FEATURE_USE_IBPB) ? ", IBPB" : "",
+		       boot_cpu_has(X86_FEATURE_USE_IBRS_FW) ? ", IBRS_FW" : "",
 		       spectre_v2_module_string());
 }
 #endif
diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
index 67fbe35..bab3721 100644
--- a/drivers/watchdog/hpwdt.c
+++ b/drivers/watchdog/hpwdt.c
@@ -38,6 +38,7 @@
 #endif /* CONFIG_HPWDT_NMI_DECODING */
 #include <asm/nmi.h>
 #include <asm/frame.h>
+#include <asm/nospec-branch.h>
 
 #define HPWDT_VERSION			"1.4.0"
 #define SECS_TO_TICKS(secs)		((secs) * 1000 / 128)
@@ -486,11 +487,13 @@ static int hpwdt_pretimeout(unsigned int ulReason, struct pt_regs *regs)
 	if (!hpwdt_nmi_decoding)
 		return NMI_DONE;
 
+	firmware_restrict_branch_speculation_start();
 	spin_lock_irqsave(&rom_lock, rom_pl);
 	if (!die_nmi_called && !is_icru && !is_uefi)
 		asminline_call(&cmn_regs, cru_rom_addr);
 	die_nmi_called = 1;
 	spin_unlock_irqrestore(&rom_lock, rom_pl);
+	firmware_restrict_branch_speculation_end();
 
 	if (allow_kdump)
 		hpwdt_stop();

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [PATCH] x86/speculation: Clean up various Spectre related details
  2018-02-11 18:50   ` [PATCH] x86/speculation: Clean up various Spectre related details Ingo Molnar
@ 2018-02-11 19:25     ` David Woodhouse
  2018-02-11 19:43       ` Ingo Molnar
  0 siblings, 1 reply; 74+ messages in thread
From: David Woodhouse @ 2018-02-11 19:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: x86, linux-kernel, tglx, karahmed, sironi, kvm, torvalds,
	pbonzini, bp, peterz, jmattson, rkrcmar, arjan.van.de.ven,
	dave.hansen

[-- Attachment #1: Type: text/plain, Size: 1660 bytes --]



On Sun, 2018-02-11 at 19:50 +0100, Ingo Molnar wrote:
> 
> From 82c2b2f29691143a05181333f387e786646aa28b Mon Sep 17 00:00:00 2001
> From: Ingo Molnar <mingo@kernel.org>
> Date: Sat, 10 Feb 2018 11:51:57 +0100
> Subject: [PATCH] x86/speculation: Clean up various Spectre related details
> 
> Harmonize all the Spectre messages so that a:
> 
>     dmesg | grep -i spectre
> 
> ... gives us most Spectre related kernel boot messages.
> 
> Also fix a few other details:
> 
>  - clarify a comment about firmware speculation control
> 
>  - s/KPTI/PTI
> 
>  - remove various line-breaks that made the code uglier
> 
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Arjan van de Ven <arjan@linux.intel.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: David Woodhouse <dwmw2@infradead.org>
> Cc: David Woodhouse <dwmw@amazon.co.uk>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Josh Poimboeuf <jpoimboe@redhat.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Ingo Molnar <mingo@kernel.org>



Acked-by: David Woodhouse <dwmw@amazon.co.uk>

with a couple of comments:


-        * If neither SMEP or KPTI are available, there is a risk of
+        * If neither SMEP or PTI are available, there is a risk of

Make that 'neither SMEP nor PTI' while you're at it though please;
that's bugged me a couple of times in passing.

And should these say 'Spectre v2' not just 'Spectre'?

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] x86/speculation: Clean up various Spectre related details
  2018-02-11 19:25     ` David Woodhouse
@ 2018-02-11 19:43       ` Ingo Molnar
  2018-02-12 15:30         ` David Woodhouse
  0 siblings, 1 reply; 74+ messages in thread
From: Ingo Molnar @ 2018-02-11 19:43 UTC (permalink / raw)
  To: David Woodhouse
  Cc: x86, linux-kernel, tglx, karahmed, sironi, kvm, torvalds,
	pbonzini, bp, peterz, jmattson, rkrcmar, arjan.van.de.ven,
	dave.hansen


* David Woodhouse <dwmw2@infradead.org> wrote:

> 
> 
> On Sun, 2018-02-11 at 19:50 +0100, Ingo Molnar wrote:
> > 
> > From 82c2b2f29691143a05181333f387e786646aa28b Mon Sep 17 00:00:00 2001
> > From: Ingo Molnar <mingo@kernel.org>
> > Date: Sat, 10 Feb 2018 11:51:57 +0100
> > Subject: [PATCH] x86/speculation: Clean up various Spectre related details
> > 
> > Harmonize all the Spectre messages so that a:
> > 
> >     dmesg | grep -i spectre
> > 
> > ... gives us most Spectre related kernel boot messages.
> > 
> > Also fix a few other details:
> > 
> >  - clarify a comment about firmware speculation control
> > 
> >  - s/KPTI/PTI
> > 
> >  - remove various line-breaks that made the code uglier
> > 
> > Cc: Andy Lutomirski <luto@kernel.org>
> > Cc: Arjan van de Ven <arjan@linux.intel.com>
> > Cc: Borislav Petkov <bp@alien8.de>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: David Woodhouse <dwmw2@infradead.org>
> > Cc: David Woodhouse <dwmw@amazon.co.uk>
> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Cc: Josh Poimboeuf <jpoimboe@redhat.com>
> > Cc: Linus Torvalds <torvalds@linux-foundation.org>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Signed-off-by: Ingo Molnar <mingo@kernel.org>
> 
> 
> 
> Acked-by: David Woodhouse <dwmw@amazon.co.uk>

Thanks, added.

> with a couple of comments:
> 
> 
> -        * If neither SMEP or KPTI are available, there is a risk of
> +        * If neither SMEP or PTI are available, there is a risk of
> 
> Make that 'neither SMEP nor PTI' while you're at it though please;
> that's bugged me a couple of times in passing.

Ok, fixed that too.

> 
> And should these say 'Spectre v2' not just 'Spectre'?

Yeah, you are probably right, but I didn't want to make the messages too specific 
- do we really know that this is the end of Spectre-style speculation holes?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [tip:x86/pti] x86/speculation: Use IBRS if available before calling into firmware
  2018-02-11 19:19   ` [tip:x86/pti] x86/speculation: Use IBRS if available before calling into firmware tip-bot for David Woodhouse
@ 2018-02-12  5:59     ` afzal mohammed
  2018-02-12 16:30       ` David Woodhouse
  2018-02-12 10:22     ` Ingo Molnar
  2018-02-16 18:44     ` [tip:x86/pti] x86/speculation: Use IBRS if available before calling into firmware Tim Chen
  2 siblings, 1 reply; 74+ messages in thread
From: afzal mohammed @ 2018-02-12  5:59 UTC (permalink / raw)
  To: mingo, hpa, tglx, torvalds, linux-kernel, dwmw, peterz; +Cc: linux-tip-commits

Hi,

On Sun, Feb 11, 2018 at 11:19:10AM -0800, tip-bot for David Woodhouse wrote:

> x86/speculation: Use IBRS if available before calling into firmware
> 
> Retpoline means the kernel is safe because it has no indirect branches.
> But firmware isn't, so use IBRS for firmware calls if it's available.

afaui, so only retpoline means still mitigation not enough.

Also David W has mentioned [1] that even with retpoline, IBPB is also
required (except Sky Lake).

If IBPB & IBRS is not supported by ucode, shouldn't the below indicate
some thing on the lines of Mitigation not enough ?

> -	return sprintf(buf, "%s%s%s\n", spectre_v2_strings[spectre_v2_enabled],
> +	return sprintf(buf, "%s%s%s%s\n", spectre_v2_strings[spectre_v2_enabled],
>  		       boot_cpu_has(X86_FEATURE_USE_IBPB) ? ", IBPB" : "",
> +		       boot_cpu_has(X86_FEATURE_USE_IBRS_FW) ? ", IBRS_FW" : "",
>  		       spectre_v2_module_string());

On 4.16-rc1, w/ GCC 7.3.0,

/sys/devices/system/cpu/vulnerabilities/meltdown:Mitigation: PTI
/sys/devices/system/cpu/vulnerabilities/spectre_v1:Mitigation: __user pointer sanitization
/sys/devices/system/cpu/vulnerabilities/spectre_v2:Mitigation: Full generic retpoline

Here for the user (at least for me), it is not clear whether the
mitigation is enough. In the present system (Ivy Bridge), as ucode
update is not available, IBPB is not printed along with
"spectre_v2:Mitigation", so unless i am missing something, till then
this system should be considered vulnerable, but for a user not
familiar with details of the issue, it cannot be deduced.

Perhaps an additional status field [OKAY,PARTIAL] to Mitigation in
sysfs might be helpful. All these changes are in the air for me, this
is from a user perspective, sorry if my feedback seems idiotic.

afzal


[1] lkml.kernel.org/r/1516638426.9521.20.camel@infradead.org

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v2 0/6] Spectre v2 updates
  2018-02-10 23:39 [PATCH v2 0/6] Spectre v2 updates David Woodhouse
                   ` (7 preceding siblings ...)
  2018-02-11 15:19 ` [PATCH v2.1] x86/speculation: Use IBRS if available before calling into firmware David Woodhouse
@ 2018-02-12  8:27 ` Paolo Bonzini
  2018-02-13  7:59   ` Ingo Molnar
  8 siblings, 1 reply; 74+ messages in thread
From: Paolo Bonzini @ 2018-02-12  8:27 UTC (permalink / raw)
  To: David Woodhouse, tglx, karahmed, sironi, x86, kvm, torvalds,
	linux-kernel, bp, peterz, jmattson, rkrcmar, arjan.van.de.ven,
	dave.hansen

On 11/02/2018 00:39, David Woodhouse wrote:
> Using retpoline ensures the kernel is safe because it doesn't contain
> any indirect branches, but firmware still can — and we make calls into
> firmware at runtime. Where the IBRS microcode support is available, use
> that before calling into firmware.
> 
> While doing that, I noticed that we were calling C functions without
> telling the compiler about the call-clobbered registers. Stop that.
> 
> This also contains the always_inline fix for the performance problem
> introduced by retpoline in KVM code, and fixes some other issues with
> the per-vCPU KVM handling for the SPEC_CTRL MSR.
> 
> Finally, update the microcode blacklist to reflect the latest
> information from Intel.
> 
> v2: Drop IBRS_ALL patch for the time being
>     Add KVM MSR fixes (karahmed)
>     Update microcode blacklist
> 
> 
> 
> David Woodhouse (4):
>   x86/speculation: Update Speculation Control microcode blacklist
>   Revert "x86/speculation: Simplify
>     indirect_branch_prediction_barrier()"
>   KVM: x86: Reduce retpoline performance impact in
>     slot_handle_level_range()
>   x86/speculation: Use IBRS if available before calling into firmware
> 
> KarimAllah Ahmed (2):
>   X86/nVMX: Properly set spec_ctrl and pred_cmd before merging MSRs
>   KVM/nVMX: Set the CPU_BASED_USE_MSR_BITMAPS if we have a valid L02 MSR
>     bitmap
> 
>  arch/x86/include/asm/apm.h           |  6 ++++++
>  arch/x86/include/asm/cpufeatures.h   |  1 +
>  arch/x86/include/asm/efi.h           | 17 +++++++++++++++--
>  arch/x86/include/asm/nospec-branch.h | 32 ++++++++++++++++++++++++++++----
>  arch/x86/include/asm/processor.h     |  3 ---
>  arch/x86/kernel/cpu/bugs.c           | 18 +++++++++++-------
>  arch/x86/kernel/cpu/intel.c          |  4 ----
>  arch/x86/kvm/mmu.c                   | 10 +++++-----
>  arch/x86/kvm/vmx.c                   |  7 ++++---
>  drivers/watchdog/hpwdt.c             |  3 +++
>  10 files changed, 73 insertions(+), 28 deletions(-)
> 

Acked-by: Paolo Bonzini <pbonzini@redhat.com>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v2 1/6] x86/speculation: Update Speculation Control microcode blacklist
  2018-02-10 23:39 ` [PATCH v2 1/6] x86/speculation: Update Speculation Control microcode blacklist David Woodhouse
  2018-02-11 12:08   ` [tip:x86/pti] " tip-bot for David Woodhouse
@ 2018-02-12  9:50   ` Darren Kenny
  2018-02-12 14:16   ` David Woodhouse
  2 siblings, 0 replies; 74+ messages in thread
From: Darren Kenny @ 2018-02-12  9:50 UTC (permalink / raw)
  To: David Woodhouse
  Cc: tglx, karahmed, sironi, x86, kvm, torvalds, pbonzini,
	linux-kernel, bp, peterz, jmattson, rkrcmar, arjan.van.de.ven,
	dave.hansen

On Sat, Feb 10, 2018 at 11:39:22PM +0000, David Woodhouse wrote:
>Intel have retroactively blessed the 0xc2 microcode on Skylake mobile
>and desktop parts, and the Gemini Lake 0x22 microcode is apparently fine
>too. We blacklisted the latter purely because it was present with all
>the other problematic ones in the 2018-01-08 release, but now it's
>explicitly listed as OK.
>
>We still list 0x84 for the various Kaby Lake / Coffee Lake parts, as
>that appeared in one version of the blacklist and then reverted to
>0x80 again. We can change it if 0x84 is actually announced to be safe.
>
>Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

Reviewed-by: Darren Kenny <darren.kenny@oracle.com>

>---
> arch/x86/kernel/cpu/intel.c | 4 ----
> 1 file changed, 4 deletions(-)
>
>diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
>index 319bf98..f73b814 100644
>--- a/arch/x86/kernel/cpu/intel.c
>+++ b/arch/x86/kernel/cpu/intel.c
>@@ -123,8 +123,6 @@ static const struct sku_microcode spectre_bad_microcodes[] = {
> 	{ INTEL_FAM6_KABYLAKE_MOBILE,	0x09,	0x84 },
> 	{ INTEL_FAM6_SKYLAKE_X,		0x03,	0x0100013e },
> 	{ INTEL_FAM6_SKYLAKE_X,		0x04,	0x0200003c },
>-	{ INTEL_FAM6_SKYLAKE_MOBILE,	0x03,	0xc2 },
>-	{ INTEL_FAM6_SKYLAKE_DESKTOP,	0x03,	0xc2 },
> 	{ INTEL_FAM6_BROADWELL_CORE,	0x04,	0x28 },
> 	{ INTEL_FAM6_BROADWELL_GT3E,	0x01,	0x1b },
> 	{ INTEL_FAM6_BROADWELL_XEON_D,	0x02,	0x14 },
>@@ -136,8 +134,6 @@ static const struct sku_microcode spectre_bad_microcodes[] = {
> 	{ INTEL_FAM6_HASWELL_X,		0x02,	0x3b },
> 	{ INTEL_FAM6_HASWELL_X,		0x04,	0x10 },
> 	{ INTEL_FAM6_IVYBRIDGE_X,	0x04,	0x42a },
>-	/* Updated in the 20180108 release; blacklist until we know otherwise */
>-	{ INTEL_FAM6_ATOM_GEMINI_LAKE,	0x01,	0x22 },
> 	/* Observed in the wild */
> 	{ INTEL_FAM6_SANDYBRIDGE_X,	0x06,	0x61b },
> 	{ INTEL_FAM6_SANDYBRIDGE_X,	0x07,	0x712 },
>-- 
>2.7.4
>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [tip:x86/pti] x86/speculation: Use IBRS if available before calling into firmware
  2018-02-11 19:19   ` [tip:x86/pti] x86/speculation: Use IBRS if available before calling into firmware tip-bot for David Woodhouse
  2018-02-12  5:59     ` afzal mohammed
@ 2018-02-12 10:22     ` Ingo Molnar
  2018-02-12 11:50       ` Peter Zijlstra
  2018-02-12 16:13       ` Dave Hansen
  2018-02-16 18:44     ` [tip:x86/pti] x86/speculation: Use IBRS if available before calling into firmware Tim Chen
  2 siblings, 2 replies; 74+ messages in thread
From: Ingo Molnar @ 2018-02-12 10:22 UTC (permalink / raw)
  To: hpa, tglx, torvalds, linux-kernel, dwmw, peterz
  Cc: linux-tip-commits, Dave Hansen, Borislav Petkov, H. Peter Anvin,
	Arjan van de Ven


* tip-bot for David Woodhouse <tipbot@zytor.com> wrote:

> Commit-ID:  670c3e8da87fa4046a55077b1409cf250865a203
> Gitweb:     https://git.kernel.org/tip/670c3e8da87fa4046a55077b1409cf250865a203
> Author:     David Woodhouse <dwmw@amazon.co.uk>
> AuthorDate: Sun, 11 Feb 2018 15:19:19 +0000
> Committer:  Ingo Molnar <mingo@kernel.org>
> CommitDate: Sun, 11 Feb 2018 19:44:46 +0100
> 
> x86/speculation: Use IBRS if available before calling into firmware
> 
> Retpoline means the kernel is safe because it has no indirect branches.
> But firmware isn't, so use IBRS for firmware calls if it's available.
> 
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Link: http://lkml.kernel.org/r/1518362359-1005-1-git-send-email-dwmw@amazon.co.uk
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> ---
>  arch/x86/include/asm/apm.h           |  6 ++++++
>  arch/x86/include/asm/cpufeatures.h   |  1 +
>  arch/x86/include/asm/efi.h           | 17 +++++++++++++++--
>  arch/x86/include/asm/nospec-branch.h | 37 +++++++++++++++++++++++++++---------
>  arch/x86/kernel/cpu/bugs.c           | 12 +++++++++++-
>  drivers/watchdog/hpwdt.c             |  3 +++
>  6 files changed, 64 insertions(+), 12 deletions(-)

> --- a/arch/x86/include/asm/nospec-branch.h
> +++ b/arch/x86/include/asm/nospec-branch.h

> +/*
> + * With retpoline, we must use IBRS to restrict branch prediction
> + * before calling into firmware.
> + */
> +static inline void firmware_restrict_branch_speculation_start(void)
> +{
> +	alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS,
> +			      X86_FEATURE_USE_IBRS_FW);
> +}
> +
> +static inline void firmware_restrict_branch_speculation_end(void)
> +{
> +	alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,
> +			      X86_FEATURE_USE_IBRS_FW);

BTW., there's a detail that only occurred to me today, this enabling/disabling 
sequence is not NMI safe, and it might be called from NMI context:

> --- a/drivers/watchdog/hpwdt.c
> +++ b/drivers/watchdog/hpwdt.c
> @@ -38,6 +38,7 @@
>  #endif /* CONFIG_HPWDT_NMI_DECODING */
>  #include <asm/nmi.h>
>  #include <asm/frame.h>
> +#include <asm/nospec-branch.h>
>  
>  #define HPWDT_VERSION			"1.4.0"
>  #define SECS_TO_TICKS(secs)		((secs) * 1000 / 128)
> @@ -486,11 +487,13 @@ static int hpwdt_pretimeout(unsigned int ulReason, struct pt_regs *regs)
>  	if (!hpwdt_nmi_decoding)
>  		return NMI_DONE;
>  
> +	firmware_restrict_branch_speculation_start();
>  	spin_lock_irqsave(&rom_lock, rom_pl);
>  	if (!die_nmi_called && !is_icru && !is_uefi)
>  		asminline_call(&cmn_regs, cru_rom_addr);
>  	die_nmi_called = 1;
>  	spin_unlock_irqrestore(&rom_lock, rom_pl);
> +	firmware_restrict_branch_speculation_end();
>  
>  	if (allow_kdump)
>  		hpwdt_stop();

But ... this is a (comparatively rare) hardware-watchdog tick function, and the 
chance of racing with say an EFI call seems minuscule. The race will result in an 
EFI call being performed with speculation enabled, sporadically.

Is this something we should worry about?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [tip:x86/pti] x86/speculation: Use IBRS if available before calling into firmware
  2018-02-12 10:22     ` Ingo Molnar
@ 2018-02-12 11:50       ` Peter Zijlstra
  2018-02-12 12:27         ` David Woodhouse
  2018-02-12 12:28         ` Peter Zijlstra
  2018-02-12 16:13       ` Dave Hansen
  1 sibling, 2 replies; 74+ messages in thread
From: Peter Zijlstra @ 2018-02-12 11:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: hpa, tglx, torvalds, linux-kernel, dwmw, linux-tip-commits,
	Dave Hansen, Borislav Petkov, Arjan van de Ven

On Mon, Feb 12, 2018 at 11:22:11AM +0100, Ingo Molnar wrote:
> > +static inline void firmware_restrict_branch_speculation_start(void)
> > +{
> > +	alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS,
> > +			      X86_FEATURE_USE_IBRS_FW);
> > +}
> > +
> > +static inline void firmware_restrict_branch_speculation_end(void)
> > +{
> > +	alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,
> > +			      X86_FEATURE_USE_IBRS_FW);
> 
> BTW., there's a detail that only occurred to me today, this enabling/disabling 
> sequence is not NMI safe, and it might be called from NMI context:

Wait, we're doing firmware from NMI? That sounds like a _REALLY_ bad
idea.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [tip:x86/pti] x86/speculation: Use IBRS if available before calling into firmware
  2018-02-12 11:50       ` Peter Zijlstra
@ 2018-02-12 12:27         ` David Woodhouse
  2018-02-12 13:06           ` Peter Zijlstra
  2018-02-13  7:58           ` Ingo Molnar
  2018-02-12 12:28         ` Peter Zijlstra
  1 sibling, 2 replies; 74+ messages in thread
From: David Woodhouse @ 2018-02-12 12:27 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Thomas Mingarelli
  Cc: hpa, tglx, torvalds, linux-kernel, linux-tip-commits,
	Dave Hansen, Borislav Petkov, Arjan van de Ven

[-- Attachment #1: Type: text/plain, Size: 1658 bytes --]

On Mon, 2018-02-12 at 12:50 +0100, Peter Zijlstra wrote:
> On Mon, Feb 12, 2018 at 11:22:11AM +0100, Ingo Molnar wrote:
> > > +static inline void firmware_restrict_branch_speculation_start(void)
> > > +{
> > > +   alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS,
> > > +                         X86_FEATURE_USE_IBRS_FW);
> > > +}
> > > +
> > > +static inline void firmware_restrict_branch_speculation_end(void)
> > > +{
> > > +   alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,
> > > +                         X86_FEATURE_USE_IBRS_FW);
> > 
> > BTW., there's a detail that only occurred to me today, this enabling/disabling 
> > sequence is not NMI safe, and it might be called from NMI context:
> 
> Wait, we're doing firmware from NMI? That sounds like a _REALLY_ bad
> idea.

And spin_lock_irqsave() too. Which is probably why I missed the fact
that this was being called in NMI context.

Yay for HP and their persistent attempts to "value subtract" in their
firmware offerings.

I'm tempted to drop that part of the patch and declare that if you're
using this driver, the potential for stray branch prediction when you
call into the firmware from the NMI handler is the *least* of your
problems.

I *will* go back over the other parts of the patch and audit them for
preempt safety though; there could potentially be a similar issue
there. I think I put them close enough to the actual firmware calls
that if we aren't already preempt-safe then we were screwed anyway, but
*maybe* there's merit in making the macros explicitly bump the preempt
count anyway.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [tip:x86/pti] x86/speculation: Use IBRS if available before calling into firmware
  2018-02-12 11:50       ` Peter Zijlstra
  2018-02-12 12:27         ` David Woodhouse
@ 2018-02-12 12:28         ` Peter Zijlstra
  1 sibling, 0 replies; 74+ messages in thread
From: Peter Zijlstra @ 2018-02-12 12:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: hpa, tglx, torvalds, linux-kernel, dwmw, linux-tip-commits,
	Dave Hansen, Borislav Petkov, Arjan van de Ven

On Mon, Feb 12, 2018 at 12:50:02PM +0100, Peter Zijlstra wrote:
> On Mon, Feb 12, 2018 at 11:22:11AM +0100, Ingo Molnar wrote:
> > > +static inline void firmware_restrict_branch_speculation_start(void)
> > > +{
> > > +	alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS,
> > > +			      X86_FEATURE_USE_IBRS_FW);
> > > +}
> > > +
> > > +static inline void firmware_restrict_branch_speculation_end(void)
> > > +{
> > > +	alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,
> > > +			      X86_FEATURE_USE_IBRS_FW);
> > 
> > BTW., there's a detail that only occurred to me today, this enabling/disabling 
> > sequence is not NMI safe, and it might be called from NMI context:
> 
> Wait, we're doing firmware from NMI? That sounds like a _REALLY_ bad
> idea.

Argh, its that stupid watchdog driver again.. Not only does it call
firmware, it also uses (!raw) spinlock.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [tip:x86/pti] x86/speculation: Use IBRS if available before calling into firmware
  2018-02-12 12:27         ` David Woodhouse
@ 2018-02-12 13:06           ` Peter Zijlstra
  2018-02-13  7:58           ` Ingo Molnar
  1 sibling, 0 replies; 74+ messages in thread
From: Peter Zijlstra @ 2018-02-12 13:06 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Ingo Molnar, Thomas Mingarelli, hpa, tglx, torvalds,
	linux-kernel, linux-tip-commits, Dave Hansen, Borislav Petkov,
	Arjan van de Ven

On Mon, Feb 12, 2018 at 12:27:19PM +0000, David Woodhouse wrote:
> On Mon, 2018-02-12 at 12:50 +0100, Peter Zijlstra wrote:

> > Wait, we're doing firmware from NMI? That sounds like a _REALLY_ bad
> > idea.
> 
> And spin_lock_irqsave() too. Which is probably why I missed the fact
> that this was being called in NMI context.
> 
> Yay for HP and their persistent attempts to "value subtract" in their
> firmware offerings.
> 
> I'm tempted to drop that part of the patch and declare that if you're
> using this driver, the potential for stray branch prediction when you
> call into the firmware from the NMI handler is the *least* of your
> problems.

We should probably mark it BROKEN or something, or move it into staging.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v2 1/6] x86/speculation: Update Speculation Control microcode blacklist
  2018-02-10 23:39 ` [PATCH v2 1/6] x86/speculation: Update Speculation Control microcode blacklist David Woodhouse
  2018-02-11 12:08   ` [tip:x86/pti] " tip-bot for David Woodhouse
  2018-02-12  9:50   ` [PATCH v2 1/6] " Darren Kenny
@ 2018-02-12 14:16   ` David Woodhouse
  2018-02-12 14:32     ` Thomas Gleixner
  2 siblings, 1 reply; 74+ messages in thread
From: David Woodhouse @ 2018-02-12 14:16 UTC (permalink / raw)
  To: tglx, karahmed, sironi, x86, kvm, torvalds, pbonzini,
	linux-kernel, bp, peterz, jmattson, rkrcmar, arjan.van.de.ven,
	dave.hansen

[-- Attachment #1: Type: text/plain, Size: 1424 bytes --]

On Sat, 2018-02-10 at 23:39 +0000, David Woodhouse wrote:
> 
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -123,8 +123,6 @@ static const struct sku_microcode
> spectre_bad_microcodes[] = {
>         { INTEL_FAM6_KABYLAKE_MOBILE,   0x09,   0x84 },
>         { INTEL_FAM6_SKYLAKE_X,         0x03,   0x0100013e },
>         { INTEL_FAM6_SKYLAKE_X,         0x04,   0x0200003c },
> -       { INTEL_FAM6_SKYLAKE_MOBILE,    0x03,   0xc2 },
> -       { INTEL_FAM6_SKYLAKE_DESKTOP,   0x03,   0xc2 },
>         { INTEL_FAM6_BROADWELL_CORE,    0x04,   0x28 },
>         { INTEL_FAM6_BROADWELL_GT3E,    0x01,   0x1b },
>         { INTEL_FAM6_BROADWELL_XEON_D,  0x02,   0x14 },

Arjan points out that the SKYLAKE_DESKTOP one there is premature. There
are *two* rows in Intel's table which match that CPUID (506E3).

Only *one* of them ("Skylake H/S") has cleared the 0xC2 microcode for
use, while the "Skylake E3" line still doesn't approve it. (But doesn't
explicitly list it in the "STOP deploying" column any more either,
which it probably should, and might have helped me notice.)

Ingo, Thomas: do you want to drop this patch which is already in
tip/x86/pti and have a new version with the SKYLAKE_DESKTOP no longer
removed? Or shall I send an incremental patch to add it back?

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v2 1/6] x86/speculation: Update Speculation Control microcode blacklist
  2018-02-12 14:16   ` David Woodhouse
@ 2018-02-12 14:32     ` Thomas Gleixner
  0 siblings, 0 replies; 74+ messages in thread
From: Thomas Gleixner @ 2018-02-12 14:32 UTC (permalink / raw)
  To: David Woodhouse
  Cc: karahmed, sironi, x86, kvm, torvalds, pbonzini, linux-kernel, bp,
	peterz, jmattson, rkrcmar, arjan.van.de.ven, dave.hansen

[-- Attachment #1: Type: text/plain, Size: 1542 bytes --]

On Mon, 12 Feb 2018, David Woodhouse wrote:

> On Sat, 2018-02-10 at 23:39 +0000, David Woodhouse wrote:
> > 
> > --- a/arch/x86/kernel/cpu/intel.c
> > +++ b/arch/x86/kernel/cpu/intel.c
> > @@ -123,8 +123,6 @@ static const struct sku_microcode
> > spectre_bad_microcodes[] = {
> >         { INTEL_FAM6_KABYLAKE_MOBILE,   0x09,   0x84 },
> >         { INTEL_FAM6_SKYLAKE_X,         0x03,   0x0100013e },
> >         { INTEL_FAM6_SKYLAKE_X,         0x04,   0x0200003c },
> > -       { INTEL_FAM6_SKYLAKE_MOBILE,    0x03,   0xc2 },
> > -       { INTEL_FAM6_SKYLAKE_DESKTOP,   0x03,   0xc2 },
> >         { INTEL_FAM6_BROADWELL_CORE,    0x04,   0x28 },
> >         { INTEL_FAM6_BROADWELL_GT3E,    0x01,   0x1b },
> >         { INTEL_FAM6_BROADWELL_XEON_D,  0x02,   0x14 },
> 
> Arjan points out that the SKYLAKE_DESKTOP one there is premature. There
> are *two* rows in Intel's table which match that CPUID (506E3).
> 
> Only *one* of them ("Skylake H/S") has cleared the 0xC2 microcode for
> use, while the "Skylake E3" line still doesn't approve it. (But doesn't
> explicitly list it in the "STOP deploying" column any more either,
> which it probably should, and might have helped me notice.)
> 
> Ingo, Thomas: do you want to drop this patch which is already in
> tip/x86/pti and have a new version with the SKYLAKE_DESKTOP no longer
> removed? Or shall I send an incremental patch to add it back?

Delta patch please.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] x86/speculation: Clean up various Spectre related details
  2018-02-11 19:43       ` Ingo Molnar
@ 2018-02-12 15:30         ` David Woodhouse
  2018-02-13  8:04           ` Ingo Molnar
  0 siblings, 1 reply; 74+ messages in thread
From: David Woodhouse @ 2018-02-12 15:30 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: x86, linux-kernel, tglx, karahmed, sironi, kvm, torvalds,
	pbonzini, bp, peterz, jmattson, rkrcmar, arjan.van.de.ven,
	dave.hansen

[-- Attachment #1: Type: text/plain, Size: 652 bytes --]



On Sun, 2018-02-11 at 20:43 +0100, Ingo Molnar wrote:
> > And should these say 'Spectre v2' not just 'Spectre'?
> 
> Yeah, you are probably right, but I didn't want to make the messages too specific 
> - do we really know that this is the end of Spectre-style speculation holes?

Well... if a new problem is also remedied by use if IBRS/IBPB and
retpoline, I think we can happily call it a subclass of "Spectre v2".

And if it *isn't* addressed by those same things, then it's clearly
something different. Either way, these messages should be 'v2', no?

On the whole though, there are plenty of better things to be worrying
about :)

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [tip:x86/pti] x86/speculation: Use IBRS if available before calling into firmware
  2018-02-12 10:22     ` Ingo Molnar
  2018-02-12 11:50       ` Peter Zijlstra
@ 2018-02-12 16:13       ` Dave Hansen
  2018-02-12 16:58         ` Peter Zijlstra
  1 sibling, 1 reply; 74+ messages in thread
From: Dave Hansen @ 2018-02-12 16:13 UTC (permalink / raw)
  To: Ingo Molnar, hpa, tglx, torvalds, linux-kernel, dwmw, peterz
  Cc: linux-tip-commits, Borislav Petkov, Arjan van de Ven

On 02/12/2018 02:22 AM, Ingo Molnar wrote:
>> +static inline void firmware_restrict_branch_speculation_end(void)
>> +{
>> +	alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,
>> +			      X86_FEATURE_USE_IBRS_FW);
> BTW., there's a detail that only occurred to me today, this enabling/disabling 
> sequence is not NMI safe, and it might be called from NMI context:

FWIW, Tim Chen and I talked about this a bunch.  We ended up just
saving/restoring the MSR verbatim in the NMI handler the same way we do
CR3, stashing it in a high general-purpose-register (r%12?).  That costs
a RDMSR (at least) and an WRMSR (which you can optimize out).  We have a
patch for that somewhere if anybody wants it.

We also came to the same conclusion that it's a rather challenging thing
to exploit these cases, especially when you consider that we can easily
do RSB stuffing on NMI exit just before going back to the firmware.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [tip:x86/pti] x86/speculation: Use IBRS if available before calling into firmware
  2018-02-12  5:59     ` afzal mohammed
@ 2018-02-12 16:30       ` David Woodhouse
  0 siblings, 0 replies; 74+ messages in thread
From: David Woodhouse @ 2018-02-12 16:30 UTC (permalink / raw)
  To: afzal mohammed, mingo, hpa, tglx, torvalds, linux-kernel, peterz
  Cc: linux-tip-commits, Wieczorkiewicz, Pawel

[-- Attachment #1: Type: text/plain, Size: 3256 bytes --]



On Mon, 2018-02-12 at 11:29 +0530, afzal mohammed wrote:
> Hi,
> 
> On Sun, Feb 11, 2018 at 11:19:10AM -0800, tip-bot for David Woodhouse wrote:
> 
> > 
> > x86/speculation: Use IBRS if available before calling into firmware
> > 
> > Retpoline means the kernel is safe because it has no indirect branches.
> > But firmware isn't, so use IBRS for firmware calls if it's available.
>
> afaui, so only retpoline means still mitigation not enough.
> 
> Also David W has mentioned [1] that even with retpoline, IBPB is also
> required (except Sky Lake).

Retpoline is sufficient to protect the *kernel*, which is the biggest
target. (Except on Skylake, where IBRS is the only full mitigation and
people are still working trying to come up with a "good enough"
mitigation that isn't IBRS.)

On all CPUs, you need IBPB to protect userspace processes from each
other, although since it's slow we don't actually *do* that for every
context switch; only when switching to non-dumpable processes.

That IBPB requirement for protecting userspace is true even on the next
generation of CPUs with the "Enhanced IBRS" (IBRS_ALL) feature. It only
goes away in CPUs which are even *further* in the future, when Intel
manage to fix it completely in hardware. They haven't even documented
the feature bit they're going to advertise to indicate that fix yet!


> If IBPB & IBRS is not supported by ucode, shouldn't the below indicate
> some thing on the lines of Mitigation not enough ?
> 
> > 
> > -	return sprintf(buf, "%s%s%s\n", spectre_v2_strings[spectre_v2_enabled],
> > +	return sprintf(buf, "%s%s%s%s\n", spectre_v2_strings[spectre_v2_enabled],
> >  		       boot_cpu_has(X86_FEATURE_USE_IBPB) ? ", IBPB" : "",
> > +		       boot_cpu_has(X86_FEATURE_USE_IBRS_FW) ? ", IBRS_FW" : "",
> >  		       spectre_v2_module_string());
> On 4.16-rc1, w/ GCC 7.3.0,
> 
> /sys/devices/system/cpu/vulnerabilities/meltdown:Mitigation: PTI
> /sys/devices/system/cpu/vulnerabilities/spectre_v1:Mitigation: __user pointer sanitization
> /sys/devices/system/cpu/vulnerabilities/spectre_v2:Mitigation: Full generic retpoline
> 
> Here for the user (at least for me), it is not clear whether the
> mitigation is enough. In the present system (Ivy Bridge), as ucode
> update is not available, IBPB is not printed along with
> "spectre_v2:Mitigation", so unless i am missing something, till then
> this system should be considered vulnerable, but for a user not
> familiar with details of the issue, it cannot be deduced.
> 
> Perhaps an additional status field [OKAY,PARTIAL] to Mitigation in
> sysfs might be helpful. All these changes are in the air for me, this
> is from a user perspective, sorry if my feedback seems idiotic.

Given that we only do it for non-dumpable processes, it's *always*
going to be only partial. (Although I think Thomas was looking at a
command line option to  make that happen on every context switch?)

And on Skylake the current plan is that strictly speaking it would also
be partial.

I understand the concern, but I'm not sure that there's much we can do
to improve it. If it says "Mitigation:" that's generally OK, and if it
says anything else, it's not.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [tip:x86/pti] x86/speculation: Use IBRS if available before calling into firmware
  2018-02-12 16:13       ` Dave Hansen
@ 2018-02-12 16:58         ` Peter Zijlstra
  2018-02-13  7:55           ` Ingo Molnar
  0 siblings, 1 reply; 74+ messages in thread
From: Peter Zijlstra @ 2018-02-12 16:58 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Ingo Molnar, hpa, tglx, torvalds, linux-kernel, dwmw,
	linux-tip-commits, Borislav Petkov, Arjan van de Ven

On Mon, Feb 12, 2018 at 08:13:31AM -0800, Dave Hansen wrote:
> On 02/12/2018 02:22 AM, Ingo Molnar wrote:
> >> +static inline void firmware_restrict_branch_speculation_end(void)
> >> +{
> >> +	alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,
> >> +			      X86_FEATURE_USE_IBRS_FW);
> > BTW., there's a detail that only occurred to me today, this enabling/disabling 
> > sequence is not NMI safe, and it might be called from NMI context:
> 
> FWIW, Tim Chen and I talked about this a bunch.  We ended up just
> saving/restoring the MSR verbatim in the NMI handler the same way we do
> CR3, stashing it in a high general-purpose-register (r%12?).  That costs
> a RDMSR (at least) and an WRMSR (which you can optimize out).  We have a
> patch for that somewhere if anybody wants it.

I would really rather not do that on the NMI path.. And if we _have_ to,
please keep a software shadow of that MSR state, such that we can avoid
touching that MSR 99% of the time.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [tip:x86/pti] x86/speculation: Use IBRS if available before calling into firmware
  2018-02-12 16:58         ` Peter Zijlstra
@ 2018-02-13  7:55           ` Ingo Molnar
  2018-02-14  1:49             ` Tim Chen
  0 siblings, 1 reply; 74+ messages in thread
From: Ingo Molnar @ 2018-02-13  7:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dave Hansen, hpa, tglx, torvalds, linux-kernel, dwmw,
	linux-tip-commits, Borislav Petkov, Arjan van de Ven


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Mon, Feb 12, 2018 at 08:13:31AM -0800, Dave Hansen wrote:
> > On 02/12/2018 02:22 AM, Ingo Molnar wrote:
> > >> +static inline void firmware_restrict_branch_speculation_end(void)
> > >> +{
> > >> +	alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,
> > >> +			      X86_FEATURE_USE_IBRS_FW);
> > > BTW., there's a detail that only occurred to me today, this enabling/disabling 
> > > sequence is not NMI safe, and it might be called from NMI context:
> > 
> > FWIW, Tim Chen and I talked about this a bunch.  We ended up just
> > saving/restoring the MSR verbatim in the NMI handler the same way we do
> > CR3, stashing it in a high general-purpose-register (r%12?).  That costs
> > a RDMSR (at least) and an WRMSR (which you can optimize out).  We have a
> > patch for that somewhere if anybody wants it.
> 
> I would really rather not do that on the NMI path.. And if we _have_ to,
> please keep a software shadow of that MSR state, such that we can avoid
> touching that MSR 99% of the time.

Yeah, I'd rather avoid doing firmware calls from NMI context altogether.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [tip:x86/pti] x86/speculation: Use IBRS if available before calling into firmware
  2018-02-12 12:27         ` David Woodhouse
  2018-02-12 13:06           ` Peter Zijlstra
@ 2018-02-13  7:58           ` Ingo Molnar
  1 sibling, 0 replies; 74+ messages in thread
From: Ingo Molnar @ 2018-02-13  7:58 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Peter Zijlstra, Thomas Mingarelli, hpa, tglx, torvalds,
	linux-kernel, linux-tip-commits, Dave Hansen, Borislav Petkov,
	Arjan van de Ven


* David Woodhouse <dwmw2@infradead.org> wrote:

> On Mon, 2018-02-12 at 12:50 +0100, Peter Zijlstra wrote:
> > On Mon, Feb 12, 2018 at 11:22:11AM +0100, Ingo Molnar wrote:
> > > > +static inline void firmware_restrict_branch_speculation_start(void)
> > > > +{
> > > > +   alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS,
> > > > +                         X86_FEATURE_USE_IBRS_FW);
> > > > +}
> > > > +
> > > > +static inline void firmware_restrict_branch_speculation_end(void)
> > > > +{
> > > > +   alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,
> > > > +                         X86_FEATURE_USE_IBRS_FW);
> > > 
> > > BTW., there's a detail that only occurred to me today, this enabling/disabling 
> > > sequence is not NMI safe, and it might be called from NMI context:
> > 
> > Wait, we're doing firmware from NMI? That sounds like a _REALLY_ bad
> > idea.
> 
> And spin_lock_irqsave() too. Which is probably why I missed the fact
> that this was being called in NMI context.
> 
> Yay for HP and their persistent attempts to "value subtract" in their
> firmware offerings.
> 
> I'm tempted to drop that part of the patch and declare that if you're
> using this driver, the potential for stray branch prediction when you
> call into the firmware from the NMI handler is the *least* of your
> problems.
> 
> I *will* go back over the other parts of the patch and audit them for
> preempt safety though; there could potentially be a similar issue
> there. I think I put them close enough to the actual firmware calls
> that if we aren't already preempt-safe then we were screwed anyway, but
> *maybe* there's merit in making the macros explicitly bump the preempt
> count anyway.

Ok, meanwhile I'm removing this patch from the x86/pti branch, and since the 
branch has to be rebased anyway, I'll merge these into a single patch:

85d8426e0720: x86/speculation: Correct Speculation Control microcode blacklist again
1751342095f0: x86/speculation: Update Speculation Control microcode blacklist

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v2 0/6] Spectre v2 updates
  2018-02-12  8:27 ` [PATCH v2 0/6] Spectre v2 updates Paolo Bonzini
@ 2018-02-13  7:59   ` Ingo Molnar
  0 siblings, 0 replies; 74+ messages in thread
From: Ingo Molnar @ 2018-02-13  7:59 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: David Woodhouse, tglx, karahmed, sironi, x86, kvm, torvalds,
	linux-kernel, bp, peterz, jmattson, rkrcmar, arjan.van.de.ven,
	dave.hansen


* Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 11/02/2018 00:39, David Woodhouse wrote:
> > Using retpoline ensures the kernel is safe because it doesn't contain
> > any indirect branches, but firmware still can — and we make calls into
> > firmware at runtime. Where the IBRS microcode support is available, use
> > that before calling into firmware.
> > 
> > While doing that, I noticed that we were calling C functions without
> > telling the compiler about the call-clobbered registers. Stop that.
> > 
> > This also contains the always_inline fix for the performance problem
> > introduced by retpoline in KVM code, and fixes some other issues with
> > the per-vCPU KVM handling for the SPEC_CTRL MSR.
> > 
> > Finally, update the microcode blacklist to reflect the latest
> > information from Intel.
> > 
> > v2: Drop IBRS_ALL patch for the time being
> >     Add KVM MSR fixes (karahmed)
> >     Update microcode blacklist
> > 
> > 
> > 
> > David Woodhouse (4):
> >   x86/speculation: Update Speculation Control microcode blacklist
> >   Revert "x86/speculation: Simplify
> >     indirect_branch_prediction_barrier()"
> >   KVM: x86: Reduce retpoline performance impact in
> >     slot_handle_level_range()
> >   x86/speculation: Use IBRS if available before calling into firmware
> > 
> > KarimAllah Ahmed (2):
> >   X86/nVMX: Properly set spec_ctrl and pred_cmd before merging MSRs
> >   KVM/nVMX: Set the CPU_BASED_USE_MSR_BITMAPS if we have a valid L02 MSR
> >     bitmap
> > 
> >  arch/x86/include/asm/apm.h           |  6 ++++++
> >  arch/x86/include/asm/cpufeatures.h   |  1 +
> >  arch/x86/include/asm/efi.h           | 17 +++++++++++++++--
> >  arch/x86/include/asm/nospec-branch.h | 32 ++++++++++++++++++++++++++++----
> >  arch/x86/include/asm/processor.h     |  3 ---
> >  arch/x86/kernel/cpu/bugs.c           | 18 +++++++++++-------
> >  arch/x86/kernel/cpu/intel.c          |  4 ----
> >  arch/x86/kvm/mmu.c                   | 10 +++++-----
> >  arch/x86/kvm/vmx.c                   |  7 ++++---
> >  drivers/watchdog/hpwdt.c             |  3 +++
> >  10 files changed, 73 insertions(+), 28 deletions(-)
> > 
> 
> Acked-by: Paolo Bonzini <pbonzini@redhat.com>

Thanks - I've added your Ack to the three KVM patches.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] x86/speculation: Clean up various Spectre related details
  2018-02-12 15:30         ` David Woodhouse
@ 2018-02-13  8:04           ` Ingo Molnar
  0 siblings, 0 replies; 74+ messages in thread
From: Ingo Molnar @ 2018-02-13  8:04 UTC (permalink / raw)
  To: David Woodhouse
  Cc: x86, linux-kernel, tglx, karahmed, sironi, kvm, torvalds,
	pbonzini, bp, peterz, jmattson, rkrcmar, arjan.van.de.ven,
	dave.hansen


* David Woodhouse <dwmw2@infradead.org> wrote:

> On Sun, 2018-02-11 at 20:43 +0100, Ingo Molnar wrote:
> > > And should these say 'Spectre v2' not just 'Spectre'?
> > 
> > Yeah, you are probably right, but I didn't want to make the messages too specific 
> > - do we really know that this is the end of Spectre-style speculation holes?
> 
> Well... if a new problem is also remedied by use if IBRS/IBPB and
> retpoline, I think we can happily call it a subclass of "Spectre v2".
> 
> And if it *isn't* addressed by those same things, then it's clearly
> something different. Either way, these messages should be 'v2', no?

Ok, fair enough - I've changed it to v2 as you suggest:

-		pr_info("Filling RSB on context switch\n");
+		pr_info("Spectre v2 mitigation: Filling RSB on context switch\n");
-		pr_info("Enabling Indirect Branch Prediction Barrier\n");
+		pr_info("Spectre v2 mitigation: Enabling Indirect Branch Prediction Barrier\n");

> On the whole though, there are plenty of better things to be worrying
> about :)

Sure - nevertheless I fixed these while they were still hot ;-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [tip:x86/pti] Revert "x86/speculation: Simplify indirect_branch_prediction_barrier()"
  2018-02-10 23:39 ` [PATCH v2 2/6] Revert "x86/speculation: Simplify indirect_branch_prediction_barrier()" David Woodhouse
  2018-02-11 12:09   ` [tip:x86/pti] " tip-bot for David Woodhouse
@ 2018-02-13  8:58   ` tip-bot for David Woodhouse
  2018-02-13  9:41     ` Peter Zijlstra
  1 sibling, 1 reply; 74+ messages in thread
From: tip-bot for David Woodhouse @ 2018-02-13  8:58 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, hpa, linux-kernel, dwmw2, dan.j.williams, gregkh, luto,
	dwmw, jpoimboe, torvalds, dave.hansen, bp, peterz, tglx, arjan

Commit-ID:  f208820a321f9b23d77d7eed89945d862d62a3ed
Gitweb:     https://git.kernel.org/tip/f208820a321f9b23d77d7eed89945d862d62a3ed
Author:     David Woodhouse <dwmw@amazon.co.uk>
AuthorDate: Sat, 10 Feb 2018 23:39:23 +0000
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 13 Feb 2018 08:59:00 +0100

Revert "x86/speculation: Simplify indirect_branch_prediction_barrier()"

This reverts commit 64e16720ea0879f8ab4547e3b9758936d483909b.

We cannot call C functions like that, without marking all the
call-clobbered registers as, well, clobbered. We might have got away
with it for now because the __ibp_barrier() function was *fairly*
unlikely to actually use any other registers. But no. Just no.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: arjan.van.de.ven@intel.com
Cc: dave.hansen@intel.com
Cc: jmattson@google.com
Cc: karahmed@amazon.de
Cc: kvm@vger.kernel.org
Cc: pbonzini@redhat.com
Cc: rkrcmar@redhat.com
Cc: sironi@amazon.de
Link: http://lkml.kernel.org/r/1518305967-31356-3-git-send-email-dwmw@amazon.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/nospec-branch.h | 13 +++++++++----
 arch/x86/include/asm/processor.h     |  3 ---
 arch/x86/kernel/cpu/bugs.c           |  6 ------
 3 files changed, 9 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 4d57894..300cc15 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -164,10 +164,15 @@ static inline void vmexit_fill_RSB(void)
 
 static inline void indirect_branch_prediction_barrier(void)
 {
-	alternative_input("",
-			  "call __ibp_barrier",
-			  X86_FEATURE_USE_IBPB,
-			  ASM_NO_INPUT_CLOBBER("eax", "ecx", "edx", "memory"));
+	asm volatile(ALTERNATIVE("",
+				 "movl %[msr], %%ecx\n\t"
+				 "movl %[val], %%eax\n\t"
+				 "movl $0, %%edx\n\t"
+				 "wrmsr",
+				 X86_FEATURE_USE_IBPB)
+		     : : [msr] "i" (MSR_IA32_PRED_CMD),
+			 [val] "i" (PRED_CMD_IBPB)
+		     : "eax", "ecx", "edx", "memory");
 }
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 513f960..99799fb 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -969,7 +969,4 @@ bool xen_set_default_idle(void);
 
 void stop_this_cpu(void *dummy);
 void df_debug(struct pt_regs *regs, long error_code);
-
-void __ibp_barrier(void);
-
 #endif /* _ASM_X86_PROCESSOR_H */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 71949bf..61152aa 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -337,9 +337,3 @@ ssize_t cpu_show_spectre_v2(struct device *dev,
 		       spectre_v2_module_string());
 }
 #endif
-
-void __ibp_barrier(void)
-{
-	__wrmsr(MSR_IA32_PRED_CMD, PRED_CMD_IBPB, 0);
-}
-EXPORT_SYMBOL_GPL(__ibp_barrier);

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [tip:x86/pti] KVM/x86: Reduce retpoline performance impact in slot_handle_level_range(), by always inlining iterator helper methods
  2018-02-10 23:39 ` [PATCH v2 3/6] KVM: x86: Reduce retpoline performance impact in slot_handle_level_range() David Woodhouse
  2018-02-11 12:09   ` [tip:x86/pti] KVM/x86: Reduce retpoline performance impact in slot_handle_level_range(), by always inlining iterator helper methods tip-bot for David Woodhouse
@ 2018-02-13  8:58   ` tip-bot for David Woodhouse
  1 sibling, 0 replies; 74+ messages in thread
From: tip-bot for David Woodhouse @ 2018-02-13  8:58 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: bp, dwmw, dave.hansen, arjan, sironi, torvalds, dan.j.williams,
	peterz, pbonzini, linux-kernel, mingo, hpa, tglx, gregkh, luto,
	jpoimboe, dwmw2

Commit-ID:  928a4c39484281f8ca366f53a1db79330d058401
Gitweb:     https://git.kernel.org/tip/928a4c39484281f8ca366f53a1db79330d058401
Author:     David Woodhouse <dwmw@amazon.co.uk>
AuthorDate: Sat, 10 Feb 2018 23:39:24 +0000
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 13 Feb 2018 08:59:45 +0100

KVM/x86: Reduce retpoline performance impact in slot_handle_level_range(), by always inlining iterator helper methods

With retpoline, tight loops of "call this function for every XXX" are
very much pessimised by taking a prediction miss *every* time. This one
is by far the biggest contributor to the guest launch time with retpoline.

By marking the iterator slot_handle_…() functions always_inline, we can
ensure that the indirect function call can be optimised away into a
direct call and it actually generates slightly smaller code because
some of the other conditionals can get optimised away too.

Performance is now pretty close to what we see with nospectre_v2 on
the command line.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Tested-by: Filippo Sironi <sironi@amazon.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Filippo Sironi <sironi@amazon.de>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: arjan.van.de.ven@intel.com
Cc: dave.hansen@intel.com
Cc: jmattson@google.com
Cc: karahmed@amazon.de
Cc: kvm@vger.kernel.org
Cc: rkrcmar@redhat.com
Link: http://lkml.kernel.org/r/1518305967-31356-4-git-send-email-dwmw@amazon.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kvm/mmu.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 2b8eb4d..cc83bdc 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -5058,7 +5058,7 @@ void kvm_mmu_uninit_vm(struct kvm *kvm)
 typedef bool (*slot_level_handler) (struct kvm *kvm, struct kvm_rmap_head *rmap_head);
 
 /* The caller should hold mmu-lock before calling this function. */
-static bool
+static __always_inline bool
 slot_handle_level_range(struct kvm *kvm, struct kvm_memory_slot *memslot,
 			slot_level_handler fn, int start_level, int end_level,
 			gfn_t start_gfn, gfn_t end_gfn, bool lock_flush_tlb)
@@ -5088,7 +5088,7 @@ slot_handle_level_range(struct kvm *kvm, struct kvm_memory_slot *memslot,
 	return flush;
 }
 
-static bool
+static __always_inline bool
 slot_handle_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
 		  slot_level_handler fn, int start_level, int end_level,
 		  bool lock_flush_tlb)
@@ -5099,7 +5099,7 @@ slot_handle_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
 			lock_flush_tlb);
 }
 
-static bool
+static __always_inline bool
 slot_handle_all_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
 		      slot_level_handler fn, bool lock_flush_tlb)
 {
@@ -5107,7 +5107,7 @@ slot_handle_all_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
 				 PT_MAX_HUGEPAGE_LEVEL, lock_flush_tlb);
 }
 
-static bool
+static __always_inline bool
 slot_handle_large_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
 			slot_level_handler fn, bool lock_flush_tlb)
 {
@@ -5115,7 +5115,7 @@ slot_handle_large_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
 				 PT_MAX_HUGEPAGE_LEVEL, lock_flush_tlb);
 }
 
-static bool
+static __always_inline bool
 slot_handle_leaf(struct kvm *kvm, struct kvm_memory_slot *memslot,
 		 slot_level_handler fn, bool lock_flush_tlb)
 {

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [tip:x86/pti] X86/nVMX: Properly set spec_ctrl and pred_cmd before merging MSRs
  2018-02-10 23:39 ` [PATCH v2 4/6] X86/nVMX: Properly set spec_ctrl and pred_cmd before merging MSRs David Woodhouse
  2018-02-11 12:10   ` [tip:x86/pti] " tip-bot for KarimAllah Ahmed
@ 2018-02-13  8:59   ` tip-bot for KarimAllah Ahmed
  1 sibling, 0 replies; 74+ messages in thread
From: tip-bot for KarimAllah Ahmed @ 2018-02-13  8:59 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: rkrcmar, tglx, linux-kernel, pbonzini, peterz, arjan, mingo,
	dwmw2, luto, dan.j.williams, jmattson, karahmed, bp, dave.hansen,
	torvalds, dwmw, jpoimboe, hpa, gregkh

Commit-ID:  206587a9fb764d71f035dc7f6d3b6488f5d5b304
Gitweb:     https://git.kernel.org/tip/206587a9fb764d71f035dc7f6d3b6488f5d5b304
Author:     KarimAllah Ahmed <karahmed@amazon.de>
AuthorDate: Sat, 10 Feb 2018 23:39:25 +0000
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 13 Feb 2018 09:00:06 +0100

X86/nVMX: Properly set spec_ctrl and pred_cmd before merging MSRs

These two variables should check whether SPEC_CTRL and PRED_CMD are
supposed to be passed through to L2 guests or not. While
msr_write_intercepted_l01 would return 'true' if it is not passed through.

So just invert the result of msr_write_intercepted_l01 to implement the
correct semantics.

Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Jim Mattson <jmattson@google.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: arjan.van.de.ven@intel.com
Cc: dave.hansen@intel.com
Cc: kvm@vger.kernel.org
Cc: sironi@amazon.de
Fixes: 086e7d4118cc ("KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL")
Link: http://lkml.kernel.org/r/1518305967-31356-5-git-send-email-dwmw@amazon.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kvm/vmx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bee4c49..599179b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10219,8 +10219,8 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu,
 	 *    updated to reflect this when L1 (or its L2s) actually write to
 	 *    the MSR.
 	 */
-	bool pred_cmd = msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD);
-	bool spec_ctrl = msr_write_intercepted_l01(vcpu, MSR_IA32_SPEC_CTRL);
+	bool pred_cmd = !msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD);
+	bool spec_ctrl = !msr_write_intercepted_l01(vcpu, MSR_IA32_SPEC_CTRL);
 
 	if (!nested_cpu_has_virt_x2apic_mode(vmcs12) &&
 	    !pred_cmd && !spec_ctrl)

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [tip:x86/pti] KVM/nVMX: Set the CPU_BASED_USE_MSR_BITMAPS if we have a valid L02 MSR bitmap
  2018-02-10 23:39 ` [PATCH v2 5/6] KVM/nVMX: Set the CPU_BASED_USE_MSR_BITMAPS if we have a valid L02 MSR bitmap David Woodhouse
  2018-02-11 10:19   ` Ingo Molnar
  2018-02-11 12:10   ` [tip:x86/pti] " tip-bot for KarimAllah Ahmed
@ 2018-02-13  8:59   ` tip-bot for KarimAllah Ahmed
  2 siblings, 0 replies; 74+ messages in thread
From: tip-bot for KarimAllah Ahmed @ 2018-02-13  8:59 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: torvalds, bp, dan.j.williams, dave.hansen, rkrcmar, pbonzini,
	mingo, luto, karahmed, peterz, arjan, hpa, linux-kernel, dwmw2,
	dwmw, gregkh, tglx, jpoimboe

Commit-ID:  3712caeb14dcb33fb4d5114f14c0beef10aca101
Gitweb:     https://git.kernel.org/tip/3712caeb14dcb33fb4d5114f14c0beef10aca101
Author:     KarimAllah Ahmed <karahmed@amazon.de>
AuthorDate: Sat, 10 Feb 2018 23:39:26 +0000
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 13 Feb 2018 09:00:17 +0100

KVM/nVMX: Set the CPU_BASED_USE_MSR_BITMAPS if we have a valid L02 MSR bitmap

We either clear the CPU_BASED_USE_MSR_BITMAPS and end up intercepting all
MSR accesses or create a valid L02 MSR bitmap and use that. This decision
has to be made every time we evaluate whether we are going to generate the
L02 MSR bitmap.

Before commit:

  d28b387fb74d ("KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL")

... this was probably OK since the decision was always identical.

This is no longer the case now since the MSR bitmap might actually
change once we decide to not intercept SPEC_CTRL and PRED_CMD.

Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: arjan.van.de.ven@intel.com
Cc: dave.hansen@intel.com
Cc: jmattson@google.com
Cc: kvm@vger.kernel.org
Cc: sironi@amazon.de
Link: http://lkml.kernel.org/r/1518305967-31356-6-git-send-email-dwmw@amazon.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kvm/vmx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 599179b..91e3539 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10130,7 +10130,8 @@ static void nested_get_vmcs12_pages(struct kvm_vcpu *vcpu,
 	if (cpu_has_vmx_msr_bitmap() &&
 	    nested_cpu_has(vmcs12, CPU_BASED_USE_MSR_BITMAPS) &&
 	    nested_vmx_merge_msr_bitmap(vcpu, vmcs12))
-		;
+		vmcs_set_bits(CPU_BASED_VM_EXEC_CONTROL,
+			      CPU_BASED_USE_MSR_BITMAPS);
 	else
 		vmcs_clear_bits(CPU_BASED_VM_EXEC_CONTROL,
 				CPU_BASED_USE_MSR_BITMAPS);

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [tip:x86/pti] Revert "x86/speculation: Simplify indirect_branch_prediction_barrier()"
  2018-02-13  8:58   ` tip-bot for David Woodhouse
@ 2018-02-13  9:41     ` Peter Zijlstra
  2018-02-13 11:28       ` Ingo Molnar
  0 siblings, 1 reply; 74+ messages in thread
From: Peter Zijlstra @ 2018-02-13  9:41 UTC (permalink / raw)
  To: torvalds, dave.hansen, bp, tglx, arjan, mingo, linux-kernel,
	dwmw2, hpa, gregkh, dan.j.williams, dwmw, luto, jpoimboe
  Cc: linux-tip-commits, Joe Konno


On Tue, Feb 13, 2018 at 12:58:21AM -0800, tip-bot for David Woodhouse wrote:

> --- a/arch/x86/include/asm/nospec-branch.h
> +++ b/arch/x86/include/asm/nospec-branch.h
> @@ -164,10 +164,15 @@ static inline void vmexit_fill_RSB(void)
>  
>  static inline void indirect_branch_prediction_barrier(void)
>  {
> +	asm volatile(ALTERNATIVE("",
> +				 "movl %[msr], %%ecx\n\t"
> +				 "movl %[val], %%eax\n\t"
> +				 "movl $0, %%edx\n\t"
> +				 "wrmsr",
> +				 X86_FEATURE_USE_IBPB)
> +		     : : [msr] "i" (MSR_IA32_PRED_CMD),
> +			 [val] "i" (PRED_CMD_IBPB)
> +		     : "eax", "ecx", "edx", "memory");
>  }

Joe Konno pointed out that we now need the below line too, because we're
using MSR_IA32_PRED_CMD in this header.

With the existing code that's not a problem per-se, but my objtool
retpoline annotation things did do stumble over this.

Do we want to fold it into the objtool annotation patch or have it
separate?

---
 arch/x86/include/asm/nospec-branch.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 300cc159b4a0..76b058533e47 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -6,6 +6,7 @@
 #include <asm/alternative.h>
 #include <asm/alternative-asm.h>
 #include <asm/cpufeatures.h>
+#include <asm/msr-index.h>
 
 #ifdef __ASSEMBLY__
 

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [tip:x86/pti] Revert "x86/speculation: Simplify indirect_branch_prediction_barrier()"
  2018-02-13  9:41     ` Peter Zijlstra
@ 2018-02-13 11:28       ` Ingo Molnar
  2018-02-13 13:28         ` Peter Zijlstra
  0 siblings, 1 reply; 74+ messages in thread
From: Ingo Molnar @ 2018-02-13 11:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: torvalds, dave.hansen, bp, tglx, arjan, linux-kernel, dwmw2, hpa,
	gregkh, dan.j.williams, dwmw, luto, jpoimboe, linux-tip-commits,
	Joe Konno


* Peter Zijlstra <peterz@infradead.org> wrote:

> 
> On Tue, Feb 13, 2018 at 12:58:21AM -0800, tip-bot for David Woodhouse wrote:
> 
> > --- a/arch/x86/include/asm/nospec-branch.h
> > +++ b/arch/x86/include/asm/nospec-branch.h
> > @@ -164,10 +164,15 @@ static inline void vmexit_fill_RSB(void)
> >  
> >  static inline void indirect_branch_prediction_barrier(void)
> >  {
> > +	asm volatile(ALTERNATIVE("",
> > +				 "movl %[msr], %%ecx\n\t"
> > +				 "movl %[val], %%eax\n\t"
> > +				 "movl $0, %%edx\n\t"
> > +				 "wrmsr",
> > +				 X86_FEATURE_USE_IBPB)
> > +		     : : [msr] "i" (MSR_IA32_PRED_CMD),
> > +			 [val] "i" (PRED_CMD_IBPB)
> > +		     : "eax", "ecx", "edx", "memory");
> >  }
> 
> Joe Konno pointed out that we now need the below line too, because we're
> using MSR_IA32_PRED_CMD in this header.
> 
> With the existing code that's not a problem per-se, but my objtool
> retpoline annotation things did do stumble over this.
> 
> Do we want to fold it into the objtool annotation patch or have it
> separate?

Separate would be better, it makes sense and is one problem less to worry about?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [tip:x86/pti] Revert "x86/speculation: Simplify indirect_branch_prediction_barrier()"
  2018-02-13 11:28       ` Ingo Molnar
@ 2018-02-13 13:28         ` Peter Zijlstra
  2018-02-13 13:38           ` Ingo Molnar
                             ` (2 more replies)
  0 siblings, 3 replies; 74+ messages in thread
From: Peter Zijlstra @ 2018-02-13 13:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: torvalds, dave.hansen, bp, tglx, arjan, linux-kernel, dwmw2, hpa,
	gregkh, dan.j.williams, dwmw, luto, jpoimboe, linux-tip-commits,
	Joe Konno

On Tue, Feb 13, 2018 at 12:28:38PM +0100, Ingo Molnar wrote:

> Separate would be better, it makes sense and is one problem less to worry about?

Something like so then? I'm not entirely sure which commit wants to fo
in Fixes, I picked the earlier one, but it could equally have been:

Fixes: f208820a321f ("Revert "x86/speculation: Simplify indirect_branch_prediction_barrier()"")

---
Subject: x86/speculation: Add msr-index.h
From: Peter Zijlstra <peterz@infradead.org>
Date: Tue, 13 Feb 2018 10:41:32 +0100

Joe Konno reported a compile failure resulting from using an MSR
without inclusion of msr-index.h, and while the current code builds
fine (by accident) this needs fixing for future patches.

Cc: mingo@kernel.org
Cc: dwmw2@infradead.org
Cc: torvalds@linux-foundation.org
Cc: bp@alien8.de
Cc: tglx@linutronix.de
Cc: arjan@linux.intel.com
Cc: dan.j.williams@intel.com
Cc: dave.hansen@linux.intel.com
Cc: jpoimboe@redhat.com
Cc: hpa@zytor.com
Cc: gregkh@linuxfoundation.org
Cc: dwmw@amazon.co.uk
Cc: luto@kernel.org
Fixes: 20ffa1caecca ("x86/speculation: Add basic IBPB (Indirect Branch Prediction Barrier) support")
Reported-by: Joe Konno <joe.konno@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/nospec-branch.h |    1 +
 1 file changed, 1 insertion(+)

--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -6,6 +6,7 @@
 #include <asm/alternative.h>
 #include <asm/alternative-asm.h>
 #include <asm/cpufeatures.h>
+#include <asm/msr-index.h>
 
 #ifdef __ASSEMBLY__
 

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [tip:x86/pti] Revert "x86/speculation: Simplify indirect_branch_prediction_barrier()"
  2018-02-13 13:28         ` Peter Zijlstra
@ 2018-02-13 13:38           ` Ingo Molnar
  2018-02-13 15:26           ` [tip:x86/pti] x86/speculation: Add <asm/msr-index.h> dependency tip-bot for Peter Zijlstra
  2018-02-15  0:28           ` tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 74+ messages in thread
From: Ingo Molnar @ 2018-02-13 13:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: torvalds, dave.hansen, bp, tglx, arjan, linux-kernel, dwmw2, hpa,
	gregkh, dan.j.williams, dwmw, luto, jpoimboe, linux-tip-commits,
	Joe Konno


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Tue, Feb 13, 2018 at 12:28:38PM +0100, Ingo Molnar wrote:
> 
> > Separate would be better, it makes sense and is one problem less to worry about?
> 
> Something like so then?

Yeah, perfect!

> I'm not entirely sure which commit wants to fo in Fixes, I picked the earlier 
> one, but it could equally have been:
> 
> Fixes: f208820a321f ("Revert "x86/speculation: Simplify indirect_branch_prediction_barrier()"")

I'll leave the tag out (in case a rebase has to be done), as these commits are 
close enough to each other and are also in a single backporting block of commits.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [tip:x86/pti] x86/speculation: Add <asm/msr-index.h> dependency
  2018-02-13 13:28         ` Peter Zijlstra
  2018-02-13 13:38           ` Ingo Molnar
@ 2018-02-13 15:26           ` tip-bot for Peter Zijlstra
  2018-02-15  0:28           ` tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 74+ messages in thread
From: tip-bot for Peter Zijlstra @ 2018-02-13 15:26 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, linux-kernel, joe.konno, mingo, peterz, hpa, torvalds

Commit-ID:  30d97b098534d1d35b198531870356a29cc066c2
Gitweb:     https://git.kernel.org/tip/30d97b098534d1d35b198531870356a29cc066c2
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Tue, 13 Feb 2018 14:28:19 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 13 Feb 2018 14:39:34 +0100

x86/speculation: Add <asm/msr-index.h> dependency

Joe Konno reported a compile failure resulting from using an MSR
without inclusion of <asm/msr-index.h>, and while the current code builds
fine (by accident) this needs fixing for future patches.

Reported-by: Joe Konno <joe.konno@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: arjan@linux.intel.com
Cc: bp@alien8.de
Cc: dan.j.williams@intel.com
Cc: dave.hansen@linux.intel.com
Cc: dwmw2@infradead.org
Cc: dwmw@amazon.co.uk
Cc: gregkh@linuxfoundation.org
Cc: hpa@zytor.com
Cc: jpoimboe@redhat.com
Cc: linux-tip-commits@vger.kernel.org
Cc: luto@kernel.org
Fixes: 20ffa1caecca ("x86/speculation: Add basic IBPB (Indirect Branch Prediction Barrier) support")
Link: http://lkml.kernel.org/r/20180213132819.GJ25201@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/nospec-branch.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 300cc15..76b0585 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -6,6 +6,7 @@
 #include <asm/alternative.h>
 #include <asm/alternative-asm.h>
 #include <asm/cpufeatures.h>
+#include <asm/msr-index.h>
 
 #ifdef __ASSEMBLY__
 

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [tip:x86/pti] x86/speculation: Use IBRS if available before calling into firmware
  2018-02-13  7:55           ` Ingo Molnar
@ 2018-02-14  1:49             ` Tim Chen
  2018-02-14  8:56               ` Peter Zijlstra
  2018-02-14  9:31               ` [PATCH] watchdog: hpwdt: Remove spinlock acquire and BIOS calls from NMI context Ingo Molnar
  0 siblings, 2 replies; 74+ messages in thread
From: Tim Chen @ 2018-02-14  1:49 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: Dave Hansen, hpa, tglx, torvalds, linux-kernel, dwmw,
	linux-tip-commits, Borislav Petkov, Arjan van de Ven

On 02/12/2018 11:55 PM, Ingo Molnar wrote:
> 
> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
>> On Mon, Feb 12, 2018 at 08:13:31AM -0800, Dave Hansen wrote:
>>> On 02/12/2018 02:22 AM, Ingo Molnar wrote:
>>>>> +static inline void firmware_restrict_branch_speculation_end(void)
>>>>> +{
>>>>> +	alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,
>>>>> +			      X86_FEATURE_USE_IBRS_FW);
>>>> BTW., there's a detail that only occurred to me today, this enabling/disabling 
>>>> sequence is not NMI safe, and it might be called from NMI context:
>>>
>>> FWIW, Tim Chen and I talked about this a bunch.  We ended up just
>>> saving/restoring the MSR verbatim in the NMI handler the same way we do
>>> CR3, stashing it in a high general-purpose-register (r%12?).  That costs
>>> a RDMSR (at least) and an WRMSR (which you can optimize out).  We have a
>>> patch for that somewhere if anybody wants it.
>>
>> I would really rather not do that on the NMI path.. And if we _have_ to,
>> please keep a software shadow of that MSR state, such that we can avoid
>> touching that MSR 99% of the time.
> 
> Yeah, I'd rather avoid doing firmware calls from NMI context altogether.
> 
> Thanks,
> 
> 	Ingo
> 

Dave Hansen and I had some discussions on how to handle the nested NMI
and firmware calls.  We thought of using 
a per cpu counter to record the nesting call depth
and toggle IBRS appropriately for the depth 0->1 and 1->0
transition.  Will this change be sufficient?

Thanks.

Tim

commit 55546c27a006198630c57b900abcbd3baaabf63a
Author: Tim Chen <tim.c.chen@linux.intel.com>
Date:   Tue Feb 13 04:10:41 2018 -0800

    x86/firmware: Prevent IBRS from being turned off prematurely.
    
    Dave Woodhoue proposed to use IBRS to protect the firmware
    call path against Spectre exploit.  However, firmware path
    can go through NMI and we can get nested calls, causing
    unsafe firmware calls with missing IBRS as illustrated below:
    
    firmware_restrict_branch_speculation_start (set IBRS=1)
        NMI
            firmware_restrict_branch_speculation_start (set IBRS=1)
            firmware call
            firmware_restrict_branch_speculation_end (set IBRS=0)
        NMI return
    firmware call (with IBRS=0)  <---- unsafe call, premature IBRS disabling
    firmware_restrict_branch_speculation_end (set IBRS=0)
    
    This patch proposes using a per cpu counter to track the IBRS
    firmware call nesting depth, to ensure that we don't turn off
    IBRS prematurely before calling firmware.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 297d457..1e9c828 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -146,6 +146,8 @@ enum spectre_v2_mitigation {
 extern char __indirect_thunk_start[];
 extern char __indirect_thunk_end[];
 
+DECLARE_PER_CPU(int, spec_ctrl_ibrs_fw_depth);
+
 /*
  * On VMEXIT we must ensure that no RSB predictions learned in the guest
  * can be followed in the host, by overwriting the RSB completely. Both
@@ -186,14 +188,16 @@ static inline void indirect_branch_prediction_barrier(void)
  */
 static inline void firmware_restrict_branch_speculation_start(void)
 {
-	alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS,
+	if (this_cpu_inc_return(spec_ctrl_ibrs_fw_depth) == 1)
+		alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS,
 			      X86_FEATURE_USE_IBRS_FW);
 }
 
 static inline void firmware_restrict_branch_speculation_end(void)
 {
-	alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,
-			      X86_FEATURE_USE_IBRS_FW);
+	if (this_cpu_dec_return(spec_ctrl_ibrs_fw_depth) == 0)
+		alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,
+				      X86_FEATURE_USE_IBRS_FW);
 }
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_X86_NOSPEC_BRANCH_H_ */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index c994dab..4ab13f0 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -27,6 +27,8 @@
 #include <asm/intel-family.h>
 
 static void __init spectre_v2_select_mitigation(void);
+DEFINE_PER_CPU(int, spec_ctrl_ibrs_fw_depth);
+EXPORT_PER_CPU_SYMBOL(spec_ctrl_ibrs_fw_depth);
 
 void __init check_bugs(void)
 {

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [tip:x86/pti] x86/speculation: Use IBRS if available before calling into firmware
  2018-02-14  1:49             ` Tim Chen
@ 2018-02-14  8:56               ` Peter Zijlstra
  2018-02-14  8:57                 ` Peter Zijlstra
  2018-02-14 19:20                 ` Tim Chen
  2018-02-14  9:31               ` [PATCH] watchdog: hpwdt: Remove spinlock acquire and BIOS calls from NMI context Ingo Molnar
  1 sibling, 2 replies; 74+ messages in thread
From: Peter Zijlstra @ 2018-02-14  8:56 UTC (permalink / raw)
  To: Tim Chen
  Cc: Ingo Molnar, Dave Hansen, hpa, tglx, torvalds, linux-kernel,
	dwmw, linux-tip-commits, Borislav Petkov, Arjan van de Ven

On Tue, Feb 13, 2018 at 05:49:47PM -0800, Tim Chen wrote:

>  static inline void firmware_restrict_branch_speculation_start(void)
>  {
> +	if (this_cpu_inc_return(spec_ctrl_ibrs_fw_depth) == 1)
> +		alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS,
>  			      X86_FEATURE_USE_IBRS_FW);
>  }
>  
>  static inline void firmware_restrict_branch_speculation_end(void)
>  {
> +	if (this_cpu_dec_return(spec_ctrl_ibrs_fw_depth) == 0)
> +		alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,
> +				      X86_FEATURE_USE_IBRS_FW);
>  }


At the very least this must disable and re-enable preemption, such that
we guarantee we inc/dec the same counter. ISTR some firmware calls (EFI)
actually are preemptible so that wouldn't work.

Further, consider:

	this_cpu_inc_return()		// 0->1
	<NMI>
		this_cpu_inc_return()	// 1->2
		call_broken_arse_firmware()
		this_cpu_dec_return()	// 2->1
	</NMI>
	wrmsr(SPEC_CTRL, IBRS);

	/* from dodgy firmware crap */

	this_cpu_dec_return()		// 1->0
	wrmsr(SPEC_CTRL, 0);

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [tip:x86/pti] x86/speculation: Use IBRS if available before calling into firmware
  2018-02-14  8:56               ` Peter Zijlstra
@ 2018-02-14  8:57                 ` Peter Zijlstra
  2018-02-14 19:20                 ` Tim Chen
  1 sibling, 0 replies; 74+ messages in thread
From: Peter Zijlstra @ 2018-02-14  8:57 UTC (permalink / raw)
  To: Tim Chen
  Cc: Ingo Molnar, Dave Hansen, hpa, tglx, torvalds, linux-kernel,
	dwmw, linux-tip-commits, Borislav Petkov, Arjan van de Ven

On Wed, Feb 14, 2018 at 09:56:14AM +0100, Peter Zijlstra wrote:
> On Tue, Feb 13, 2018 at 05:49:47PM -0800, Tim Chen wrote:
> 
> >  static inline void firmware_restrict_branch_speculation_start(void)
> >  {
> > +	if (this_cpu_inc_return(spec_ctrl_ibrs_fw_depth) == 1)
> > +		alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS,
> >  			      X86_FEATURE_USE_IBRS_FW);
> >  }
> >  
> >  static inline void firmware_restrict_branch_speculation_end(void)
> >  {
> > +	if (this_cpu_dec_return(spec_ctrl_ibrs_fw_depth) == 0)
> > +		alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,
> > +				      X86_FEATURE_USE_IBRS_FW);
> >  }
> 
> 
> At the very least this must disable and re-enable preemption, such that
> we guarantee we inc/dec the same counter. ISTR some firmware calls (EFI)
> actually are preemptible so that wouldn't work.
> 
> Further, consider:
> 
> 	this_cpu_inc_return()		// 0->1
> 	<NMI>
> 		this_cpu_inc_return()	// 1->2
> 		call_broken_arse_firmware()
> 		this_cpu_dec_return()	// 2->1
> 	</NMI>
> 	wrmsr(SPEC_CTRL, IBRS);
> 
> 	/* from dodgy firmware crap */

s/from/more/

typing hard.

> 	this_cpu_dec_return()		// 1->0
> 	wrmsr(SPEC_CTRL, 0);

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH] watchdog: hpwdt: Remove spinlock acquire and BIOS calls from NMI context
  2018-02-14  1:49             ` Tim Chen
  2018-02-14  8:56               ` Peter Zijlstra
@ 2018-02-14  9:31               ` Ingo Molnar
  2018-02-14  9:38                 ` Peter Zijlstra
  2018-02-14  9:44                 ` Borislav Petkov
  1 sibling, 2 replies; 74+ messages in thread
From: Ingo Molnar @ 2018-02-14  9:31 UTC (permalink / raw)
  To: Tim Chen
  Cc: Peter Zijlstra, Dave Hansen, hpa, tglx, torvalds, linux-kernel,
	dwmw, linux-tip-commits, Borislav Petkov, Arjan van de Ven


* Tim Chen <tim.c.chen@linux.intel.com> wrote:

> Dave Hansen and I had some discussions on how to handle the nested NMI and 
> firmware calls.  We thought of using a per cpu counter to record the nesting 
> call depth and toggle IBRS appropriately for the depth 0->1 and 1->0 transition.  
> Will this change be sufficient?

Yeah, so I think the first question should be: does the firmware call from NMI 
context make sense to begin with?

Because in this particular case it does not appear to be so: the reason for the 
BIOS/firmware call appears to be to determine how we nmi_panic() after receiving 
an NMI that no other NMI handler handled: with a passive-aggressive "I don't know" 
panic message or with a slightly more informative panic message.

That's not a real value-add to users - so we can avoid all these complications by 
applying the patch below:

 drivers/watchdog/hpwdt.c | 30 ++++--------------------------
 1 file changed, 4 insertions(+), 26 deletions(-)

As a bonus the spinlock use can be removed as well.

Thanks,

	Ingo

====================>
>From b038428a739a3fcf0b9678305c131f60af7422ca Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@kernel.org>
Date: Wed, 14 Feb 2018 10:24:41 +0100
Subject: [PATCH] watchdog: hpwdt: Remove spinlock acquire and BIOS calls from
 NMI context

Taking a spinlock and calling into the firmware are problematic things to
do from NMI callbacks.

It also seems completely pointless in this particular case:

 - cmn_regs is updated by the firmware call in the hpwdt_pretimeout() NMI
   callback, but there's almost no use for it: we use it only to determine
   whether to panic with an 'unknown NMI' or a slightly more descriptive
   message.

 - cmn_regs and rom_lock is not used anywhere else, other than early detection
   of the firmware capability.

So remove rom_lock, make the cmn_regs call from the detection routine only,
and thus make the NMI callback a lot more robust.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 drivers/watchdog/hpwdt.c | 30 ++++--------------------------
 1 file changed, 4 insertions(+), 26 deletions(-)

diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
index f1f00dfc0e68..2544fe482fe3 100644
--- a/drivers/watchdog/hpwdt.c
+++ b/drivers/watchdog/hpwdt.c
@@ -150,9 +150,7 @@ static unsigned int hpwdt_nmi_decoding;
 static unsigned int allow_kdump = 1;
 static unsigned int is_icru;
 static unsigned int is_uefi;
-static DEFINE_SPINLOCK(rom_lock);
 static void *cru_rom_addr;
-static struct cmn_registers cmn_regs;
 
 extern asmlinkage void asminline_call(struct cmn_registers *pi86Regs,
 						unsigned long *pRomEntry);
@@ -225,6 +223,7 @@ static int cru_detect(unsigned long map_entry,
 	unsigned long physical_bios_base = 0;
 	unsigned long physical_bios_offset = 0;
 	int retval = -ENODEV;
+	struct cmn_registers cmn_regs = { };
 
 	bios32_map = ioremap(map_entry, (2 * PAGE_SIZE));
 
@@ -486,32 +485,18 @@ static int hpwdt_my_nmi(void)
  */
 static int hpwdt_pretimeout(unsigned int ulReason, struct pt_regs *regs)
 {
-	unsigned long rom_pl;
-	static int die_nmi_called;
-
 	if (!hpwdt_nmi_decoding)
 		return NMI_DONE;
 
 	if ((ulReason == NMI_UNKNOWN) && !hpwdt_my_nmi())
 		return NMI_DONE;
 
-	spin_lock_irqsave(&rom_lock, rom_pl);
-	if (!die_nmi_called && !is_icru && !is_uefi)
-		asminline_call(&cmn_regs, cru_rom_addr);
-	die_nmi_called = 1;
-	spin_unlock_irqrestore(&rom_lock, rom_pl);
-
 	if (allow_kdump)
 		hpwdt_stop();
 
-	if (!is_icru && !is_uefi) {
-		if (cmn_regs.u1.ral == 0) {
-			nmi_panic(regs, "An NMI occurred, but unable to determine source.\n");
-			return NMI_HANDLED;
-		}
-	}
-	nmi_panic(regs, "An NMI occurred. Depending on your system the reason "
-		"for the NMI is logged in any one of the following "
+	nmi_panic(regs,
+		"An NMI occurred. Depending on your system the reason "
+		"for the NMI might be logged in any one of the following "
 		"resources:\n"
 		"1. Integrated Management Log (IML)\n"
 		"2. OA Syslog\n"
@@ -744,13 +729,6 @@ static int hpwdt_init_nmi_decoding(struct pci_dev *dev)
 				HPWDT_ARCH);
 			return retval;
 		}
-
-		/*
-		* We know this is the only CRU call we need to make so lets keep as
-		* few instructions as possible once the NMI comes in.
-		*/
-		cmn_regs.u1.rah = 0x0D;
-		cmn_regs.u1.ral = 0x02;
 	}
 
 	/*

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [PATCH] watchdog: hpwdt: Remove spinlock acquire and BIOS calls from NMI context
  2018-02-14  9:31               ` [PATCH] watchdog: hpwdt: Remove spinlock acquire and BIOS calls from NMI context Ingo Molnar
@ 2018-02-14  9:38                 ` Peter Zijlstra
  2018-02-14 10:39                   ` Ingo Molnar
  2018-02-14  9:44                 ` Borislav Petkov
  1 sibling, 1 reply; 74+ messages in thread
From: Peter Zijlstra @ 2018-02-14  9:38 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Tim Chen, Dave Hansen, hpa, tglx, torvalds, linux-kernel, dwmw,
	linux-tip-commits, Borislav Petkov, Arjan van de Ven

On Wed, Feb 14, 2018 at 10:31:59AM +0100, Ingo Molnar wrote:
> Because in this particular case it does not appear to be so: the reason for the 
> BIOS/firmware call appears to be to determine how we nmi_panic() after receiving 
> an NMI that no other NMI handler handled: with a passive-aggressive "I don't know" 
> panic message or with a slightly more informative panic message.

However much I like just ripping all that out, I think the ROM call
actually does that logging, or that is how I read things.

If you look at the original Changelog for that driver:

    Hp is providing a Hardware WatchDog Timer driver that will only work with the
    specific HW Timer located in the HP ProLiant iLO 2 ASIC. The iLO 2 HW Timer
    will generate a Non-maskable Interrupt (NMI) 9 seconds before physically
    resetting the server, by removing power, so that the event can be logged to
    the HP Integrated Management Log (IML), a Non-Volatile Random Access Memory
    (NVRAM). The logging of the event is performed using the HP ProLiant ROM via
    an Industry Standard access known as a BIOS Service Directory Entry.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] watchdog: hpwdt: Remove spinlock acquire and BIOS calls from NMI context
  2018-02-14  9:31               ` [PATCH] watchdog: hpwdt: Remove spinlock acquire and BIOS calls from NMI context Ingo Molnar
  2018-02-14  9:38                 ` Peter Zijlstra
@ 2018-02-14  9:44                 ` Borislav Petkov
  2018-02-14 18:13                   ` Jerry Hoemann
  1 sibling, 1 reply; 74+ messages in thread
From: Borislav Petkov @ 2018-02-14  9:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Tim Chen, Peter Zijlstra, Dave Hansen, hpa, tglx, torvalds,
	linux-kernel, dwmw, linux-tip-commits, Arjan van de Ven,
	Jerry Hoemann

+ Jerry

On Wed, Feb 14, 2018 at 10:31:59AM +0100, Ingo Molnar wrote:
> 
> * Tim Chen <tim.c.chen@linux.intel.com> wrote:
> 
> > Dave Hansen and I had some discussions on how to handle the nested NMI and 
> > firmware calls.  We thought of using a per cpu counter to record the nesting 
> > call depth and toggle IBRS appropriately for the depth 0->1 and 1->0 transition.  
> > Will this change be sufficient?
> 
> Yeah, so I think the first question should be: does the firmware call from NMI 
> context make sense to begin with?
> 
> Because in this particular case it does not appear to be so: the reason for the 
> BIOS/firmware call appears to be to determine how we nmi_panic() after receiving 
> an NMI that no other NMI handler handled: with a passive-aggressive "I don't know" 
> panic message or with a slightly more informative panic message.
> 
> That's not a real value-add to users - so we can avoid all these complications by 
> applying the patch below:
> 
>  drivers/watchdog/hpwdt.c | 30 ++++--------------------------
>  1 file changed, 4 insertions(+), 26 deletions(-)
> 
> As a bonus the spinlock use can be removed as well.
> 
> Thanks,
> 
> 	Ingo
> 
> ====================>
> From b038428a739a3fcf0b9678305c131f60af7422ca Mon Sep 17 00:00:00 2001
> From: Ingo Molnar <mingo@kernel.org>
> Date: Wed, 14 Feb 2018 10:24:41 +0100
> Subject: [PATCH] watchdog: hpwdt: Remove spinlock acquire and BIOS calls from
>  NMI context
> 
> Taking a spinlock and calling into the firmware are problematic things to
> do from NMI callbacks.
> 
> It also seems completely pointless in this particular case:
> 
>  - cmn_regs is updated by the firmware call in the hpwdt_pretimeout() NMI
>    callback, but there's almost no use for it: we use it only to determine
>    whether to panic with an 'unknown NMI' or a slightly more descriptive
>    message.
> 
>  - cmn_regs and rom_lock is not used anywhere else, other than early detection
>    of the firmware capability.
> 
> So remove rom_lock, make the cmn_regs call from the detection routine only,
> and thus make the NMI callback a lot more robust.
> 
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> ---
>  drivers/watchdog/hpwdt.c | 30 ++++--------------------------
>  1 file changed, 4 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
> index f1f00dfc0e68..2544fe482fe3 100644
> --- a/drivers/watchdog/hpwdt.c
> +++ b/drivers/watchdog/hpwdt.c
> @@ -150,9 +150,7 @@ static unsigned int hpwdt_nmi_decoding;
>  static unsigned int allow_kdump = 1;
>  static unsigned int is_icru;
>  static unsigned int is_uefi;
> -static DEFINE_SPINLOCK(rom_lock);
>  static void *cru_rom_addr;
> -static struct cmn_registers cmn_regs;
>  
>  extern asmlinkage void asminline_call(struct cmn_registers *pi86Regs,
>  						unsigned long *pRomEntry);
> @@ -225,6 +223,7 @@ static int cru_detect(unsigned long map_entry,
>  	unsigned long physical_bios_base = 0;
>  	unsigned long physical_bios_offset = 0;
>  	int retval = -ENODEV;
> +	struct cmn_registers cmn_regs = { };
>  
>  	bios32_map = ioremap(map_entry, (2 * PAGE_SIZE));
>  
> @@ -486,32 +485,18 @@ static int hpwdt_my_nmi(void)
>   */
>  static int hpwdt_pretimeout(unsigned int ulReason, struct pt_regs *regs)
>  {
> -	unsigned long rom_pl;
> -	static int die_nmi_called;
> -
>  	if (!hpwdt_nmi_decoding)
>  		return NMI_DONE;
>  
>  	if ((ulReason == NMI_UNKNOWN) && !hpwdt_my_nmi())
>  		return NMI_DONE;
>  
> -	spin_lock_irqsave(&rom_lock, rom_pl);
> -	if (!die_nmi_called && !is_icru && !is_uefi)
> -		asminline_call(&cmn_regs, cru_rom_addr);
> -	die_nmi_called = 1;
> -	spin_unlock_irqrestore(&rom_lock, rom_pl);
> -
>  	if (allow_kdump)
>  		hpwdt_stop();
>  
> -	if (!is_icru && !is_uefi) {
> -		if (cmn_regs.u1.ral == 0) {
> -			nmi_panic(regs, "An NMI occurred, but unable to determine source.\n");
> -			return NMI_HANDLED;
> -		}
> -	}
> -	nmi_panic(regs, "An NMI occurred. Depending on your system the reason "
> -		"for the NMI is logged in any one of the following "
> +	nmi_panic(regs,
> +		"An NMI occurred. Depending on your system the reason "
> +		"for the NMI might be logged in any one of the following "
>  		"resources:\n"
>  		"1. Integrated Management Log (IML)\n"
>  		"2. OA Syslog\n"
> @@ -744,13 +729,6 @@ static int hpwdt_init_nmi_decoding(struct pci_dev *dev)
>  				HPWDT_ARCH);
>  			return retval;
>  		}
> -
> -		/*
> -		* We know this is the only CRU call we need to make so lets keep as
> -		* few instructions as possible once the NMI comes in.
> -		*/
> -		cmn_regs.u1.rah = 0x0D;
> -		cmn_regs.u1.ral = 0x02;
>  	}
>  
>  	/*

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] watchdog: hpwdt: Remove spinlock acquire and BIOS calls from NMI context
  2018-02-14  9:38                 ` Peter Zijlstra
@ 2018-02-14 10:39                   ` Ingo Molnar
  0 siblings, 0 replies; 74+ messages in thread
From: Ingo Molnar @ 2018-02-14 10:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tim Chen, Dave Hansen, hpa, tglx, torvalds, linux-kernel, dwmw,
	linux-tip-commits, Borislav Petkov, Arjan van de Ven


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, Feb 14, 2018 at 10:31:59AM +0100, Ingo Molnar wrote:
> > Because in this particular case it does not appear to be so: the reason for the 
> > BIOS/firmware call appears to be to determine how we nmi_panic() after receiving 
> > an NMI that no other NMI handler handled: with a passive-aggressive "I don't know" 
> > panic message or with a slightly more informative panic message.
> 
> However much I like just ripping all that out, I think the ROM call
> actually does that logging, or that is how I read things.
> 
> If you look at the original Changelog for that driver:
> 
>     Hp is providing a Hardware WatchDog Timer driver that will only work with the
>     specific HW Timer located in the HP ProLiant iLO 2 ASIC. The iLO 2 HW Timer
>     will generate a Non-maskable Interrupt (NMI) 9 seconds before physically
>     resetting the server, by removing power, so that the event can be logged to
>     the HP Integrated Management Log (IML), a Non-Volatile Random Access Memory
>     (NVRAM). The logging of the event is performed using the HP ProLiant ROM via
>     an Industry Standard access known as a BIOS Service Directory Entry.

Ok, that appears to be the case, too bad.

But the good news: if this callback is executed only once per system lifetime then 
we don't actually have to perform *any* modification on this driver, right? The 
reason is that this callback will panic unconditionally after performing the BIOS 
call. The control flow to the panic is unconditional:

        spin_lock_irqsave(&rom_lock, rom_pl);
        if (!die_nmi_called && !is_icru && !is_uefi)
                asminline_call(&cmn_regs, cru_rom_addr);

	...

        if (!is_icru && !is_uefi) { 
                if (cmn_regs.u1.ral == 0) {
                        nmi_panic(regs, "An NMI occurred, but unable to determine source.\n");
			...

        nmi_panic(regs, "An NMI occurred. Depending on your system the reason "
                "for the NMI is logged in any one of the following "
                "resources:\n"
                "1. Integrated Management Log (IML)\n"
                "2. OA Syslog\n"
                "3. OA Forward Progress Log\n"
                "4. iLO Event Log");


This callback does not get executed when we get perf NMIs, correct?

	Ingo

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] watchdog: hpwdt: Remove spinlock acquire and BIOS calls from NMI context
  2018-02-14  9:44                 ` Borislav Petkov
@ 2018-02-14 18:13                   ` Jerry Hoemann
  2018-02-14 23:17                     ` Ingo Molnar
  0 siblings, 1 reply; 74+ messages in thread
From: Jerry Hoemann @ 2018-02-14 18:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Borislav Petkov, Tim Chen, Peter Zijlstra, Dave Hansen, hpa,
	tglx, torvalds, linux-kernel, dwmw, linux-tip-commits,
	Arjan van de Ven, linux-watchdog


Ingo,

I have a patch set under review that brings hpwdt into compliance
with the watchdog core.

One of the changes removes the callback into firmware in hpwdt_pretimeout
and its associated spinlock.

https://lkml.org/lkml/2018/2/12/30

I will add you to the CC list of the next version of the set.

Jerry


On Wed, Feb 14, 2018 at 10:44:05AM +0100, Borislav Petkov wrote:
> + Jerry
> 
> On Wed, Feb 14, 2018 at 10:31:59AM +0100, Ingo Molnar wrote:
> > 
> > * Tim Chen <tim.c.chen@linux.intel.com> wrote:
> > 
> > > Dave Hansen and I had some discussions on how to handle the nested NMI and 
> > > firmware calls.  We thought of using a per cpu counter to record the nesting 
> > > call depth and toggle IBRS appropriately for the depth 0->1 and 1->0 transition.  
> > > Will this change be sufficient?
> > 
> > Yeah, so I think the first question should be: does the firmware call from NMI 
> > context make sense to begin with?
> > 
> > Because in this particular case it does not appear to be so: the reason for the 
> > BIOS/firmware call appears to be to determine how we nmi_panic() after receiving 
> > an NMI that no other NMI handler handled: with a passive-aggressive "I don't know" 
> > panic message or with a slightly more informative panic message.
> > 
> > That's not a real value-add to users - so we can avoid all these complications by 
> > applying the patch below:
> > 
> >  drivers/watchdog/hpwdt.c | 30 ++++--------------------------
> >  1 file changed, 4 insertions(+), 26 deletions(-)
> > 
> > As a bonus the spinlock use can be removed as well.
> > 
> > Thanks,
> > 
> > 	Ingo
> > 
> > ====================>
> > From b038428a739a3fcf0b9678305c131f60af7422ca Mon Sep 17 00:00:00 2001
> > From: Ingo Molnar <mingo@kernel.org>
> > Date: Wed, 14 Feb 2018 10:24:41 +0100
> > Subject: [PATCH] watchdog: hpwdt: Remove spinlock acquire and BIOS calls from
> >  NMI context
> > 
> > Taking a spinlock and calling into the firmware are problematic things to
> > do from NMI callbacks.
> > 
> > It also seems completely pointless in this particular case:
> > 
> >  - cmn_regs is updated by the firmware call in the hpwdt_pretimeout() NMI
> >    callback, but there's almost no use for it: we use it only to determine
> >    whether to panic with an 'unknown NMI' or a slightly more descriptive
> >    message.
> > 
> >  - cmn_regs and rom_lock is not used anywhere else, other than early detection
> >    of the firmware capability.
> > 
> > So remove rom_lock, make the cmn_regs call from the detection routine only,
> > and thus make the NMI callback a lot more robust.
> > 
> > Signed-off-by: Ingo Molnar <mingo@kernel.org>
> > ---
> >  drivers/watchdog/hpwdt.c | 30 ++++--------------------------
> >  1 file changed, 4 insertions(+), 26 deletions(-)
> > 
> > diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
> > index f1f00dfc0e68..2544fe482fe3 100644
> > --- a/drivers/watchdog/hpwdt.c
> > +++ b/drivers/watchdog/hpwdt.c
> > @@ -150,9 +150,7 @@ static unsigned int hpwdt_nmi_decoding;
> >  static unsigned int allow_kdump = 1;
> >  static unsigned int is_icru;
> >  static unsigned int is_uefi;
> > -static DEFINE_SPINLOCK(rom_lock);
> >  static void *cru_rom_addr;
> > -static struct cmn_registers cmn_regs;
> >  
> >  extern asmlinkage void asminline_call(struct cmn_registers *pi86Regs,
> >  						unsigned long *pRomEntry);
> > @@ -225,6 +223,7 @@ static int cru_detect(unsigned long map_entry,
> >  	unsigned long physical_bios_base = 0;
> >  	unsigned long physical_bios_offset = 0;
> >  	int retval = -ENODEV;
> > +	struct cmn_registers cmn_regs = { };
> >  
> >  	bios32_map = ioremap(map_entry, (2 * PAGE_SIZE));
> >  
> > @@ -486,32 +485,18 @@ static int hpwdt_my_nmi(void)
> >   */
> >  static int hpwdt_pretimeout(unsigned int ulReason, struct pt_regs *regs)
> >  {
> > -	unsigned long rom_pl;
> > -	static int die_nmi_called;
> > -
> >  	if (!hpwdt_nmi_decoding)
> >  		return NMI_DONE;
> >  
> >  	if ((ulReason == NMI_UNKNOWN) && !hpwdt_my_nmi())
> >  		return NMI_DONE;
> >  
> > -	spin_lock_irqsave(&rom_lock, rom_pl);
> > -	if (!die_nmi_called && !is_icru && !is_uefi)
> > -		asminline_call(&cmn_regs, cru_rom_addr);
> > -	die_nmi_called = 1;
> > -	spin_unlock_irqrestore(&rom_lock, rom_pl);
> > -
> >  	if (allow_kdump)
> >  		hpwdt_stop();
> >  
> > -	if (!is_icru && !is_uefi) {
> > -		if (cmn_regs.u1.ral == 0) {
> > -			nmi_panic(regs, "An NMI occurred, but unable to determine source.\n");
> > -			return NMI_HANDLED;
> > -		}
> > -	}
> > -	nmi_panic(regs, "An NMI occurred. Depending on your system the reason "
> > -		"for the NMI is logged in any one of the following "
> > +	nmi_panic(regs,
> > +		"An NMI occurred. Depending on your system the reason "
> > +		"for the NMI might be logged in any one of the following "
> >  		"resources:\n"
> >  		"1. Integrated Management Log (IML)\n"
> >  		"2. OA Syslog\n"
> > @@ -744,13 +729,6 @@ static int hpwdt_init_nmi_decoding(struct pci_dev *dev)
> >  				HPWDT_ARCH);
> >  			return retval;
> >  		}
> > -
> > -		/*
> > -		* We know this is the only CRU call we need to make so lets keep as
> > -		* few instructions as possible once the NMI comes in.
> > -		*/
> > -		cmn_regs.u1.rah = 0x0D;
> > -		cmn_regs.u1.ral = 0x02;
> >  	}
> >  
> >  	/*
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> Good mailing practices for 400: avoid top-posting and trim the reply.

-- 

-----------------------------------------------------------------------------
Jerry Hoemann                  Software Engineer   Hewlett Packard Enterprise
-----------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [tip:x86/pti] x86/speculation: Use IBRS if available before calling into firmware
  2018-02-14  8:56               ` Peter Zijlstra
  2018-02-14  8:57                 ` Peter Zijlstra
@ 2018-02-14 19:20                 ` Tim Chen
  2018-02-14 23:19                   ` Ingo Molnar
  1 sibling, 1 reply; 74+ messages in thread
From: Tim Chen @ 2018-02-14 19:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Dave Hansen, hpa, tglx, torvalds, linux-kernel,
	dwmw, linux-tip-commits, Borislav Petkov, Arjan van de Ven

On 02/14/2018 12:56 AM, Peter Zijlstra wrote:

> 
> At the very least this must disable and re-enable preemption, such that
> we guarantee we inc/dec the same counter. ISTR some firmware calls (EFI)
> actually are preemptible so that wouldn't work.
> 
> Further, consider:
> 
> 	this_cpu_inc_return()		// 0->1
> 	<NMI>
> 		this_cpu_inc_return()	// 1->2
> 		call_broken_arse_firmware()
> 		this_cpu_dec_return()	// 2->1
> 	</NMI>
> 	wrmsr(SPEC_CTRL, IBRS);
> 
> 	/* from dodgy firmware crap */
> 
> 	this_cpu_dec_return()		// 1->0
> 	wrmsr(SPEC_CTRL, 0);
> 

How about the following patch.

Thanks.

Tim

---
>From a37d28622781acf2789dd63f2fdb57be733f15a4 Mon Sep 17 00:00:00 2001
From: Tim Chen <tim.c.chen@linux.intel.com>
Date: Tue, 13 Feb 2018 04:10:41 -0800
Subject: [PATCH] x86/firmware: Prevent IBRS from being turned off prematurely.

Dave Woodhoue proposed using IBRS to protect the firmware
call path against Spectre exploit.  However, firmware path
can go through NMI and we can get nested calls, causing
unsafe firmware calls with missing IBRS as illustrated below:

	firmware_restrict_branch_speculation_start (set IBRS=1)
	NMI
	    firmware_restrict_branch_speculation_start (set IBRS=1)
	    firmware call
	    firmware_restrict_branch_speculation_end (set IBRS=0)
	NMI return
	firmware call (with IBRS=0)  <---- unsafe call, premature IBRS disabling
	firmware_restrict_branch_speculation_end (set IBRS=0)

This patch proposes using a per cpu counter to track the IBRS
firmware call nesting depth, to ensure that we don't turn off
IBRS prematurely before calling firmware.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 arch/x86/include/asm/nospec-branch.h | 10 ++++++++--
 arch/x86/kernel/cpu/bugs.c           |  2 ++
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 297d457..a8dd9ea 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -146,6 +146,8 @@ enum spectre_v2_mitigation {
 extern char __indirect_thunk_start[];
 extern char __indirect_thunk_end[];
 
+DECLARE_PER_CPU(int, spec_ctrl_ibrs_fw_depth);
+
 /*
  * On VMEXIT we must ensure that no RSB predictions learned in the guest
  * can be followed in the host, by overwriting the RSB completely. Both
@@ -186,14 +188,18 @@ static inline void indirect_branch_prediction_barrier(void)
  */
 static inline void firmware_restrict_branch_speculation_start(void)
 {
+	preempt_disable();
+	this_cpu_inc(spec_ctrl_ibrs_fw_depth);
 	alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS,
 			      X86_FEATURE_USE_IBRS_FW);
 }
 
 static inline void firmware_restrict_branch_speculation_end(void)
 {
-	alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,
-			      X86_FEATURE_USE_IBRS_FW);
+	if (this_cpu_dec_return(spec_ctrl_ibrs_fw_depth) == 0)
+		alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,
+				      X86_FEATURE_USE_IBRS_FW);
+	preempt_enable();
 }
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_X86_NOSPEC_BRANCH_H_ */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index c994dab..4ab13f0 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -27,6 +27,8 @@
 #include <asm/intel-family.h>
 
 static void __init spectre_v2_select_mitigation(void);
+DEFINE_PER_CPU(int, spec_ctrl_ibrs_fw_depth);
+EXPORT_PER_CPU_SYMBOL(spec_ctrl_ibrs_fw_depth);
 
 void __init check_bugs(void)
 {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [PATCH] watchdog: hpwdt: Remove spinlock acquire and BIOS calls from NMI context
  2018-02-14 18:13                   ` Jerry Hoemann
@ 2018-02-14 23:17                     ` Ingo Molnar
  2018-02-15 17:44                       ` Jerry Hoemann
  0 siblings, 1 reply; 74+ messages in thread
From: Ingo Molnar @ 2018-02-14 23:17 UTC (permalink / raw)
  To: Jerry Hoemann
  Cc: Borislav Petkov, Tim Chen, Peter Zijlstra, Dave Hansen, hpa,
	tglx, torvalds, linux-kernel, dwmw, linux-tip-commits,
	Arjan van de Ven, linux-watchdog


* Jerry Hoemann <jerry.hoemann@hpe.com> wrote:

> 
> Ingo,
> 
> I have a patch set under review that brings hpwdt into compliance
> with the watchdog core.
> 
> One of the changes removes the callback into firmware in hpwdt_pretimeout
> and its associated spinlock.
> 
> https://lkml.org/lkml/2018/2/12/30

 drivers/watchdog/hpwdt.c | 490 +----------------------------------------------
 1 file changed, 8 insertions(+), 482 deletions(-)

Very nice, and this should solve all the NMI handling complications!

> I will add you to the CC list of the next version of the set.

Thanks!

	Ingo

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [tip:x86/pti] x86/speculation: Use IBRS if available before calling into firmware
  2018-02-14 19:20                 ` Tim Chen
@ 2018-02-14 23:19                   ` Ingo Molnar
  2018-02-15  2:01                     ` Tim Chen
  0 siblings, 1 reply; 74+ messages in thread
From: Ingo Molnar @ 2018-02-14 23:19 UTC (permalink / raw)
  To: Tim Chen
  Cc: Peter Zijlstra, Dave Hansen, hpa, tglx, torvalds, linux-kernel,
	dwmw, linux-tip-commits, Borislav Petkov, Arjan van de Ven


* Tim Chen <tim.c.chen@linux.intel.com> wrote:

> On 02/14/2018 12:56 AM, Peter Zijlstra wrote:
> 
> > 
> > At the very least this must disable and re-enable preemption, such that
> > we guarantee we inc/dec the same counter. ISTR some firmware calls (EFI)
> > actually are preemptible so that wouldn't work.
> > 
> > Further, consider:
> > 
> > 	this_cpu_inc_return()		// 0->1
> > 	<NMI>
> > 		this_cpu_inc_return()	// 1->2
> > 		call_broken_arse_firmware()
> > 		this_cpu_dec_return()	// 2->1
> > 	</NMI>
> > 	wrmsr(SPEC_CTRL, IBRS);
> > 
> > 	/* from dodgy firmware crap */
> > 
> > 	this_cpu_dec_return()		// 1->0
> > 	wrmsr(SPEC_CTRL, 0);
> > 
> 
> How about the following patch.

These fragile complications of the interface should now be unnecessary, as the 
only driver that called firmware from NMI callbacks (hpwdt.c) is going to remove 
those firmware callbacks in the near future - solving the problem at the source.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [tip:x86/pti] x86/speculation: Add <asm/msr-index.h> dependency
  2018-02-13 13:28         ` Peter Zijlstra
  2018-02-13 13:38           ` Ingo Molnar
  2018-02-13 15:26           ` [tip:x86/pti] x86/speculation: Add <asm/msr-index.h> dependency tip-bot for Peter Zijlstra
@ 2018-02-15  0:28           ` tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 74+ messages in thread
From: tip-bot for Peter Zijlstra @ 2018-02-15  0:28 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: torvalds, linux-kernel, tglx, hpa, peterz, mingo, joe.konno

Commit-ID:  ea00f301285ea2f07393678cd2b6057878320c9d
Gitweb:     https://git.kernel.org/tip/ea00f301285ea2f07393678cd2b6057878320c9d
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Tue, 13 Feb 2018 14:28:19 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 15 Feb 2018 01:15:51 +0100

x86/speculation: Add <asm/msr-index.h> dependency

Joe Konno reported a compile failure resulting from using an MSR
without inclusion of <asm/msr-index.h>, and while the current code builds
fine (by accident) this needs fixing for future patches.

Reported-by: Joe Konno <joe.konno@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: arjan@linux.intel.com
Cc: bp@alien8.de
Cc: dan.j.williams@intel.com
Cc: dave.hansen@linux.intel.com
Cc: dwmw2@infradead.org
Cc: dwmw@amazon.co.uk
Cc: gregkh@linuxfoundation.org
Cc: hpa@zytor.com
Cc: jpoimboe@redhat.com
Cc: linux-tip-commits@vger.kernel.org
Cc: luto@kernel.org
Fixes: 20ffa1caecca ("x86/speculation: Add basic IBPB (Indirect Branch Prediction Barrier) support")
Link: http://lkml.kernel.org/r/20180213132819.GJ25201@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/nospec-branch.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 300cc15..76b0585 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -6,6 +6,7 @@
 #include <asm/alternative.h>
 #include <asm/alternative-asm.h>
 #include <asm/cpufeatures.h>
+#include <asm/msr-index.h>
 
 #ifdef __ASSEMBLY__
 

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [tip:x86/pti] x86/speculation: Use IBRS if available before calling into firmware
  2018-02-14 23:19                   ` Ingo Molnar
@ 2018-02-15  2:01                     ` Tim Chen
  0 siblings, 0 replies; 74+ messages in thread
From: Tim Chen @ 2018-02-15  2:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Dave Hansen, hpa, tglx, torvalds, linux-kernel,
	dwmw, linux-tip-commits, Borislav Petkov, Arjan van de Ven

On 02/14/2018 03:19 PM, Ingo Molnar wrote:
> 
> * Tim Chen <tim.c.chen@linux.intel.com> wrote:
> 
>> On 02/14/2018 12:56 AM, Peter Zijlstra wrote:
>>
>>>
>>> At the very least this must disable and re-enable preemption, such that
>>> we guarantee we inc/dec the same counter. ISTR some firmware calls (EFI)
>>> actually are preemptible so that wouldn't work.
>>>
>>> Further, consider:
>>>
>>> 	this_cpu_inc_return()		// 0->1
>>> 	<NMI>
>>> 		this_cpu_inc_return()	// 1->2
>>> 		call_broken_arse_firmware()
>>> 		this_cpu_dec_return()	// 2->1
>>> 	</NMI>
>>> 	wrmsr(SPEC_CTRL, IBRS);
>>>
>>> 	/* from dodgy firmware crap */
>>>
>>> 	this_cpu_dec_return()		// 1->0
>>> 	wrmsr(SPEC_CTRL, 0);
>>>
>>
>> How about the following patch.
> 
> These fragile complications of the interface should now be unnecessary, as the 
> only driver that called firmware from NMI callbacks (hpwdt.c) is going to remove 
> those firmware callbacks in the near future - solving the problem at the source.
> 
> Thanks,
> 
> 	Ingo
> 

Sounds good. I sent this out before seeing the other mails on removing NMI callbacks
from hpwdt.c

Tim

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] watchdog: hpwdt: Remove spinlock acquire and BIOS calls from NMI context
  2018-02-14 23:17                     ` Ingo Molnar
@ 2018-02-15 17:44                       ` Jerry Hoemann
  2018-02-15 19:02                         ` Ingo Molnar
  2018-02-15 19:48                         ` Peter Zijlstra
  0 siblings, 2 replies; 74+ messages in thread
From: Jerry Hoemann @ 2018-02-15 17:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Borislav Petkov, Tim Chen, Peter Zijlstra, Dave Hansen, hpa,
	tglx, torvalds, linux-kernel, dwmw, linux-tip-commits,
	Arjan van de Ven, linux-watchdog

On Thu, Feb 15, 2018 at 12:17:04AM +0100, Ingo Molnar wrote:
> 
> * Jerry Hoemann <jerry.hoemann@hpe.com> wrote:
> 
> > 
> > Ingo,
> > 
> > I have a patch set under review that brings hpwdt into compliance
> > with the watchdog core.
> > 
> > One of the changes removes the callback into firmware in hpwdt_pretimeout
> > and its associated spinlock.
> > 
> > https://lkml.org/lkml/2018/2/12/30
> 
>  drivers/watchdog/hpwdt.c | 490 +----------------------------------------------
>  1 file changed, 8 insertions(+), 482 deletions(-)
> 
> Very nice, and this should solve all the NMI handling complications!
> 
> > I will add you to the CC list of the next version of the set.
> 
> Thanks!
> 
> 	Ingo


Ingo,

Is your desire to remove of the firmware callback/spinlock in hpwdt_pretimeout
related to David Woodhouse patch set:

	https://lkml.org/lkml/2018/2/14/305  ?

Which I think are to mitigate performance issues resulting from the
changes to address the specter/meltdown?

Any there other changes to hpwdt needed related to this work?

Thanks

Jerry

-- 

-----------------------------------------------------------------------------
Jerry Hoemann                  Software Engineer   Hewlett Packard Enterprise
-----------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] watchdog: hpwdt: Remove spinlock acquire and BIOS calls from NMI context
  2018-02-15 17:44                       ` Jerry Hoemann
@ 2018-02-15 19:02                         ` Ingo Molnar
  2018-02-15 19:48                         ` Peter Zijlstra
  1 sibling, 0 replies; 74+ messages in thread
From: Ingo Molnar @ 2018-02-15 19:02 UTC (permalink / raw)
  To: Jerry Hoemann
  Cc: Borislav Petkov, Tim Chen, Peter Zijlstra, Dave Hansen, hpa,
	tglx, torvalds, linux-kernel, dwmw, linux-tip-commits,
	Arjan van de Ven, linux-watchdog


* Jerry Hoemann <jerry.hoemann@hpe.com> wrote:

> On Thu, Feb 15, 2018 at 12:17:04AM +0100, Ingo Molnar wrote:
> > 
> > * Jerry Hoemann <jerry.hoemann@hpe.com> wrote:
> > 
> > > 
> > > Ingo,
> > > 
> > > I have a patch set under review that brings hpwdt into compliance
> > > with the watchdog core.
> > > 
> > > One of the changes removes the callback into firmware in hpwdt_pretimeout
> > > and its associated spinlock.
> > > 
> > > https://lkml.org/lkml/2018/2/12/30
> > 
> >  drivers/watchdog/hpwdt.c | 490 +----------------------------------------------
> >  1 file changed, 8 insertions(+), 482 deletions(-)
> > 
> > Very nice, and this should solve all the NMI handling complications!
> > 
> > > I will add you to the CC list of the next version of the set.
> > 
> > Thanks!
> > 
> > 	Ingo
> 
> 
> Ingo,
> 
> Is your desire to remove of the firmware callback/spinlock in hpwdt_pretimeout
> related to David Woodhouse patch set:
> 
> 	https://lkml.org/lkml/2018/2/14/305  ?
> 
> Which I think are to mitigate performance issues resulting from the
> changes to address the specter/meltdown?

So the motivation was that we are trying to wrap BIOS/EFI calls into 
Spectre-disabling sections - and while doing that we realized that hpwdt calls the 
firmware from an NMI callback, which would have complicated the Spectre work..

But with that call removed from NMI context it's all perfect!

> Any there other changes to hpwdt needed related to this work?

No other changes needed I think!

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] watchdog: hpwdt: Remove spinlock acquire and BIOS calls from NMI context
  2018-02-15 17:44                       ` Jerry Hoemann
  2018-02-15 19:02                         ` Ingo Molnar
@ 2018-02-15 19:48                         ` Peter Zijlstra
  1 sibling, 0 replies; 74+ messages in thread
From: Peter Zijlstra @ 2018-02-15 19:48 UTC (permalink / raw)
  To: Jerry Hoemann
  Cc: Ingo Molnar, Borislav Petkov, Tim Chen, Dave Hansen, hpa, tglx,
	torvalds, linux-kernel, dwmw, linux-tip-commits,
	Arjan van de Ven, linux-watchdog

On Thu, Feb 15, 2018 at 10:44:44AM -0700, Jerry Hoemann wrote:
> Is your desire to remove of the firmware callback/spinlock in hpwdt_pretimeout
> related to David Woodhouse patch set:

That's the work that made us find this code, but no, even without that,
code like that is entirely dodgy. NMI code needs to be very careful and
firmware just isn't something I trust to get things right. Worse, its
not something we can fix.

And using spnilock from NMI context is just wrong, if anything it needs
be raw_spnilock but even then, yuck.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [tip:x86/pti] x86/speculation: Use IBRS if available before calling into firmware
  2018-02-11 19:19   ` [tip:x86/pti] x86/speculation: Use IBRS if available before calling into firmware tip-bot for David Woodhouse
  2018-02-12  5:59     ` afzal mohammed
  2018-02-12 10:22     ` Ingo Molnar
@ 2018-02-16 18:44     ` Tim Chen
  2018-02-16 19:16       ` David Woodhouse
  2 siblings, 1 reply; 74+ messages in thread
From: Tim Chen @ 2018-02-16 18:44 UTC (permalink / raw)
  To: mingo, hpa, tglx, torvalds, linux-kernel, dwmw, peterz,
	linux-tip-commits

On 02/11/2018 11:19 AM, tip-bot for David Woodhouse wrote:
		\
>  })
> diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
> index 300cc15..788c4da 100644
> --- a/arch/x86/include/asm/nospec-branch.h
> +++ b/arch/x86/include/asm/nospec-branch.h
> @@ -162,17 +162,36 @@ static inline void vmexit_fill_RSB(void)
>  #endif
>  }
>  
> +#define alternative_msr_write(_msr, _val, _feature)		\
> +	asm volatile(ALTERNATIVE("",				\
> +				 "movl %[msr], %%ecx\n\t"	\
> +				 "movl %[val], %%eax\n\t"	\
> +				 "movl $0, %%edx\n\t"		\
> +				 "wrmsr",			\
> +				 _feature)			\
> +		     : : [msr] "i" (_msr), [val] "i" (_val)	\
> +		     : "eax", "ecx", "edx", "memory")
> +

I encountered hang on a machine but not others when using the above
macro.  It is probably an alignment thing with ALTERNATIVE as the problem went
away after I made the change below:

Tim

diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 8f2ff74..0f65bd2 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -148,6 +148,7 @@ extern char __indirect_thunk_end[];
 
 #define alternative_msr_write(_msr, _val, _feature)            \
        asm volatile(ALTERNATIVE("",                            \
+                               ".align 16\n\t"                \
                                "movl %[msr], %%ecx\n\t"       \
                                "movl %[val], %%eax\n\t"       \
                                "movl $0, %%edx\n\t"           \

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [tip:x86/pti] x86/speculation: Use IBRS if available before calling into firmware
  2018-02-16 18:44     ` [tip:x86/pti] x86/speculation: Use IBRS if available before calling into firmware Tim Chen
@ 2018-02-16 19:16       ` David Woodhouse
  2018-02-16 23:46         ` Tim Chen
  0 siblings, 1 reply; 74+ messages in thread
From: David Woodhouse @ 2018-02-16 19:16 UTC (permalink / raw)
  To: Tim Chen, mingo, hpa, tglx, torvalds, linux-kernel, peterz,
	linux-tip-commits

[-- Attachment #1: Type: text/plain, Size: 1594 bytes --]

On Fri, 2018-02-16 at 10:44 -0800, Tim Chen wrote:
> 
> I encountered hang on a machine but not others when using the above
> macro.  It is probably an alignment thing with ALTERNATIVE as the
> problem went
> away after I made the change below:
> 
> Tim
> 
> diff --git a/arch/x86/include/asm/nospec-branch.h
> b/arch/x86/include/asm/nospec-branch.h
> index 8f2ff74..0f65bd2 100644
> --- a/arch/x86/include/asm/nospec-branch.h
> +++ b/arch/x86/include/asm/nospec-branch.h
> @@ -148,6 +148,7 @@ extern char __indirect_thunk_end[];
>  
>  #define alternative_msr_write(_msr, _val, _feature)            \
>         asm volatile(ALTERNATIVE("",                            \
> +                               ".align 16\n\t"                \
>                                 "movl %[msr], %%ecx\n\t"       \
>                                 "movl %[val], %%eax\n\t"       \
>                                 "movl $0, %%edx\n\t"           \

That's weird. Note that .align in an altinstr section isn't actually
going to do what you'd expect; the oldinstr and altinstr sections
aren't necessarily aligned the same, so however many NOPs it inserts
into the alternative, might be deliberately *misaligning* it in the
code that actually gets executed.

Are you sure you're not running a kernel where the alternatives code
would turn that alternative which *starts* with a NOP, into *all* NOPs?

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [tip:x86/pti] x86/speculation: Use IBRS if available before calling into firmware
  2018-02-16 19:16       ` David Woodhouse
@ 2018-02-16 23:46         ` Tim Chen
  2018-02-17 10:26           ` Ingo Molnar
  0 siblings, 1 reply; 74+ messages in thread
From: Tim Chen @ 2018-02-16 23:46 UTC (permalink / raw)
  To: David Woodhouse, mingo, hpa, tglx, torvalds, linux-kernel,
	peterz, linux-tip-commits

On 02/16/2018 11:16 AM, David Woodhouse wrote:
> On Fri, 2018-02-16 at 10:44 -0800, Tim Chen wrote:
>>
>> I encountered hang on a machine but not others when using the above
>> macro.  It is probably an alignment thing with ALTERNATIVE as the
>> problem went
>> away after I made the change below:
>>
>> Tim
>>
>> diff --git a/arch/x86/include/asm/nospec-branch.h
>> b/arch/x86/include/asm/nospec-branch.h
>> index 8f2ff74..0f65bd2 100644
>> --- a/arch/x86/include/asm/nospec-branch.h
>> +++ b/arch/x86/include/asm/nospec-branch.h
>> @@ -148,6 +148,7 @@ extern char __indirect_thunk_end[];
>>  
>>  #define alternative_msr_write(_msr, _val, _feature)            \
>>         asm volatile(ALTERNATIVE("",                            \
>> +                               ".align 16\n\t"                \
>>                                 "movl %[msr], %%ecx\n\t"       \
>>                                 "movl %[val], %%eax\n\t"       \
>>                                 "movl $0, %%edx\n\t"           \
> 
> That's weird. Note that .align in an altinstr section isn't actually
> going to do what you'd expect; the oldinstr and altinstr sections
> aren't necessarily aligned the same, so however many NOPs it inserts
> into the alternative, might be deliberately *misaligning* it in the
> code that actually gets executed.
> 
> Are you sure you're not running a kernel where the alternatives code
> would turn that alternative which *starts* with a NOP, into *all* NOPs?
> 

I rebuild the kernel again without the align. I'm no longer
seeing the issue again on that machine that had an issue earlier.  
So let's ignore this for now as I can't reproduce the problem.

It should be other issues causing the hang I saw earlier.

Thanks.

Tim

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [tip:x86/pti] x86/speculation: Use IBRS if available before calling into firmware
  2018-02-16 23:46         ` Tim Chen
@ 2018-02-17 10:26           ` Ingo Molnar
  2018-02-19  9:20             ` Peter Zijlstra
  0 siblings, 1 reply; 74+ messages in thread
From: Ingo Molnar @ 2018-02-17 10:26 UTC (permalink / raw)
  To: Tim Chen, Peter Zijlstra
  Cc: David Woodhouse, hpa, tglx, torvalds, linux-kernel, peterz,
	linux-tip-commits, Borislav Petkov, Peter Zijlstra


* Tim Chen <tim.c.chen@linux.intel.com> wrote:

> On 02/16/2018 11:16 AM, David Woodhouse wrote:
> > On Fri, 2018-02-16 at 10:44 -0800, Tim Chen wrote:
> >>
> >> I encountered hang on a machine but not others when using the above
> >> macro.  It is probably an alignment thing with ALTERNATIVE as the
> >> problem went
> >> away after I made the change below:
> >>
> >> Tim
> >>
> >> diff --git a/arch/x86/include/asm/nospec-branch.h
> >> b/arch/x86/include/asm/nospec-branch.h
> >> index 8f2ff74..0f65bd2 100644
> >> --- a/arch/x86/include/asm/nospec-branch.h
> >> +++ b/arch/x86/include/asm/nospec-branch.h
> >> @@ -148,6 +148,7 @@ extern char __indirect_thunk_end[];
> >>  
> >>  #define alternative_msr_write(_msr, _val, _feature)            \
> >>         asm volatile(ALTERNATIVE("",                            \
> >> +                               ".align 16\n\t"                \
> >>                                 "movl %[msr], %%ecx\n\t"       \
> >>                                 "movl %[val], %%eax\n\t"       \
> >>                                 "movl $0, %%edx\n\t"           \
> > 
> > That's weird. Note that .align in an altinstr section isn't actually
> > going to do what you'd expect; the oldinstr and altinstr sections
> > aren't necessarily aligned the same, so however many NOPs it inserts
> > into the alternative, might be deliberately *misaligning* it in the
> > code that actually gets executed.
> > 
> > Are you sure you're not running a kernel where the alternatives code
> > would turn that alternative which *starts* with a NOP, into *all* NOPs?
> > 
> 
> I rebuild the kernel again without the align. I'm no longer
> seeing the issue again on that machine that had an issue earlier.  
> So let's ignore this for now as I can't reproduce the problem.
> 
> It should be other issues causing the hang I saw earlier.

Note that PeterZ was struggling with intermittent boot hangs yesterday as well, 
which hangs came and went during severeal (fruitless) bisection attempts. Then at 
a certain point the hangs went away altogether.

The symptoms for both his and your hangs are consistent with an alignment 
dependent bug.

My other guess is that it's perhaps somehow microcode related?

I'm not seeing any hangs myself, on various test systems.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [tip:x86/pti] x86/speculation: Use IBRS if available before calling into firmware
  2018-02-17 10:26           ` Ingo Molnar
@ 2018-02-19  9:20             ` Peter Zijlstra
  2018-02-19  9:29               ` David Woodhouse
  2018-02-19  9:36               ` Ingo Molnar
  0 siblings, 2 replies; 74+ messages in thread
From: Peter Zijlstra @ 2018-02-19  9:20 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Tim Chen, David Woodhouse, hpa, tglx, torvalds, linux-kernel,
	linux-tip-commits, Borislav Petkov

On Sat, Feb 17, 2018 at 11:26:16AM +0100, Ingo Molnar wrote:
> Note that PeterZ was struggling with intermittent boot hangs yesterday as well, 
> which hangs came and went during severeal (fruitless) bisection attempts. Then at 
> a certain point the hangs went away altogether.
> 
> The symptoms for both his and your hangs are consistent with an alignment 
> dependent bug.

Mine would consistently hang right after

  "Freeing SMP alternatives memory: 44K"

At one point I bisected it to commit:

  a06cc94f3f8d ("x86/build: Drop superfluous ALIGN from the linker script")

But shortly thereafter I started having trouble reproducing, and now I
can run kernels that before would consistently fail to boot :/

> My other guess is that it's perhaps somehow microcode related?

I did not update or otherwise change packages while I was bisecting; the
machine is:

vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
stepping        : 4
microcode       : 0x428

Like I wrote on IRC; what _seems_ to have 'cured' things is clearing out
my /boot. The amount of kernels generated by the bisect was immense and
running 'update-grub' was taking _much_ longer than actually building a
new kernel.

What I have not tried is again generating and installing that many
kernels to see if that will make it go 'funny' again.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [tip:x86/pti] x86/speculation: Use IBRS if available before calling into firmware
  2018-02-19  9:20             ` Peter Zijlstra
@ 2018-02-19  9:29               ` David Woodhouse
  2018-02-19  9:39                 ` Ingo Molnar
  2018-02-19 10:08                 ` Peter Zijlstra
  2018-02-19  9:36               ` Ingo Molnar
  1 sibling, 2 replies; 74+ messages in thread
From: David Woodhouse @ 2018-02-19  9:29 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Tim Chen, hpa, tglx, torvalds, linux-kernel, linux-tip-commits,
	Borislav Petkov

[-- Attachment #1: Type: text/plain, Size: 683 bytes --]

On Mon, 2018-02-19 at 10:20 +0100, Peter Zijlstra wrote:
> 
> I did not update or otherwise change packages while I was bisecting; the
> machine is:
> 
> vendor_id       : GenuineIntel
> cpu family      : 6
> model           : 62
> model name      : Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
> stepping        : 4
> microcode       : 0x428

That's IVX with a microcode that doesn't *have* IBRS/IBPB. I don't
think there's a publicly available microcode that does; I assume you
didn't have one and build it into your kernel for early loading, and
thus you really weren't even using IBRS here? The code never even gets
patched in?

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [tip:x86/pti] x86/speculation: Use IBRS if available before calling into firmware
  2018-02-19  9:20             ` Peter Zijlstra
  2018-02-19  9:29               ` David Woodhouse
@ 2018-02-19  9:36               ` Ingo Molnar
  1 sibling, 0 replies; 74+ messages in thread
From: Ingo Molnar @ 2018-02-19  9:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tim Chen, David Woodhouse, hpa, tglx, torvalds, linux-kernel,
	linux-tip-commits, Borislav Petkov


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Sat, Feb 17, 2018 at 11:26:16AM +0100, Ingo Molnar wrote:
> > Note that PeterZ was struggling with intermittent boot hangs yesterday as well, 
> > which hangs came and went during severeal (fruitless) bisection attempts. Then at 
> > a certain point the hangs went away altogether.
> > 
> > The symptoms for both his and your hangs are consistent with an alignment 
> > dependent bug.
> 
> Mine would consistently hang right after
> 
>   "Freeing SMP alternatives memory: 44K"
> 
> At one point I bisected it to commit:
> 
>   a06cc94f3f8d ("x86/build: Drop superfluous ALIGN from the linker script")

So, just to make sure this commit had no effect: I cannot really see anything 
wrong with that commit, it does a single substantial change, which is to remove 
this explicit alignment:

                . = ALIGN(8);
                TEXT_TEXT

which seems fine to me, since it expanded to:

                . = ALIGN(8);
                ALIGN_FUNCTION();                                       \
		...

which expanded to:

                . = ALIGN(8);
		. = ALIGN(8);
		...

... which duplication the commit removed.

... where all the relevant defitions of TEXT_TEXT and ALIGN_FUNCTION are 
unconditional and not overriden anywhere for arch/x86 builds.

I.e. the commit is a NOP AFAICS.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [tip:x86/pti] x86/speculation: Use IBRS if available before calling into firmware
  2018-02-19  9:29               ` David Woodhouse
@ 2018-02-19  9:39                 ` Ingo Molnar
  2018-02-19  9:44                   ` David Woodhouse
  2018-02-19 10:08                 ` Peter Zijlstra
  1 sibling, 1 reply; 74+ messages in thread
From: Ingo Molnar @ 2018-02-19  9:39 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Peter Zijlstra, Tim Chen, hpa, tglx, torvalds, linux-kernel,
	linux-tip-commits, Borislav Petkov


* David Woodhouse <dwmw2@infradead.org> wrote:

> On Mon, 2018-02-19 at 10:20 +0100, Peter Zijlstra wrote:
> > 
> > I did not update or otherwise change packages while I was bisecting; the
> > machine is:
> > 
> > vendor_id       : GenuineIntel
> > cpu family      : 6
> > model           : 62
> > model name      : Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
> > stepping        : 4
> > microcode       : 0x428
> 
> That's IVX with a microcode that doesn't *have* IBRS/IBPB. I don't
> think there's a publicly available microcode that does; I assume you
> didn't have one and build it into your kernel for early loading, and
> thus you really weren't even using IBRS here? The code never even gets
> patched in?

Note that PeterZ's boot troubles only match the *symptoms* of the spurious 
failures reported by Tim Chen. Your commit wasn't bisected to.

I linked these two reports on the (remote) possibility that they might be related 
via some alignment dependent bug somewhere else in the x86 kernel - possibly 
completely unrelated to any IBRS/IBPB details.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [tip:x86/pti] x86/speculation: Use IBRS if available before calling into firmware
  2018-02-19  9:39                 ` Ingo Molnar
@ 2018-02-19  9:44                   ` David Woodhouse
  0 siblings, 0 replies; 74+ messages in thread
From: David Woodhouse @ 2018-02-19  9:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Tim Chen, hpa, tglx, torvalds, linux-kernel,
	linux-tip-commits, Borislav Petkov

[-- Attachment #1: Type: text/plain, Size: 1427 bytes --]



On Mon, 2018-02-19 at 10:39 +0100, Ingo Molnar wrote:
> * David Woodhouse <dwmw2@infradead.org> wrote:
> 
> > 
> > On Mon, 2018-02-19 at 10:20 +0100, Peter Zijlstra wrote:
> > > 
> > > 
> > > I did not update or otherwise change packages while I was bisecting; the
> > > machine is:
> > > 
> > > vendor_id       : GenuineIntel
> > > cpu family      : 6
> > > model           : 62
> > > model name      : Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
> > > stepping        : 4
> > > microcode       : 0x428
> > That's IVX with a microcode that doesn't *have* IBRS/IBPB. I don't
> > think there's a publicly available microcode that does; I assume you
> > didn't have one and build it into your kernel for early loading, and
> > thus you really weren't even using IBRS here? The code never even gets
> > patched in?
> Note that PeterZ's boot troubles only match the *symptoms* of the spurious 
> failures reported by Tim Chen. Your commit wasn't bisected to.
> 
> I linked these two reports on the (remote) possibility that they might be related 
> via some alignment dependent bug somewhere else in the x86 kernel - possibly 
> completely unrelated to any IBRS/IBPB details.

Understood. I was merely *accentuating* the "completely unrelated to
any IBRS/IBPB" bit, in a "la la la I'm ignoring this because I'm
working on other things" kind of a way... :)

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [tip:x86/pti] x86/speculation: Use IBRS if available before calling into firmware
  2018-02-19  9:29               ` David Woodhouse
  2018-02-19  9:39                 ` Ingo Molnar
@ 2018-02-19 10:08                 ` Peter Zijlstra
  1 sibling, 0 replies; 74+ messages in thread
From: Peter Zijlstra @ 2018-02-19 10:08 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Ingo Molnar, Tim Chen, hpa, tglx, torvalds, linux-kernel,
	linux-tip-commits, Borislav Petkov

On Mon, Feb 19, 2018 at 09:29:12AM +0000, David Woodhouse wrote:
> On Mon, 2018-02-19 at 10:20 +0100, Peter Zijlstra wrote:
> > 
> > I did not update or otherwise change packages while I was bisecting; the
> > machine is:
> > 
> > vendor_id       : GenuineIntel
> > cpu family      : 6
> > model           : 62
> > model name      : Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
> > stepping        : 4
> > microcode       : 0x428
> 
> That's IVX with a microcode that doesn't *have* IBRS/IBPB. I don't
> think there's a publicly available microcode that does; I assume you
> didn't have one and build it into your kernel for early loading, and
> thus you really weren't even using IBRS here? The code never even gets
> patched in?

I wasn't using IRBS afaik. I was bisceting tip/master before the
IBRS/firmware patches (not sure they're in now or not).

No fancy ucode setup, I think I have ucode image in the initrd, but that
would be same for all kernels.

^ permalink raw reply	[flat|nested] 74+ messages in thread

end of thread, other threads:[~2018-02-19 10:08 UTC | newest]

Thread overview: 74+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-10 23:39 [PATCH v2 0/6] Spectre v2 updates David Woodhouse
2018-02-10 23:39 ` [PATCH v2 1/6] x86/speculation: Update Speculation Control microcode blacklist David Woodhouse
2018-02-11 12:08   ` [tip:x86/pti] " tip-bot for David Woodhouse
2018-02-12  9:50   ` [PATCH v2 1/6] " Darren Kenny
2018-02-12 14:16   ` David Woodhouse
2018-02-12 14:32     ` Thomas Gleixner
2018-02-10 23:39 ` [PATCH v2 2/6] Revert "x86/speculation: Simplify indirect_branch_prediction_barrier()" David Woodhouse
2018-02-11 12:09   ` [tip:x86/pti] " tip-bot for David Woodhouse
2018-02-13  8:58   ` tip-bot for David Woodhouse
2018-02-13  9:41     ` Peter Zijlstra
2018-02-13 11:28       ` Ingo Molnar
2018-02-13 13:28         ` Peter Zijlstra
2018-02-13 13:38           ` Ingo Molnar
2018-02-13 15:26           ` [tip:x86/pti] x86/speculation: Add <asm/msr-index.h> dependency tip-bot for Peter Zijlstra
2018-02-15  0:28           ` tip-bot for Peter Zijlstra
2018-02-10 23:39 ` [PATCH v2 3/6] KVM: x86: Reduce retpoline performance impact in slot_handle_level_range() David Woodhouse
2018-02-11 12:09   ` [tip:x86/pti] KVM/x86: Reduce retpoline performance impact in slot_handle_level_range(), by always inlining iterator helper methods tip-bot for David Woodhouse
2018-02-13  8:58   ` tip-bot for David Woodhouse
2018-02-10 23:39 ` [PATCH v2 4/6] X86/nVMX: Properly set spec_ctrl and pred_cmd before merging MSRs David Woodhouse
2018-02-11 12:10   ` [tip:x86/pti] " tip-bot for KarimAllah Ahmed
2018-02-13  8:59   ` tip-bot for KarimAllah Ahmed
2018-02-10 23:39 ` [PATCH v2 5/6] KVM/nVMX: Set the CPU_BASED_USE_MSR_BITMAPS if we have a valid L02 MSR bitmap David Woodhouse
2018-02-11 10:19   ` Ingo Molnar
     [not found]     ` <1518345844.3677.365.camel@amazon.co.uk>
2018-02-11 10:55       ` Ingo Molnar
2018-02-11 12:10   ` [tip:x86/pti] " tip-bot for KarimAllah Ahmed
2018-02-13  8:59   ` tip-bot for KarimAllah Ahmed
2018-02-10 23:39 ` [PATCH v2 6/6] x86/speculation: Use IBRS if available before calling into firmware David Woodhouse
2018-02-11 11:46   ` Ingo Molnar
2018-02-11 10:41 ` [PATCH v2 0/6] Spectre v2 updates Ingo Molnar
2018-02-11 15:19 ` [PATCH v2.1] x86/speculation: Use IBRS if available before calling into firmware David Woodhouse
2018-02-11 18:50   ` [PATCH] x86/speculation: Clean up various Spectre related details Ingo Molnar
2018-02-11 19:25     ` David Woodhouse
2018-02-11 19:43       ` Ingo Molnar
2018-02-12 15:30         ` David Woodhouse
2018-02-13  8:04           ` Ingo Molnar
2018-02-11 19:19   ` [tip:x86/pti] x86/speculation: Use IBRS if available before calling into firmware tip-bot for David Woodhouse
2018-02-12  5:59     ` afzal mohammed
2018-02-12 16:30       ` David Woodhouse
2018-02-12 10:22     ` Ingo Molnar
2018-02-12 11:50       ` Peter Zijlstra
2018-02-12 12:27         ` David Woodhouse
2018-02-12 13:06           ` Peter Zijlstra
2018-02-13  7:58           ` Ingo Molnar
2018-02-12 12:28         ` Peter Zijlstra
2018-02-12 16:13       ` Dave Hansen
2018-02-12 16:58         ` Peter Zijlstra
2018-02-13  7:55           ` Ingo Molnar
2018-02-14  1:49             ` Tim Chen
2018-02-14  8:56               ` Peter Zijlstra
2018-02-14  8:57                 ` Peter Zijlstra
2018-02-14 19:20                 ` Tim Chen
2018-02-14 23:19                   ` Ingo Molnar
2018-02-15  2:01                     ` Tim Chen
2018-02-14  9:31               ` [PATCH] watchdog: hpwdt: Remove spinlock acquire and BIOS calls from NMI context Ingo Molnar
2018-02-14  9:38                 ` Peter Zijlstra
2018-02-14 10:39                   ` Ingo Molnar
2018-02-14  9:44                 ` Borislav Petkov
2018-02-14 18:13                   ` Jerry Hoemann
2018-02-14 23:17                     ` Ingo Molnar
2018-02-15 17:44                       ` Jerry Hoemann
2018-02-15 19:02                         ` Ingo Molnar
2018-02-15 19:48                         ` Peter Zijlstra
2018-02-16 18:44     ` [tip:x86/pti] x86/speculation: Use IBRS if available before calling into firmware Tim Chen
2018-02-16 19:16       ` David Woodhouse
2018-02-16 23:46         ` Tim Chen
2018-02-17 10:26           ` Ingo Molnar
2018-02-19  9:20             ` Peter Zijlstra
2018-02-19  9:29               ` David Woodhouse
2018-02-19  9:39                 ` Ingo Molnar
2018-02-19  9:44                   ` David Woodhouse
2018-02-19 10:08                 ` Peter Zijlstra
2018-02-19  9:36               ` Ingo Molnar
2018-02-12  8:27 ` [PATCH v2 0/6] Spectre v2 updates Paolo Bonzini
2018-02-13  7:59   ` Ingo Molnar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.