All of lore.kernel.org
 help / color / mirror / Atom feed
* [MODERATED] [PATCH v2 0/8] MDSv2 8
@ 2018-12-10 17:53 Andi Kleen
  2018-12-10 17:53 ` [MODERATED] [PATCH v2 1/8] MDSv2 4 Andi Kleen
                   ` (7 more replies)
  0 siblings, 8 replies; 59+ messages in thread
From: Andi Kleen @ 2018-12-10 17:53 UTC (permalink / raw)
  To: speck; +Cc: Andi Kleen

Here's support for flushing CPU buffers on kernel exit or going
idle for group 4.  This prevents leaking any data in the non SMT
case.

This either uses the VERW instruction using the recently released microcode
updates, or using software sequences for some platforms.

This mainly covers single thread, not SMT (except for the idle case).

I lumped all the issues together under the Microarchitectural Data
Sampling (MDS) name because they need the same mitigations,a
and it doesn't seem worth duplicating the sysfs files and bug entries.

Some notes:
- Against 4.20-rc5
- There's a (bogus) build time warning from objtool about unreachable code.
- No support for Xeon Phi so far.

Changes against previous version:
- Added software sequences
- Various fixes and cleanups.

Andi Kleen (8):
  x86/speculation/mds: Add basic bug infrastructure for MDS
  x86/speculation/mds: Clear CPU buffers on kernel exit for 64bit
  x86/speculation/mds: Clear CPU buffers on entering idle
  x86/speculation/mds: Add sysfs reporting
  x86/speculation/mds: Add software sequences for older CPUs.
  x86/speculation/mds: Force MDS_EXIT on paranoid exit for sw sequence
  x86/speculation/mds: Call software sequences on KVM entry
  x86/speculation/mds: Clear buffers on kernel exit for 32bit kernels

 .../ABI/testing/sysfs-devices-system-cpu      |   1 +
 .../admin-guide/kernel-parameters.txt         |  13 +++
 arch/x86/entry/calling.h                      |   2 +
 arch/x86/entry/entry_32.S                     |   5 +
 arch/x86/entry/entry_64.S                     |  16 +++
 arch/x86/entry/entry_64_compat.S              |   1 +
 arch/x86/include/asm/cpufeatures.h            |   4 +
 arch/x86/include/asm/irqflags.h               |   3 +
 arch/x86/include/asm/msr-index.h              |   1 +
 arch/x86/include/asm/mwait.h                  |   4 +
 arch/x86/include/asm/nospec-branch.h          |  14 +++
 arch/x86/kernel/cpu/bugs.c                    | 109 ++++++++++++++++++
 arch/x86/kernel/cpu/common.c                  |  14 +++
 arch/x86/kernel/kvm.c                         |   1 +
 arch/x86/kernel/process.c                     |   4 +
 arch/x86/kernel/smpboot.c                     |   1 +
 arch/x86/kvm/vmx.c                            |   8 ++
 arch/x86/lib/Makefile                         |   1 +
 arch/x86/lib/clear_cpu.S                      | 107 +++++++++++++++++
 drivers/base/cpu.c                            |   8 ++
 20 files changed, 317 insertions(+)
 create mode 100644 arch/x86/lib/clear_cpu.S

-- 
2.17.2

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] [PATCH v2 1/8] MDSv2 4
  2018-12-10 17:53 [MODERATED] [PATCH v2 0/8] MDSv2 8 Andi Kleen
@ 2018-12-10 17:53 ` Andi Kleen
  2018-12-11 14:14   ` [MODERATED] " Paolo Bonzini
                     ` (2 more replies)
  2018-12-10 17:53 ` [MODERATED] [PATCH v2 2/8] MDSv2 1 Andi Kleen
                   ` (6 subsequent siblings)
  7 siblings, 3 replies; 59+ messages in thread
From: Andi Kleen @ 2018-12-10 17:53 UTC (permalink / raw)
  To: speck; +Cc: Andi Kleen

MDS is micro architectural data sampling, which is a side channel
attack on internal buffers in Intel CPUs. They all have
the same mitigations for single thread, so we lump them all
together as a single MDS issue.

This patch adds the basic infrastructure to detect if the current
CPU is affected by MDS, and if yes set the right BUG bits.

We also provide a command line option "mds_disable" to disable
any workarounds.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 Documentation/admin-guide/kernel-parameters.txt |  3 +++
 arch/x86/include/asm/cpufeatures.h              |  2 ++
 arch/x86/include/asm/msr-index.h                |  1 +
 arch/x86/kernel/cpu/bugs.c                      | 10 ++++++++++
 arch/x86/kernel/cpu/common.c                    | 14 ++++++++++++++
 5 files changed, 30 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index aefd358a5ca3..48891572e825 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2341,6 +2341,9 @@
 			Format: <first>,<last>
 			Specifies range of consoles to be captured by the MDA.
 
+	mds_disable	[X86]
+			Disable workarounds for Micro-architectural Data Sampling.
+
 	mem=nn[KMG]	[KNL,BOOT] Force usage of a specific amount of memory
 			Amount of memory to be used when the kernel is not able
 			to see the whole system memory or for test.
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 28c4a502b419..93fab3a1e046 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -342,6 +342,7 @@
 /* Intel-defined CPU features, CPUID level 0x00000007:0 (EDX), word 18 */
 #define X86_FEATURE_AVX512_4VNNIW	(18*32+ 2) /* AVX-512 Neural Network Instructions */
 #define X86_FEATURE_AVX512_4FMAPS	(18*32+ 3) /* AVX-512 Multiply Accumulation Single precision */
+#define X86_FEATURE_MB_CLEAR		(18*32+10) /* Flush state on VERW */
 #define X86_FEATURE_PCONFIG		(18*32+18) /* Intel PCONFIG */
 #define X86_FEATURE_SPEC_CTRL		(18*32+26) /* "" Speculation Control (IBRS + IBPB) */
 #define X86_FEATURE_INTEL_STIBP		(18*32+27) /* "" Single Thread Indirect Branch Predictors */
@@ -379,5 +380,6 @@
 #define X86_BUG_SPECTRE_V2		X86_BUG(16) /* CPU is affected by Spectre variant 2 attack with indirect branches */
 #define X86_BUG_SPEC_STORE_BYPASS	X86_BUG(17) /* CPU is affected by speculative store bypass attack */
 #define X86_BUG_L1TF			X86_BUG(18) /* CPU is affected by L1 Terminal Fault */
+#define X86_BUG_MDS			X86_BUG(19) /* CPU is affected by Microarchitectural data sampling */
 
 #endif /* _ASM_X86_CPUFEATURES_H */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index c8f73efb4ece..303064a9a0a9 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -77,6 +77,7 @@
 						    * attack, so no Speculative Store Bypass
 						    * control required.
 						    */
+#define ARCH_CAP_MDS_NO			(1 << 5)   /* No Microarchitectural data sampling */
 
 #define MSR_IA32_FLUSH_CMD		0x0000010b
 #define L1D_FLUSH			(1 << 0)   /*
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 500278f5308e..5cac243849d8 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -35,6 +35,7 @@
 static void __init spectre_v2_select_mitigation(void);
 static void __init ssb_select_mitigation(void);
 static void __init l1tf_select_mitigation(void);
+static void __init mds_select_mitigation(void);
 
 /* The base value of the SPEC_CTRL MSR that always has to be preserved. */
 u64 x86_spec_ctrl_base;
@@ -99,6 +100,8 @@ void __init check_bugs(void)
 
 	l1tf_select_mitigation();
 
+	mds_select_mitigation();
+
 #ifdef CONFIG_X86_32
 	/*
 	 * Check whether we are able to run this kernel safely on SMP.
@@ -1041,6 +1044,13 @@ early_param("l1tf", l1tf_cmdline);
 
 #undef pr_fmt
 
+static void mds_select_mitigation(void)
+{
+	if (cmdline_find_option_bool(boot_command_line, "mds_disable") ||
+	    !boot_cpu_has_bug(X86_BUG_MDS))
+		setup_clear_cpu_cap(X86_FEATURE_MB_CLEAR);
+}
+
 #ifdef CONFIG_SYSFS
 
 #define L1TF_DEFAULT_MSG "Mitigation: PTE Inversion"
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index ffb181f959d2..bebeb67015fc 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -998,6 +998,14 @@ static const __initconst struct x86_cpu_id cpu_no_l1tf[] = {
 	{}
 };
 
+static const __initconst struct x86_cpu_id cpu_no_mds[] = {
+	/* in addition to cpu_no_speculation */
+	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_ATOM_GOLDMONT	},
+	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_ATOM_GOLDMONT_X	},
+	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_ATOM_GOLDMONT_PLUS	},
+	{}
+};
+
 static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
 {
 	u64 ia32_cap = 0;
@@ -1019,6 +1027,12 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
 	if (ia32_cap & ARCH_CAP_IBRS_ALL)
 		setup_force_cpu_cap(X86_FEATURE_IBRS_ENHANCED);
 
+	if ((boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
+	    !x86_match_cpu(cpu_no_mds)) &&
+	    !(ia32_cap & ARCH_CAP_MDS_NO) &&
+	    !(ia32_cap & ARCH_CAP_RDCL_NO))
+		setup_force_cpu_bug(X86_BUG_MDS);
+
 	if (x86_match_cpu(cpu_no_meltdown))
 		return;
 
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [MODERATED] [PATCH v2 2/8] MDSv2 1
  2018-12-10 17:53 [MODERATED] [PATCH v2 0/8] MDSv2 8 Andi Kleen
  2018-12-10 17:53 ` [MODERATED] [PATCH v2 1/8] MDSv2 4 Andi Kleen
@ 2018-12-10 17:53 ` Andi Kleen
  2018-12-10 22:49   ` [MODERATED] " Jiri Kosina
  2018-12-10 17:53 ` [MODERATED] [PATCH v2 3/8] MDSv2 5 Andi Kleen
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 59+ messages in thread
From: Andi Kleen @ 2018-12-10 17:53 UTC (permalink / raw)
  To: speck; +Cc: Andi Kleen

For MDS the CPU might leak previously touched data
in CPU internal structures. Make sure to clear these structures
every time we exit the kernel. This prevents any leakage
between user processes or between kernel and user.

The flushing is provided by new microcode as a new side
effect of the otherwise unused VERW instruction. We
add VERW to all the kernel exit paths.

We don't need to do this for guests because the L1TF cache
flush will implicitely do the same flushing, and is
automatically selected on MDS affected systems.

This mitigation doesn't address Hyper Threading.

So far this is for 64bit only, 32bit is not covered yet.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/entry/calling.h             | 2 ++
 arch/x86/entry/entry_64.S            | 8 ++++++++
 arch/x86/entry/entry_64_compat.S     | 1 +
 arch/x86/include/asm/nospec-branch.h | 8 ++++++++
 4 files changed, 19 insertions(+)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 25e5a6bda8c3..4b07f97e3874 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -1,4 +1,6 @@
 /* SPDX-License-Identifier: GPL-2.0 */
+#include <linux/stringify.h>
+#include <asm/segment.h>
 #include <linux/jump_label.h>
 #include <asm/unwind_hints.h>
 #include <asm/cpufeatures.h>
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index ce25d84023c0..0bb7bb3dc728 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -248,6 +248,9 @@ GLOBAL(entry_SYSCALL_64_after_hwframe)
 	 * perf profiles. Nothing jumps here.
 	 */
 syscall_return_via_sysret:
+
+	EXIT_MDS
+
 	/* rcx and r11 are already restored (see code above) */
 	UNWIND_HINT_EMPTY
 	POP_REGS pop_rdi=0 skip_r11rcx=1
@@ -604,6 +607,8 @@ GLOBAL(swapgs_restore_regs_and_return_to_usermode)
 	ud2
 1:
 #endif
+	EXIT_MDS
+
 	POP_REGS pop_rdi=0
 
 	/*
@@ -623,6 +628,7 @@ GLOBAL(swapgs_restore_regs_and_return_to_usermode)
 	/* Push user RDI on the trampoline stack. */
 	pushq	(%rdi)
 
+
 	/*
 	 * We are on the trampoline stack.  All regs except RDI are live.
 	 * We can do future final exit work right here.
@@ -1616,6 +1622,8 @@ end_repeat_nmi:
 	movq	$-1, %rsi
 	call	do_nmi
 
+	EXIT_MDS
+
 	/* Always restore stashed CR3 value (see paranoid_entry) */
 	RESTORE_CR3 scratch_reg=%r15 save_reg=%r14
 
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 8eaf8952c408..8b5a11c6a32e 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -276,6 +276,7 @@ sysret32_from_system_call:
 	popq	%rdx			/* Skip pt_regs->cx */
 	popq	%rdx			/* pt_regs->dx */
 	popq	%rsi			/* pt_regs->si */
+	EXIT_MDS
 	popq	%rdi			/* pt_regs->di */
 
         /*
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 032b6009baab..f780f29e351f 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -150,6 +150,14 @@
 #endif
 .endm
 
+.macro EXIT_MDS
+	/* Clear CPU buffers that could leak. Instruction must be in memory form. */
+	ALTERNATIVE_2 "", __stringify(push $__USER_DS ; verw (% _ASM_SP ) ; add $8, % _ASM_SP ),\
+		X86_FEATURE_MB_CLEAR, \
+		"call do_clear_cpu", \
+		X86_BUG_MDS_CLEAR_CPU
+.endm
+
 #else /* __ASSEMBLY__ */
 
 #define ANNOTATE_NOSPEC_ALTERNATIVE				\
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [MODERATED] [PATCH v2 3/8] MDSv2 5
  2018-12-10 17:53 [MODERATED] [PATCH v2 0/8] MDSv2 8 Andi Kleen
  2018-12-10 17:53 ` [MODERATED] [PATCH v2 1/8] MDSv2 4 Andi Kleen
  2018-12-10 17:53 ` [MODERATED] [PATCH v2 2/8] MDSv2 1 Andi Kleen
@ 2018-12-10 17:53 ` Andi Kleen
  2018-12-10 23:00   ` [MODERATED] " Linus Torvalds
  2018-12-10 17:53 ` [MODERATED] [PATCH v2 4/8] MDSv2 0 Andi Kleen
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 59+ messages in thread
From: Andi Kleen @ 2018-12-10 17:53 UTC (permalink / raw)
  To: speck; +Cc: Andi Kleen

From: Andi Kleen <ak@linux.intel.com>
Subject:  x86/speculation/mds: Clear CPU buffers on entering
 idle

When entering idle the internal state of the current CPU might
become visible to the thread sibling because the CPU "frees" some
internal resources.

To ensure there is no MDS leakage always clear the CPU state
before doing any idling. We only do this if SMT is enabled,
as otherwise there is no leakage possible.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/include/asm/irqflags.h      |  3 +++
 arch/x86/include/asm/mwait.h         |  4 ++++
 arch/x86/include/asm/nospec-branch.h |  5 +++++
 arch/x86/kernel/cpu/bugs.c           | 23 +++++++++++++++++++++++
 arch/x86/kernel/kvm.c                |  1 +
 arch/x86/kernel/process.c            |  4 ++++
 arch/x86/kernel/smpboot.c            |  1 +
 7 files changed, 41 insertions(+)

diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index 058e40fed167..0b6560f0defb 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -3,6 +3,7 @@
 #define _X86_IRQFLAGS_H_
 
 #include <asm/processor-flags.h>
+#include <asm/nospec-branch.h>
 
 #ifndef __ASSEMBLY__
 
@@ -96,6 +97,7 @@ static inline notrace void arch_local_irq_enable(void)
  */
 static inline __cpuidle void arch_safe_halt(void)
 {
+	clear_cpu_buffers_idle();
 	native_safe_halt();
 }
 
@@ -105,6 +107,7 @@ static inline __cpuidle void arch_safe_halt(void)
  */
 static inline __cpuidle void halt(void)
 {
+	clear_cpu_buffers_idle();
 	native_halt();
 }
 
diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index 39a2fb29378a..36edad498d19 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -40,6 +40,8 @@ static inline void __monitorx(const void *eax, unsigned long ecx,
 
 static inline void __mwait(unsigned long eax, unsigned long ecx)
 {
+	clear_cpu_buffers_idle();
+
 	/* "mwait %eax, %ecx;" */
 	asm volatile(".byte 0x0f, 0x01, 0xc9;"
 		     :: "a" (eax), "c" (ecx));
@@ -82,6 +84,8 @@ static inline void __mwaitx(unsigned long eax, unsigned long ebx,
 static inline void __sti_mwait(unsigned long eax, unsigned long ecx)
 {
 	trace_hardirqs_on();
+	clear_cpu_buffers_idle();
+
 	/* "mwait %eax, %ecx;" */
 	asm volatile("sti; .byte 0x0f, 0x01, 0xc9;"
 		     :: "a" (eax), "c" (ecx));
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index f780f29e351f..3a2843062dfb 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -9,6 +9,7 @@
 #include <asm/alternative-asm.h>
 #include <asm/cpufeatures.h>
 #include <asm/msr-index.h>
+#include <asm/segment.h>
 
 /*
  * Fill the CPU return stack buffer.
@@ -386,4 +387,8 @@ do {								\
 # endif
 #endif
 
+#ifndef __ASSEMBLY__
+void clear_cpu_buffers_idle(void);
+#endif
+
 #endif /* _ASM_X86_NOSPEC_BRANCH_H_ */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 5cac243849d8..8200b41b8db9 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1051,6 +1051,29 @@ static void mds_select_mitigation(void)
 		setup_clear_cpu_cap(X86_FEATURE_MB_CLEAR);
 }
 
+/*
+ * Clear CPU buffers before going idle, so that no state is leaked to SMT
+ * siblings taking over thread resources.
+ * Out of line to avoid include hell.
+ *
+ * Assumes that interrupts are disabled and only get reenabled
+ * before idle, otherwise the data from a racing interrupt might not
+ * get cleared. There are some callers who violate this,
+ * but they only in very uncommon cases.
+ */
+void clear_cpu_buffers_idle(void)
+{
+	if (cpu_smt_control != CPU_SMT_ENABLED)
+		return;
+	/* Has to be memory form, don't modify to use an register */
+	alternative_input("",
+		"pushq %[kernelds]; verw (%%rsp) ; addq $8,%%rsp \n",
+		X86_FEATURE_MB_CLEAR,
+		[kernelds] "i" (__KERNEL_DS));
+}
+
+EXPORT_SYMBOL(clear_cpu_buffers_idle);
+
 #ifdef CONFIG_SYSFS
 
 #define L1TF_DEFAULT_MSG "Mitigation: PTE Inversion"
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index ba4bfb7f6a36..357c66adc250 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -159,6 +159,7 @@ void kvm_async_pf_task_wait(u32 token, int interrupt_kernel)
 			/*
 			 * We cannot reschedule. So halt.
 			 */
+			clear_cpu_buffers_idle();
 			native_safe_halt();
 			local_irq_disable();
 		}
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 7d31192296a8..1eda5658ac55 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -586,6 +586,8 @@ void stop_this_cpu(void *dummy)
 	disable_local_APIC();
 	mcheck_cpu_clear(this_cpu_ptr(&cpu_info));
 
+	clear_cpu_buffers_idle();
+
 	/*
 	 * Use wbinvd on processors that support SME. This provides support
 	 * for performing a successful kexec when going from SME inactive
@@ -672,6 +674,8 @@ static __cpuidle void mwait_idle(void)
 			mb(); /* quirk */
 		}
 
+		clear_cpu_buffers_idle();
+
 		__monitor((void *)&current_thread_info()->flags, 0, 0);
 		if (!need_resched())
 			__sti_mwait(0, 0);
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index a9134d1910b9..96820e635491 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1662,6 +1662,7 @@ void hlt_play_dead(void)
 		wbinvd();
 
 	while (1) {
+		clear_cpu_buffers_idle();
 		native_halt();
 		/*
 		 * If NMI wants to wake up CPU0, start CPU0.
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [MODERATED] [PATCH v2 4/8] MDSv2 0
  2018-12-10 17:53 [MODERATED] [PATCH v2 0/8] MDSv2 8 Andi Kleen
                   ` (2 preceding siblings ...)
  2018-12-10 17:53 ` [MODERATED] [PATCH v2 3/8] MDSv2 5 Andi Kleen
@ 2018-12-10 17:53 ` Andi Kleen
  2018-12-12 21:45   ` [MODERATED] " Konrad Rzeszutek Wilk
  2018-12-10 17:53 ` [MODERATED] [PATCH v2 5/8] MDSv2 7 Andi Kleen
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 59+ messages in thread
From: Andi Kleen @ 2018-12-10 17:53 UTC (permalink / raw)
  To: speck; +Cc: Andi Kleen

From: Andi Kleen <ak@linux.intel.com>
Subject:  x86/speculation/mds: Add sysfs reporting

Report mds mitigation state in sysfs vulnerabilities.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 .../ABI/testing/sysfs-devices-system-cpu         |  1 +
 arch/x86/kernel/cpu/bugs.c                       | 16 ++++++++++++++++
 drivers/base/cpu.c                               |  8 ++++++++
 3 files changed, 25 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 73318225a368..02b7bb711214 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -477,6 +477,7 @@ What:		/sys/devices/system/cpu/vulnerabilities
 		/sys/devices/system/cpu/vulnerabilities/spectre_v2
 		/sys/devices/system/cpu/vulnerabilities/spec_store_bypass
 		/sys/devices/system/cpu/vulnerabilities/l1tf
+		/sys/devices/system/cpu/vulnerabilities/mds
 Date:		January 2018
 Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
 Description:	Information about CPU vulnerabilities
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 8200b41b8db9..accab3279068 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1175,6 +1175,16 @@ static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr
 		if (boot_cpu_has(X86_FEATURE_L1TF_PTEINV))
 			return l1tf_show_state(buf);
 		break;
+
+	case X86_BUG_MDS:
+		/* Assumes Hypervisor exposed HT state to us if in guest */
+		if (boot_cpu_has(X86_FEATURE_MB_CLEAR)) {
+			if (cpu_smt_control != CPU_SMT_ENABLED)
+				return sprintf(buf, "Mitigation: MB_CLEAR\n");
+			return sprintf(buf, "Mitigation: MB_CLEAR, HT vulnerable\n");
+		}
+		return sprintf(buf, "Vulnerable\n");
+
 	default:
 		break;
 	}
@@ -1206,4 +1216,10 @@ ssize_t cpu_show_l1tf(struct device *dev, struct device_attribute *attr, char *b
 {
 	return cpu_show_common(dev, attr, buf, X86_BUG_L1TF);
 }
+
+ssize_t cpu_show_mds(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	return cpu_show_common(dev, attr, buf, X86_BUG_MDS);
+}
+
 #endif
diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index eb9443d5bae1..2fd6ca1021c2 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -546,11 +546,18 @@ ssize_t __weak cpu_show_l1tf(struct device *dev,
 	return sprintf(buf, "Not affected\n");
 }
 
+ssize_t __weak cpu_show_mds(struct device *dev,
+			    struct device_attribute *attr, char *buf)
+{
+	return sprintf(buf, "Not affected\n");
+}
+
 static DEVICE_ATTR(meltdown, 0444, cpu_show_meltdown, NULL);
 static DEVICE_ATTR(spectre_v1, 0444, cpu_show_spectre_v1, NULL);
 static DEVICE_ATTR(spectre_v2, 0444, cpu_show_spectre_v2, NULL);
 static DEVICE_ATTR(spec_store_bypass, 0444, cpu_show_spec_store_bypass, NULL);
 static DEVICE_ATTR(l1tf, 0444, cpu_show_l1tf, NULL);
+static DEVICE_ATTR(mds, 0444, cpu_show_mds, NULL);
 
 static struct attribute *cpu_root_vulnerabilities_attrs[] = {
 	&dev_attr_meltdown.attr,
@@ -558,6 +565,7 @@ static struct attribute *cpu_root_vulnerabilities_attrs[] = {
 	&dev_attr_spectre_v2.attr,
 	&dev_attr_spec_store_bypass.attr,
 	&dev_attr_l1tf.attr,
+	&dev_attr_mds.attr,
 	NULL
 };
 
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [MODERATED] [PATCH v2 5/8] MDSv2 7
  2018-12-10 17:53 [MODERATED] [PATCH v2 0/8] MDSv2 8 Andi Kleen
                   ` (3 preceding siblings ...)
  2018-12-10 17:53 ` [MODERATED] [PATCH v2 4/8] MDSv2 0 Andi Kleen
@ 2018-12-10 17:53 ` Andi Kleen
  2018-12-11  0:33   ` [MODERATED] " Andrew Cooper
  2018-12-12 21:41   ` Konrad Rzeszutek Wilk
  2018-12-10 17:53 ` [MODERATED] [PATCH v2 6/8] MDSv2 3 Andi Kleen
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 59+ messages in thread
From: Andi Kleen @ 2018-12-10 17:53 UTC (permalink / raw)
  To: speck; +Cc: Andi Kleen

On older CPUs before Broadwell clearing the CPU buffer with VERW is not
available, so we implement software sequences. These can then be
automatically patched in as needed.

Support mitigation for Nehalem upto Broadwell. Broadwell strictly doesn't
need it because it should have the microcode update for VERW, which
is preferred.

We add command line options to force the two different sequences,
mainly for regression testing and if someone wants to test
on pre Nehalem CPUs.

Note to backporters: this patch requires eager FPU support.

Note there is no Xeon Phi support so far.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 .../admin-guide/kernel-parameters.txt         |  10 ++
 arch/x86/include/asm/cpufeatures.h            |   2 +
 arch/x86/kernel/cpu/bugs.c                    |  59 +++++++++-
 arch/x86/lib/Makefile                         |   1 +
 arch/x86/lib/clear_cpu.S                      | 107 ++++++++++++++++++
 5 files changed, 176 insertions(+), 3 deletions(-)
 create mode 100644 arch/x86/lib/clear_cpu.S

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 48891572e825..16e10f92f8dd 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2344,6 +2344,16 @@
 	mds_disable	[X86]
 			Disable workarounds for Micro-architectural Data Sampling.
 
+	mds=swclear	[X86]
+			Force using software sequence for clearing data that
+			could be exploited by Micro-architectural Data Sampling.
+			Normally automatically enabled when needed.
+
+	mds=swclearhsw	[X86]
+			Use Haswell/Broadwell specific sequence for clearing
+			data that could be exploited by Micro-architectural Data
+			Sampling. Normally automatically enabled when needed.
+
 	mem=nn[KMG]	[KNL,BOOT] Force usage of a specific amount of memory
 			Amount of memory to be used when the kernel is not able
 			to see the whole system memory or for test.
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 93fab3a1e046..110759334c88 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -381,5 +381,7 @@
 #define X86_BUG_SPEC_STORE_BYPASS	X86_BUG(17) /* CPU is affected by speculative store bypass attack */
 #define X86_BUG_L1TF			X86_BUG(18) /* CPU is affected by L1 Terminal Fault */
 #define X86_BUG_MDS			X86_BUG(19) /* CPU is affected by Microarchitectural data sampling */
+#define X86_BUG_MDS_CLEAR_CPU		X86_BUG(20) /* CPU needs call to clear_cpu on kernel exit/idle for MDS */
+#define X86_BUG_MDS_CLEAR_CPU_HSW	X86_BUG(21) /* CPU needs Haswell version of clear cpu */
 
 #endif /* _ASM_X86_CPUFEATURES_H */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index accab3279068..91619f7d90be 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -31,6 +31,8 @@
 #include <asm/intel-family.h>
 #include <asm/e820/api.h>
 #include <asm/hypervisor.h>
+#include <asm/hypervisor.h>
+#include <asm/cpu_device_id.h>
 
 static void __init spectre_v2_select_mitigation(void);
 static void __init ssb_select_mitigation(void);
@@ -1044,11 +1046,55 @@ early_param("l1tf", l1tf_cmdline);
 
 #undef pr_fmt
 
+static const __initconst struct x86_cpu_id cpu_mds_clear_cpu[] = {
+	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_NEHALEM	 },
+	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_NEHALEM_G	 },
+	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_NEHALEM_EP	 },
+	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_NEHALEM_EX	 },
+	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_WESTMERE	 },
+	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_WESTMERE_EP	 },
+	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_WESTMERE_EX	 },
+	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_SANDYBRIDGE	 },
+	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_SANDYBRIDGE_X },
+	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_IVYBRIDGE	 },
+	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_IVYBRIDGE_X	 },
+	{}
+};
+
+static const __initconst struct x86_cpu_id cpu_mds_clear_cpu_hsw[] = {
+	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_HASWELL_CORE	    },
+	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_HASWELL_X	    },
+	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_HASWELL_ULT	    },
+	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_HASWELL_GT3E	    },
+
+	/* Have MB_CLEAR with microcode update, but list just in case: */
+	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_BROADWELL_CORE   },
+	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_BROADWELL_GT3E   },
+	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_BROADWELL_X	    },
+	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_BROADWELL_XEON_D },
+	{}
+};
+
 static void mds_select_mitigation(void)
 {
 	if (cmdline_find_option_bool(boot_command_line, "mds_disable") ||
-	    !boot_cpu_has_bug(X86_BUG_MDS))
+	    !boot_cpu_has_bug(X86_BUG_MDS)) {
 		setup_clear_cpu_cap(X86_FEATURE_MB_CLEAR);
+		setup_clear_cpu_cap(X86_BUG_MDS_CLEAR_CPU_HSW);
+		setup_clear_cpu_cap(X86_BUG_MDS_CLEAR_CPU);
+		return;
+	}
+
+	if ((!boot_cpu_has(X86_FEATURE_MB_CLEAR) &&
+		x86_match_cpu(cpu_mds_clear_cpu)) ||
+		cmdline_find_option_bool(boot_command_line, "mds=swclear"))
+		setup_force_cpu_cap(X86_BUG_MDS_CLEAR_CPU);
+	if ((!boot_cpu_has(X86_FEATURE_MB_CLEAR) &&
+		x86_match_cpu(cpu_mds_clear_cpu_hsw)) ||
+		cmdline_find_option_bool(boot_command_line, "mds=swclearhsw")) {
+		setup_force_cpu_cap(X86_BUG_MDS_CLEAR_CPU);
+		setup_force_cpu_cap(X86_BUG_MDS_CLEAR_CPU_HSW);
+	}
 }
 
 /*
@@ -1066,9 +1112,11 @@ void clear_cpu_buffers_idle(void)
 	if (cpu_smt_control != CPU_SMT_ENABLED)
 		return;
 	/* Has to be memory form, don't modify to use an register */
-	alternative_input("",
-		"pushq %[kernelds]; verw (%%rsp) ; addq $8,%%rsp \n",
+	alternative_input_2("",
+		"push %[kernelds]; verw (%%" _ASM_SP ") ; add $8,%%" _ASM_SP "\n",
 		X86_FEATURE_MB_CLEAR,
+		"call do_clear_cpu",
+		X86_BUG_MDS_CLEAR_CPU,
 		[kernelds] "i" (__KERNEL_DS));
 }
 
@@ -1183,6 +1231,11 @@ static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr
 				return sprintf(buf, "Mitigation: MB_CLEAR\n");
 			return sprintf(buf, "Mitigation: MB_CLEAR, HT vulnerable\n");
 		}
+		if (boot_cpu_has_bug(X86_BUG_MDS_CLEAR_CPU)) {
+			if (cpu_smt_control != CPU_SMT_ENABLED)
+				return sprintf(buf, "Mitigation: software buffer clearing\n");
+			return sprintf(buf, "Mitigation: software buffer clearing, HT vulnerable\n");
+		}
 		return sprintf(buf, "Vulnerable\n");
 
 	default:
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 25a972c61b0a..ce07225e53e1 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -28,6 +28,7 @@ lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o insn-eval.o
 lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
 lib-$(CONFIG_FUNCTION_ERROR_INJECTION)	+= error-inject.o
 lib-$(CONFIG_RETPOLINE) += retpoline.o
+lib-y += clear_cpu.o
 
 obj-y += msr.o msr-reg.o msr-reg-export.o hweight.o
 
diff --git a/arch/x86/lib/clear_cpu.S b/arch/x86/lib/clear_cpu.S
new file mode 100644
index 000000000000..5af33baf5427
--- /dev/null
+++ b/arch/x86/lib/clear_cpu.S
@@ -0,0 +1,107 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#include <linux/linkage.h>
+#include <asm/alternative-asm.h>
+#include <asm/cpufeatures.h>
+
+/*
+ * Clear internal CPU buffers on kernel boundaries.
+ *
+ * These sequences are somewhat fragile, please don't add
+ * or change instructions in the middle of the areas marked with
+ * start/end.
+ *
+ * Interrupts and NMIs we deal with by reclearing. We clear parts
+ * of the kernel stack, which has other advantages too.
+ *
+ * Save all registers to make it easier to use for callers.
+ *
+ * This sequence is for Nehalem-IvyBridge. For Haswell we jump
+ * to hsw_clear_buf.
+ *
+ * These functions need to be called on a full stack, as they may
+ * use upto 1.5k of stack. They should be also called with
+ * interrupts disabled. NMIs etc. are handled by letting every
+ * NMI do its own clear sequence.
+ */
+ENTRY(ivb_clear_cpu)
+GLOBAL(do_clear_cpu)
+	/*
+	 * obj[tf]ool complains about unreachable code here,
+	 * which appears to be spurious?!?
+	 */
+	ALTERNATIVE "", "jmp hsw_clear_cpu", X86_BUG_MDS_CLEAR_CPU_HSW
+	push %__ASM_REG(si)
+	push %__ASM_REG(di)
+	push %__ASM_REG(cx)
+	mov %_ASM_SP, %__ASM_REG(si)
+	sub  $2*16, %_ASM_SP
+	and  $-16,%_ASM_SP
+	movdqa %xmm0, (%_ASM_SP)
+	movdqa %xmm1, 1*16(%_ASM_SP)
+	sub  $672, %_ASM_SP
+	xorpd %xmm0,%xmm0
+	movdqa %xmm0, (%_ASM_SP)
+	movdqa %xmm0, 16(%_ASM_SP)
+	mov %_ASM_SP, %__ASM_REG(di)
+	/* Clear sequence start */
+	movdqu %xmm0,(%__ASM_REG(di))
+	lfence
+	orpd (%__ASM_REG(di)), %xmm0
+	orpd (%__ASM_REG(di)), %xmm1
+	mfence
+	movl $40, %ecx
+	add  $32, %__ASM_REG(di)
+1:	movntdq %xmm0, (%__ASM_REG(di))
+	add  $16, %__ASM_REG(di)
+	decl %ecx
+	jnz  1b
+	mfence
+	/* Clear sequence end */
+	add  $672, %_ASM_SP
+	movdqu (%_ASM_SP), %xmm0
+	movdqu 1*16(%_ASM_SP), %xmm1
+	mov  %__ASM_REG(si),%_ASM_SP
+	pop %__ASM_REG(cx)
+	pop %__ASM_REG(di)
+	pop %__ASM_REG(si)
+	ret
+END(ivb_clear_cpu)
+
+/*
+ * Version for Haswell/Broadwell.
+ */
+ENTRY(hsw_clear_cpu)
+	push %__ASM_REG(si)
+	push %__ASM_REG(di)
+	push %__ASM_REG(cx)
+	push %__ASM_REG(ax)
+	mov  %_ASM_SP, %__ASM_REG(ax)
+	sub  $16, %_ASM_SP
+	and  $-16,%_ASM_SP
+	movdqa %xmm0, (%_ASM_SP)
+	/* Clear sequence start */
+	xorpd %xmm0,%xmm0
+	sub  $1536,%_ASM_SP
+	mov  %_ASM_SP, %__ASM_REG(si)
+	mov  %__ASM_REG(si), %__ASM_REG(di)
+	movl $40,%ecx
+1:	movntdq %xmm0, (%__ASM_REG(di))
+	add  $16, %__ASM_REG(di)
+	decl %ecx
+	jnz  1b
+	mfence
+	mov  %__ASM_REG(si), %__ASM_REG(di)
+	mov $1536, %ecx
+	rep movsb
+	lfence
+	/* Clear sequence end */
+	add $1536,%_ASM_SP
+	movdqa (%_ASM_SP), %xmm0
+	mov %__ASM_REG(ax),%_ASM_SP
+	pop %__ASM_REG(ax)
+	pop %__ASM_REG(cx)
+	pop %__ASM_REG(di)
+	pop %__ASM_REG(si)
+	ret
+END(hsw_clear_cpu)
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [MODERATED] [PATCH v2 6/8] MDSv2 3
  2018-12-10 17:53 [MODERATED] [PATCH v2 0/8] MDSv2 8 Andi Kleen
                   ` (4 preceding siblings ...)
  2018-12-10 17:53 ` [MODERATED] [PATCH v2 5/8] MDSv2 7 Andi Kleen
@ 2018-12-10 17:53 ` Andi Kleen
  2018-12-11  0:37   ` [MODERATED] " Andrew Cooper
  2018-12-10 17:53 ` [MODERATED] [PATCH v2 7/8] MDSv2 6 Andi Kleen
  2018-12-10 17:53 ` [MODERATED] [PATCH v2 8/8] MDSv2 2 Andi Kleen
  7 siblings, 1 reply; 59+ messages in thread
From: Andi Kleen @ 2018-12-10 17:53 UTC (permalink / raw)
  To: speck; +Cc: Andi Kleen

From: Andi Kleen <ak@linux.intel.com>
Subject:  x86/speculation/mds: Force MDS_EXIT on paranoid exit
 for sw sequence

When we use a software sequence for clearing CPU buffers an NMI
or similar interrupt could interrupt the clearing sequence.
In this case make sure we really flush by always doing the extra
clearing on paranoid interrupt exit.

This is only needed for the software sequence because VERW
is an instruction that cannot be interrupted.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/entry/entry_64.S | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 0bb7bb3dc728..350216af4ff7 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1211,6 +1211,14 @@ ENTRY(paranoid_exit)
 	jmp	.Lparanoid_exit_restore
 .Lparanoid_exit_no_swapgs:
 	TRACE_IRQS_IRETQ_DEBUG
+	/*
+	 * Always do MDS clear in case we're racing with a MDS clear
+	 * software sequence on kernel exit.
+	 * Only needed if MB_CLEAR is not available, because VERW is atomic.
+	 */
+	ALTERNATIVE "", "jmp 1f", X86_FEATURE_MB_CLEAR
+	EXIT_MDS
+1:
 	/* Always restore stashed CR3 value (see paranoid_entry) */
 	RESTORE_CR3	scratch_reg=%rbx save_reg=%r14
 .Lparanoid_exit_restore:
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [MODERATED] [PATCH v2 7/8] MDSv2 6
  2018-12-10 17:53 [MODERATED] [PATCH v2 0/8] MDSv2 8 Andi Kleen
                   ` (5 preceding siblings ...)
  2018-12-10 17:53 ` [MODERATED] [PATCH v2 6/8] MDSv2 3 Andi Kleen
@ 2018-12-10 17:53 ` Andi Kleen
  2018-12-10 17:53 ` [MODERATED] [PATCH v2 8/8] MDSv2 2 Andi Kleen
  7 siblings, 0 replies; 59+ messages in thread
From: Andi Kleen @ 2018-12-10 17:53 UTC (permalink / raw)
  To: speck; +Cc: Andi Kleen

CPU buffers need to be cleared before entering a guest.
For VERW based cpu clearing we rely on the L1 cache flush for L1TF
doing it implicitely.

When using software sequences this is not done, so in this case
need to do call the software sequence explicitely.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/include/asm/nospec-branch.h |  1 +
 arch/x86/kernel/cpu/bugs.c           | 13 ++++++++++---
 arch/x86/kvm/vmx.c                   |  8 ++++++++
 3 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 3a2843062dfb..79c5032a4d40 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -389,6 +389,7 @@ do {								\
 
 #ifndef __ASSEMBLY__
 void clear_cpu_buffers_idle(void);
+void clear_cpu_buffers(void);
 #endif
 
 #endif /* _ASM_X86_NOSPEC_BRANCH_H_ */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 91619f7d90be..75a1d2eb17d5 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1107,10 +1107,8 @@ static void mds_select_mitigation(void)
  * get cleared. There are some callers who violate this,
  * but they only in very uncommon cases.
  */
-void clear_cpu_buffers_idle(void)
+void clear_cpu_buffers(void)
 {
-	if (cpu_smt_control != CPU_SMT_ENABLED)
-		return;
 	/* Has to be memory form, don't modify to use an register */
 	alternative_input_2("",
 		"push %[kernelds]; verw (%%" _ASM_SP ") ; add $8,%%" _ASM_SP "\n",
@@ -1120,6 +1118,15 @@ void clear_cpu_buffers_idle(void)
 		[kernelds] "i" (__KERNEL_DS));
 }
 
+EXPORT_SYMBOL(clear_cpu_buffers);
+
+void clear_cpu_buffers_idle(void)
+{
+	if (cpu_smt_control != CPU_SMT_ENABLED)
+		return;
+	clear_cpu_buffers();
+}
+
 EXPORT_SYMBOL(clear_cpu_buffers_idle);
 
 #ifdef CONFIG_SYSFS
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 02edd9960e9d..75c69f799a54 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10680,6 +10680,14 @@ static void vmx_l1d_flush(struct kvm_vcpu *vcpu)
 
 	vcpu->stat.l1d_flush++;
 
+	/*
+	 * When the CPU has MB_CLEAR the cpu buffers flush is done implicitely
+	 * by the L1D_FLUSH below. But if software sequences are used
+	 * we need to call them explicitely.
+	 */
+	if (boot_cpu_has(X86_BUG_MDS) && !boot_cpu_has(X86_FEATURE_MB_CLEAR))
+		clear_cpu_buffers();
+
 	if (static_cpu_has(X86_FEATURE_FLUSH_L1D)) {
 		wrmsrl(MSR_IA32_FLUSH_CMD, L1D_FLUSH);
 		return;
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [MODERATED] [PATCH v2 8/8] MDSv2 2
  2018-12-10 17:53 [MODERATED] [PATCH v2 0/8] MDSv2 8 Andi Kleen
                   ` (6 preceding siblings ...)
  2018-12-10 17:53 ` [MODERATED] [PATCH v2 7/8] MDSv2 6 Andi Kleen
@ 2018-12-10 17:53 ` Andi Kleen
  7 siblings, 0 replies; 59+ messages in thread
From: Andi Kleen @ 2018-12-10 17:53 UTC (permalink / raw)
  To: speck; +Cc: Andi Kleen

Clear buffers on kernel exit for MDS mitigation on 32bit kernels.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/entry/entry_32.S | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index d309f30cf7af..de03a0d8426c 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -921,6 +921,8 @@ ENTRY(entry_SYSENTER_32)
 	popl	%edi			/* pt_regs->di */
 	popl	%ebp			/* pt_regs->bp */
 
+	EXIT_MDS
+
 	/* Switch to entry stack */
 	movl	%eax, %esp
 
@@ -1006,10 +1008,12 @@ ENTRY(entry_INT80_32)
 
 restore_all:
 	TRACE_IRQS_IRET
+	EXIT_MDS
 	SWITCH_TO_ENTRY_STACK
 .Lrestore_all_notrace:
 	CHECK_AND_APPLY_ESPFIX
 .Lrestore_nocheck:
+
 	/* Switch back to user CR3 */
 	SWITCH_TO_USER_CR3 scratch_reg=%eax
 
@@ -1028,6 +1032,7 @@ restore_all:
 restore_all_kernel:
 	TRACE_IRQS_IRET
 	PARANOID_EXIT_TO_KERNEL_MODE
+	EXIT_MDS
 	BUG_IF_WRONG_CR3
 	RESTORE_REGS 4
 	jmp	.Lirq_return
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 2/8] MDSv2 1
  2018-12-10 17:53 ` [MODERATED] [PATCH v2 2/8] MDSv2 1 Andi Kleen
@ 2018-12-10 22:49   ` Jiri Kosina
  2018-12-11  0:03     ` Andi Kleen
  2018-12-11  0:13     ` Kanth Ghatraju
  0 siblings, 2 replies; 59+ messages in thread
From: Jiri Kosina @ 2018-12-10 22:49 UTC (permalink / raw)
  To: speck

On Mon, 10 Dec 2018, speck for Andi Kleen wrote:

> From: Andi Kleen <ak@linux.intel.com>
> Subject:  x86/speculation/mds: Clear CPU buffers on kernel exit
>  for 64bit
[ ... snip ... ]
> +.macro EXIT_MDS
> +	/* Clear CPU buffers that could leak. Instruction must be in memory form. */
> +	ALTERNATIVE_2 "", __stringify(push $__USER_DS ; verw (% _ASM_SP ) ; add $8, % _ASM_SP ),\
> +		X86_FEATURE_MB_CLEAR, \
> +		"call do_clear_cpu", \
> +		X86_BUG_MDS_CLEAR_CPU

The software clearing sequence is introduced later in the series, so this 
seems not to be bisectable.

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 3/8] MDSv2 5
  2018-12-10 17:53 ` [MODERATED] [PATCH v2 3/8] MDSv2 5 Andi Kleen
@ 2018-12-10 23:00   ` Linus Torvalds
  2018-12-11  0:03     ` Andi Kleen
  2018-12-11  0:09     ` Andrew Cooper
  0 siblings, 2 replies; 59+ messages in thread
From: Linus Torvalds @ 2018-12-10 23:00 UTC (permalink / raw)
  To: speck

Honestly, this looks completely bogus.

On Mon, Dec 10, 2018 at 2:37 PM speck for Andi Kleen
<speck@linutronix.de> wrote:
>
> +void clear_cpu_buffers_idle(void)
> +{
> +       if (cpu_smt_control != CPU_SMT_ENABLED)
> +               return;
> +       /* Has to be memory form, don't modify to use an register */
> +       alternative_input("",
> +               "pushq %[kernelds]; verw (%%rsp) ; addq $8,%%rsp \n",
> +               X86_FEATURE_MB_CLEAR,
> +               [kernelds] "i" (__KERNEL_DS));
> +}

So you have a non-inline function that

 - has a test to return early

 - does a *single* instruction

 - the instruction gets nop'ed out if it's not valid

all of which just *screams* "that's wrong" to me. Why isn't it inline
and a static branch?

The actual asm sequence also seems bad to me.

Why isn't that just using

        int val = __KERNEL_DS;

with the asm just doing

        "verw %[kernelds]" ..  [kernelds] "m" (val)

instead, letting gcc do the stack setup part.

Maybe none of this matters simply because it's in the idle case, but
the exact same issues happen for the kernel exit case.

And no, we're not doing that insane kernel exit software case by
default. Not without a whole lot of examples of horrid badness.

The data that system calls touch is basically already user data. Sure,
there may be kernel pointers etc there, but we've already accepted
those leaking locally for all the usual reasons that we have no
control over.

So the whole "let's add crazy long sequences to every kernel exit" is
not going to happen. Not without a lot more explanations.

                   Linus

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 3/8] MDSv2 5
  2018-12-10 23:00   ` [MODERATED] " Linus Torvalds
@ 2018-12-11  0:03     ` Andi Kleen
  2018-12-11  0:43       ` Linus Torvalds
  2018-12-11  0:09     ` Andrew Cooper
  1 sibling, 1 reply; 59+ messages in thread
From: Andi Kleen @ 2018-12-11  0:03 UTC (permalink / raw)
  To: speck


Hi Linus,

> So you have a non-inline function that
> 
>  - has a test to return early
> 
>  - does a *single* instruction
> 
>  - the instruction gets nop'ed out if it's not valid
> 
> all of which just *screams* "that's wrong" to me. Why isn't it inline
> and a static branch?

I tried inline originally, but it caused bad include loops with alternative.h
and mwait.h.

It could be done, but would need some restructuring in the includes.

I don't think it's worth it because going into idle 
isn't really that performance sensitive. Also if the
side effect is there its overhead really swamps any 
call overhead.

alternative already handles the noping, so I don't think we would
need a static branch, unless you want to toggle it at runtime?

> 
> The actual asm sequence also seems bad to me.
> 
> Why isn't that just using
> 
>         int val = __KERNEL_DS;
> 
> with the asm just doing
> 
>         "verw %[kernelds]" ..  [kernelds] "m" (val)
> 
> instead, letting gcc do the stack setup part.

It has to be the memory form, only that has the necessary side effect
in the microcode.

> 
> Maybe none of this matters simply because it's in the idle case, but
> the exact same issues happen for the kernel exit case.

kernel exit doesn't use the C code, it's custom assembler.

The only case where it may matter is KVM entry, but that is
only used with the software sequence, and that is quite
expensive anyways.

> 
> And no, we're not doing that insane kernel exit software case by
> default. Not without a whole lot of examples of horrid badness.
> 
> The data that system calls touch is basically already user data. Sure,
> there may be kernel pointers etc there, but we've already accepted
> those leaking locally for all the usual reasons that we have no
> control over.

It's encryption keys etc. too. But yes.

Another case is interrupt data -- for example the TCP stack
sometimes copies packet data.  Ok you could argue they 
should encrypt on the wire, but it still seems bogus 
to leak it.

Then if there are drivers which copy data in interrupts
they may also leak, but presumably that's rare these
days.

> 
> So the whole "let's add crazy long sequences to every kernel exit" is
> not going to happen. Not without a lot more explanations.

So you would prefer to only flush on context switch, and perhaps
for kernel crypto code and perhaps TCP stack if it copied anything?

One issue here is that to be really sure there no such other
cases it would need some way to audit all the code. I'm not 
sure there's a practical way to do such an audit.

FWIW from our tests so far the performance loss from the kernel exit overhead
doesn't seem to be that bad.

-Andi

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 2/8] MDSv2 1
  2018-12-10 22:49   ` [MODERATED] " Jiri Kosina
@ 2018-12-11  0:03     ` Andi Kleen
  2018-12-11  0:13     ` Kanth Ghatraju
  1 sibling, 0 replies; 59+ messages in thread
From: Andi Kleen @ 2018-12-11  0:03 UTC (permalink / raw)
  To: speck

On Mon, Dec 10, 2018 at 11:49:43PM +0100, speck for Jiri Kosina wrote:
> On Mon, 10 Dec 2018, speck for Andi Kleen wrote:
> 
> > From: Andi Kleen <ak@linux.intel.com>
> > Subject:  x86/speculation/mds: Clear CPU buffers on kernel exit
> >  for 64bit
> [ ... snip ... ]
> > +.macro EXIT_MDS
> > +	/* Clear CPU buffers that could leak. Instruction must be in memory form. */
> > +	ALTERNATIVE_2 "", __stringify(push $__USER_DS ; verw (% _ASM_SP ) ; add $8, % _ASM_SP ),\
> > +		X86_FEATURE_MB_CLEAR, \
> > +		"call do_clear_cpu", \
> > +		X86_BUG_MDS_CLEAR_CPU
> 
> The software clearing sequence is introduced later in the series, so this 
> seems not to be bisectable.

Yes good point. Looks like I mismerged a hunk. Will fix.

-Andi

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 3/8] MDSv2 5
  2018-12-10 23:00   ` [MODERATED] " Linus Torvalds
  2018-12-11  0:03     ` Andi Kleen
@ 2018-12-11  0:09     ` Andrew Cooper
  1 sibling, 0 replies; 59+ messages in thread
From: Andrew Cooper @ 2018-12-11  0:09 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 1879 bytes --]

On 10/12/2018 23:00, speck for Linus Torvalds wrote:
> The data that system calls touch is basically already user data. Sure,
> there may be kernel pointers etc there, but we've already accepted
> those leaking locally for all the usual reasons that we have no
> control over.
>
> So the whole "let's add crazy long sequences to every kernel exit" is
> not going to happen. Not without a lot more explanations.

The load buffers and fill buffers are very unlikely to have interesting
data in by the time you return to userspace, because of normal things we
do on the exit path.

It is the store buffers which are the problem case.

At some future point, e.g. when a userspace store takes an assist
(setting A/D bits is the example given, but there are other cases which
manifest this behaviour), a subsequent load-to-store-forward can leak
the stale contents of store buffer entry.

The store buffers are as wide as the vector pipeline, so while the lower
64 bits are almost certainly clobbered, the upper bits will be from the
last vector operation which was allocated to this store buffer.  As
XSAVE doesn't use all the store buffer entries, and doesn't make
uniformly-wide writes, the upper bits of the store buffer could be from
several context switches ago.

In the case of using SIMD-accelerated crypto, you've got a chance of
being able to find plaintext in the upper bits of the store buffer.

I can't speak to how easy this is to exploit in practice (I've not tried
yet), or what else perturbs the store buffers (apparently on some CPUs,
rep operations with a sufficient %ecx will use the entire width of the
store buffer), but there is a plausible risk that you really can get
interesting data from the store buffers.  That said, I expect the chance
is somewhere between rare and astronomical outside of a demo setup.

~Andrew


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 2/8] MDSv2 1
  2018-12-10 22:49   ` [MODERATED] " Jiri Kosina
  2018-12-11  0:03     ` Andi Kleen
@ 2018-12-11  0:13     ` Kanth Ghatraju
  2018-12-11  2:00       ` Andi Kleen
                         ` (2 more replies)
  1 sibling, 3 replies; 59+ messages in thread
From: Kanth Ghatraju @ 2018-12-11  0:13 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 1174 bytes --]



> On Dec 10, 2018, at 5:49 PM, speck for Jiri Kosina <speck@linutronix.de> wrote:
> 
> On Mon, 10 Dec 2018, speck for Andi Kleen wrote:
> 
>> From: Andi Kleen <ak@linux.intel.com>
>> Subject:  x86/speculation/mds: Clear CPU buffers on kernel exit
>> for 64bit
> [ ... snip ... ]
>> +.macro EXIT_MDS
>> +	/* Clear CPU buffers that could leak. Instruction must be in memory form. */
>> +	ALTERNATIVE_2 "", __stringify(push $__USER_DS ; verw (% _ASM_SP ) ; add $8, % _ASM_SP ),\
>> +		X86_FEATURE_MB_CLEAR, \
>> +		"call do_clear_cpu", \
>> +		X86_BUG_MDS_CLEAR_CPU
> 
> The software clearing sequence is introduced later in the series, so this
> seems not to be bisectable.
> 
> Thanks,
> 
> --
> Jiri Kosina
> SUSE Labs

I am forwarding some additional comments from Jamie Iles:

- altinstructions are probably not what we want at all for any of these
mitigations.

If you have a patched kernel but old microcode and then do a late load
of the microcode, you have already thrown away the alternatives by that
point and you'll show as being mitigated but in fact have nops all over
the place.  A static branch would solve this.

Jamie


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 5/8] MDSv2 7
  2018-12-10 17:53 ` [MODERATED] [PATCH v2 5/8] MDSv2 7 Andi Kleen
@ 2018-12-11  0:33   ` Andrew Cooper
  2018-12-12 18:05     ` Andrew Cooper
  2018-12-12 21:41   ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 59+ messages in thread
From: Andrew Cooper @ 2018-12-11  0:33 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 4005 bytes --]

On 10/12/2018 17:53, speck for Andi Kleen wrote:
> xdiff --git a/arch/x86/lib/clear_cpu.S b/arch/x86/lib/clear_cpu.S
> new file mode 100644
> index 000000000000..5af33baf5427
> --- /dev/null
> +++ b/arch/x86/lib/clear_cpu.S
> @@ -0,0 +1,107 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#include <linux/linkage.h>
> +#include <asm/alternative-asm.h>
> +#include <asm/cpufeatures.h>
> +
> +/*
> + * Clear internal CPU buffers on kernel boundaries.
> + *
> + * These sequences are somewhat fragile, please don't add
> + * or change instructions in the middle of the areas marked with
> + * start/end.
> + *
> + * Interrupts and NMIs we deal with by reclearing. We clear parts
> + * of the kernel stack, which has other advantages too.
> + *
> + * Save all registers to make it easier to use for callers.
> + *
> + * This sequence is for Nehalem-IvyBridge. For Haswell we jump
> + * to hsw_clear_buf.
> + *
> + * These functions need to be called on a full stack, as they may
> + * use upto 1.5k of stack. They should be also called with
> + * interrupts disabled. NMIs etc. are handled by letting every
> + * NMI do its own clear sequence.
> + */
> +ENTRY(ivb_clear_cpu)
> +GLOBAL(do_clear_cpu)
> +	/*
> +	 * obj[tf]ool complains about unreachable code here,
> +	 * which appears to be spurious?!?
> +	 */
> +	ALTERNATIVE "", "jmp hsw_clear_cpu", X86_BUG_MDS_CLEAR_CPU_HSW
> +	push %__ASM_REG(si)
> +	push %__ASM_REG(di)
> +	push %__ASM_REG(cx)
> +	mov %_ASM_SP, %__ASM_REG(si)
> +	sub  $2*16, %_ASM_SP
> +	and  $-16,%_ASM_SP
> +	movdqa %xmm0, (%_ASM_SP)
> +	movdqa %xmm1, 1*16(%_ASM_SP)

You don't need to preserve %xmm1 here.  It is unmodified by the
sequence, because the orpd pulls zero out of its memory operand. 
Similarly...

> +	sub  $672, %_ASM_SP
> +	xorpd %xmm0,%xmm0
> +	movdqa %xmm0, (%_ASM_SP)
> +	movdqa %xmm0, 16(%_ASM_SP)

... this store doesn't appear to do anything useful, as that stack slot
isn't read again, and...

> +	mov %_ASM_SP, %__ASM_REG(di)
> +	/* Clear sequence start */
> +	movdqu %xmm0,(%__ASM_REG(di))
> +	lfence
> +	orpd (%__ASM_REG(di)), %xmm0
> +	orpd (%__ASM_REG(di)), %xmm1
> +	mfence
> +	movl $40, %ecx
> +	add  $32, %__ASM_REG(di)

... I know this was in the recommended sequence, but bytes 16-31 aren't
used at all, and it feels fishy.

Either this wants to be $16, or the second orpd wants a displacement of
16 (and we do need to retain the second zeroing write) so all 32 bytes
are used.

> +1:	movntdq %xmm0, (%__ASM_REG(di))
> +	add  $16, %__ASM_REG(di)
> +	decl %ecx
> +	jnz  1b
> +	mfence
> +	/* Clear sequence end */
> +	add  $672, %_ASM_SP
> +	movdqu (%_ASM_SP), %xmm0
> +	movdqu 1*16(%_ASM_SP), %xmm1
> +	mov  %__ASM_REG(si),%_ASM_SP
> +	pop %__ASM_REG(cx)
> +	pop %__ASM_REG(di)
> +	pop %__ASM_REG(si)
> +	ret
> +END(ivb_clear_cpu)
> +
> +/*
> + * Version for Haswell/Broadwell.
> + */
> +ENTRY(hsw_clear_cpu)
> +	push %__ASM_REG(si)
> +	push %__ASM_REG(di)
> +	push %__ASM_REG(cx)
> +	push %__ASM_REG(ax)
> +	mov  %_ASM_SP, %__ASM_REG(ax)
> +	sub  $16, %_ASM_SP
> +	and  $-16,%_ASM_SP
> +	movdqa %xmm0, (%_ASM_SP)
> +	/* Clear sequence start */
> +	xorpd %xmm0,%xmm0
> +	sub  $1536,%_ASM_SP
> +	mov  %_ASM_SP, %__ASM_REG(si)
> +	mov  %__ASM_REG(si), %__ASM_REG(di)

Strictly speaking, these 3 instructions aren't part of the clear
sequence.  They are just for setting up the buffer.  In particular, sub
$1536 %rsp is unbalanced WRT its matching add.

~Andrew

> +	movl $40,%ecx
> +1:	movntdq %xmm0, (%__ASM_REG(di))
> +	add  $16, %__ASM_REG(di)
> +	decl %ecx
> +	jnz  1b
> +	mfence
> +	mov  %__ASM_REG(si), %__ASM_REG(di)
> +	mov $1536, %ecx
> +	rep movsb
> +	lfence
> +	/* Clear sequence end */
> +	add $1536,%_ASM_SP
> +	movdqa (%_ASM_SP), %xmm0
> +	mov %__ASM_REG(ax),%_ASM_SP
> +	pop %__ASM_REG(ax)
> +	pop %__ASM_REG(cx)
> +	pop %__ASM_REG(di)
> +	pop %__ASM_REG(si)
> +	ret
> +END(hsw_clear_cpu)



^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 6/8] MDSv2 3
  2018-12-10 17:53 ` [MODERATED] [PATCH v2 6/8] MDSv2 3 Andi Kleen
@ 2018-12-11  0:37   ` Andrew Cooper
  2018-12-11  0:46     ` Luck, Tony
  0 siblings, 1 reply; 59+ messages in thread
From: Andrew Cooper @ 2018-12-11  0:37 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 839 bytes --]

On 10/12/2018 17:53, speck for Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> Subject:  x86/speculation/mds: Force MDS_EXIT on paranoid exit
>  for sw sequence
>
> When we use a software sequence for clearing CPU buffers an NMI
> or similar interrupt could interrupt the clearing sequence.
> In this case make sure we really flush by always doing the extra
> clearing on paranoid interrupt exit.
>
> This is only needed for the software sequence because VERW
> is an instruction that cannot be interrupted.

Interrupting the middle of the software sequence is only one half of the
problem.

The other half is when an NMI/#MC/etc hits on the return to guest path
after executing VERW, at which point you've just refilled all the
buffers between trying to clear them, and returning to userspace.

~Andrew


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 3/8] MDSv2 5
  2018-12-11  0:03     ` Andi Kleen
@ 2018-12-11  0:43       ` Linus Torvalds
  2018-12-11  1:33         ` Linus Torvalds
  2018-12-11  2:10         ` Andi Kleen
  0 siblings, 2 replies; 59+ messages in thread
From: Linus Torvalds @ 2018-12-11  0:43 UTC (permalink / raw)
  To: speck

On Mon, Dec 10, 2018, 16:03 speck for Andi Kleen <speck@linutronix.de wrote:

>
>
> I don't think it's worth it because going into idle
> isn't really that performance sensitive. Also if the
> side effect is there its overhead really swamps any
> call overhead.
>

It's exactly the  "it's not even there" overhead we should look at.

alternative already handles the noping, so I don't think we would
> need a static branch, unless you want to toggle it at runtime?
>

The whole point is that you're adding a conditional branch and a call for
something that does NOTHING!

>         "verw %[kernelds]" ..  [kernelds] "m" (val)
> >
> > instead, letting gcc do the stack setup part.
>
> It has to be the memory form, only that has the necessary side effect
> in the microcode.
>

That's exactly what the "m" there does.

kernel exit doesn't use the C code, it's custom assembler.


Yes, and that custom assembler has that disgusting call to the SW version
that the code then enables by default.

It's encryption keys etc. too. But yes.
>


I think the encryption key case could easily have a "let's scrub cpu state"
thing.

It already does other strange things, like memset_safe() or whatever it is
that forces a memset even if the compiler decides it's dead.

FWIW from our tests so far the performance loss from the kernel exit
> overhead
> doesn't seem to be that bad.
>

I haven't seen any numbers, and whilei can believe it's true for the verw
case if there is hw acceleration, I doubt the SW case isn't noticeable.

Plus we've already seen that people have been way too eager to apply
patches just because there is alleged security implications, without doing
any kind of risk vs cost analysis.

      Linus

>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 6/8] MDSv2 3
  2018-12-11  0:37   ` [MODERATED] " Andrew Cooper
@ 2018-12-11  0:46     ` Luck, Tony
  2018-12-11  1:02       ` Andrew Cooper
  2018-12-11  1:53       ` Andi Kleen
  0 siblings, 2 replies; 59+ messages in thread
From: Luck, Tony @ 2018-12-11  0:46 UTC (permalink / raw)
  To: speck

On Tue, Dec 11, 2018 at 12:37:49AM +0000, speck for Andrew Cooper wrote:
> On 10/12/2018 17:53, speck for Andi Kleen wrote:
> > From: Andi Kleen <ak@linux.intel.com>
> Interrupting the middle of the software sequence is only one half of the
> problem.
> 
> The other half is when an NMI/#MC/etc hits on the return to guest path
> after executing VERW, at which point you've just refilled all the
> buffers between trying to clear them, and returning to userspace.

NMI would seem to be the only exploitable option (since user might
user perf to arrange an NMI in this window ... user can't force #MC
or SMI on command).

Would NMI fill the microarchitectural buffers with secrets?

-Tony

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 6/8] MDSv2 3
  2018-12-11  0:46     ` Luck, Tony
@ 2018-12-11  1:02       ` Andrew Cooper
  2018-12-11  1:53       ` Andi Kleen
  1 sibling, 0 replies; 59+ messages in thread
From: Andrew Cooper @ 2018-12-11  1:02 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 1223 bytes --]

On 11/12/2018 00:46, speck for Luck, Tony wrote:
> On Tue, Dec 11, 2018 at 12:37:49AM +0000, speck for Andrew Cooper wrote:
>> On 10/12/2018 17:53, speck for Andi Kleen wrote:
>>> From: Andi Kleen <ak@linux.intel.com>
>> Interrupting the middle of the software sequence is only one half of the
>> problem.
>>
>> The other half is when an NMI/#MC/etc hits on the return to guest path
>> after executing VERW, at which point you've just refilled all the
>> buffers between trying to clear them, and returning to userspace.
> NMI would seem to be the only exploitable option (since user might
> user perf to arrange an NMI in this window ... user can't force #MC
> or SMI on command).

There are plenty of indirect ways to generate an SMI, and SMIs handlers
all need to rendezvous to play this state game.

Given a very quick system call that you can call very frequently, its
feasible to create a race where it is the rendezvous SMI which hits in
the exit path.

> Would NMI fill the microarchitectural buffers with secrets?

That's a very different question, and might be ok to document our way
out of, but at the end of the day does depend on what code is hooked off
the NMI handler.

~Andrew


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 3/8] MDSv2 5
  2018-12-11  0:43       ` Linus Torvalds
@ 2018-12-11  1:33         ` Linus Torvalds
  2018-12-11  2:12           ` Andi Kleen
  2018-12-11  2:20           ` Linus Torvalds
  2018-12-11  2:10         ` Andi Kleen
  1 sibling, 2 replies; 59+ messages in thread
From: Linus Torvalds @ 2018-12-11  1:33 UTC (permalink / raw)
  To: speck

On Mon, Dec 10, 2018 at 4:43 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I think the encryption key case could easily have a "let's scrub cpu state" thing.

In fact, maybe we could have something like a TIF_SCRUBME thread flag,
which we react to on return to user space, and on context switch.

That way anybody who thinks they are really working with sensitive
information could just set the flag. We could set it automatically in
kzfree() and memzero_explicit().

Using a TIF_xyz flag would seem to be very natural. Sure, there's a
window between the "handle TIF_xyz" flags and the actual return to
user space, but nothing in that window should touch anything remotely
sensitive. It's mainly just restoring user state.

                   Linus

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 6/8] MDSv2 3
  2018-12-11  0:46     ` Luck, Tony
  2018-12-11  1:02       ` Andrew Cooper
@ 2018-12-11  1:53       ` Andi Kleen
  1 sibling, 0 replies; 59+ messages in thread
From: Andi Kleen @ 2018-12-11  1:53 UTC (permalink / raw)
  To: speck

On Mon, Dec 10, 2018 at 04:46:22PM -0800, speck for Luck, Tony wrote:
> On Tue, Dec 11, 2018 at 12:37:49AM +0000, speck for Andrew Cooper wrote:
> > On 10/12/2018 17:53, speck for Andi Kleen wrote:
> > > From: Andi Kleen <ak@linux.intel.com>
> > Interrupting the middle of the software sequence is only one half of the
> > problem.
> > 
> > The other half is when an NMI/#MC/etc hits on the return to guest path
> > after executing VERW, at which point you've just refilled all the
> > buffers between trying to clear them, and returning to userspace.
> 
> NMI would seem to be the only exploitable option (since user might
> user perf to arrange an NMI in this window ... user can't force #MC
> or SMI on command).
> 
> Would NMI fill the microarchitectural buffers with secrets?

The problem is not NMI filling (afaik it doesn't), but NMI preventing the
proper clearing.

-Andi

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 2/8] MDSv2 1
  2018-12-11  0:13     ` Kanth Ghatraju
@ 2018-12-11  2:00       ` Andi Kleen
  2018-12-11  5:36       ` Jiri Kosina
  2018-12-11 10:03       ` Borislav Petkov
  2 siblings, 0 replies; 59+ messages in thread
From: Andi Kleen @ 2018-12-11  2:00 UTC (permalink / raw)
  To: speck

> - altinstructions are probably not what we want at all for any of these
> mitigations.
> 
> If you have a patched kernel but old microcode and then do a late load
> of the microcode, you have already thrown away the alternatives by that
> point and you'll show as being mitigated but in fact have nops all over
> the place.  A static branch would solve this.

Nothing else in the Linux kernel handles changed CPUIDs from
a microcode load. The microcode loader even detects this case
and warns about it.

The assumption was that if you update the kernel you also update
the microcode, and you do it at boot.

If you do hot update, you are unlikely to have the new patched
kernel anyways, and if you try to hot patch the hooks in too
you can handle it in your patch.

But I don't see any point in complicating the upstream kernel
for such hotpatch only scenarios.

-Andi

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 3/8] MDSv2 5
  2018-12-11  0:43       ` Linus Torvalds
  2018-12-11  1:33         ` Linus Torvalds
@ 2018-12-11  2:10         ` Andi Kleen
  1 sibling, 0 replies; 59+ messages in thread
From: Andi Kleen @ 2018-12-11  2:10 UTC (permalink / raw)
  To: speck

On Mon, Dec 10, 2018 at 04:43:56PM -0800, speck for Linus Torvalds wrote:

Ok I will inline it.

FWIW the overhead is currently ~7 cycles on SKL, see [1]



>      It's encryption keys etc. too. But yes.
> 
>    I think the encryption key case could easily have a "let's scrub cpu
>    state" thing.
>    It already does other strange things, like memset_safe() or whatever it is
>    that forces a memset even if the compiler decides it's dead.

But the problem is how do we find all cases where someone else's data
is touched? 

Even if I write a patch for the known to me cases I could
never guarantee I found all.

FWIW I suspect crypto is actually not that big an issue because
most uses should be in own threads, which would be handled 
by the context switch flush. But at least softirqs/timers copying
some user data is a real danger.

> 
>      FWIW from our tests so far the performance loss from the kernel exit
>      overhead
>      doesn't seem to be that bad.
> 
>    I haven't seen any numbers, and whilei can believe it's true for the verw
>    case if there is hw acceleration, I doubt the SW case isn't noticeable.
>    Plus we've already seen that people have been way too eager to apply
>    patches just because there is alleged security implications, without doing
>    any kind of risk vs cost analysis.

My understanding is that while this is much harder to exploit than L1TF or
Meltdown, there are working exploits.

-Andi


[1]


        ffffffff818f5946                        callq  0xffffffff81091960               # PRED 1 cycles [507]
        clear_cpu_buffers_idle:
        ffffffff81091960                        nopl  %eax, (%rax,%rax,1) 
        ffffffff81091965                        movl  0x12a3ab9(%rip), %eax 
        ffffffff8109196b                        test %eax, %eax 
        ffffffff8109196d                        jz 0xffffffff81091970                           # PRED 1 cycles [508] 3.00 IPC
        ffffffff81091970                        pushq  %rbp 
        ffffffff81091971                        mov %rsp, %rbp 
        ffffffff81091974                        callq  0xffffffff81091940               # PRED 1 cycles [509] 2.00 IPC

(it would help if we didn't disable tail calls with debug info I guess ...) 

        clear_cpu_buffers:
        ffffffff81091940                        nopl  %eax, (%rax,%rax,1) 
        ffffffff81091945                        pushq  %rbp 
        ffffffff81091946                        mov %rsp, %rbp 
        ffffffff81091949                        nopl  %eax, (%rax,%rax,1) 
        ffffffff81091951                        data16 nop  
        ffffffff81091953                        popq  %rbp 
        ffffffff81091954                        retq                            # PRED 4 cycles [513] 1.50 IPC
        clear_cpu_buffers_idle+25:
        ffffffff81091979                        popq  %rbp 
        ffffffff8109197a                        retq                            # PRED 1 cycles [514] 1.00 IPC

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 3/8] MDSv2 5
  2018-12-11  1:33         ` Linus Torvalds
@ 2018-12-11  2:12           ` Andi Kleen
  2018-12-11  2:20           ` Linus Torvalds
  1 sibling, 0 replies; 59+ messages in thread
From: Andi Kleen @ 2018-12-11  2:12 UTC (permalink / raw)
  To: speck

On Mon, Dec 10, 2018 at 05:33:27PM -0800, speck for Linus Torvalds wrote:

Linus, for some reason that mail seemed to be SMIME encrypted?

Not sure if it was due to the mailing list setup.

At least it was so secure that mutt couldn't decrypt it.

-Andi

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 3/8] MDSv2 5
  2018-12-11  1:33         ` Linus Torvalds
  2018-12-11  2:12           ` Andi Kleen
@ 2018-12-11  2:20           ` Linus Torvalds
  2018-12-11  3:25             ` Andi Kleen
  1 sibling, 1 reply; 59+ messages in thread
From: Linus Torvalds @ 2018-12-11  2:20 UTC (permalink / raw)
  To: speck

Re-sending, because apparently the previous email was so well
encrypted that Andi couldn't even read it.

Oh well. I was so happy to get my s/mime setup fixed up and finally
talking to the speck list automatically and properly, but I was
apparently celebrating too early, so this downgrades to the earlier
(somewhat broken) setup.

I suspect Thomas could write a book about his frustrations by now.

                 Linus

On Mon, Dec 10, 2018 at 5:33 PM Linus Torvalds wrote:
>
> On Mon, Dec 10, 2018 at 4:43 PM Linus Torvalds wrote:
> >
> > I think the encryption key case could easily have a "let's scrub cpu state" thing.
>
> In fact, maybe we could have something like a TIF_SCRUBME thread flag,
> which we react to on return to user space, and on context switch.
>
> That way anybody who thinks they are really working with sensitive
> information could just set the flag. We could set it automatically in
> kzfree() and memzero_explicit().
>
> Using a TIF_xyz flag would seem to be very natural. Sure, there's a
> window between the "handle TIF_xyz" flags and the actual return to
> user space, but nothing in that window should touch anything remotely
> sensitive. It's mainly just restoring user state.
>
>                    Linus

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 3/8] MDSv2 5
  2018-12-11  2:20           ` Linus Torvalds
@ 2018-12-11  3:25             ` Andi Kleen
  2018-12-11 17:55               ` Linus Torvalds
  2018-12-12 14:02               ` [MODERATED] " Paolo Bonzini
  0 siblings, 2 replies; 59+ messages in thread
From: Andi Kleen @ 2018-12-11  3:25 UTC (permalink / raw)
  To: speck

On Mon, Dec 10, 2018 at 06:20:16PM -0800, speck for Linus Torvalds wrote:
> Re-sending, because apparently the previous email was so well
> encrypted that Andi couldn't even read it.
> 
> Oh well. I was so happy to get my s/mime setup fixed up and finally
> talking to the speck list automatically and properly, but I was
> apparently celebrating too early, so this downgrades to the earlier
> (somewhat broken) setup.
> 
> I suspect Thomas could write a book about his frustrations by now.
> 
>                  Linus
> 
> On Mon, Dec 10, 2018 at 5:33 PM Linus Torvalds wrote:
> >
> > On Mon, Dec 10, 2018 at 4:43 PM Linus Torvalds wrote:
> > >
> > > I think the encryption key case could easily have a "let's scrub cpu state" thing.
> >
> > In fact, maybe we could have something like a TIF_SCRUBME thread flag,
> > which we react to on return to user space, and on context switch.
> >
> > That way anybody who thinks they are really working with sensitive
> > information could just set the flag. We could set it automatically in
> > kzfree() and memzero_explicit().
> >
> > Using a TIF_xyz flag would seem to be very natural. Sure, there's a
> > window between the "handle TIF_xyz" flags and the actual return to
> > user space, but nothing in that window should touch anything remotely
> > sensitive. It's mainly just restoring user state.

Okay.

We'll have to disable interrupts in that window, but that
should be ok.

I guess it could be also set in some skb_* functions to catch
the network cases.

Or maybe setting when calling into unaudited code in interrupts/timers?

And the flag would allow a "paranoid" mode which sets
it unconditionally. 

-Andi

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 2/8] MDSv2 1
  2018-12-11  0:13     ` Kanth Ghatraju
  2018-12-11  2:00       ` Andi Kleen
@ 2018-12-11  5:36       ` Jiri Kosina
  2018-12-11 10:03       ` Borislav Petkov
  2 siblings, 0 replies; 59+ messages in thread
From: Jiri Kosina @ 2018-12-11  5:36 UTC (permalink / raw)
  To: speck

On Mon, 10 Dec 2018, speck for Kanth Ghatraju wrote:

> - altinstructions are probably not what we want at all for any of these
> mitigations.
> 
> If you have a patched kernel but old microcode and then do a late load
> of the microcode, you have already thrown away the alternatives by that
> point and you'll show as being mitigated but in fact have nops all over
> the place.  

Many other ucode-based mitigations for previous issues are in the same 
boat. Late ucode loading is more or less unsupported in these scenarios. 
Especially CPU caps handling is not really ready for dynamically changing 
feature values during runtime.

I would suggest not digging into that hole.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 2/8] MDSv2 1
  2018-12-11  0:13     ` Kanth Ghatraju
  2018-12-11  2:00       ` Andi Kleen
  2018-12-11  5:36       ` Jiri Kosina
@ 2018-12-11 10:03       ` Borislav Petkov
  2018-12-12 21:31         ` Konrad Rzeszutek Wilk
  2 siblings, 1 reply; 59+ messages in thread
From: Borislav Petkov @ 2018-12-11 10:03 UTC (permalink / raw)
  To: speck

On Mon, Dec 10, 2018 at 07:13:18PM -0500, speck for Kanth Ghatraju wrote:
> If you have a patched kernel but old microcode and then do a late load
> of the microcode,

And to extend what the others said: late microcode loading is a can
of worms which we shouldnt've opened back then. Especially since it
requires a complicated dance on the OS part to quiesce the cores to even
work without locking up the core or whatever crazy can happen.

So microcode update should be handled just like updating your kernel -
you need to reboot the machine.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 1/8] MDSv2 4
  2018-12-10 17:53 ` [MODERATED] [PATCH v2 1/8] MDSv2 4 Andi Kleen
@ 2018-12-11 14:14   ` Paolo Bonzini
  2018-12-12 21:22   ` Konrad Rzeszutek Wilk
  2018-12-12 21:25   ` Konrad Rzeszutek Wilk
  2 siblings, 0 replies; 59+ messages in thread
From: Paolo Bonzini @ 2018-12-11 14:14 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 545 bytes --]

On 10/12/18 18:53, speck for Andi Kleen wrote:
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -342,6 +342,7 @@
>  /* Intel-defined CPU features, CPUID level 0x00000007:0 (EDX), word 18 */
>  #define X86_FEATURE_AVX512_4VNNIW	(18*32+ 2) /* AVX-512 Neural Network Instructions */
>  #define X86_FEATURE_AVX512_4FMAPS	(18*32+ 3) /* AVX-512 Multiply Accumulation Single precision */
> +#define X86_FEATURE_MB_CLEAR		(18*32+10) /* Flush state on VERW */

KVM also needs the CPUID bit.

Paolo


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 3/8] MDSv2 5
  2018-12-11  3:25             ` Andi Kleen
@ 2018-12-11 17:55               ` Linus Torvalds
  2018-12-11 18:10                 ` Borislav Petkov
  2018-12-11 18:21                 ` Linus Torvalds
  2018-12-12 14:02               ` [MODERATED] " Paolo Bonzini
  1 sibling, 2 replies; 59+ messages in thread
From: Linus Torvalds @ 2018-12-11 17:55 UTC (permalink / raw)
  To: speck

[ Let's see if this comes through legibly ]

On Mon, Dec 10, 2018 at 7:27 PM speck for Andi Kleen
<speck@linutronix.de> wrote:
>
> > In fact, maybe we could have something like a TIF_SCRUBME thread flag,
> > which we react to on return to user space, and on context switch.
>
> We'll have to disable interrupts in that window, but that
> should be ok.

I don't think we even need to disable interrupts.

If an interrupt comes in while we're scrubbing, either that interrupt
will touch sensitive data (and set TIF_SCRUBME) or it won't.

If it *does* set TIF_SCRUBME, we'll just re-do the scrub, and if it
*doesn't* touch sensitive data, we'll return to the scrubbing having
touched even more cachelines so any old sensitive data should be fine.

Yeah, yeah, it might affect some cache replacement logic that the SW
scrubbing code might depend on in theory, but that's pretty damn
theoretical. Good luck to anybody trying to attack that.

                Linus

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 3/8] MDSv2 5
  2018-12-11 17:55               ` Linus Torvalds
@ 2018-12-11 18:10                 ` Borislav Petkov
  2018-12-11 18:21                 ` Linus Torvalds
  1 sibling, 0 replies; 59+ messages in thread
From: Borislav Petkov @ 2018-12-11 18:10 UTC (permalink / raw)
  To: speck

On Tue, Dec 11, 2018 at 09:55:05AM -0800, speck for Linus Torvalds wrote:

That's too encrypted. :-)

All I see in mutt is:

----------
[-- Begin signature information --]
Good signature from: speck@linutronix.de <speck@linutronix.de>
                aka: spexen@linutronix.de
                aka: spexen-owner@linutronix.de
                aka: speck@linutronix.de <speck-owner@linutronix.de>
                aka: speck@linutronix.de <speck-request@linutronix.de>
            created: Tue 11 Dec 2018 06:55:41 PM CET
WARNING: We have NO indication whether the key belongs to the person named as shown above
Fingerprint: D527 95F2 8E26 A155 4E7C  D269 E232 0546 8C06 0A6A
[-- End signature information --]

[-- The following data is PGP/MIME signed and encrypted --]

[-- Error: decryption failed: No secret key --]


[-- End of PGP/MIME signed and encrypted data --]
----------

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 3/8] MDSv2 5
  2018-12-11 17:55               ` Linus Torvalds
  2018-12-11 18:10                 ` Borislav Petkov
@ 2018-12-11 18:21                 ` Linus Torvalds
  2018-12-11 18:26                   ` Borislav Petkov
                                     ` (2 more replies)
  1 sibling, 3 replies; 59+ messages in thread
From: Linus Torvalds @ 2018-12-11 18:21 UTC (permalink / raw)
  To: speck

Ok, re-sending again.

It looks like speck doesn't have my other cert, so when I send with
the cert I want to use it ends up never being decrypted (and then
re-encrypted with the speck cert) by the mailing list.

               Linus

On Tue, Dec 11, 2018 at 9:55 AM Linus Torvalds wrote:
>
> [ Let's see if this comes through legibly ]
>
> On Mon, Dec 10, 2018 at 7:27 PM speck for Andi Kleen wrote:
> >
> > > In fact, maybe we could have something like a TIF_SCRUBME thread flag,
> > > which we react to on return to user space, and on context switch.
> >
> > We'll have to disable interrupts in that window, but that
> > should be ok.
>
> I don't think we even need to disable interrupts.
>
> If an interrupt comes in while we're scrubbing, either that interrupt
> will touch sensitive data (and set TIF_SCRUBME) or it won't.
>
> If it *does* set TIF_SCRUBME, we'll just re-do the scrub, and if it
> *doesn't* touch sensitive data, we'll return to the scrubbing having
> touched even more cachelines so any old sensitive data should be fine.
>
> Yeah, yeah, it might affect some cache replacement logic that the SW
> scrubbing code might depend on in theory, but that's pretty damn
> theoretical. Good luck to anybody trying to attack that.
>
>                 Linus

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 3/8] MDSv2 5
  2018-12-11 18:21                 ` Linus Torvalds
@ 2018-12-11 18:26                   ` Borislav Petkov
  2018-12-11 19:47                   ` Andi Kleen
  2018-12-11 21:22                   ` Thomas Gleixner
  2 siblings, 0 replies; 59+ messages in thread
From: Borislav Petkov @ 2018-12-11 18:26 UTC (permalink / raw)
  To: speck

On Tue, Dec 11, 2018 at 10:21:17AM -0800, speck for Linus Torvalds wrote:
> Ok, re-sending again.
> 
> It looks like speck doesn't have my other cert, so when I send with
> the cert I want to use it ends up never being decrypted (and then
> re-encrypted with the speck cert) by the mailing list.

Yeah, better. I can read it now.

tglx is looking into it, FWIW.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 3/8] MDSv2 5
  2018-12-11 18:21                 ` Linus Torvalds
  2018-12-11 18:26                   ` Borislav Petkov
@ 2018-12-11 19:47                   ` Andi Kleen
  2018-12-11 21:22                   ` Thomas Gleixner
  2 siblings, 0 replies; 59+ messages in thread
From: Andi Kleen @ 2018-12-11 19:47 UTC (permalink / raw)
  To: speck

> > If it *does* set TIF_SCRUBME, we'll just re-do the scrub, and if it
> > *doesn't* touch sensitive data, we'll return to the scrubbing having
> > touched even more cachelines so any old sensitive data should be fine.

Ah you mean we're ok if we recheck the flag with interrupts off 
at the end and repeat if needed? (like reschedule). I guess that will work.

-Andi

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 3/8] MDSv2 5
  2018-12-11 18:21                 ` Linus Torvalds
  2018-12-11 18:26                   ` Borislav Petkov
  2018-12-11 19:47                   ` Andi Kleen
@ 2018-12-11 21:22                   ` Thomas Gleixner
  2 siblings, 0 replies; 59+ messages in thread
From: Thomas Gleixner @ 2018-12-11 21:22 UTC (permalink / raw)
  To: speck

On Tue, 11 Dec 2018, speck for Linus Torvalds wrote:

> Ok, re-sending again.
> 
> It looks like speck doesn't have my other cert, so when I send with
> the cert I want to use it ends up never being decrypted (and then
> re-encrypted with the speck cert) by the mailing list.

Nah. It's just that gmail is the only MUA which does S/MIME different and
the remailer path handling that was never executed before.

Fixed and replayed the message.

Thanks,

	Thomas

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 3/8] MDSv2 5
  2018-12-11  3:25             ` Andi Kleen
  2018-12-11 17:55               ` Linus Torvalds
@ 2018-12-12 14:02               ` Paolo Bonzini
  2018-12-12 17:58                 ` Andi Kleen
  1 sibling, 1 reply; 59+ messages in thread
From: Paolo Bonzini @ 2018-12-12 14:02 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 648 bytes --]

On 11/12/18 04:25, speck for Andi Kleen wrote:
> Okay.
> 
> We'll have to disable interrupts in that window, but that
> should be ok.
> 
> I guess it could be also set in some skb_* functions to catch
> the network cases.
> 
> Or maybe setting when calling into unaudited code in interrupts/timers?
> 
> And the flag would allow a "paranoid" mode which sets
> it unconditionally. 

That sounds a lot like kvm_get_cpu_l1tf_flush_l1d and
kvm_clear_cpu_l1tf_flush_l1d.  Maybe we can just add another bit in
irq_cpustat_t's kvm_cpu_l1tf_flush_l1d, which would be cleared on return
to userspace before doing the verw stuff.

Paolo


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 3/8] MDSv2 5
  2018-12-12 14:02               ` [MODERATED] " Paolo Bonzini
@ 2018-12-12 17:58                 ` Andi Kleen
  2018-12-12 18:47                   ` Linus Torvalds
  0 siblings, 1 reply; 59+ messages in thread
From: Andi Kleen @ 2018-12-12 17:58 UTC (permalink / raw)
  To: speck

On Wed, Dec 12, 2018 at 03:02:23PM +0100, speck for Paolo Bonzini wrote:
> On 11/12/18 04:25, speck for Andi Kleen wrote:
> > Okay.
> > 
> > We'll have to disable interrupts in that window, but that
> > should be ok.
> > 
> > I guess it could be also set in some skb_* functions to catch
> > the network cases.
> > 
> > Or maybe setting when calling into unaudited code in interrupts/timers?
> > 
> > And the flag would allow a "paranoid" mode which sets
> > it unconditionally. 
> 
> That sounds a lot like kvm_get_cpu_l1tf_flush_l1d and
> kvm_clear_cpu_l1tf_flush_l1d.  Maybe we can just add another bit in
> irq_cpustat_t's kvm_cpu_l1tf_flush_l1d, which would be cleared on return
> to userspace before doing the verw stuff.

This is not only for KVM, but also for any user space exits,
so TIF flags is the natural mechanism.

Right now the clearing on entry is unconditional.

I thought about using the L1TF mechanism for KVM to avoid
the clear for fast exits.

However there is one problem: if the guest is using
the software sequence and the sequence gets interrupted
by an exit, then the only way to make sure the clearing
happened correctly in the guest is to let the hypervisor
do it.

That would require unconditional clearing on entry.

Unless we can somehow prove that fast exits only happen
when the guest does something actively with an instruction.
The clear sequence will not do any instruction that
should cause exits, other than exits that can happen
always (like interrupts or EPT violations). So would
need to make sure that such "can always happen exits"
would not trigger the fast exit no flush case. I'm not 
sure that is true today

-Andi

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 5/8] MDSv2 7
  2018-12-11  0:33   ` [MODERATED] " Andrew Cooper
@ 2018-12-12 18:05     ` Andrew Cooper
  0 siblings, 0 replies; 59+ messages in thread
From: Andrew Cooper @ 2018-12-12 18:05 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 3316 bytes --]

On 10/12/2018 16:33, speck for Andrew Cooper wrote:
> On 10/12/2018 17:53, speck for Andi Kleen wrote:
>> xdiff --git a/arch/x86/lib/clear_cpu.S b/arch/x86/lib/clear_cpu.S
>> new file mode 100644
>> index 000000000000..5af33baf5427
>> --- /dev/null
>> +++ b/arch/x86/lib/clear_cpu.S
>> @@ -0,0 +1,107 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +
>> +#include <linux/linkage.h>
>> +#include <asm/alternative-asm.h>
>> +#include <asm/cpufeatures.h>
>> +
>> +/*
>> + * Clear internal CPU buffers on kernel boundaries.
>> + *
>> + * These sequences are somewhat fragile, please don't add
>> + * or change instructions in the middle of the areas marked with
>> + * start/end.
>> + *
>> + * Interrupts and NMIs we deal with by reclearing. We clear parts
>> + * of the kernel stack, which has other advantages too.
>> + *
>> + * Save all registers to make it easier to use for callers.
>> + *
>> + * This sequence is for Nehalem-IvyBridge. For Haswell we jump
>> + * to hsw_clear_buf.
>> + *
>> + * These functions need to be called on a full stack, as they may
>> + * use upto 1.5k of stack. They should be also called with
>> + * interrupts disabled. NMIs etc. are handled by letting every
>> + * NMI do its own clear sequence.
>> + */
>> +ENTRY(ivb_clear_cpu)
>> +GLOBAL(do_clear_cpu)
>> +	/*
>> +	 * obj[tf]ool complains about unreachable code here,
>> +	 * which appears to be spurious?!?
>> +	 */
>> +	ALTERNATIVE "", "jmp hsw_clear_cpu", X86_BUG_MDS_CLEAR_CPU_HSWange
>> +	push %__ASM_REG(si)
>> +	push %__ASM_REG(di)
>> +	push %__ASM_REG(cx)
>> +	mov %_ASM_SP, %__ASM_REG(si)
>> +	sub  $2*16, %_ASM_SP
>> +	and  $-16,%_ASM_SP
>> +	movdqa %xmm0, (%_ASM_SP)
>> +	movdqa %xmm1, 1*16(%_ASM_SP)
> You don't need to preserve %xmm1 here.  It is unmodified by the
> sequence, because the orpd pulls zero out of its memory operand. 
> Similarly...
>
>> +	sub  $672, %_ASM_SP
>> +	xorpd %xmm0,%xmm0
>> +	movdqa %xmm0, (%_ASM_SP)
>> +	movdqa %xmm0, 16(%_ASM_SP)
> ... this store doesn't appear to do anything useful, as that stack slot
> isn't read again, and...
>
>> +	mov %_ASM_SP, %__ASM_REG(di)
>> +	/* Clear sequence start */
>> +	movdqu %xmm0,(%__ASM_REG(di))
>> +	lfence
>> +	orpd (%__ASM_REG(di)), %xmm0
>> +	orpd (%__ASM_REG(di)), %xmm1
>> +	mfence
>> +	movl $40, %ecx
>> +	add  $32, %__ASM_REG(di)
> ... I know this was in the recommended sequence, but bytes 16-31 aren't
> used at all, and it feels fishy.
>
> Either this wants to be $16, or the second orpd wants a displacement of
> 16 (and we do need to retain the second zeroing write) so all 32 bytes
> are used.

Based on what Ronak has said in person, two back-to-back loads are
guaranteed to be scheduled on alternate load ports.

Therefore, this can be a 16 byte change rather than 32.

However, there is a separate problem with synchronising the other
threads, because a pause waitloop will be racing with this sequence for
allocation of load ports.  There doesn't appear to be a viable option to
guarantee that these two orpd hit both load ports in the core.

Given the GPR restoration later in the return to guest path which is
15ish loads in a line, one option being discussed is to do away with the
first half of software sequence entirely.

~Andrew


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 3/8] MDSv2 5
  2018-12-12 17:58                 ` Andi Kleen
@ 2018-12-12 18:47                   ` Linus Torvalds
  2018-12-13 19:44                     ` Linus Torvalds
  0 siblings, 1 reply; 59+ messages in thread
From: Linus Torvalds @ 2018-12-12 18:47 UTC (permalink / raw)
  To: speck

[ let's see if this comes through now ]

On Wed, Dec 12, 2018, 09:58 speck for Andi Kleen <speck@linutronix.de wrote:

>
> However there is one problem: if the guest is using
> the software sequence and the sequence gets interrupted
> by an exit,
>

So I'm at the Intel partner meeting right now, and the official statement
seems to be that the software sequence really shouldn't be used. Nobody
seems to want that to be used for anything but something really odd case.

Just the verw instruction seems to be on the order of 400 cycles. It's
probably unnoticeable on any machine that has all the other mitigations in
place, but it looks like there are machines now that have all the other
things fixed, so 400 cycles then is still quite noticeable.

But the software sequence really sounds like nobody should realistically be
using by default.

         Linus

>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 1/8] MDSv2 4
  2018-12-10 17:53 ` [MODERATED] [PATCH v2 1/8] MDSv2 4 Andi Kleen
  2018-12-11 14:14   ` [MODERATED] " Paolo Bonzini
@ 2018-12-12 21:22   ` Konrad Rzeszutek Wilk
  2018-12-12 21:28     ` Andi Kleen
  2018-12-12 21:25   ` Konrad Rzeszutek Wilk
  2 siblings, 1 reply; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-12-12 21:22 UTC (permalink / raw)
  To: speck

On Mon, Dec 10, 2018 at 09:53:33AM -0800, speck for Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> Subject:  x86/speculation/mds: Add basic bug infrastructure for
>  MDS
> 
> MDS is micro architectural data sampling, which is a side channel
> attack on internal buffers in Intel CPUs. They all have
> the same mitigations for single thread, so we lump them all
> together as a single MDS issue.
> 
> This patch adds the basic infrastructure to detect if the current
> CPU is affected by MDS, and if yes set the right BUG bits.
> 
> We also provide a command line option "mds_disable" to disable
> any workarounds.

Would it be worth referencing a not-yet-posted Intel doc,
or at least point to the kernel.org bugzilla where we dump all
the CPU related issues?

> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>  Documentation/admin-guide/kernel-parameters.txt |  3 +++
>  arch/x86/include/asm/cpufeatures.h              |  2 ++
>  arch/x86/include/asm/msr-index.h                |  1 +
>  arch/x86/kernel/cpu/bugs.c                      | 10 ++++++++++
>  arch/x86/kernel/cpu/common.c                    | 14 ++++++++++++++
>  5 files changed, 30 insertions(+)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index aefd358a5ca3..48891572e825 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2341,6 +2341,9 @@
>  			Format: <first>,<last>
>  			Specifies range of consoles to be captured by the MDA.
>  
> +	mds_disable	[X86]

,Intel ?

> +			Disable workarounds for Micro-architectural Data Sampling.
> +
>  	mem=nn[KMG]	[KNL,BOOT] Force usage of a specific amount of memory
>  			Amount of memory to be used when the kernel is not able
>  			to see the whole system memory or for test.
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 28c4a502b419..93fab3a1e046 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -342,6 +342,7 @@
>  /* Intel-defined CPU features, CPUID level 0x00000007:0 (EDX), word 18 */
>  #define X86_FEATURE_AVX512_4VNNIW	(18*32+ 2) /* AVX-512 Neural Network Instructions */
>  #define X86_FEATURE_AVX512_4FMAPS	(18*32+ 3) /* AVX-512 Multiply Accumulation Single precision */
> +#define X86_FEATURE_MB_CLEAR		(18*32+10) /* Flush state on VERW */
>  #define X86_FEATURE_PCONFIG		(18*32+18) /* Intel PCONFIG */
>  #define X86_FEATURE_SPEC_CTRL		(18*32+26) /* "" Speculation Control (IBRS + IBPB) */
>  #define X86_FEATURE_INTEL_STIBP		(18*32+27) /* "" Single Thread Indirect Branch Predictors */
> @@ -379,5 +380,6 @@
>  #define X86_BUG_SPECTRE_V2		X86_BUG(16) /* CPU is affected by Spectre variant 2 attack with indirect branches */
>  #define X86_BUG_SPEC_STORE_BYPASS	X86_BUG(17) /* CPU is affected by speculative store bypass attack */
>  #define X86_BUG_L1TF			X86_BUG(18) /* CPU is affected by L1 Terminal Fault */
> +#define X86_BUG_MDS			X86_BUG(19) /* CPU is affected by Microarchitectural data sampling */
>  
>  #endif /* _ASM_X86_CPUFEATURES_H */
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index c8f73efb4ece..303064a9a0a9 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -77,6 +77,7 @@
>  						    * attack, so no Speculative Store Bypass
>  						    * control required.
>  						    */
> +#define ARCH_CAP_MDS_NO			(1 << 5)   /* No Microarchitectural data sampling */

Should this say anything about RDCL?
>  
>  #define MSR_IA32_FLUSH_CMD		0x0000010b
>  #define L1D_FLUSH			(1 << 0)   /*
> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> index 500278f5308e..5cac243849d8 100644
> --- a/arch/x86/kernel/cpu/bugs.c
> +++ b/arch/x86/kernel/cpu/bugs.c
> @@ -35,6 +35,7 @@
>  static void __init spectre_v2_select_mitigation(void);
>  static void __init ssb_select_mitigation(void);
>  static void __init l1tf_select_mitigation(void);
> +static void __init mds_select_mitigation(void);
>  
>  /* The base value of the SPEC_CTRL MSR that always has to be preserved. */
>  u64 x86_spec_ctrl_base;
> @@ -99,6 +100,8 @@ void __init check_bugs(void)
>  
>  	l1tf_select_mitigation();
>  
> +	mds_select_mitigation();
> +
>  #ifdef CONFIG_X86_32
>  	/*
>  	 * Check whether we are able to run this kernel safely on SMP.
> @@ -1041,6 +1044,13 @@ early_param("l1tf", l1tf_cmdline);
>  
>  #undef pr_fmt
>  
> +static void mds_select_mitigation(void)
> +{
> +	if (cmdline_find_option_bool(boot_command_line, "mds_disable") ||
> +	    !boot_cpu_has_bug(X86_BUG_MDS))
> +		setup_clear_cpu_cap(X86_FEATURE_MB_CLEAR);
> +}
> +
>  #ifdef CONFIG_SYSFS
>  
>  #define L1TF_DEFAULT_MSG "Mitigation: PTE Inversion"
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index ffb181f959d2..bebeb67015fc 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -998,6 +998,14 @@ static const __initconst struct x86_cpu_id cpu_no_l1tf[] = {
>  	{}
>  };
>  
> +static const __initconst struct x86_cpu_id cpu_no_mds[] = {
> +	/* in addition to cpu_no_speculation */
> +	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_ATOM_GOLDMONT	},
> +	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_ATOM_GOLDMONT_X	},
> +	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_ATOM_GOLDMONT_PLUS	},
> +	{}
> +};
> +
>  static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
>  {
>  	u64 ia32_cap = 0;
> @@ -1019,6 +1027,12 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
>  	if (ia32_cap & ARCH_CAP_IBRS_ALL)
>  		setup_force_cpu_cap(X86_FEATURE_IBRS_ENHANCED);
>  
> +	if ((boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
> +	    !x86_match_cpu(cpu_no_mds)) &&
> +	    !(ia32_cap & ARCH_CAP_MDS_NO) &&
> +	    !(ia32_cap & ARCH_CAP_RDCL_NO))
> +		setup_force_cpu_bug(X86_BUG_MDS);
> +
>  	if (x86_match_cpu(cpu_no_meltdown))
>  		return;
>  
> -- 
> 2.17.2

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 1/8] MDSv2 4
  2018-12-10 17:53 ` [MODERATED] [PATCH v2 1/8] MDSv2 4 Andi Kleen
  2018-12-11 14:14   ` [MODERATED] " Paolo Bonzini
  2018-12-12 21:22   ` Konrad Rzeszutek Wilk
@ 2018-12-12 21:25   ` Konrad Rzeszutek Wilk
  2 siblings, 0 replies; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-12-12 21:25 UTC (permalink / raw)
  To: speck

On Mon, Dec 10, 2018 at 09:53:33AM -0800, speck for Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> Subject:  x86/speculation/mds: Add basic bug infrastructure for
>  MDS
> 
> MDS is micro architectural data sampling, which is a side channel
> attack on internal buffers in Intel CPUs. They all have
> the same mitigations for single thread, so we lump them all
> together as a single MDS issue.
> 
> This patch adds the basic infrastructure to detect if the current
> CPU is affected by MDS, and if yes set the right BUG bits.
> 
> We also provide a command line option "mds_disable" to disable
> any workarounds.

Perhaps follow the same way as the other ones by having:

mds=off
nomds

?

Also is it worth explaining in details what this is? And include the
CVE number?
> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>  Documentation/admin-guide/kernel-parameters.txt |  3 +++
>  arch/x86/include/asm/cpufeatures.h              |  2 ++
>  arch/x86/include/asm/msr-index.h                |  1 +
>  arch/x86/kernel/cpu/bugs.c                      | 10 ++++++++++
>  arch/x86/kernel/cpu/common.c                    | 14 ++++++++++++++
>  5 files changed, 30 insertions(+)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index aefd358a5ca3..48891572e825 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2341,6 +2341,9 @@
>  			Format: <first>,<last>
>  			Specifies range of consoles to be captured by the MDA.
>  
> +	mds_disable	[X86]
> +			Disable workarounds for Micro-architectural Data Sampling.
> +
>  	mem=nn[KMG]	[KNL,BOOT] Force usage of a specific amount of memory
>  			Amount of memory to be used when the kernel is not able
>  			to see the whole system memory or for test.
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 28c4a502b419..93fab3a1e046 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -342,6 +342,7 @@
>  /* Intel-defined CPU features, CPUID level 0x00000007:0 (EDX), word 18 */
>  #define X86_FEATURE_AVX512_4VNNIW	(18*32+ 2) /* AVX-512 Neural Network Instructions */
>  #define X86_FEATURE_AVX512_4FMAPS	(18*32+ 3) /* AVX-512 Multiply Accumulation Single precision */
> +#define X86_FEATURE_MB_CLEAR		(18*32+10) /* Flush state on VERW */
>  #define X86_FEATURE_PCONFIG		(18*32+18) /* Intel PCONFIG */
>  #define X86_FEATURE_SPEC_CTRL		(18*32+26) /* "" Speculation Control (IBRS + IBPB) */
>  #define X86_FEATURE_INTEL_STIBP		(18*32+27) /* "" Single Thread Indirect Branch Predictors */
> @@ -379,5 +380,6 @@
>  #define X86_BUG_SPECTRE_V2		X86_BUG(16) /* CPU is affected by Spectre variant 2 attack with indirect branches */
>  #define X86_BUG_SPEC_STORE_BYPASS	X86_BUG(17) /* CPU is affected by speculative store bypass attack */
>  #define X86_BUG_L1TF			X86_BUG(18) /* CPU is affected by L1 Terminal Fault */
> +#define X86_BUG_MDS			X86_BUG(19) /* CPU is affected by Microarchitectural data sampling */
>  
>  #endif /* _ASM_X86_CPUFEATURES_H */
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index c8f73efb4ece..303064a9a0a9 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -77,6 +77,7 @@
>  						    * attack, so no Speculative Store Bypass
>  						    * control required.
>  						    */
> +#define ARCH_CAP_MDS_NO			(1 << 5)   /* No Microarchitectural data sampling */
>  
>  #define MSR_IA32_FLUSH_CMD		0x0000010b
>  #define L1D_FLUSH			(1 << 0)   /*
> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> index 500278f5308e..5cac243849d8 100644
> --- a/arch/x86/kernel/cpu/bugs.c
> +++ b/arch/x86/kernel/cpu/bugs.c
> @@ -35,6 +35,7 @@
>  static void __init spectre_v2_select_mitigation(void);
>  static void __init ssb_select_mitigation(void);
>  static void __init l1tf_select_mitigation(void);
> +static void __init mds_select_mitigation(void);
>  
>  /* The base value of the SPEC_CTRL MSR that always has to be preserved. */
>  u64 x86_spec_ctrl_base;
> @@ -99,6 +100,8 @@ void __init check_bugs(void)
>  
>  	l1tf_select_mitigation();
>  
> +	mds_select_mitigation();
> +
>  #ifdef CONFIG_X86_32
>  	/*
>  	 * Check whether we are able to run this kernel safely on SMP.
> @@ -1041,6 +1044,13 @@ early_param("l1tf", l1tf_cmdline);
>  
>  #undef pr_fmt
>  
> +static void mds_select_mitigation(void)
> +{
> +	if (cmdline_find_option_bool(boot_command_line, "mds_disable") ||
> +	    !boot_cpu_has_bug(X86_BUG_MDS))
> +		setup_clear_cpu_cap(X86_FEATURE_MB_CLEAR);
> +}
> +
>  #ifdef CONFIG_SYSFS
>  
>  #define L1TF_DEFAULT_MSG "Mitigation: PTE Inversion"
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index ffb181f959d2..bebeb67015fc 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -998,6 +998,14 @@ static const __initconst struct x86_cpu_id cpu_no_l1tf[] = {
>  	{}
>  };
>  
> +static const __initconst struct x86_cpu_id cpu_no_mds[] = {
> +	/* in addition to cpu_no_speculation */
> +	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_ATOM_GOLDMONT	},
> +	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_ATOM_GOLDMONT_X	},
> +	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_ATOM_GOLDMONT_PLUS	},
> +	{}
> +};
> +
>  static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
>  {
>  	u64 ia32_cap = 0;
> @@ -1019,6 +1027,12 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
>  	if (ia32_cap & ARCH_CAP_IBRS_ALL)
>  		setup_force_cpu_cap(X86_FEATURE_IBRS_ENHANCED);
>  
> +	if ((boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
> +	    !x86_match_cpu(cpu_no_mds)) &&
> +	    !(ia32_cap & ARCH_CAP_MDS_NO) &&
> +	    !(ia32_cap & ARCH_CAP_RDCL_NO))
> +		setup_force_cpu_bug(X86_BUG_MDS);
> +
>  	if (x86_match_cpu(cpu_no_meltdown))
>  		return;
>  
> -- 
> 2.17.2

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 1/8] MDSv2 4
  2018-12-12 21:22   ` Konrad Rzeszutek Wilk
@ 2018-12-12 21:28     ` Andi Kleen
  0 siblings, 0 replies; 59+ messages in thread
From: Andi Kleen @ 2018-12-12 21:28 UTC (permalink / raw)
  To: speck

On Wed, Dec 12, 2018 at 04:22:46PM -0500, speck for Konrad Rzeszutek Wilk wrote:
> On Mon, Dec 10, 2018 at 09:53:33AM -0800, speck for Andi Kleen wrote:
> > From: Andi Kleen <ak@linux.intel.com>
> > Subject:  x86/speculation/mds: Add basic bug infrastructure for
> >  MDS
> > 
> > MDS is micro architectural data sampling, which is a side channel
> > attack on internal buffers in Intel CPUs. They all have
> > the same mitigations for single thread, so we lump them all
> > together as a single MDS issue.
> > 
> > This patch adds the basic infrastructure to detect if the current
> > CPU is affected by MDS, and if yes set the right BUG bits.
> > 
> > We also provide a command line option "mds_disable" to disable
> > any workarounds.
> 
> Would it be worth referencing a not-yet-posted Intel doc,
> or at least point to the kernel.org bugzilla where we dump all
> the CPU related issues?

Can add that later once a URL is known.

> >  						    * attack, so no Speculative Store Bypass
> >  						    * control required.
> >  						    */
> > +#define ARCH_CAP_MDS_NO			(1 << 5)   /* No Microarchitectural data sampling */
> 
> Should this say anything about RDCL?

Don't see any point. 

-Andi

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 2/8] MDSv2 1
  2018-12-11 10:03       ` Borislav Petkov
@ 2018-12-12 21:31         ` Konrad Rzeszutek Wilk
  2018-12-12 21:43           ` Andi Kleen
  2018-12-12 22:17           ` Borislav Petkov
  0 siblings, 2 replies; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-12-12 21:31 UTC (permalink / raw)
  To: speck

On Tue, Dec 11, 2018 at 11:03:35AM +0100, speck for Borislav Petkov wrote:
> On Mon, Dec 10, 2018 at 07:13:18PM -0500, speck for Kanth Ghatraju wrote:
> > If you have a patched kernel but old microcode and then do a late load
> > of the microcode,
> 
> And to extend what the others said: late microcode loading is a can
> of worms which we shouldnt've opened back then. Especially since it
> requires a complicated dance on the OS part to quiesce the cores to even
> work without locking up the core or whatever crazy can happen.

We (Oracle, UEK kernel) has this can of worms happily (well, we did have to
get rid of some bugs) working.

We can start that conversation upstream, but Thomas and Boris -are you
guys very much against this idea with the upstream kernel or would you
be up for taking a sip from this can?

Thanks!
> 
> So microcode update should be handled just like updating your kernel -
> you need to reboot the machine.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
> -- 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 5/8] MDSv2 7
  2018-12-10 17:53 ` [MODERATED] [PATCH v2 5/8] MDSv2 7 Andi Kleen
  2018-12-11  0:33   ` [MODERATED] " Andrew Cooper
@ 2018-12-12 21:41   ` Konrad Rzeszutek Wilk
  2018-12-12 22:12     ` Andi Kleen
  1 sibling, 1 reply; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-12-12 21:41 UTC (permalink / raw)
  To: speck

On Mon, Dec 10, 2018 at 09:53:37AM -0800, speck for Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> Subject:  x86/speculation/mds: Add software sequences for older
>  CPUs.
> 
> On older CPUs before Broadwell clearing the CPU buffer with VERW is not
> available, so we implement software sequences. These can then be
> automatically patched in as needed.
> 
> Support mitigation for Nehalem upto Broadwell. Broadwell strictly doesn't
> need it because it should have the microcode update for VERW, which
> is preferred.
> 
> We add command line options to force the two different sequences,
> mainly for regression testing and if someone wants to test
> on pre Nehalem CPUs.
> 
> Note to backporters: this patch requires eager FPU support.
> 
> Note there is no Xeon Phi support so far.
> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>  .../admin-guide/kernel-parameters.txt         |  10 ++
>  arch/x86/include/asm/cpufeatures.h            |   2 +
>  arch/x86/kernel/cpu/bugs.c                    |  59 +++++++++-
>  arch/x86/lib/Makefile                         |   1 +
>  arch/x86/lib/clear_cpu.S                      | 107 ++++++++++++++++++
>  5 files changed, 176 insertions(+), 3 deletions(-)
>  create mode 100644 arch/x86/lib/clear_cpu.S
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 48891572e825..16e10f92f8dd 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2344,6 +2344,16 @@
>  	mds_disable	[X86]
>  			Disable workarounds for Micro-architectural Data Sampling.
>  
> +	mds=swclear	[X86]
> +			Force using software sequence for clearing data that
> +			could be exploited by Micro-architectural Data Sampling.
> +			Normally automatically enabled when needed.

s/Normally//?
> +
> +	mds=swclearhsw	[X86]
> +			Use Haswell/Broadwell specific sequence for clearing
> +			data that could be exploited by Micro-architectural Data
> +			Sampling. Normally automatically enabled when needed.

s/Normally//?
> +
>  	mem=nn[KMG]	[KNL,BOOT] Force usage of a specific amount of memory
>  			Amount of memory to be used when the kernel is not able
>  			to see the whole system memory or for test.
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 93fab3a1e046..110759334c88 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -381,5 +381,7 @@
>  #define X86_BUG_SPEC_STORE_BYPASS	X86_BUG(17) /* CPU is affected by speculative store bypass attack */
>  #define X86_BUG_L1TF			X86_BUG(18) /* CPU is affected by L1 Terminal Fault */
>  #define X86_BUG_MDS			X86_BUG(19) /* CPU is affected by Microarchitectural data sampling */
> +#define X86_BUG_MDS_CLEAR_CPU		X86_BUG(20) /* CPU needs call to clear_cpu on kernel exit/idle for MDS */
> +#define X86_BUG_MDS_CLEAR_CPU_HSW	X86_BUG(21) /* CPU needs Haswell version of clear cpu */

Why do we want to expose these instead of just having 'MDS' be exposed as bug?

>  
>  #endif /* _ASM_X86_CPUFEATURES_H */
> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> index accab3279068..91619f7d90be 100644
> --- a/arch/x86/kernel/cpu/bugs.c
> +++ b/arch/x86/kernel/cpu/bugs.c
> @@ -31,6 +31,8 @@
>  #include <asm/intel-family.h>
>  #include <asm/e820/api.h>
>  #include <asm/hypervisor.h>
> +#include <asm/hypervisor.h>
> +#include <asm/cpu_device_id.h>
>  
>  static void __init spectre_v2_select_mitigation(void);
>  static void __init ssb_select_mitigation(void);
> @@ -1044,11 +1046,55 @@ early_param("l1tf", l1tf_cmdline);
>  
>  #undef pr_fmt
>  
> +static const __initconst struct x86_cpu_id cpu_mds_clear_cpu[] = {
> +	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_NEHALEM	 },
> +	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_NEHALEM_G	 },
> +	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_NEHALEM_EP	 },
> +	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_NEHALEM_EX	 },
> +	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_WESTMERE	 },
> +	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_WESTMERE_EP	 },
> +	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_WESTMERE_EX	 },
> +	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_SANDYBRIDGE	 },
> +	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_SANDYBRIDGE_X },
> +	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_IVYBRIDGE	 },
> +	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_IVYBRIDGE_X	 },
> +	{}
> +};
> +
> +static const __initconst struct x86_cpu_id cpu_mds_clear_cpu_hsw[] = {
> +	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_HASWELL_CORE	    },
> +	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_HASWELL_X	    },
> +	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_HASWELL_ULT	    },
> +	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_HASWELL_GT3E	    },
> +
> +	/* Have MB_CLEAR with microcode update, but list just in case: */
> +	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_BROADWELL_CORE   },
> +	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_BROADWELL_GT3E   },
> +	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_BROADWELL_X	    },
> +	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_BROADWELL_XEON_D },
> +	{}
> +};
> +
>  static void mds_select_mitigation(void)
>  {
>  	if (cmdline_find_option_bool(boot_command_line, "mds_disable") ||
> -	    !boot_cpu_has_bug(X86_BUG_MDS))
> +	    !boot_cpu_has_bug(X86_BUG_MDS)) {
>  		setup_clear_cpu_cap(X86_FEATURE_MB_CLEAR);
> +		setup_clear_cpu_cap(X86_BUG_MDS_CLEAR_CPU_HSW);
> +		setup_clear_cpu_cap(X86_BUG_MDS_CLEAR_CPU);
> +		return;
> +	}
> +
> +	if ((!boot_cpu_has(X86_FEATURE_MB_CLEAR) &&
> +		x86_match_cpu(cpu_mds_clear_cpu)) ||
> +		cmdline_find_option_bool(boot_command_line, "mds=swclear"))
> +		setup_force_cpu_cap(X86_BUG_MDS_CLEAR_CPU);
> +	if ((!boot_cpu_has(X86_FEATURE_MB_CLEAR) &&
> +		x86_match_cpu(cpu_mds_clear_cpu_hsw)) ||
> +		cmdline_find_option_bool(boot_command_line, "mds=swclearhsw")) {
> +		setup_force_cpu_cap(X86_BUG_MDS_CLEAR_CPU);
> +		setup_force_cpu_cap(X86_BUG_MDS_CLEAR_CPU_HSW);
> +	}

This all seems scream using loops and array to probe the 'swclear','swclearhsw,'off', etc?

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 2/8] MDSv2 1
  2018-12-12 21:31         ` Konrad Rzeszutek Wilk
@ 2018-12-12 21:43           ` Andi Kleen
  2018-12-12 22:17           ` Borislav Petkov
  1 sibling, 0 replies; 59+ messages in thread
From: Andi Kleen @ 2018-12-12 21:43 UTC (permalink / raw)
  To: speck

On Wed, Dec 12, 2018 at 04:31:48PM -0500, speck for Konrad Rzeszutek Wilk wrote:
> On Tue, Dec 11, 2018 at 11:03:35AM +0100, speck for Borislav Petkov wrote:
> > On Mon, Dec 10, 2018 at 07:13:18PM -0500, speck for Kanth Ghatraju wrote:
> > > If you have a patched kernel but old microcode and then do a late load
> > > of the microcode,
> > 
> > And to extend what the others said: late microcode loading is a can
> > of worms which we shouldnt've opened back then. Especially since it
> > requires a complicated dance on the OS part to quiesce the cores to even
> > work without locking up the core or whatever crazy can happen.
> 
> We (Oracle, UEK kernel) has this can of worms happily (well, we did have to
> get rid of some bugs) working.

> 
> We can start that conversation upstream, but Thomas and Boris -are you
> guys very much against this idea with the upstream kernel or would you
> be up for taking a sip from this can?

If you really want to support it the right way would be to fix alternative()
to support it natively, not require every user of alternative to
switch to static keys.

It's probably not that hard, just don't throw the alternative tables
away and rerun the sequence.  Will likely need a stop_machine to patch
to handle multiple instruction sequences, but that is likely not a big issue.

If that was done my patch would just work naturally without any changes.

-Andi

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 4/8] MDSv2 0
  2018-12-10 17:53 ` [MODERATED] [PATCH v2 4/8] MDSv2 0 Andi Kleen
@ 2018-12-12 21:45   ` Konrad Rzeszutek Wilk
  2018-12-12 22:09     ` Andi Kleen
  0 siblings, 1 reply; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-12-12 21:45 UTC (permalink / raw)
  To: speck

On Mon, Dec 10, 2018 at 09:53:36AM -0800, speck for Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> Subject:  x86/speculation/mds: Add sysfs reporting
> 
> Report mds mitigation state in sysfs vulnerabilities.
> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>  .../ABI/testing/sysfs-devices-system-cpu         |  1 +
>  arch/x86/kernel/cpu/bugs.c                       | 16 ++++++++++++++++
>  drivers/base/cpu.c                               |  8 ++++++++
>  3 files changed, 25 insertions(+)
> 
> diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
> index 73318225a368..02b7bb711214 100644
> --- a/Documentation/ABI/testing/sysfs-devices-system-cpu
> +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
> @@ -477,6 +477,7 @@ What:		/sys/devices/system/cpu/vulnerabilities
>  		/sys/devices/system/cpu/vulnerabilities/spectre_v2
>  		/sys/devices/system/cpu/vulnerabilities/spec_store_bypass
>  		/sys/devices/system/cpu/vulnerabilities/l1tf
> +		/sys/devices/system/cpu/vulnerabilities/mds
>  Date:		January 2018
>  Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
>  Description:	Information about CPU vulnerabilities
> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> index 8200b41b8db9..accab3279068 100644
> --- a/arch/x86/kernel/cpu/bugs.c
> +++ b/arch/x86/kernel/cpu/bugs.c
> @@ -1175,6 +1175,16 @@ static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr
>  		if (boot_cpu_has(X86_FEATURE_L1TF_PTEINV))
>  			return l1tf_show_state(buf);
>  		break;
> +
> +	case X86_BUG_MDS:
> +		/* Assumes Hypervisor exposed HT state to us if in guest */
> +		if (boot_cpu_has(X86_FEATURE_MB_CLEAR)) {
> +			if (cpu_smt_control != CPU_SMT_ENABLED)
> +				return sprintf(buf, "Mitigation: MB_CLEAR\n");

I am not sure if a normal system admin would understand what MB_CLEAR means. Is there a way to
make this more ..grokked?
> +			return sprintf(buf, "Mitigation: MB_CLEAR, HT vulnerable\n");

Also you could use "%s", cpu_smt_control != CPU_SMT_ENABLED : "", "SMT vulnerable...")?

> +		}
> +		return sprintf(buf, "Vulnerable\n");
> +
>  	default:
>  		break;
>  	}
> @@ -1206,4 +1216,10 @@ ssize_t cpu_show_l1tf(struct device *dev, struct device_attribute *attr, char *b
>  {
>  	return cpu_show_common(dev, attr, buf, X86_BUG_L1TF);
>  }
> +
> +ssize_t cpu_show_mds(struct device *dev, struct device_attribute *attr, char *buf)
> +{
> +	return cpu_show_common(dev, attr, buf, X86_BUG_MDS);
> +}
> +
>  #endif
> diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
> index eb9443d5bae1..2fd6ca1021c2 100644
> --- a/drivers/base/cpu.c
> +++ b/drivers/base/cpu.c
> @@ -546,11 +546,18 @@ ssize_t __weak cpu_show_l1tf(struct device *dev,
>  	return sprintf(buf, "Not affected\n");
>  }
>  
> +ssize_t __weak cpu_show_mds(struct device *dev,
> +			    struct device_attribute *attr, char *buf)
> +{
> +	return sprintf(buf, "Not affected\n");
> +}
> +
>  static DEVICE_ATTR(meltdown, 0444, cpu_show_meltdown, NULL);
>  static DEVICE_ATTR(spectre_v1, 0444, cpu_show_spectre_v1, NULL);
>  static DEVICE_ATTR(spectre_v2, 0444, cpu_show_spectre_v2, NULL);
>  static DEVICE_ATTR(spec_store_bypass, 0444, cpu_show_spec_store_bypass, NULL);
>  static DEVICE_ATTR(l1tf, 0444, cpu_show_l1tf, NULL);
> +static DEVICE_ATTR(mds, 0444, cpu_show_mds, NULL);
>  
>  static struct attribute *cpu_root_vulnerabilities_attrs[] = {
>  	&dev_attr_meltdown.attr,
> @@ -558,6 +565,7 @@ static struct attribute *cpu_root_vulnerabilities_attrs[] = {
>  	&dev_attr_spectre_v2.attr,
>  	&dev_attr_spec_store_bypass.attr,
>  	&dev_attr_l1tf.attr,
> +	&dev_attr_mds.attr,
>  	NULL
>  };
>  
> -- 
> 2.17.2

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 4/8] MDSv2 0
  2018-12-12 21:45   ` [MODERATED] " Konrad Rzeszutek Wilk
@ 2018-12-12 22:09     ` Andi Kleen
  2018-12-12 22:36       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 59+ messages in thread
From: Andi Kleen @ 2018-12-12 22:09 UTC (permalink / raw)
  To: speck

On Wed, Dec 12, 2018 at 04:45:06PM -0500, speck for Konrad Rzeszutek Wilk wrote:
> On Mon, Dec 10, 2018 at 09:53:36AM -0800, speck for Andi Kleen wrote:
> > From: Andi Kleen <ak@linux.intel.com>
> > Subject:  x86/speculation/mds: Add sysfs reporting
> > 
> > Report mds mitigation state in sysfs vulnerabilities.
> > 
> > Signed-off-by: Andi Kleen <ak@linux.intel.com>
> > ---
> >  .../ABI/testing/sysfs-devices-system-cpu         |  1 +
> >  arch/x86/kernel/cpu/bugs.c                       | 16 ++++++++++++++++
> >  drivers/base/cpu.c                               |  8 ++++++++
> >  3 files changed, 25 insertions(+)
> > 
> > diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
> > index 73318225a368..02b7bb711214 100644
> > --- a/Documentation/ABI/testing/sysfs-devices-system-cpu
> > +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
> > @@ -477,6 +477,7 @@ What:		/sys/devices/system/cpu/vulnerabilities
> >  		/sys/devices/system/cpu/vulnerabilities/spectre_v2
> >  		/sys/devices/system/cpu/vulnerabilities/spec_store_bypass
> >  		/sys/devices/system/cpu/vulnerabilities/l1tf
> > +		/sys/devices/system/cpu/vulnerabilities/mds
> >  Date:		January 2018
> >  Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
> >  Description:	Information about CPU vulnerabilities
> > diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> > index 8200b41b8db9..accab3279068 100644
> > --- a/arch/x86/kernel/cpu/bugs.c
> > +++ b/arch/x86/kernel/cpu/bugs.c
> > @@ -1175,6 +1175,16 @@ static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr
> >  		if (boot_cpu_has(X86_FEATURE_L1TF_PTEINV))
> >  			return l1tf_show_state(buf);
> >  		break;
> > +
> > +	case X86_BUG_MDS:
> > +		/* Assumes Hypervisor exposed HT state to us if in guest */
> > +		if (boot_cpu_has(X86_FEATURE_MB_CLEAR)) {
> > +			if (cpu_smt_control != CPU_SMT_ENABLED)
> > +				return sprintf(buf, "Mitigation: MB_CLEAR\n");
> 
> I am not sure if a normal system admin would understand what MB_CLEAR means. Is there a way to
> make this more ..grokked?

You mean they are somehow unable to use Google?

Can call it

microcode 

I guess, but I have doubts it's actually better.

> > +			return sprintf(buf, "Mitigation: MB_CLEAR, HT vulnerable\n");
> 
> Also you could use "%s", cpu_smt_control != CPU_SMT_ENABLED : "", "SMT vulnerable...")?

This code will change later when HT mitigations are added, would
just need to be untangled again then.

-Andi

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 5/8] MDSv2 7
  2018-12-12 21:41   ` Konrad Rzeszutek Wilk
@ 2018-12-12 22:12     ` Andi Kleen
  0 siblings, 0 replies; 59+ messages in thread
From: Andi Kleen @ 2018-12-12 22:12 UTC (permalink / raw)
  To: speck

> > +	mds=swclear	[X86]
> > +			Force using software sequence for clearing data that
> > +			could be exploited by Micro-architectural Data Sampling.
> > +			Normally automatically enabled when needed.
> 
> s/Normally//?
> > +
> > +	mds=swclearhsw	[X86]
> > +			Use Haswell/Broadwell specific sequence for clearing
> > +			data that could be exploited by Micro-architectural Data
> > +			Sampling. Normally automatically enabled when needed.
> 
> s/Normally//?

It's not done on really old pre Nehalem CPUs, which may or may not have the problem.

> > +
> >  	mem=nn[KMG]	[KNL,BOOT] Force usage of a specific amount of memory
> >  			Amount of memory to be used when the kernel is not able
> >  			to see the whole system memory or for test.
> > diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> > index 93fab3a1e046..110759334c88 100644
> > --- a/arch/x86/include/asm/cpufeatures.h
> > +++ b/arch/x86/include/asm/cpufeatures.h
> > @@ -381,5 +381,7 @@
> >  #define X86_BUG_SPEC_STORE_BYPASS	X86_BUG(17) /* CPU is affected by speculative store bypass attack */
> >  #define X86_BUG_L1TF			X86_BUG(18) /* CPU is affected by L1 Terminal Fault */
> >  #define X86_BUG_MDS			X86_BUG(19) /* CPU is affected by Microarchitectural data sampling */
> > +#define X86_BUG_MDS_CLEAR_CPU		X86_BUG(20) /* CPU needs call to clear_cpu on kernel exit/idle for MDS */
> > +#define X86_BUG_MDS_CLEAR_CPU_HSW	X86_BUG(21) /* CPU needs Haswell version of clear cpu */
> 
> Why do we want to expose these instead of just having 'MDS' be exposed as bug?

The patch code needs it. And I find it useful during my testing
if it is exposed.

> >  	if (cmdline_find_option_bool(boot_command_line, "mds_disable") ||
> > -	    !boot_cpu_has_bug(X86_BUG_MDS))
> > +	    !boot_cpu_has_bug(X86_BUG_MDS)) {
> >  		setup_clear_cpu_cap(X86_FEATURE_MB_CLEAR);
> > +		setup_clear_cpu_cap(X86_BUG_MDS_CLEAR_CPU_HSW);
> > +		setup_clear_cpu_cap(X86_BUG_MDS_CLEAR_CPU);
> > +		return;
> > +	}
> > +
> > +	if ((!boot_cpu_has(X86_FEATURE_MB_CLEAR) &&
> > +		x86_match_cpu(cpu_mds_clear_cpu)) ||
> > +		cmdline_find_option_bool(boot_command_line, "mds=swclear"))
> > +		setup_force_cpu_cap(X86_BUG_MDS_CLEAR_CPU);
> > +	if ((!boot_cpu_has(X86_FEATURE_MB_CLEAR) &&
> > +		x86_match_cpu(cpu_mds_clear_cpu_hsw)) ||
> > +		cmdline_find_option_bool(boot_command_line, "mds=swclearhsw")) {
> > +		setup_force_cpu_cap(X86_BUG_MDS_CLEAR_CPU);
> > +		setup_force_cpu_cap(X86_BUG_MDS_CLEAR_CPU_HSW);
> > +	}
> 
> This all seems scream using loops and array to probe the 'swclear','swclearhsw,'off', etc?

I don't believe in overdesign.

-Andi

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 2/8] MDSv2 1
  2018-12-12 21:31         ` Konrad Rzeszutek Wilk
  2018-12-12 21:43           ` Andi Kleen
@ 2018-12-12 22:17           ` Borislav Petkov
  2018-12-12 22:40             ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 59+ messages in thread
From: Borislav Petkov @ 2018-12-12 22:17 UTC (permalink / raw)
  To: speck

On Wed, Dec 12, 2018 at 04:31:48PM -0500, speck for Konrad Rzeszutek Wilk wrote:
> We can start that conversation upstream, but Thomas and Boris -are you
> guys very much against this idea with the upstream kernel or would you
> be up for taking a sip from this can?

Well, I'm willing to hear your arguments for why you absolutely must
load microcode during runtime, without rebooting and cannot bundle a
microcode upgrade with a kernel upgrade and do a normal reboot during
scheduled system downtime.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 4/8] MDSv2 0
  2018-12-12 22:09     ` Andi Kleen
@ 2018-12-12 22:36       ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-12-12 22:36 UTC (permalink / raw)
  To: speck

On Wed, Dec 12, 2018 at 02:09:46PM -0800, speck for Andi Kleen wrote:
> On Wed, Dec 12, 2018 at 04:45:06PM -0500, speck for Konrad Rzeszutek Wilk wrote:
> > On Mon, Dec 10, 2018 at 09:53:36AM -0800, speck for Andi Kleen wrote:
> > > From: Andi Kleen <ak@linux.intel.com>
> > > Subject:  x86/speculation/mds: Add sysfs reporting
> > > 
> > > Report mds mitigation state in sysfs vulnerabilities.
> > > 
> > > Signed-off-by: Andi Kleen <ak@linux.intel.com>
> > > ---
> > >  .../ABI/testing/sysfs-devices-system-cpu         |  1 +
> > >  arch/x86/kernel/cpu/bugs.c                       | 16 ++++++++++++++++
> > >  drivers/base/cpu.c                               |  8 ++++++++
> > >  3 files changed, 25 insertions(+)
> > > 
> > > diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
> > > index 73318225a368..02b7bb711214 100644
> > > --- a/Documentation/ABI/testing/sysfs-devices-system-cpu
> > > +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
> > > @@ -477,6 +477,7 @@ What:		/sys/devices/system/cpu/vulnerabilities
> > >  		/sys/devices/system/cpu/vulnerabilities/spectre_v2
> > >  		/sys/devices/system/cpu/vulnerabilities/spec_store_bypass
> > >  		/sys/devices/system/cpu/vulnerabilities/l1tf
> > > +		/sys/devices/system/cpu/vulnerabilities/mds
> > >  Date:		January 2018
> > >  Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
> > >  Description:	Information about CPU vulnerabilities
> > > diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> > > index 8200b41b8db9..accab3279068 100644
> > > --- a/arch/x86/kernel/cpu/bugs.c
> > > +++ b/arch/x86/kernel/cpu/bugs.c
> > > @@ -1175,6 +1175,16 @@ static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr
> > >  		if (boot_cpu_has(X86_FEATURE_L1TF_PTEINV))
> > >  			return l1tf_show_state(buf);
> > >  		break;
> > > +
> > > +	case X86_BUG_MDS:
> > > +		/* Assumes Hypervisor exposed HT state to us if in guest */
> > > +		if (boot_cpu_has(X86_FEATURE_MB_CLEAR)) {
> > > +			if (cpu_smt_control != CPU_SMT_ENABLED)
> > > +				return sprintf(buf, "Mitigation: MB_CLEAR\n");
> > 
> > I am not sure if a normal system admin would understand what MB_CLEAR means. Is there a way to
> > make this more ..grokked?
> 
> You mean they are somehow unable to use Google?

Why do we want to have sharp corners? What is so wrong with making
it well rounded and self explanatory?

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 2/8] MDSv2 1
  2018-12-12 22:17           ` Borislav Petkov
@ 2018-12-12 22:40             ` Konrad Rzeszutek Wilk
  2018-12-12 22:45               ` Borislav Petkov
  0 siblings, 1 reply; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-12-12 22:40 UTC (permalink / raw)
  To: speck

On Wed, Dec 12, 2018 at 11:17:31PM +0100, speck for Borislav Petkov wrote:
> On Wed, Dec 12, 2018 at 04:31:48PM -0500, speck for Konrad Rzeszutek Wilk wrote:
> > We can start that conversation upstream, but Thomas and Boris -are you
> > guys very much against this idea with the upstream kernel or would you
> > be up for taking a sip from this can?
> 
> Well, I'm willing to hear your arguments for why you absolutely must
> load microcode during runtime, without rebooting and cannot bundle a
> microcode upgrade with a kernel upgrade and do a normal reboot during
> scheduled system downtime.

That is easy - "cloud" - you know $X billion market where everybody decides
that 99.99999% is needed which means you can only reboot for 1 second.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 2/8] MDSv2 1
  2018-12-12 22:40             ` Konrad Rzeszutek Wilk
@ 2018-12-12 22:45               ` Borislav Petkov
  2018-12-13 15:15                 ` Andrew Cooper
  0 siblings, 1 reply; 59+ messages in thread
From: Borislav Petkov @ 2018-12-12 22:45 UTC (permalink / raw)
  To: speck

On Wed, Dec 12, 2018 at 05:40:47PM -0500, speck for Konrad Rzeszutek Wilk wrote:
> That is easy - "cloud" - you know $X billion market where everybody decides
> that 99.99999% is needed which means you can only reboot for 1 second.

And clowd doesn't reboot machines to update kernels?

And clowd gets guarantees from CPU vendor that live microcode upgrade
works fine every time even while the core executes instructions and with
any microcode revision being upgraded to?

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 2/8] MDSv2 1
  2018-12-12 22:45               ` Borislav Petkov
@ 2018-12-13 15:15                 ` Andrew Cooper
  2018-12-13 16:52                   ` Borislav Petkov
  0 siblings, 1 reply; 59+ messages in thread
From: Andrew Cooper @ 2018-12-13 15:15 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 1265 bytes --]

On 12/12/2018 14:45, speck for Borislav Petkov wrote:
> On Wed, Dec 12, 2018 at 05:40:47PM -0500, speck for Konrad Rzeszutek Wilk wrote:
>> That is easy - "cloud" - you know $X billion market where everybody decides
>> that 99.99999% is needed which means you can only reboot for 1 second.
> And clowd doesn't reboot machines to update kernels?

No - not if they can possibly avoid it.

> And clowd gets guarantees from CPU vendor that live microcode upgrade
> works fine every time even while the core executes instructions and with
> any microcode revision being upgraded to?

Within the constraints of "quiesce the system with stop_machine()", yes.


I've got some customers to have said in no uncertain terms that they
want to reboot to get mitigations for these forthcoming issues.  Said
customers are large enough that Intel is currently engaged, and have
tentatively said that it is fine to load ucode in parallel on every core
during stop machine, rather than serially.  I'm expecting Intel to
propose this change in Linux as well as Xen.

I personally think everyone should reboot and call it done, but at the
end of the day, I'm beholden to my customers, and they really really do
want late load microcode to work.

~Andrew


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 2/8] MDSv2 1
  2018-12-13 15:15                 ` Andrew Cooper
@ 2018-12-13 16:52                   ` Borislav Petkov
  0 siblings, 0 replies; 59+ messages in thread
From: Borislav Petkov @ 2018-12-13 16:52 UTC (permalink / raw)
  To: speck

On Thu, Dec 13, 2018 at 07:15:29AM -0800, speck for Andrew Cooper wrote:
> No - not if they can possibly avoid it.

Well, from the engineers of the cloud company I know - probably the
biggest one - I hear a completely different story and they *do* have to
reboot.

And in the cases where they reboot, they can update microcode too. I'm
willing to bet a lot of beers that microcode updates come a lot seldomly
than kernel updates.

And if it weren't for the year of mitigations 2018, late loading
wouldnt've even been a topic at all. Because microcode updates are
*really* that seldom. Patch memory being really limited and already
carrying some other patches, comes to mind.

So the normal flow on the boxes out there is, at some point you get the
updated microcode package which gets installed onto the system.

The next time you update the kernel or you have to reboot the box, you
regenerate the initrd and you have your microcode update ready too.
Done. Problem solved and then some.

> Within the constraints of "quiesce the system with stop_machine()", yes.

Except that is not enough in every case:

static bool is_blacklisted(unsigned int cpu)
{
        struct cpuinfo_x86 *c = &cpu_data(cpu);

        /*
         * Late loading on model 79 with microcode revision less than 0x0b000021
         * and LLC size per core bigger than 2.5MB may result in a system hang.
         * This behavior is documented in item BDF90, #334165 (Intel Xeon
         * Processor E7-8800/4800 v4 Product Family).
         */
        if (c->x86 == 6 &&
            c->x86_model == INTEL_FAM6_BROADWELL_X &&
            c->x86_stepping == 0x01 &&
            llc_size_per_core > 2621440 &&
            c->microcode < 0x0b000021) {
                pr_err_once("Erratum BDF90: late loading with revision < 0x0b000021 (0x%x) disabled.\n", c->microcode);
                pr_err_once("Please consider either early loading through initrd/built-in or a potential BIOS update.\n");

Yap, that's an erratum for late loading. Those customers can't do late
loading because shit freezes the box. Good luck!

And why are we doing such a complex conditional, you ask? Well, we're
trying to salvage the situation because, yeah, late loading is just a
*bad* *bad* idea.

And this is the first erratum I'm aware of. I betcha more beers there
are other problems with late loading. The debian people did blacklist a
couple of microcode revisions for that reason too, for example.

> I've got some customers to have said in no uncertain terms that they
> want to reboot to get mitigations for these forthcoming issues.  Said
> customers are large enough that Intel is currently engaged, and have
> tentatively said that it is fine to load ucode in parallel on every
> core during stop machine, rather than serially.

Who said that, Intel or the customers? If it is Intel, see above. If
it is the customers, they are more than free to support their own late
loading.

> I'm expecting Intel to propose this change in Linux as well as Xen.

I have seen the "proposal".

> I personally think everyone should reboot and call it done,

Exactly.

> but at the end of the day, I'm beholden to my customers, and they
> really really do want late load microcode to work.

Huh? They want to reboot but they still want late microcode loading? I
have no idea what your argument here is.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 3/8] MDSv2 5
  2018-12-12 18:47                   ` Linus Torvalds
@ 2018-12-13 19:44                     ` Linus Torvalds
  2018-12-13 20:48                       ` Andi Kleen
  0 siblings, 1 reply; 59+ messages in thread
From: Linus Torvalds @ 2018-12-13 19:44 UTC (permalink / raw)
  To: speck

On Wed, Dec 12, 2018 at 10:47 AM Linus Torvalds wrote:
>
> So I'm at the Intel partner meeting right now, and the official
> statement seems to be that the software sequence really shouldn't be
> used. Nobody seems to want that to be used for anything but something
> really odd case.

Half the people on this list were probably at the meeting, so I'm not
sure it needs extending on, but just to clarify: the sw version is
known to not work in VM environments anyway, and really seems to be
more for "documentation" than any real use.

The vmvare people in particular seemed to really want the verw even if
the cpuid bits didn't indicate that it does anything (because they
have long lead-times to enable new CPUID bits), so their preferred
sequence is basically

    if (!(ia32_cap & ARCH_CAP_MDS_NO))
        verw();

so that it uses verw even if the cpuid bits don't say that it does anything.

This is actually fairly ok, because if the verw *really* doesn't do
anything, it also isn't very expensive (ie on the order of just 40
cycles instead of 400 cycles).  So I'd actually be inclined to say
"let's do that for the *initial* patches and back-ports" just to give
any vmware users the protection.

Longer term (ie after a few months), the proper sequence ends up
taking X86_FEATURE_MB_CLEAR into account, so this is basically just an
artificial "it takes a while for that feature to show up in a vmware
cluster even after the fixes have been applied" issue.

Note that vmware *really* really doesn't want that software sequence
in any case. Not only does it not work reliably in virtualization
anyway, it's likely that vmware cpu's *have* gotten the microcode
updates, they just don't show the cpuid bits. So the software version
is basically slower and less effective. Just don't do it.

Do we care hugely about the odd vmvare situation? Maybe not, but
considering the low cost of verw in the absence of new microcode, I do
think the tradeoff is worth it. verw only gets expensive when it
actually does something, and the problem is that in a vmware cloud
that cpuid bit just isn't reliable.

Side note: that fixes the late microcode update issue too for *this*
particular case. If cpuid isn't a gating issue for the verw use, it
just doesn't matter. Of course, Thomas has some good reasons to think
that late microcode updates are fundamentally broken for reasons that
have nothing to do with our alternate instruction case, but that's a
separate issue.

              Linus

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 3/8] MDSv2 5
  2018-12-13 19:44                     ` Linus Torvalds
@ 2018-12-13 20:48                       ` Andi Kleen
  2018-12-13 20:56                         ` Linus Torvalds
  2018-12-15  0:30                         ` Andi Kleen
  0 siblings, 2 replies; 59+ messages in thread
From: Andi Kleen @ 2018-12-13 20:48 UTC (permalink / raw)
  To: speck

> This is actually fairly ok, because if the verw *really* doesn't do
> anything, it also isn't very expensive (ie on the order of just 40
> cycles instead of 400 cycles).  So I'd actually be inclined to say
> "let's do that for the *initial* patches and back-ports" just to give
> any vmware users the protection.

I think that's fine

40 cycles will be hard to measure even in the most extreme 
syscall micro benchmarks, and is unlikely to impact
anything real.

> Longer term (ie after a few months), the proper sequence ends up
> taking X86_FEATURE_MB_CLEAR into account, so this is basically just an
> artificial "it takes a while for that feature to show up in a vmware
> cluster even after the fixes have been applied" issue.

Seems odd to somehow tie our release cycle to VMware.

This is something the distribution vendors would need to do
anyways, as VMWare is mostly used in Enterprise environments
which are likely mostly on SLES, RHEL and somesuch, and
are unlikely to run raw kernel.org kernels.

*Iff* we really care about this oddity, it would be best to ignore
MB_CLEAR forever.

AFAIK in KVM software sequences should work, as long as the administrator
does not set a totally bogus CPUID (like no AVX on Sandy Bridge,
or AVX with a CPU model that doesn't support AVX). We can
add some documentation to make this clear.

> 
> Note that vmware *really* really doesn't want that software sequence
> in any case. Not only does it not work reliably in virtualization
> anyway, it's likely that vmware cpu's *have* gotten the microcode
> updates, they just don't show the cpuid bits. So the software version
> is basically slower and less effective. Just don't do it.

I don't think it has been worked out yet, but the reason
I added the software sequences is that some CPUs may not
be able to do the VERW in microcode updates.

In this case the only choice is to use software sequences,
and VMware will have to deal with it somehow.

But I can remove them again until that is settled.

-Andi

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 3/8] MDSv2 5
  2018-12-13 20:48                       ` Andi Kleen
@ 2018-12-13 20:56                         ` Linus Torvalds
  2018-12-15  0:30                         ` Andi Kleen
  1 sibling, 0 replies; 59+ messages in thread
From: Linus Torvalds @ 2018-12-13 20:56 UTC (permalink / raw)
  To: speck

On Thu, Dec 13, 2018 at 12:50 PM speck for Andi Kleen
<speck@linutronix.de> wrote:
>
> I don't think it has been worked out yet, but the reason
> I added the software sequences is that some CPUs may not
> be able to do the VERW in microcode updates.

Yes. But it really sounded that nobody cared, and people actually
argued against it (ei particularly the vmware case).

Or rather, the people who cared were pushing for the micorocode
updates to be extended back to Westmere Xeon, rather than use a sw
sequence.

Most people that run hardware that is basically 10+ years old by now
won't be upgrading kernels _or_ microcode, so us worrying about them
sounds pretty much pointless.

                    Linus

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [MODERATED] Re: [PATCH v2 3/8] MDSv2 5
  2018-12-13 20:48                       ` Andi Kleen
  2018-12-13 20:56                         ` Linus Torvalds
@ 2018-12-15  0:30                         ` Andi Kleen
  1 sibling, 0 replies; 59+ messages in thread
From: Andi Kleen @ 2018-12-15  0:30 UTC (permalink / raw)
  To: speck

On Thu, Dec 13, 2018 at 12:48:28PM -0800, speck for Andi Kleen wrote:
> > This is actually fairly ok, because if the verw *really* doesn't do
> > anything, it also isn't very expensive (ie on the order of just 40
> > cycles instead of 400 cycles).  So I'd actually be inclined to say
> > "let's do that for the *initial* patches and back-ports" just to give
> > any vmware users the protection.
> 
> I think that's fine
> 
> 40 cycles will be hard to measure even in the most extreme 
> syscall micro benchmarks, and is unlikely to impact
> anything real.

I looked into this.

One problem that if we ignore MB_CLEAR there is no way to tell the user
whether they are mitigated or not in 

/sys/devices/system/cpu/vulnerabilities/mds

I fear while it would cause a lot of confusion.

So even if VMWare users were fixed this way there would be no
way for them to find out in Linux. And for other non VMWare
users there would be also no way to find out, which would
be quite bad.

Based on that I think we should use MB_CLEAR as in the original
patches.

Potentially have some kind of VMWare specific quirk to override 
it, but it still seems fairly bogus.

-Andi

^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2018-12-15  0:30 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-10 17:53 [MODERATED] [PATCH v2 0/8] MDSv2 8 Andi Kleen
2018-12-10 17:53 ` [MODERATED] [PATCH v2 1/8] MDSv2 4 Andi Kleen
2018-12-11 14:14   ` [MODERATED] " Paolo Bonzini
2018-12-12 21:22   ` Konrad Rzeszutek Wilk
2018-12-12 21:28     ` Andi Kleen
2018-12-12 21:25   ` Konrad Rzeszutek Wilk
2018-12-10 17:53 ` [MODERATED] [PATCH v2 2/8] MDSv2 1 Andi Kleen
2018-12-10 22:49   ` [MODERATED] " Jiri Kosina
2018-12-11  0:03     ` Andi Kleen
2018-12-11  0:13     ` Kanth Ghatraju
2018-12-11  2:00       ` Andi Kleen
2018-12-11  5:36       ` Jiri Kosina
2018-12-11 10:03       ` Borislav Petkov
2018-12-12 21:31         ` Konrad Rzeszutek Wilk
2018-12-12 21:43           ` Andi Kleen
2018-12-12 22:17           ` Borislav Petkov
2018-12-12 22:40             ` Konrad Rzeszutek Wilk
2018-12-12 22:45               ` Borislav Petkov
2018-12-13 15:15                 ` Andrew Cooper
2018-12-13 16:52                   ` Borislav Petkov
2018-12-10 17:53 ` [MODERATED] [PATCH v2 3/8] MDSv2 5 Andi Kleen
2018-12-10 23:00   ` [MODERATED] " Linus Torvalds
2018-12-11  0:03     ` Andi Kleen
2018-12-11  0:43       ` Linus Torvalds
2018-12-11  1:33         ` Linus Torvalds
2018-12-11  2:12           ` Andi Kleen
2018-12-11  2:20           ` Linus Torvalds
2018-12-11  3:25             ` Andi Kleen
2018-12-11 17:55               ` Linus Torvalds
2018-12-11 18:10                 ` Borislav Petkov
2018-12-11 18:21                 ` Linus Torvalds
2018-12-11 18:26                   ` Borislav Petkov
2018-12-11 19:47                   ` Andi Kleen
2018-12-11 21:22                   ` Thomas Gleixner
2018-12-12 14:02               ` [MODERATED] " Paolo Bonzini
2018-12-12 17:58                 ` Andi Kleen
2018-12-12 18:47                   ` Linus Torvalds
2018-12-13 19:44                     ` Linus Torvalds
2018-12-13 20:48                       ` Andi Kleen
2018-12-13 20:56                         ` Linus Torvalds
2018-12-15  0:30                         ` Andi Kleen
2018-12-11  2:10         ` Andi Kleen
2018-12-11  0:09     ` Andrew Cooper
2018-12-10 17:53 ` [MODERATED] [PATCH v2 4/8] MDSv2 0 Andi Kleen
2018-12-12 21:45   ` [MODERATED] " Konrad Rzeszutek Wilk
2018-12-12 22:09     ` Andi Kleen
2018-12-12 22:36       ` Konrad Rzeszutek Wilk
2018-12-10 17:53 ` [MODERATED] [PATCH v2 5/8] MDSv2 7 Andi Kleen
2018-12-11  0:33   ` [MODERATED] " Andrew Cooper
2018-12-12 18:05     ` Andrew Cooper
2018-12-12 21:41   ` Konrad Rzeszutek Wilk
2018-12-12 22:12     ` Andi Kleen
2018-12-10 17:53 ` [MODERATED] [PATCH v2 6/8] MDSv2 3 Andi Kleen
2018-12-11  0:37   ` [MODERATED] " Andrew Cooper
2018-12-11  0:46     ` Luck, Tony
2018-12-11  1:02       ` Andrew Cooper
2018-12-11  1:53       ` Andi Kleen
2018-12-10 17:53 ` [MODERATED] [PATCH v2 7/8] MDSv2 6 Andi Kleen
2018-12-10 17:53 ` [MODERATED] [PATCH v2 8/8] MDSv2 2 Andi Kleen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.