All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch 0/8] MDS basics 0
@ 2019-02-19 12:44 Thomas Gleixner
  2019-02-19 12:44 ` [patch 1/8] MDS basics 1 Thomas Gleixner
                   ` (10 more replies)
  0 siblings, 11 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-02-19 12:44 UTC (permalink / raw)
  To: speck

Subject: [patch 0/8] MDS basics
From: Thomas Gleixner <tglx@linutronix.de>

Hi!

I got the following information yesterday night:

  "All - FYI.  There has been some chatter/ discussion on the subject.
   Hopefully this note will help clarify.  We received a report from a
   researcher who independently identified what we formerly referred to as
   PSF (aka Microarchitectural Store Buffer Data Sampling).  There were
   some initial indications (this week) this researcher would elect to
   release a paper publicly PRIOR to the May 14 embargo was lifted.

   We have been working closely with them, and it appears for now that will
   NOT be the case.  Were that to happen however, we DID begin prepping
   materials to disclose PSF ONLY.  I.e. we would disclose only that
   particular issue after having consulted with this team.  This includes a
   modified/ reduced section of the existing whitepaper, press statement
   and standard security advisory language.  We are finalizing this
   material and will then hold it in reserve.

   As we have done in the past, we would convene a meeting of reps from
   this group before activating those assets.  I will keep you apprised of
   any change in the situation, and can provide those assets for your use/
   adaptation once finalized."

This was posted on that keybase io chat on friday night and of course not
made available to those who are not part of that. Even people who are
subscribed there missed the message because it scrolled away due to
other chit/chat.

Now we maybe got lucky this time, but I wouldn't hold my breath as the
propability that other people will figure that out as well is surely way
larger than 0.

If that happens, then it makes exactly ZERO sense to expose only the
MSBDS part as everything else is lumped together with this. But why am
I still trying to make sense of all this?

So while being grumpy about this communication fail, I'm even more
grumpy about the fact, that we don't have even the minimal full/off
mitigation in place in a workable form. I asked specifically for this
weeks ago just for the case that the embargo breaks early so we don't
stand there with pants down.

So being grumpy as hell made me sit down and write the basic
mitigation implementation myself (again).

It reuses a single patch from that Intel pile which is defining the
bug and MSR bits. Guess what, it took me less than 4 hours to do so
and another 2 hours in the morning to write at least the basic admin
documentation. The latter surely needs some work still, but I wanted
to get the patches out. There is also another TODO mentioned further
down.

The series comes with:

  - A consistent command line interface

  - A consistent sysfs interface

  - Static key based control for the exit to user and idle invocations

  - Dynamic update of the idle invocation key according to the actual SMT
    state similar to the STIPB update.

  - Idle invocations are inside the halt/mwait inlines and not randomly
    sprinkled all over the kernel tree.

It builds and boots and while I was able to emit the VERW instruction by
hacking the mitigation selection to omit the MD_CLEAR supported check, I
have no access to real hardware with updated micro code.

This is how it should have looked from the very beginning and the extra
bits and pieces (cond mode) can be built on top of it. Please review and
give it a testride when you have a machine with updated microcode
available.

The lot is also available from the speck git tree in the WIP.mds
branch.

Note, that I moved the L!TF document to a separate folder so the hw
vulnerabilities are not showing up at the top level index of the admin
guide as separate items. Should have thought about that back then
already...

TODO: 

For CPUs which are not affected by L1TF but are affected by MDS there
needs to be CPU buffer clear mitigation at VMENTER.  That applies at
least to XEON PHI, SILVERMONT and AIRMONT and probably to some of the
newer models which have RDCL_NO set.

Thanks,

	tglx

8<-----------------------
 Documentation/ABI/testing/sysfs-devices-system-cpu |    1 
 Documentation/admin-guide/hw-vuln/index.rst        |   13 +
 Documentation/admin-guide/hw-vuln/l1tf.rst         |    1 
 Documentation/admin-guide/hw-vuln/mds.rst          |  230 +++++++++++++++++++++
 Documentation/admin-guide/index.rst                |    6 
 Documentation/admin-guide/kernel-parameters.txt    |   27 ++
 arch/x86/entry/common.c                            |    3 
 arch/x86/include/asm/cpufeatures.h                 |    2 
 arch/x86/include/asm/irqflags.h                    |    4 
 arch/x86/include/asm/msr-index.h                   |    5 
 arch/x86/include/asm/mwait.h                       |    7 
 arch/x86/include/asm/nospec-branch.h               |   22 ++
 arch/x86/include/asm/processor.h                   |    6 
 arch/x86/kernel/cpu/bugs.c                         |  102 +++++++++
 arch/x86/kernel/cpu/common.c                       |   13 +
 drivers/base/cpu.c                                 |    6 
 include/linux/cpu.h                                |    2 
 17 files changed, 443 insertions(+), 7 deletions(-)

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 1/8] MDS basics 1
  2019-02-19 12:44 [patch 0/8] MDS basics 0 Thomas Gleixner
@ 2019-02-19 12:44 ` Thomas Gleixner
  2019-02-19 14:00   ` [MODERATED] " Borislav Petkov
  2019-02-19 12:44 ` [patch 2/8] MDS basics 2 Thomas Gleixner
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 42+ messages in thread
From: Thomas Gleixner @ 2019-02-19 12:44 UTC (permalink / raw)
  To: speck

Subject: [patch 1/8] x86/speculation/mds: Add basic bug infrastructure for MDS
From: Andi Kleen <ak@linux.intel.com>

Microarchitectural Data Sampling (MDS), is a class of side channel attacks
on internal buffers in Intel CPUs. The variants are:

 - Microarchitectural Store Buffer Data Sampling (MSBDS) (CVE-2018-12126)
 - Microarchitectural Fill Buffer Data Sampling (MFBDS) (CVE-2018-12130)
 - Microarchitectural Load Port Data (MLPDS) (CVE-2018-12127)

MSBDS leaks Store Buffer Entries which can be speculatively forwarded to a
dependent load (store-load forwarding) as an optimization. The forward can
also happen to a faulting or assisting load operation for a different
memory address, which can be exploited under certain conditions. Store
buffers are partitionened between Hyper-Threads so cross thread forwarding
is not possible. But if a thread enters or exits a sleep state the store
buffer is repartioned which can expose data from one thread to the other.

MFBDS leaks Fill Buffer Entries. Fill buffers are used internally to manage
L1 miss situations and to hold data which is returned or sent in response
to a memory or I/O operation. Fill buffers can forward data to a load
operation and also write data to the cache. When the fill buffer is
deallocated it can retain the stale data of the preceeding operations which
can then be forwarded to a faulting or assisting load operation, which can
be exploited under certain conditions. Fill buffers are shared between
Hyper-Threads so cross thread leakage is possible.

MLDPS leaks Load Port Data. Load ports are used to perform load operations
from memory or I/O. The received data is then forwarded to the register
file or a subsequent operation. In some implementations the Load Port can
contain stale data from a previous operation which can be forwarded to
faulting or assisting loads under certain conditions, which again can be
exploited eventually. Load poorts are shared between Hyper-Threads so cross
thread leakage is possible.

All variants have the same mitigation for single CPU thread case (SMT off),
so the kernel can treat them as one MDS issue.

Add the basic infrastructure to detect if the current CPU is affected by
MDS.

[ tglx: Rewrote changelog ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/include/asm/cpufeatures.h |    2 ++
 arch/x86/include/asm/msr-index.h   |    5 +++++
 arch/x86/kernel/cpu/common.c       |   13 +++++++++++++
 3 files changed, 20 insertions(+)

--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -344,6 +344,7 @@
 /* Intel-defined CPU features, CPUID level 0x00000007:0 (EDX), word 18 */
 #define X86_FEATURE_AVX512_4VNNIW	(18*32+ 2) /* AVX-512 Neural Network Instructions */
 #define X86_FEATURE_AVX512_4FMAPS	(18*32+ 3) /* AVX-512 Multiply Accumulation Single precision */
+#define X86_FEATURE_MD_CLEAR		(18*32+10) /* VERW flushs CPU state */
 #define X86_FEATURE_PCONFIG		(18*32+18) /* Intel PCONFIG */
 #define X86_FEATURE_SPEC_CTRL		(18*32+26) /* "" Speculation Control (IBRS + IBPB) */
 #define X86_FEATURE_INTEL_STIBP		(18*32+27) /* "" Single Thread Indirect Branch Predictors */
@@ -381,5 +382,6 @@
 #define X86_BUG_SPECTRE_V2		X86_BUG(16) /* CPU is affected by Spectre variant 2 attack with indirect branches */
 #define X86_BUG_SPEC_STORE_BYPASS	X86_BUG(17) /* CPU is affected by speculative store bypass attack */
 #define X86_BUG_L1TF			X86_BUG(18) /* CPU is affected by L1 Terminal Fault */
+#define X86_BUG_MDS			X86_BUG(19) /* CPU is affected by Microarchitectural data sampling */
 
 #endif /* _ASM_X86_CPUFEATURES_H */
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -77,6 +77,11 @@
 						    * attack, so no Speculative Store Bypass
 						    * control required.
 						    */
+#define ARCH_CAP_MDS_NO			(1 << 5)   /*
+						    * Not susceptible to
+						    * Microarchitectural Data
+						    * Sampling (MDS) vulnerabilities.
+						    */
 
 #define MSR_IA32_FLUSH_CMD		0x0000010b
 #define L1D_FLUSH			(1 << 0)   /*
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -998,6 +998,14 @@ static const __initconst struct x86_cpu_
 	{}
 };
 
+static const __initconst struct x86_cpu_id cpu_no_mds[] = {
+	/* in addition to cpu_no_speculation */
+	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_ATOM_GOLDMONT	},
+	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_ATOM_GOLDMONT_X	},
+	{ X86_VENDOR_INTEL,	6,	INTEL_FAM6_ATOM_GOLDMONT_PLUS	},
+	{}
+};
+
 static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
 {
 	u64 ia32_cap = 0;
@@ -1019,6 +1027,11 @@ static void __init cpu_set_bug_bits(stru
 	if (ia32_cap & ARCH_CAP_IBRS_ALL)
 		setup_force_cpu_cap(X86_FEATURE_IBRS_ENHANCED);
 
+	if ((boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
+	    !x86_match_cpu(cpu_no_mds)) &&
+	    !(ia32_cap & ARCH_CAP_MDS_NO))
+		setup_force_cpu_bug(X86_BUG_MDS);
+
 	if (x86_match_cpu(cpu_no_meltdown))
 		return;
 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 2/8] MDS basics 2
  2019-02-19 12:44 [patch 0/8] MDS basics 0 Thomas Gleixner
  2019-02-19 12:44 ` [patch 1/8] MDS basics 1 Thomas Gleixner
@ 2019-02-19 12:44 ` Thomas Gleixner
  2019-02-19 12:44 ` [patch 3/8] MDS basics 3 Thomas Gleixner
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-02-19 12:44 UTC (permalink / raw)
  To: speck

Subject: [patch 2/8] x86/speculation/mds: Add mds_clear_cpu_buffer()
From: Thomas Gleixner <tglx@linutronix.de>

The Microarchitectural Data Sampling (MDS) vulernabilities are mitigated by
clearing the affected CPU buffers. The mechanism for clearing the buffers
uses the unused and obsolete VERW instruction in combination with a
microcode update which triggers a CPU buffer clear when VERW is executed.

Provide a inline function with the assembly magic. The argument of the VERW
instruction must be a memory operand.

The function takes a pointer to a static key, so different call sites can
depend on different static keys for controlling the invocation. This avoids
the conditionals at the call sites and allows for fine grained control,
e.g. the SMT only CPU buffer clearing on idle entry can be enabled
independent of the exit to user space clear.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/nospec-branch.h |   19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -314,6 +314,25 @@ do {									\
 	preempt_enable();						\
 } while (0)
 
+#include <asm/segment.h>
+
+/**
+ * mds_clear_cpu_buffers - Mitigation for MDS vulnerability
+ *
+ * This uses the otherwise unused and obsolete VERW instruction in
+ * combination with microcode which triggers a CPU buffer flush when the
+ * instruction is executed.
+ */
+static inline void mds_clear_cpu_buffers(struct static_key_false *key)
+{
+	if (static_branch_likely(key)) {
+		static const u16 ds = __KERNEL_DS;
+
+		/* Has to be memory form, don't modify to use a register */
+		asm volatile("verw %[ds]" : : "i" (0), [ds] "m" (ds));
+	}
+}
+
 DECLARE_STATIC_KEY_FALSE(switch_to_cond_stibp);
 DECLARE_STATIC_KEY_FALSE(switch_mm_cond_ibpb);
 DECLARE_STATIC_KEY_FALSE(switch_mm_always_ibpb);

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 3/8] MDS basics 3
  2019-02-19 12:44 [patch 0/8] MDS basics 0 Thomas Gleixner
  2019-02-19 12:44 ` [patch 1/8] MDS basics 1 Thomas Gleixner
  2019-02-19 12:44 ` [patch 2/8] MDS basics 2 Thomas Gleixner
@ 2019-02-19 12:44 ` Thomas Gleixner
  2019-02-19 16:04   ` [MODERATED] " Andi Kleen
  2019-02-19 12:44 ` [patch 4/8] MDS basics 4 Thomas Gleixner
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 42+ messages in thread
From: Thomas Gleixner @ 2019-02-19 12:44 UTC (permalink / raw)
  To: speck

Subject: [patch 3/8] x86/speculation/mds: Clear CPU buffers on exit to user
From: Thomas Gleixner <tglx@linutronix.de>

Add a static key which controls the invocation of the CPU buffer clear
mechanism on exit to user space and add the call into
prepare_exit_to_usermode() right before actually returning.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/entry/common.c              |    3 +++
 arch/x86/include/asm/nospec-branch.h |    2 ++
 arch/x86/kernel/cpu/bugs.c           |    4 +++-
 3 files changed, 8 insertions(+), 1 deletion(-)

--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -31,6 +31,7 @@
 #include <asm/vdso.h>
 #include <linux/uaccess.h>
 #include <asm/cpufeature.h>
+#include <asm/nospec-branch.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/syscalls.h>
@@ -211,6 +212,8 @@ static void exit_to_usermode_loop(struct
 	ti->status &= ~(TS_COMPAT|TS_I386_REGS_POKED);
 #endif
 
+	mds_clear_cpu_buffers(&user_mds_clear_cpu_buffers);
+
 	user_enter_irqoff();
 }
 
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -337,6 +337,8 @@ DECLARE_STATIC_KEY_FALSE(switch_to_cond_
 DECLARE_STATIC_KEY_FALSE(switch_mm_cond_ibpb);
 DECLARE_STATIC_KEY_FALSE(switch_mm_always_ibpb);
 
+DECLARE_STATIC_KEY_FALSE(user_mds_clear_cpu_buffers);
+
 #endif /* __ASSEMBLY__ */
 
 /*
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -63,10 +63,12 @@ DEFINE_STATIC_KEY_FALSE(switch_mm_cond_i
 /* Control unconditional IBPB in switch_mm() */
 DEFINE_STATIC_KEY_FALSE(switch_mm_always_ibpb);
 
+/* Control MDS CPU buffer clear before returning to user space */
+DEFINE_STATIC_KEY_FALSE(user_mds_clear_cpu_buffers);
+
 void __init check_bugs(void)
 {
 	identify_boot_cpu();
-
 	/*
 	 * identify_boot_cpu() initialized SMT support information, let the
 	 * core code know.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 4/8] MDS basics 4
  2019-02-19 12:44 [patch 0/8] MDS basics 0 Thomas Gleixner
                   ` (2 preceding siblings ...)
  2019-02-19 12:44 ` [patch 3/8] MDS basics 3 Thomas Gleixner
@ 2019-02-19 12:44 ` Thomas Gleixner
  2019-02-19 13:54   ` [MODERATED] " Andrew Cooper
  2019-02-19 16:07   ` Andi Kleen
  2019-02-19 12:44 ` [patch 5/8] MDS basics 5 Thomas Gleixner
                   ` (6 subsequent siblings)
  10 siblings, 2 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-02-19 12:44 UTC (permalink / raw)
  To: speck

Subject: [patch 4/8] x86/speculation/mds: Conditionaly clear CPU buffers on idle entry
From: Thomas Gleixner <tglx@linutronix.de>

Add a static key which controls the invocation of the CPU buffer clear
mechanism on idle entry. This is independent of other MDS mitigations
because the idle entry invocation to mitigate the potential leakage due to
store buffer repartitioning is only necessary on SMT systems.

Add the actual invocations to the different halt/mwait variants which
covers all usage sites. mwaitx is not patched as it's not available on
Intel CPUs.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/irqflags.h      |    4 ++++
 arch/x86/include/asm/mwait.h         |    7 +++++++
 arch/x86/include/asm/nospec-branch.h |    1 +
 arch/x86/kernel/cpu/bugs.c           |    2 ++
 4 files changed, 14 insertions(+)

--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -6,6 +6,8 @@
 
 #ifndef __ASSEMBLY__
 
+#include <asm/nospec-branch.h>
+
 /* Provide __cpuidle; we can't safely include <linux/cpu.h> */
 #define __cpuidle __attribute__((__section__(".cpuidle.text")))
 
@@ -54,11 +56,13 @@ static inline void native_irq_enable(voi
 
 static inline __cpuidle void native_safe_halt(void)
 {
+	mds_clear_cpu_buffers(&idle_mds_clear_cpu_buffers);
 	asm volatile("sti; hlt": : :"memory");
 }
 
 static inline __cpuidle void native_halt(void)
 {
+	mds_clear_cpu_buffers(&idle_mds_clear_cpu_buffers);
 	asm volatile("hlt": : :"memory");
 }
 
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -6,6 +6,7 @@
 #include <linux/sched/idle.h>
 
 #include <asm/cpufeature.h>
+#include <asm/nospec-branch.h>
 
 #define MWAIT_SUBSTATE_MASK		0xf
 #define MWAIT_CSTATE_MASK		0xf
@@ -40,6 +41,8 @@ static inline void __monitorx(const void
 
 static inline void __mwait(unsigned long eax, unsigned long ecx)
 {
+	mds_clear_cpu_buffers(&idle_mds_clear_cpu_buffers);
+
 	/* "mwait %eax, %ecx;" */
 	asm volatile(".byte 0x0f, 0x01, 0xc9;"
 		     :: "a" (eax), "c" (ecx));
@@ -74,6 +77,8 @@ static inline void __mwait(unsigned long
 static inline void __mwaitx(unsigned long eax, unsigned long ebx,
 			    unsigned long ecx)
 {
+	/* No MDS buffer clear as this is AMD/HYGON only */
+
 	/* "mwaitx %eax, %ebx, %ecx;" */
 	asm volatile(".byte 0x0f, 0x01, 0xfb;"
 		     :: "a" (eax), "b" (ebx), "c" (ecx));
@@ -81,6 +86,8 @@ static inline void __mwaitx(unsigned lon
 
 static inline void __sti_mwait(unsigned long eax, unsigned long ecx)
 {
+	mds_clear_cpu_buffers(&idle_mds_clear_cpu_buffers);
+
 	trace_hardirqs_on();
 	/* "mwait %eax, %ecx;" */
 	asm volatile("sti; .byte 0x0f, 0x01, 0xc9;"
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -338,6 +338,7 @@ DECLARE_STATIC_KEY_FALSE(switch_mm_cond_
 DECLARE_STATIC_KEY_FALSE(switch_mm_always_ibpb);
 
 DECLARE_STATIC_KEY_FALSE(user_mds_clear_cpu_buffers);
+DECLARE_STATIC_KEY_FALSE(idle_mds_clear_cpu_buffers);
 
 #endif /* __ASSEMBLY__ */
 
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -65,6 +65,8 @@ DEFINE_STATIC_KEY_FALSE(switch_mm_always
 
 /* Control MDS CPU buffer clear before returning to user space */
 DEFINE_STATIC_KEY_FALSE(user_mds_clear_cpu_buffers);
+/* Control MDS CPU buffer clear before idling (halt, mwait) */
+DEFINE_STATIC_KEY_FALSE(idle_mds_clear_cpu_buffers);
 
 void __init check_bugs(void)
 {

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 5/8] MDS basics 5
  2019-02-19 12:44 [patch 0/8] MDS basics 0 Thomas Gleixner
                   ` (3 preceding siblings ...)
  2019-02-19 12:44 ` [patch 4/8] MDS basics 4 Thomas Gleixner
@ 2019-02-19 12:44 ` Thomas Gleixner
  2019-02-19 15:07   ` Thomas Gleixner
  2019-02-19 16:03   ` [MODERATED] " Andi Kleen
  2019-02-19 12:44 ` [patch 6/8] MDS basics 6 Thomas Gleixner
                   ` (5 subsequent siblings)
  10 siblings, 2 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-02-19 12:44 UTC (permalink / raw)
  To: speck

Subject: [patch 5/8] x86/speculation/mds: Add mitigation control for MDS
From: Thomas Gleixner <tglx@linutronix.de>

Now that the mitigations are in place, add a command line parameter to
control the mitigation, a mitigation selector function and a SMT update
mechanism.

This is the minimal straight forward initial implementation which just
provides an always on/off mode. The command line parameter is:

  mds=[full|off|auto]

This is consistent with the existing mitigations for other speculative
hardware vulnerabilities.

The idle invocation is dynamically updated according to the SMT state of
the system similar to the dynamic update of the STIBP mitigation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 Documentation/admin-guide/kernel-parameters.txt |   27 ++++++++
 arch/x86/include/asm/processor.h                |    6 +
 arch/x86/kernel/cpu/bugs.c                      |   76 ++++++++++++++++++++++++
 3 files changed, 109 insertions(+)

--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2362,6 +2362,33 @@
 			Format: <first>,<last>
 			Specifies range of consoles to be captured by the MDA.
 
+	mds=		[X86,INTEL]
+			Control mitigation for the Micro-architectural Data
+			Sampling (MDS) vulnerability.
+
+			Certain CPUs are vulnerable to an exploit against CPU
+			internal buffers which can forward information to a
+			disclosure gadget under certain conditions.
+
+			In vulnerable processors, the speculatively
+			forwarded data can be used in a cache side channel
+			attack, to access data to which the attacker does
+			not have direct access.
+
+			This parameter controls the MDS mitigation. The the
+			options are:
+
+			full    - Unconditionally enable MDS mitigation
+			off     - Unconditionally disable MDS mitigation
+			auto    - Kernel detects whether the CPU model is
+				  vulnerable to MDS and picks the most
+				  appropriate mitigation. If the CPU is not
+				  vulnerable, "off" is selected. If the CPU
+				  is vulnerable "full" is selected.
+
+			Not specifying this option is equivalent to
+			mds=auto.
+
 	mem=nn[KMG]	[KNL,BOOT] Force usage of a specific amount of memory
 			Amount of memory to be used when the kernel is not able
 			to see the whole system memory or for test.
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -991,4 +991,10 @@ enum l1tf_mitigations {
 
 extern enum l1tf_mitigations l1tf_mitigation;
 
+enum mds_mitigations {
+	MDS_MITIGATION_OFF,
+	MDS_MITIGATION_AUTO,
+	MDS_MITIGATION_FULL,
+};
+
 #endif /* _ASM_X86_PROCESSOR_H */
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -37,6 +37,7 @@
 static void __init spectre_v2_select_mitigation(void);
 static void __init ssb_select_mitigation(void);
 static void __init l1tf_select_mitigation(void);
+static void __init mds_select_mitigation(void);
 
 /* The base value of the SPEC_CTRL MSR that always has to be preserved. */
 u64 x86_spec_ctrl_base;
@@ -105,6 +106,8 @@ void __init check_bugs(void)
 
 	l1tf_select_mitigation();
 
+	mds_select_mitigation();
+
 #ifdef CONFIG_X86_32
 	/*
 	 * Check whether we are able to run this kernel safely on SMP.
@@ -211,6 +214,59 @@ static void x86_amd_ssb_disable(void)
 }
 
 #undef pr_fmt
+#define pr_fmt(fmt)	"MDS: " fmt
+
+/* Default mitigation for L1TF-affected CPUs */
+static enum mds_mitigations mds_mitigation __ro_after_init = MDS_MITIGATION_AUTO;
+
+static const char * const mds_strings[] = {
+	[MDS_MITIGATION_OFF]	= "Vulnerable",
+	[MDS_MITIGATION_FULL]	= "Mitigation: Clear CPU buffers"
+};
+
+static void mds_select_mitigation(void)
+{
+	if (!boot_cpu_has_bug(X86_BUG_MDS)) {
+		mds_mitigation = MDS_MITIGATION_OFF;
+		return;
+	}
+
+	switch (mds_mitigation) {
+	case MDS_MITIGATION_OFF:
+		break;
+	case MDS_MITIGATION_AUTO:
+	case MDS_MITIGATION_FULL:
+		if (boot_cpu_has(X86_FEATURE_MD_CLEAR)) {
+			mds_mitigation = MDS_MITIGATION_FULL;
+			static_branch_enable(&user_mds_clear_cpu_buffers);
+		} else {
+			mds_mitigation = MDS_MITIGATION_OFF;
+		}
+		break;
+	}
+	pr_info("%s\n", mds_strings[mds_mitigation]);
+}
+
+static int __init mds_cmdline(char *str)
+{
+	if (!boot_cpu_has_bug(X86_BUG_MDS))
+		return 0;
+
+	if (!str)
+		return -EINVAL;
+
+	if (!strcmp(str, "off"))
+		mds_mitigation = MDS_MITIGATION_OFF;
+	else if (!strcmp(str, "auto"))
+		mds_mitigation = MDS_MITIGATION_AUTO;
+	else if (!strcmp(str, "full"))
+		mds_mitigation = MDS_MITIGATION_FULL;
+
+	return 0;
+}
+early_param("mds", mds_cmdline);
+
+#undef pr_fmt
 #define pr_fmt(fmt)     "Spectre V2 : " fmt
 
 static enum spectre_v2_mitigation spectre_v2_enabled __ro_after_init =
@@ -614,6 +670,15 @@ static void update_indir_branch_cond(voi
 		static_branch_disable(&switch_to_cond_stibp);
 }
 
+/* Update the static key controlling the MDS CPU buffer clear in idle */
+static void update_mds_branch_idle(void)
+{
+	if (sched_smt_active())
+		static_branch_enable(&user_mds_clear_cpu_buffers);
+	else
+		static_branch_disable(&user_mds_clear_cpu_buffers);
+}
+
 void arch_smt_update(void)
 {
 	/* Enhanced IBRS implies STIBP. No update required. */
@@ -635,6 +700,17 @@ void arch_smt_update(void)
 		break;
 	}
 
+	switch (mds_mitigation) {
+	case MDS_MITIGATION_OFF:
+		break;
+	case MDS_MITIGATION_FULL:
+		update_mds_branch_idle();
+		break;
+	/* Keep GCC happy */
+	case MDS_MITIGATION_AUTO:
+		break;
+	}
+
 	mutex_unlock(&spec_ctrl_mutex);
 }
 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 6/8] MDS basics 6
  2019-02-19 12:44 [patch 0/8] MDS basics 0 Thomas Gleixner
                   ` (4 preceding siblings ...)
  2019-02-19 12:44 ` [patch 5/8] MDS basics 5 Thomas Gleixner
@ 2019-02-19 12:44 ` Thomas Gleixner
  2019-02-19 12:44 ` [patch 7/8] MDS basics 7 Thomas Gleixner
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-02-19 12:44 UTC (permalink / raw)
  To: speck

Subject: [patch 6/8] x86/speculation/mds: Add sysfs reporting for MDS
From: Thomas Gleixner <tglx@linutronix.de>

Add the sysfs reporting file for MDS. It exposes the vulnerability and
mitigation state similar to the existing files for the other speculative
hardware vulnerabilities.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 Documentation/ABI/testing/sysfs-devices-system-cpu |    1 +
 arch/x86/kernel/cpu/bugs.c                         |   20 ++++++++++++++++++++
 drivers/base/cpu.c                                 |    6 ++++--
 include/linux/cpu.h                                |    2 ++
 4 files changed, 27 insertions(+), 2 deletions(-)

--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -484,6 +484,7 @@ What:		/sys/devices/system/cpu/vulnerabi
 		/sys/devices/system/cpu/vulnerabilities/spectre_v2
 		/sys/devices/system/cpu/vulnerabilities/spec_store_bypass
 		/sys/devices/system/cpu/vulnerabilities/l1tf
+		/sys/devices/system/cpu/vulnerabilities/mds
 Date:		January 2018
 Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
 Description:	Information about CPU vulnerabilities
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1187,6 +1187,17 @@ static ssize_t l1tf_show_state(char *buf
 }
 #endif
 
+static ssize_t mds_show_state(char *buf)
+{
+	if (!hypervisor_is_type(X86_HYPER_NATIVE)) {
+		return sprintf(buf, "%s; SMT Host state unknown\n",
+			       mds_strings[mds_mitigation]);
+	}
+
+	return sprintf(buf, "%s; SMT %s\n", mds_strings[mds_mitigation],
+		       sched_smt_active() ? "vulnerable" : "disabled");
+}
+
 static char *stibp_state(void)
 {
 	if (spectre_v2_enabled == SPECTRE_V2_IBRS_ENHANCED)
@@ -1253,6 +1264,10 @@ static ssize_t cpu_show_common(struct de
 		if (boot_cpu_has(X86_FEATURE_L1TF_PTEINV))
 			return l1tf_show_state(buf);
 		break;
+
+	case X86_BUG_MDS:
+		return mds_show_state(buf);
+
 	default:
 		break;
 	}
@@ -1284,4 +1299,9 @@ ssize_t cpu_show_l1tf(struct device *dev
 {
 	return cpu_show_common(dev, attr, buf, X86_BUG_L1TF);
 }
+
+ssize_t cpu_show_mds(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	return cpu_show_common(dev, attr, buf, X86_BUG_MDS);
+}
 #endif
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -540,8 +540,8 @@ ssize_t __weak cpu_show_spec_store_bypas
 	return sprintf(buf, "Not affected\n");
 }
 
-ssize_t __weak cpu_show_l1tf(struct device *dev,
-			     struct device_attribute *attr, char *buf)
+ssize_t __weak cpu_show_mds(struct device *dev,
+			    struct device_attribute *attr, char *buf)
 {
 	return sprintf(buf, "Not affected\n");
 }
@@ -551,6 +551,7 @@ static DEVICE_ATTR(spectre_v1, 0444, cpu
 static DEVICE_ATTR(spectre_v2, 0444, cpu_show_spectre_v2, NULL);
 static DEVICE_ATTR(spec_store_bypass, 0444, cpu_show_spec_store_bypass, NULL);
 static DEVICE_ATTR(l1tf, 0444, cpu_show_l1tf, NULL);
+static DEVICE_ATTR(mds, 0444, cpu_show_mds, NULL);
 
 static struct attribute *cpu_root_vulnerabilities_attrs[] = {
 	&dev_attr_meltdown.attr,
@@ -558,6 +559,7 @@ static struct attribute *cpu_root_vulner
 	&dev_attr_spectre_v2.attr,
 	&dev_attr_spec_store_bypass.attr,
 	&dev_attr_l1tf.attr,
+	&dev_attr_mds.attr,
 	NULL
 };
 
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -57,6 +57,8 @@ extern ssize_t cpu_show_spec_store_bypas
 					  struct device_attribute *attr, char *buf);
 extern ssize_t cpu_show_l1tf(struct device *dev,
 			     struct device_attribute *attr, char *buf);
+extern ssize_t cpu_show_mds(struct device *dev,
+			    struct device_attribute *attr, char *buf);
 
 extern __printf(4, 5)
 struct device *cpu_device_create(struct device *parent, void *drvdata,

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 7/8] MDS basics 7
  2019-02-19 12:44 [patch 0/8] MDS basics 0 Thomas Gleixner
                   ` (5 preceding siblings ...)
  2019-02-19 12:44 ` [patch 6/8] MDS basics 6 Thomas Gleixner
@ 2019-02-19 12:44 ` Thomas Gleixner
  2019-02-19 12:44 ` [patch 8/8] MDS basics 8 Thomas Gleixner
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-02-19 12:44 UTC (permalink / raw)
  To: speck

Subject: [patch 7/8] Documentation: Move L1TF to separate directory
From: Thomas Gleixner <tglx@linutronix.de>

Move L!TF to a separate directory so the MDS stuff can be added at the
side. Otherwise the all hardware vulnerabilites have their own top level
entry. Should have done that right away.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 Documentation/admin-guide/hw-vuln/index.rst |   12 
 Documentation/admin-guide/hw-vuln/l1tf.rst  |  614 ++++++++++++++++++++++++++++
 Documentation/admin-guide/index.rst         |    6 
 Documentation/admin-guide/l1tf.rst          |  614 ----------------------------
 4 files changed, 628 insertions(+), 618 deletions(-)

--- /dev/null
+++ b/Documentation/admin-guide/hw-vuln/index.rst
@@ -0,0 +1,12 @@
+========================
+Hardware vulnerabilities
+========================
+
+This section describes CPU vulnerabilities and provides an overview of the
+possible mitigations along with guidance for selecting mitigations if they
+are configurable at compile, boot or run time.
+
+.. toctree::
+   :maxdepth: 1
+
+   l1tf
--- /dev/null
+++ b/Documentation/admin-guide/hw-vuln/l1tf.rst
@@ -0,0 +1,614 @@
+L1TF - L1 Terminal Fault
+========================
+
+L1 Terminal Fault is a hardware vulnerability which allows unprivileged
+speculative access to data which is available in the Level 1 Data Cache
+when the page table entry controlling the virtual address, which is used
+for the access, has the Present bit cleared or other reserved bits set.
+
+Affected processors
+-------------------
+
+This vulnerability affects a wide range of Intel processors. The
+vulnerability is not present on:
+
+   - Processors from AMD, Centaur and other non Intel vendors
+
+   - Older processor models, where the CPU family is < 6
+
+   - A range of Intel ATOM processors (Cedarview, Cloverview, Lincroft,
+     Penwell, Pineview, Silvermont, Airmont, Merrifield)
+
+   - The Intel XEON PHI family
+
+   - Intel processors which have the ARCH_CAP_RDCL_NO bit set in the
+     IA32_ARCH_CAPABILITIES MSR. If the bit is set the CPU is not affected
+     by the Meltdown vulnerability either. These CPUs should become
+     available by end of 2018.
+
+Whether a processor is affected or not can be read out from the L1TF
+vulnerability file in sysfs. See :ref:`l1tf_sys_info`.
+
+Related CVEs
+------------
+
+The following CVE entries are related to the L1TF vulnerability:
+
+   =============  =================  ==============================
+   CVE-2018-3615  L1 Terminal Fault  SGX related aspects
+   CVE-2018-3620  L1 Terminal Fault  OS, SMM related aspects
+   CVE-2018-3646  L1 Terminal Fault  Virtualization related aspects
+   =============  =================  ==============================
+
+Problem
+-------
+
+If an instruction accesses a virtual address for which the relevant page
+table entry (PTE) has the Present bit cleared or other reserved bits set,
+then speculative execution ignores the invalid PTE and loads the referenced
+data if it is present in the Level 1 Data Cache, as if the page referenced
+by the address bits in the PTE was still present and accessible.
+
+While this is a purely speculative mechanism and the instruction will raise
+a page fault when it is retired eventually, the pure act of loading the
+data and making it available to other speculative instructions opens up the
+opportunity for side channel attacks to unprivileged malicious code,
+similar to the Meltdown attack.
+
+While Meltdown breaks the user space to kernel space protection, L1TF
+allows to attack any physical memory address in the system and the attack
+works across all protection domains. It allows an attack of SGX and also
+works from inside virtual machines because the speculation bypasses the
+extended page table (EPT) protection mechanism.
+
+
+Attack scenarios
+----------------
+
+1. Malicious user space
+^^^^^^^^^^^^^^^^^^^^^^^
+
+   Operating Systems store arbitrary information in the address bits of a
+   PTE which is marked non present. This allows a malicious user space
+   application to attack the physical memory to which these PTEs resolve.
+   In some cases user-space can maliciously influence the information
+   encoded in the address bits of the PTE, thus making attacks more
+   deterministic and more practical.
+
+   The Linux kernel contains a mitigation for this attack vector, PTE
+   inversion, which is permanently enabled and has no performance
+   impact. The kernel ensures that the address bits of PTEs, which are not
+   marked present, never point to cacheable physical memory space.
+
+   A system with an up to date kernel is protected against attacks from
+   malicious user space applications.
+
+2. Malicious guest in a virtual machine
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+   The fact that L1TF breaks all domain protections allows malicious guest
+   OSes, which can control the PTEs directly, and malicious guest user
+   space applications, which run on an unprotected guest kernel lacking the
+   PTE inversion mitigation for L1TF, to attack physical host memory.
+
+   A special aspect of L1TF in the context of virtualization is symmetric
+   multi threading (SMT). The Intel implementation of SMT is called
+   HyperThreading. The fact that Hyperthreads on the affected processors
+   share the L1 Data Cache (L1D) is important for this. As the flaw allows
+   only to attack data which is present in L1D, a malicious guest running
+   on one Hyperthread can attack the data which is brought into the L1D by
+   the context which runs on the sibling Hyperthread of the same physical
+   core. This context can be host OS, host user space or a different guest.
+
+   If the processor does not support Extended Page Tables, the attack is
+   only possible, when the hypervisor does not sanitize the content of the
+   effective (shadow) page tables.
+
+   While solutions exist to mitigate these attack vectors fully, these
+   mitigations are not enabled by default in the Linux kernel because they
+   can affect performance significantly. The kernel provides several
+   mechanisms which can be utilized to address the problem depending on the
+   deployment scenario. The mitigations, their protection scope and impact
+   are described in the next sections.
+
+   The default mitigations and the rationale for choosing them are explained
+   at the end of this document. See :ref:`default_mitigations`.
+
+.. _l1tf_sys_info:
+
+L1TF system information
+-----------------------
+
+The Linux kernel provides a sysfs interface to enumerate the current L1TF
+status of the system: whether the system is vulnerable, and which
+mitigations are active. The relevant sysfs file is:
+
+/sys/devices/system/cpu/vulnerabilities/l1tf
+
+The possible values in this file are:
+
+  ===========================   ===============================
+  'Not affected'		The processor is not vulnerable
+  'Mitigation: PTE Inversion'	The host protection is active
+  ===========================   ===============================
+
+If KVM/VMX is enabled and the processor is vulnerable then the following
+information is appended to the 'Mitigation: PTE Inversion' part:
+
+  - SMT status:
+
+    =====================  ================
+    'VMX: SMT vulnerable'  SMT is enabled
+    'VMX: SMT disabled'    SMT is disabled
+    =====================  ================
+
+  - L1D Flush mode:
+
+    ================================  ====================================
+    'L1D vulnerable'		      L1D flushing is disabled
+
+    'L1D conditional cache flushes'   L1D flush is conditionally enabled
+
+    'L1D cache flushes'		      L1D flush is unconditionally enabled
+    ================================  ====================================
+
+The resulting grade of protection is discussed in the following sections.
+
+
+Host mitigation mechanism
+-------------------------
+
+The kernel is unconditionally protected against L1TF attacks from malicious
+user space running on the host.
+
+
+Guest mitigation mechanisms
+---------------------------
+
+.. _l1d_flush:
+
+1. L1D flush on VMENTER
+^^^^^^^^^^^^^^^^^^^^^^^
+
+   To make sure that a guest cannot attack data which is present in the L1D
+   the hypervisor flushes the L1D before entering the guest.
+
+   Flushing the L1D evicts not only the data which should not be accessed
+   by a potentially malicious guest, it also flushes the guest
+   data. Flushing the L1D has a performance impact as the processor has to
+   bring the flushed guest data back into the L1D. Depending on the
+   frequency of VMEXIT/VMENTER and the type of computations in the guest
+   performance degradation in the range of 1% to 50% has been observed. For
+   scenarios where guest VMEXIT/VMENTER are rare the performance impact is
+   minimal. Virtio and mechanisms like posted interrupts are designed to
+   confine the VMEXITs to a bare minimum, but specific configurations and
+   application scenarios might still suffer from a high VMEXIT rate.
+
+   The kernel provides two L1D flush modes:
+    - conditional ('cond')
+    - unconditional ('always')
+
+   The conditional mode avoids L1D flushing after VMEXITs which execute
+   only audited code paths before the corresponding VMENTER. These code
+   paths have been verified that they cannot expose secrets or other
+   interesting data to an attacker, but they can leak information about the
+   address space layout of the hypervisor.
+
+   Unconditional mode flushes L1D on all VMENTER invocations and provides
+   maximum protection. It has a higher overhead than the conditional
+   mode. The overhead cannot be quantified correctly as it depends on the
+   workload scenario and the resulting number of VMEXITs.
+
+   The general recommendation is to enable L1D flush on VMENTER. The kernel
+   defaults to conditional mode on affected processors.
+
+   **Note**, that L1D flush does not prevent the SMT problem because the
+   sibling thread will also bring back its data into the L1D which makes it
+   attackable again.
+
+   L1D flush can be controlled by the administrator via the kernel command
+   line and sysfs control files. See :ref:`mitigation_control_command_line`
+   and :ref:`mitigation_control_kvm`.
+
+.. _guest_confinement:
+
+2. Guest VCPU confinement to dedicated physical cores
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+   To address the SMT problem, it is possible to make a guest or a group of
+   guests affine to one or more physical cores. The proper mechanism for
+   that is to utilize exclusive cpusets to ensure that no other guest or
+   host tasks can run on these cores.
+
+   If only a single guest or related guests run on sibling SMT threads on
+   the same physical core then they can only attack their own memory and
+   restricted parts of the host memory.
+
+   Host memory is attackable, when one of the sibling SMT threads runs in
+   host OS (hypervisor) context and the other in guest context. The amount
+   of valuable information from the host OS context depends on the context
+   which the host OS executes, i.e. interrupts, soft interrupts and kernel
+   threads. The amount of valuable data from these contexts cannot be
+   declared as non-interesting for an attacker without deep inspection of
+   the code.
+
+   **Note**, that assigning guests to a fixed set of physical cores affects
+   the ability of the scheduler to do load balancing and might have
+   negative effects on CPU utilization depending on the hosting
+   scenario. Disabling SMT might be a viable alternative for particular
+   scenarios.
+
+   For further information about confining guests to a single or to a group
+   of cores consult the cpusets documentation:
+
+   https://www.kernel.org/doc/Documentation/cgroup-v1/cpusets.txt
+
+.. _interrupt_isolation:
+
+3. Interrupt affinity
+^^^^^^^^^^^^^^^^^^^^^
+
+   Interrupts can be made affine to logical CPUs. This is not universally
+   true because there are types of interrupts which are truly per CPU
+   interrupts, e.g. the local timer interrupt. Aside of that multi queue
+   devices affine their interrupts to single CPUs or groups of CPUs per
+   queue without allowing the administrator to control the affinities.
+
+   Moving the interrupts, which can be affinity controlled, away from CPUs
+   which run untrusted guests, reduces the attack vector space.
+
+   Whether the interrupts with are affine to CPUs, which run untrusted
+   guests, provide interesting data for an attacker depends on the system
+   configuration and the scenarios which run on the system. While for some
+   of the interrupts it can be assumed that they won't expose interesting
+   information beyond exposing hints about the host OS memory layout, there
+   is no way to make general assumptions.
+
+   Interrupt affinity can be controlled by the administrator via the
+   /proc/irq/$NR/smp_affinity[_list] files. Limited documentation is
+   available at:
+
+   https://www.kernel.org/doc/Documentation/IRQ-affinity.txt
+
+.. _smt_control:
+
+4. SMT control
+^^^^^^^^^^^^^^
+
+   To prevent the SMT issues of L1TF it might be necessary to disable SMT
+   completely. Disabling SMT can have a significant performance impact, but
+   the impact depends on the hosting scenario and the type of workloads.
+   The impact of disabling SMT needs also to be weighted against the impact
+   of other mitigation solutions like confining guests to dedicated cores.
+
+   The kernel provides a sysfs interface to retrieve the status of SMT and
+   to control it. It also provides a kernel command line interface to
+   control SMT.
+
+   The kernel command line interface consists of the following options:
+
+     =========== ==========================================================
+     nosmt	 Affects the bring up of the secondary CPUs during boot. The
+		 kernel tries to bring all present CPUs online during the
+		 boot process. "nosmt" makes sure that from each physical
+		 core only one - the so called primary (hyper) thread is
+		 activated. Due to a design flaw of Intel processors related
+		 to Machine Check Exceptions the non primary siblings have
+		 to be brought up at least partially and are then shut down
+		 again.  "nosmt" can be undone via the sysfs interface.
+
+     nosmt=force Has the same effect as "nosmt" but it does not allow to
+		 undo the SMT disable via the sysfs interface.
+     =========== ==========================================================
+
+   The sysfs interface provides two files:
+
+   - /sys/devices/system/cpu/smt/control
+   - /sys/devices/system/cpu/smt/active
+
+   /sys/devices/system/cpu/smt/control:
+
+     This file allows to read out the SMT control state and provides the
+     ability to disable or (re)enable SMT. The possible states are:
+
+	==============  ===================================================
+	on		SMT is supported by the CPU and enabled. All
+			logical CPUs can be onlined and offlined without
+			restrictions.
+
+	off		SMT is supported by the CPU and disabled. Only
+			the so called primary SMT threads can be onlined
+			and offlined without restrictions. An attempt to
+			online a non-primary sibling is rejected
+
+	forceoff	Same as 'off' but the state cannot be controlled.
+			Attempts to write to the control file are rejected.
+
+	notsupported	The processor does not support SMT. It's therefore
+			not affected by the SMT implications of L1TF.
+			Attempts to write to the control file are rejected.
+	==============  ===================================================
+
+     The possible states which can be written into this file to control SMT
+     state are:
+
+     - on
+     - off
+     - forceoff
+
+   /sys/devices/system/cpu/smt/active:
+
+     This file reports whether SMT is enabled and active, i.e. if on any
+     physical core two or more sibling threads are online.
+
+   SMT control is also possible at boot time via the l1tf kernel command
+   line parameter in combination with L1D flush control. See
+   :ref:`mitigation_control_command_line`.
+
+5. Disabling EPT
+^^^^^^^^^^^^^^^^
+
+  Disabling EPT for virtual machines provides full mitigation for L1TF even
+  with SMT enabled, because the effective page tables for guests are
+  managed and sanitized by the hypervisor. Though disabling EPT has a
+  significant performance impact especially when the Meltdown mitigation
+  KPTI is enabled.
+
+  EPT can be disabled in the hypervisor via the 'kvm-intel.ept' parameter.
+
+There is ongoing research and development for new mitigation mechanisms to
+address the performance impact of disabling SMT or EPT.
+
+.. _mitigation_control_command_line:
+
+Mitigation control on the kernel command line
+---------------------------------------------
+
+The kernel command line allows to control the L1TF mitigations at boot
+time with the option "l1tf=". The valid arguments for this option are:
+
+  ============  =============================================================
+  full		Provides all available mitigations for the L1TF
+		vulnerability. Disables SMT and enables all mitigations in
+		the hypervisors, i.e. unconditional L1D flushing
+
+		SMT control and L1D flush control via the sysfs interface
+		is still possible after boot.  Hypervisors will issue a
+		warning when the first VM is started in a potentially
+		insecure configuration, i.e. SMT enabled or L1D flush
+		disabled.
+
+  full,force	Same as 'full', but disables SMT and L1D flush runtime
+		control. Implies the 'nosmt=force' command line option.
+		(i.e. sysfs control of SMT is disabled.)
+
+  flush		Leaves SMT enabled and enables the default hypervisor
+		mitigation, i.e. conditional L1D flushing
+
+		SMT control and L1D flush control via the sysfs interface
+		is still possible after boot.  Hypervisors will issue a
+		warning when the first VM is started in a potentially
+		insecure configuration, i.e. SMT enabled or L1D flush
+		disabled.
+
+  flush,nosmt	Disables SMT and enables the default hypervisor mitigation,
+		i.e. conditional L1D flushing.
+
+		SMT control and L1D flush control via the sysfs interface
+		is still possible after boot.  Hypervisors will issue a
+		warning when the first VM is started in a potentially
+		insecure configuration, i.e. SMT enabled or L1D flush
+		disabled.
+
+  flush,nowarn	Same as 'flush', but hypervisors will not warn when a VM is
+		started in a potentially insecure configuration.
+
+  off		Disables hypervisor mitigations and doesn't emit any
+		warnings.
+		It also drops the swap size and available RAM limit restrictions
+		on both hypervisor and bare metal.
+
+  ============  =============================================================
+
+The default is 'flush'. For details about L1D flushing see :ref:`l1d_flush`.
+
+
+.. _mitigation_control_kvm:
+
+Mitigation control for KVM - module parameter
+-------------------------------------------------------------
+
+The KVM hypervisor mitigation mechanism, flushing the L1D cache when
+entering a guest, can be controlled with a module parameter.
+
+The option/parameter is "kvm-intel.vmentry_l1d_flush=". It takes the
+following arguments:
+
+  ============  ==============================================================
+  always	L1D cache flush on every VMENTER.
+
+  cond		Flush L1D on VMENTER only when the code between VMEXIT and
+		VMENTER can leak host memory which is considered
+		interesting for an attacker. This still can leak host memory
+		which allows e.g. to determine the hosts address space layout.
+
+  never		Disables the mitigation
+  ============  ==============================================================
+
+The parameter can be provided on the kernel command line, as a module
+parameter when loading the modules and at runtime modified via the sysfs
+file:
+
+/sys/module/kvm_intel/parameters/vmentry_l1d_flush
+
+The default is 'cond'. If 'l1tf=full,force' is given on the kernel command
+line, then 'always' is enforced and the kvm-intel.vmentry_l1d_flush
+module parameter is ignored and writes to the sysfs file are rejected.
+
+
+Mitigation selection guide
+--------------------------
+
+1. No virtualization in use
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+   The system is protected by the kernel unconditionally and no further
+   action is required.
+
+2. Virtualization with trusted guests
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+   If the guest comes from a trusted source and the guest OS kernel is
+   guaranteed to have the L1TF mitigations in place the system is fully
+   protected against L1TF and no further action is required.
+
+   To avoid the overhead of the default L1D flushing on VMENTER the
+   administrator can disable the flushing via the kernel command line and
+   sysfs control files. See :ref:`mitigation_control_command_line` and
+   :ref:`mitigation_control_kvm`.
+
+
+3. Virtualization with untrusted guests
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+3.1. SMT not supported or disabled
+""""""""""""""""""""""""""""""""""
+
+  If SMT is not supported by the processor or disabled in the BIOS or by
+  the kernel, it's only required to enforce L1D flushing on VMENTER.
+
+  Conditional L1D flushing is the default behaviour and can be tuned. See
+  :ref:`mitigation_control_command_line` and :ref:`mitigation_control_kvm`.
+
+3.2. EPT not supported or disabled
+""""""""""""""""""""""""""""""""""
+
+  If EPT is not supported by the processor or disabled in the hypervisor,
+  the system is fully protected. SMT can stay enabled and L1D flushing on
+  VMENTER is not required.
+
+  EPT can be disabled in the hypervisor via the 'kvm-intel.ept' parameter.
+
+3.3. SMT and EPT supported and active
+"""""""""""""""""""""""""""""""""""""
+
+  If SMT and EPT are supported and active then various degrees of
+  mitigations can be employed:
+
+  - L1D flushing on VMENTER:
+
+    L1D flushing on VMENTER is the minimal protection requirement, but it
+    is only potent in combination with other mitigation methods.
+
+    Conditional L1D flushing is the default behaviour and can be tuned. See
+    :ref:`mitigation_control_command_line` and :ref:`mitigation_control_kvm`.
+
+  - Guest confinement:
+
+    Confinement of guests to a single or a group of physical cores which
+    are not running any other processes, can reduce the attack surface
+    significantly, but interrupts, soft interrupts and kernel threads can
+    still expose valuable data to a potential attacker. See
+    :ref:`guest_confinement`.
+
+  - Interrupt isolation:
+
+    Isolating the guest CPUs from interrupts can reduce the attack surface
+    further, but still allows a malicious guest to explore a limited amount
+    of host physical memory. This can at least be used to gain knowledge
+    about the host address space layout. The interrupts which have a fixed
+    affinity to the CPUs which run the untrusted guests can depending on
+    the scenario still trigger soft interrupts and schedule kernel threads
+    which might expose valuable information. See
+    :ref:`interrupt_isolation`.
+
+The above three mitigation methods combined can provide protection to a
+certain degree, but the risk of the remaining attack surface has to be
+carefully analyzed. For full protection the following methods are
+available:
+
+  - Disabling SMT:
+
+    Disabling SMT and enforcing the L1D flushing provides the maximum
+    amount of protection. This mitigation is not depending on any of the
+    above mitigation methods.
+
+    SMT control and L1D flushing can be tuned by the command line
+    parameters 'nosmt', 'l1tf', 'kvm-intel.vmentry_l1d_flush' and at run
+    time with the matching sysfs control files. See :ref:`smt_control`,
+    :ref:`mitigation_control_command_line` and
+    :ref:`mitigation_control_kvm`.
+
+  - Disabling EPT:
+
+    Disabling EPT provides the maximum amount of protection as well. It is
+    not depending on any of the above mitigation methods. SMT can stay
+    enabled and L1D flushing is not required, but the performance impact is
+    significant.
+
+    EPT can be disabled in the hypervisor via the 'kvm-intel.ept'
+    parameter.
+
+3.4. Nested virtual machines
+""""""""""""""""""""""""""""
+
+When nested virtualization is in use, three operating systems are involved:
+the bare metal hypervisor, the nested hypervisor and the nested virtual
+machine.  VMENTER operations from the nested hypervisor into the nested
+guest will always be processed by the bare metal hypervisor. If KVM is the
+bare metal hypervisor it will:
+
+ - Flush the L1D cache on every switch from the nested hypervisor to the
+   nested virtual machine, so that the nested hypervisor's secrets are not
+   exposed to the nested virtual machine;
+
+ - Flush the L1D cache on every switch from the nested virtual machine to
+   the nested hypervisor; this is a complex operation, and flushing the L1D
+   cache avoids that the bare metal hypervisor's secrets are exposed to the
+   nested virtual machine;
+
+ - Instruct the nested hypervisor to not perform any L1D cache flush. This
+   is an optimization to avoid double L1D flushing.
+
+
+.. _default_mitigations:
+
+Default mitigations
+-------------------
+
+  The kernel default mitigations for vulnerable processors are:
+
+  - PTE inversion to protect against malicious user space. This is done
+    unconditionally and cannot be controlled. The swap storage is limited
+    to ~16TB.
+
+  - L1D conditional flushing on VMENTER when EPT is enabled for
+    a guest.
+
+  The kernel does not by default enforce the disabling of SMT, which leaves
+  SMT systems vulnerable when running untrusted guests with EPT enabled.
+
+  The rationale for this choice is:
+
+  - Force disabling SMT can break existing setups, especially with
+    unattended updates.
+
+  - If regular users run untrusted guests on their machine, then L1TF is
+    just an add on to other malware which might be embedded in an untrusted
+    guest, e.g. spam-bots or attacks on the local network.
+
+    There is no technical way to prevent a user from running untrusted code
+    on their machines blindly.
+
+  - It's technically extremely unlikely and from today's knowledge even
+    impossible that L1TF can be exploited via the most popular attack
+    mechanisms like JavaScript because these mechanisms have no way to
+    control PTEs. If this would be possible and not other mitigation would
+    be possible, then the default might be different.
+
+  - The administrators of cloud and hosting setups have to carefully
+    analyze the risk for their scenarios and make the appropriate
+    mitigation choices, which might even vary across their deployed
+    machines and also result in other changes of their overall setup.
+    There is no way for the kernel to provide a sensible default for this
+    kind of scenarios.
--- a/Documentation/admin-guide/index.rst
+++ b/Documentation/admin-guide/index.rst
@@ -17,14 +17,12 @@ etc.
    kernel-parameters
    devices
 
-This section describes CPU vulnerabilities and provides an overview of the
-possible mitigations along with guidance for selecting mitigations if they
-are configurable at compile, boot or run time.
+This section describes CPU vulnerabilities and their mitigations.
 
 .. toctree::
    :maxdepth: 1
 
-   l1tf
+   hw-vuln/index
 
 Here is a set of documents aimed at users who are trying to track down
 problems and bugs in particular.
--- a/Documentation/admin-guide/l1tf.rst
+++ /dev/null
@@ -1,614 +0,0 @@
-L1TF - L1 Terminal Fault
-========================
-
-L1 Terminal Fault is a hardware vulnerability which allows unprivileged
-speculative access to data which is available in the Level 1 Data Cache
-when the page table entry controlling the virtual address, which is used
-for the access, has the Present bit cleared or other reserved bits set.
-
-Affected processors
--------------------
-
-This vulnerability affects a wide range of Intel processors. The
-vulnerability is not present on:
-
-   - Processors from AMD, Centaur and other non Intel vendors
-
-   - Older processor models, where the CPU family is < 6
-
-   - A range of Intel ATOM processors (Cedarview, Cloverview, Lincroft,
-     Penwell, Pineview, Silvermont, Airmont, Merrifield)
-
-   - The Intel XEON PHI family
-
-   - Intel processors which have the ARCH_CAP_RDCL_NO bit set in the
-     IA32_ARCH_CAPABILITIES MSR. If the bit is set the CPU is not affected
-     by the Meltdown vulnerability either. These CPUs should become
-     available by end of 2018.
-
-Whether a processor is affected or not can be read out from the L1TF
-vulnerability file in sysfs. See :ref:`l1tf_sys_info`.
-
-Related CVEs
-------------
-
-The following CVE entries are related to the L1TF vulnerability:
-
-   =============  =================  ==============================
-   CVE-2018-3615  L1 Terminal Fault  SGX related aspects
-   CVE-2018-3620  L1 Terminal Fault  OS, SMM related aspects
-   CVE-2018-3646  L1 Terminal Fault  Virtualization related aspects
-   =============  =================  ==============================
-
-Problem
--------
-
-If an instruction accesses a virtual address for which the relevant page
-table entry (PTE) has the Present bit cleared or other reserved bits set,
-then speculative execution ignores the invalid PTE and loads the referenced
-data if it is present in the Level 1 Data Cache, as if the page referenced
-by the address bits in the PTE was still present and accessible.
-
-While this is a purely speculative mechanism and the instruction will raise
-a page fault when it is retired eventually, the pure act of loading the
-data and making it available to other speculative instructions opens up the
-opportunity for side channel attacks to unprivileged malicious code,
-similar to the Meltdown attack.
-
-While Meltdown breaks the user space to kernel space protection, L1TF
-allows to attack any physical memory address in the system and the attack
-works across all protection domains. It allows an attack of SGX and also
-works from inside virtual machines because the speculation bypasses the
-extended page table (EPT) protection mechanism.
-
-
-Attack scenarios
-----------------
-
-1. Malicious user space
-^^^^^^^^^^^^^^^^^^^^^^^
-
-   Operating Systems store arbitrary information in the address bits of a
-   PTE which is marked non present. This allows a malicious user space
-   application to attack the physical memory to which these PTEs resolve.
-   In some cases user-space can maliciously influence the information
-   encoded in the address bits of the PTE, thus making attacks more
-   deterministic and more practical.
-
-   The Linux kernel contains a mitigation for this attack vector, PTE
-   inversion, which is permanently enabled and has no performance
-   impact. The kernel ensures that the address bits of PTEs, which are not
-   marked present, never point to cacheable physical memory space.
-
-   A system with an up to date kernel is protected against attacks from
-   malicious user space applications.
-
-2. Malicious guest in a virtual machine
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-   The fact that L1TF breaks all domain protections allows malicious guest
-   OSes, which can control the PTEs directly, and malicious guest user
-   space applications, which run on an unprotected guest kernel lacking the
-   PTE inversion mitigation for L1TF, to attack physical host memory.
-
-   A special aspect of L1TF in the context of virtualization is symmetric
-   multi threading (SMT). The Intel implementation of SMT is called
-   HyperThreading. The fact that Hyperthreads on the affected processors
-   share the L1 Data Cache (L1D) is important for this. As the flaw allows
-   only to attack data which is present in L1D, a malicious guest running
-   on one Hyperthread can attack the data which is brought into the L1D by
-   the context which runs on the sibling Hyperthread of the same physical
-   core. This context can be host OS, host user space or a different guest.
-
-   If the processor does not support Extended Page Tables, the attack is
-   only possible, when the hypervisor does not sanitize the content of the
-   effective (shadow) page tables.
-
-   While solutions exist to mitigate these attack vectors fully, these
-   mitigations are not enabled by default in the Linux kernel because they
-   can affect performance significantly. The kernel provides several
-   mechanisms which can be utilized to address the problem depending on the
-   deployment scenario. The mitigations, their protection scope and impact
-   are described in the next sections.
-
-   The default mitigations and the rationale for choosing them are explained
-   at the end of this document. See :ref:`default_mitigations`.
-
-.. _l1tf_sys_info:
-
-L1TF system information
------------------------
-
-The Linux kernel provides a sysfs interface to enumerate the current L1TF
-status of the system: whether the system is vulnerable, and which
-mitigations are active. The relevant sysfs file is:
-
-/sys/devices/system/cpu/vulnerabilities/l1tf
-
-The possible values in this file are:
-
-  ===========================   ===============================
-  'Not affected'		The processor is not vulnerable
-  'Mitigation: PTE Inversion'	The host protection is active
-  ===========================   ===============================
-
-If KVM/VMX is enabled and the processor is vulnerable then the following
-information is appended to the 'Mitigation: PTE Inversion' part:
-
-  - SMT status:
-
-    =====================  ================
-    'VMX: SMT vulnerable'  SMT is enabled
-    'VMX: SMT disabled'    SMT is disabled
-    =====================  ================
-
-  - L1D Flush mode:
-
-    ================================  ====================================
-    'L1D vulnerable'		      L1D flushing is disabled
-
-    'L1D conditional cache flushes'   L1D flush is conditionally enabled
-
-    'L1D cache flushes'		      L1D flush is unconditionally enabled
-    ================================  ====================================
-
-The resulting grade of protection is discussed in the following sections.
-
-
-Host mitigation mechanism
--------------------------
-
-The kernel is unconditionally protected against L1TF attacks from malicious
-user space running on the host.
-
-
-Guest mitigation mechanisms
----------------------------
-
-.. _l1d_flush:
-
-1. L1D flush on VMENTER
-^^^^^^^^^^^^^^^^^^^^^^^
-
-   To make sure that a guest cannot attack data which is present in the L1D
-   the hypervisor flushes the L1D before entering the guest.
-
-   Flushing the L1D evicts not only the data which should not be accessed
-   by a potentially malicious guest, it also flushes the guest
-   data. Flushing the L1D has a performance impact as the processor has to
-   bring the flushed guest data back into the L1D. Depending on the
-   frequency of VMEXIT/VMENTER and the type of computations in the guest
-   performance degradation in the range of 1% to 50% has been observed. For
-   scenarios where guest VMEXIT/VMENTER are rare the performance impact is
-   minimal. Virtio and mechanisms like posted interrupts are designed to
-   confine the VMEXITs to a bare minimum, but specific configurations and
-   application scenarios might still suffer from a high VMEXIT rate.
-
-   The kernel provides two L1D flush modes:
-    - conditional ('cond')
-    - unconditional ('always')
-
-   The conditional mode avoids L1D flushing after VMEXITs which execute
-   only audited code paths before the corresponding VMENTER. These code
-   paths have been verified that they cannot expose secrets or other
-   interesting data to an attacker, but they can leak information about the
-   address space layout of the hypervisor.
-
-   Unconditional mode flushes L1D on all VMENTER invocations and provides
-   maximum protection. It has a higher overhead than the conditional
-   mode. The overhead cannot be quantified correctly as it depends on the
-   workload scenario and the resulting number of VMEXITs.
-
-   The general recommendation is to enable L1D flush on VMENTER. The kernel
-   defaults to conditional mode on affected processors.
-
-   **Note**, that L1D flush does not prevent the SMT problem because the
-   sibling thread will also bring back its data into the L1D which makes it
-   attackable again.
-
-   L1D flush can be controlled by the administrator via the kernel command
-   line and sysfs control files. See :ref:`mitigation_control_command_line`
-   and :ref:`mitigation_control_kvm`.
-
-.. _guest_confinement:
-
-2. Guest VCPU confinement to dedicated physical cores
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-   To address the SMT problem, it is possible to make a guest or a group of
-   guests affine to one or more physical cores. The proper mechanism for
-   that is to utilize exclusive cpusets to ensure that no other guest or
-   host tasks can run on these cores.
-
-   If only a single guest or related guests run on sibling SMT threads on
-   the same physical core then they can only attack their own memory and
-   restricted parts of the host memory.
-
-   Host memory is attackable, when one of the sibling SMT threads runs in
-   host OS (hypervisor) context and the other in guest context. The amount
-   of valuable information from the host OS context depends on the context
-   which the host OS executes, i.e. interrupts, soft interrupts and kernel
-   threads. The amount of valuable data from these contexts cannot be
-   declared as non-interesting for an attacker without deep inspection of
-   the code.
-
-   **Note**, that assigning guests to a fixed set of physical cores affects
-   the ability of the scheduler to do load balancing and might have
-   negative effects on CPU utilization depending on the hosting
-   scenario. Disabling SMT might be a viable alternative for particular
-   scenarios.
-
-   For further information about confining guests to a single or to a group
-   of cores consult the cpusets documentation:
-
-   https://www.kernel.org/doc/Documentation/cgroup-v1/cpusets.txt
-
-.. _interrupt_isolation:
-
-3. Interrupt affinity
-^^^^^^^^^^^^^^^^^^^^^
-
-   Interrupts can be made affine to logical CPUs. This is not universally
-   true because there are types of interrupts which are truly per CPU
-   interrupts, e.g. the local timer interrupt. Aside of that multi queue
-   devices affine their interrupts to single CPUs or groups of CPUs per
-   queue without allowing the administrator to control the affinities.
-
-   Moving the interrupts, which can be affinity controlled, away from CPUs
-   which run untrusted guests, reduces the attack vector space.
-
-   Whether the interrupts with are affine to CPUs, which run untrusted
-   guests, provide interesting data for an attacker depends on the system
-   configuration and the scenarios which run on the system. While for some
-   of the interrupts it can be assumed that they won't expose interesting
-   information beyond exposing hints about the host OS memory layout, there
-   is no way to make general assumptions.
-
-   Interrupt affinity can be controlled by the administrator via the
-   /proc/irq/$NR/smp_affinity[_list] files. Limited documentation is
-   available at:
-
-   https://www.kernel.org/doc/Documentation/IRQ-affinity.txt
-
-.. _smt_control:
-
-4. SMT control
-^^^^^^^^^^^^^^
-
-   To prevent the SMT issues of L1TF it might be necessary to disable SMT
-   completely. Disabling SMT can have a significant performance impact, but
-   the impact depends on the hosting scenario and the type of workloads.
-   The impact of disabling SMT needs also to be weighted against the impact
-   of other mitigation solutions like confining guests to dedicated cores.
-
-   The kernel provides a sysfs interface to retrieve the status of SMT and
-   to control it. It also provides a kernel command line interface to
-   control SMT.
-
-   The kernel command line interface consists of the following options:
-
-     =========== ==========================================================
-     nosmt	 Affects the bring up of the secondary CPUs during boot. The
-		 kernel tries to bring all present CPUs online during the
-		 boot process. "nosmt" makes sure that from each physical
-		 core only one - the so called primary (hyper) thread is
-		 activated. Due to a design flaw of Intel processors related
-		 to Machine Check Exceptions the non primary siblings have
-		 to be brought up at least partially and are then shut down
-		 again.  "nosmt" can be undone via the sysfs interface.
-
-     nosmt=force Has the same effect as "nosmt" but it does not allow to
-		 undo the SMT disable via the sysfs interface.
-     =========== ==========================================================
-
-   The sysfs interface provides two files:
-
-   - /sys/devices/system/cpu/smt/control
-   - /sys/devices/system/cpu/smt/active
-
-   /sys/devices/system/cpu/smt/control:
-
-     This file allows to read out the SMT control state and provides the
-     ability to disable or (re)enable SMT. The possible states are:
-
-	==============  ===================================================
-	on		SMT is supported by the CPU and enabled. All
-			logical CPUs can be onlined and offlined without
-			restrictions.
-
-	off		SMT is supported by the CPU and disabled. Only
-			the so called primary SMT threads can be onlined
-			and offlined without restrictions. An attempt to
-			online a non-primary sibling is rejected
-
-	forceoff	Same as 'off' but the state cannot be controlled.
-			Attempts to write to the control file are rejected.
-
-	notsupported	The processor does not support SMT. It's therefore
-			not affected by the SMT implications of L1TF.
-			Attempts to write to the control file are rejected.
-	==============  ===================================================
-
-     The possible states which can be written into this file to control SMT
-     state are:
-
-     - on
-     - off
-     - forceoff
-
-   /sys/devices/system/cpu/smt/active:
-
-     This file reports whether SMT is enabled and active, i.e. if on any
-     physical core two or more sibling threads are online.
-
-   SMT control is also possible at boot time via the l1tf kernel command
-   line parameter in combination with L1D flush control. See
-   :ref:`mitigation_control_command_line`.
-
-5. Disabling EPT
-^^^^^^^^^^^^^^^^
-
-  Disabling EPT for virtual machines provides full mitigation for L1TF even
-  with SMT enabled, because the effective page tables for guests are
-  managed and sanitized by the hypervisor. Though disabling EPT has a
-  significant performance impact especially when the Meltdown mitigation
-  KPTI is enabled.
-
-  EPT can be disabled in the hypervisor via the 'kvm-intel.ept' parameter.
-
-There is ongoing research and development for new mitigation mechanisms to
-address the performance impact of disabling SMT or EPT.
-
-.. _mitigation_control_command_line:
-
-Mitigation control on the kernel command line
----------------------------------------------
-
-The kernel command line allows to control the L1TF mitigations at boot
-time with the option "l1tf=". The valid arguments for this option are:
-
-  ============  =============================================================
-  full		Provides all available mitigations for the L1TF
-		vulnerability. Disables SMT and enables all mitigations in
-		the hypervisors, i.e. unconditional L1D flushing
-
-		SMT control and L1D flush control via the sysfs interface
-		is still possible after boot.  Hypervisors will issue a
-		warning when the first VM is started in a potentially
-		insecure configuration, i.e. SMT enabled or L1D flush
-		disabled.
-
-  full,force	Same as 'full', but disables SMT and L1D flush runtime
-		control. Implies the 'nosmt=force' command line option.
-		(i.e. sysfs control of SMT is disabled.)
-
-  flush		Leaves SMT enabled and enables the default hypervisor
-		mitigation, i.e. conditional L1D flushing
-
-		SMT control and L1D flush control via the sysfs interface
-		is still possible after boot.  Hypervisors will issue a
-		warning when the first VM is started in a potentially
-		insecure configuration, i.e. SMT enabled or L1D flush
-		disabled.
-
-  flush,nosmt	Disables SMT and enables the default hypervisor mitigation,
-		i.e. conditional L1D flushing.
-
-		SMT control and L1D flush control via the sysfs interface
-		is still possible after boot.  Hypervisors will issue a
-		warning when the first VM is started in a potentially
-		insecure configuration, i.e. SMT enabled or L1D flush
-		disabled.
-
-  flush,nowarn	Same as 'flush', but hypervisors will not warn when a VM is
-		started in a potentially insecure configuration.
-
-  off		Disables hypervisor mitigations and doesn't emit any
-		warnings.
-		It also drops the swap size and available RAM limit restrictions
-		on both hypervisor and bare metal.
-
-  ============  =============================================================
-
-The default is 'flush'. For details about L1D flushing see :ref:`l1d_flush`.
-
-
-.. _mitigation_control_kvm:
-
-Mitigation control for KVM - module parameter
--------------------------------------------------------------
-
-The KVM hypervisor mitigation mechanism, flushing the L1D cache when
-entering a guest, can be controlled with a module parameter.
-
-The option/parameter is "kvm-intel.vmentry_l1d_flush=". It takes the
-following arguments:
-
-  ============  ==============================================================
-  always	L1D cache flush on every VMENTER.
-
-  cond		Flush L1D on VMENTER only when the code between VMEXIT and
-		VMENTER can leak host memory which is considered
-		interesting for an attacker. This still can leak host memory
-		which allows e.g. to determine the hosts address space layout.
-
-  never		Disables the mitigation
-  ============  ==============================================================
-
-The parameter can be provided on the kernel command line, as a module
-parameter when loading the modules and at runtime modified via the sysfs
-file:
-
-/sys/module/kvm_intel/parameters/vmentry_l1d_flush
-
-The default is 'cond'. If 'l1tf=full,force' is given on the kernel command
-line, then 'always' is enforced and the kvm-intel.vmentry_l1d_flush
-module parameter is ignored and writes to the sysfs file are rejected.
-
-
-Mitigation selection guide
---------------------------
-
-1. No virtualization in use
-^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-   The system is protected by the kernel unconditionally and no further
-   action is required.
-
-2. Virtualization with trusted guests
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-   If the guest comes from a trusted source and the guest OS kernel is
-   guaranteed to have the L1TF mitigations in place the system is fully
-   protected against L1TF and no further action is required.
-
-   To avoid the overhead of the default L1D flushing on VMENTER the
-   administrator can disable the flushing via the kernel command line and
-   sysfs control files. See :ref:`mitigation_control_command_line` and
-   :ref:`mitigation_control_kvm`.
-
-
-3. Virtualization with untrusted guests
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-3.1. SMT not supported or disabled
-""""""""""""""""""""""""""""""""""
-
-  If SMT is not supported by the processor or disabled in the BIOS or by
-  the kernel, it's only required to enforce L1D flushing on VMENTER.
-
-  Conditional L1D flushing is the default behaviour and can be tuned. See
-  :ref:`mitigation_control_command_line` and :ref:`mitigation_control_kvm`.
-
-3.2. EPT not supported or disabled
-""""""""""""""""""""""""""""""""""
-
-  If EPT is not supported by the processor or disabled in the hypervisor,
-  the system is fully protected. SMT can stay enabled and L1D flushing on
-  VMENTER is not required.
-
-  EPT can be disabled in the hypervisor via the 'kvm-intel.ept' parameter.
-
-3.3. SMT and EPT supported and active
-"""""""""""""""""""""""""""""""""""""
-
-  If SMT and EPT are supported and active then various degrees of
-  mitigations can be employed:
-
-  - L1D flushing on VMENTER:
-
-    L1D flushing on VMENTER is the minimal protection requirement, but it
-    is only potent in combination with other mitigation methods.
-
-    Conditional L1D flushing is the default behaviour and can be tuned. See
-    :ref:`mitigation_control_command_line` and :ref:`mitigation_control_kvm`.
-
-  - Guest confinement:
-
-    Confinement of guests to a single or a group of physical cores which
-    are not running any other processes, can reduce the attack surface
-    significantly, but interrupts, soft interrupts and kernel threads can
-    still expose valuable data to a potential attacker. See
-    :ref:`guest_confinement`.
-
-  - Interrupt isolation:
-
-    Isolating the guest CPUs from interrupts can reduce the attack surface
-    further, but still allows a malicious guest to explore a limited amount
-    of host physical memory. This can at least be used to gain knowledge
-    about the host address space layout. The interrupts which have a fixed
-    affinity to the CPUs which run the untrusted guests can depending on
-    the scenario still trigger soft interrupts and schedule kernel threads
-    which might expose valuable information. See
-    :ref:`interrupt_isolation`.
-
-The above three mitigation methods combined can provide protection to a
-certain degree, but the risk of the remaining attack surface has to be
-carefully analyzed. For full protection the following methods are
-available:
-
-  - Disabling SMT:
-
-    Disabling SMT and enforcing the L1D flushing provides the maximum
-    amount of protection. This mitigation is not depending on any of the
-    above mitigation methods.
-
-    SMT control and L1D flushing can be tuned by the command line
-    parameters 'nosmt', 'l1tf', 'kvm-intel.vmentry_l1d_flush' and at run
-    time with the matching sysfs control files. See :ref:`smt_control`,
-    :ref:`mitigation_control_command_line` and
-    :ref:`mitigation_control_kvm`.
-
-  - Disabling EPT:
-
-    Disabling EPT provides the maximum amount of protection as well. It is
-    not depending on any of the above mitigation methods. SMT can stay
-    enabled and L1D flushing is not required, but the performance impact is
-    significant.
-
-    EPT can be disabled in the hypervisor via the 'kvm-intel.ept'
-    parameter.
-
-3.4. Nested virtual machines
-""""""""""""""""""""""""""""
-
-When nested virtualization is in use, three operating systems are involved:
-the bare metal hypervisor, the nested hypervisor and the nested virtual
-machine.  VMENTER operations from the nested hypervisor into the nested
-guest will always be processed by the bare metal hypervisor. If KVM is the
-bare metal hypervisor it will:
-
- - Flush the L1D cache on every switch from the nested hypervisor to the
-   nested virtual machine, so that the nested hypervisor's secrets are not
-   exposed to the nested virtual machine;
-
- - Flush the L1D cache on every switch from the nested virtual machine to
-   the nested hypervisor; this is a complex operation, and flushing the L1D
-   cache avoids that the bare metal hypervisor's secrets are exposed to the
-   nested virtual machine;
-
- - Instruct the nested hypervisor to not perform any L1D cache flush. This
-   is an optimization to avoid double L1D flushing.
-
-
-.. _default_mitigations:
-
-Default mitigations
--------------------
-
-  The kernel default mitigations for vulnerable processors are:
-
-  - PTE inversion to protect against malicious user space. This is done
-    unconditionally and cannot be controlled. The swap storage is limited
-    to ~16TB.
-
-  - L1D conditional flushing on VMENTER when EPT is enabled for
-    a guest.
-
-  The kernel does not by default enforce the disabling of SMT, which leaves
-  SMT systems vulnerable when running untrusted guests with EPT enabled.
-
-  The rationale for this choice is:
-
-  - Force disabling SMT can break existing setups, especially with
-    unattended updates.
-
-  - If regular users run untrusted guests on their machine, then L1TF is
-    just an add on to other malware which might be embedded in an untrusted
-    guest, e.g. spam-bots or attacks on the local network.
-
-    There is no technical way to prevent a user from running untrusted code
-    on their machines blindly.
-
-  - It's technically extremely unlikely and from today's knowledge even
-    impossible that L1TF can be exploited via the most popular attack
-    mechanisms like JavaScript because these mechanisms have no way to
-    control PTEs. If this would be possible and not other mitigation would
-    be possible, then the default might be different.
-
-  - The administrators of cloud and hosting setups have to carefully
-    analyze the risk for their scenarios and make the appropriate
-    mitigation choices, which might even vary across their deployed
-    machines and also result in other changes of their overall setup.
-    There is no way for the kernel to provide a sensible default for this
-    kind of scenarios.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 8/8] MDS basics 8
  2019-02-19 12:44 [patch 0/8] MDS basics 0 Thomas Gleixner
                   ` (6 preceding siblings ...)
  2019-02-19 12:44 ` [patch 7/8] MDS basics 7 Thomas Gleixner
@ 2019-02-19 12:44 ` Thomas Gleixner
  2019-02-19 14:17   ` [MODERATED] " Greg KH
  2019-02-19 17:27   ` [MODERATED] " Andrew Cooper
  2019-02-19 14:03 ` [MODERATED] Re: [patch 0/8] MDS basics 0 Andrew Cooper
                   ` (2 subsequent siblings)
  10 siblings, 2 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-02-19 12:44 UTC (permalink / raw)
  To: speck

Subject: [patch 8/8] Documentation: Add MDS vulnerability documentation
From: Thomas Gleixner <tglx@linutronix.de>

Add the initial MDS vulnerability documentation.

Still needs a lot of work....

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 Documentation/admin-guide/hw-vuln/index.rst |    1 
 Documentation/admin-guide/hw-vuln/l1tf.rst  |    1 
 Documentation/admin-guide/hw-vuln/mds.rst   |  230 ++++++++++++++++++++++++++++
 3 files changed, 232 insertions(+)

--- a/Documentation/admin-guide/hw-vuln/index.rst
+++ b/Documentation/admin-guide/hw-vuln/index.rst
@@ -10,3 +10,4 @@ are configurable at compile, boot or run
    :maxdepth: 1
 
    l1tf
+   mds
--- a/Documentation/admin-guide/hw-vuln/l1tf.rst
+++ b/Documentation/admin-guide/hw-vuln/l1tf.rst
@@ -445,6 +445,7 @@ The default is 'cond'. If 'l1tf=full,for
 line, then 'always' is enforced and the kvm-intel.vmentry_l1d_flush
 module parameter is ignored and writes to the sysfs file are rejected.
 
+.. _mitigation_selection:
 
 Mitigation selection guide
 --------------------------
--- /dev/null
+++ b/Documentation/admin-guide/hw-vuln/mds.rst
@@ -0,0 +1,230 @@
+MDS - Microarchitectural Data Sampling
+======================================
+
+Microarchitectural Data Sampling is a hardware vulnerability which allows
+unprivileged speculative access to data which is available in various CPU
+internal buffers.
+
+Affected processors
+-------------------
+
+This vulnerability affects a wide range of Intel processors. The
+vulnerability is not present on:
+
+   - Processors from AMD, Centaur and other non Intel vendors
+
+   - Older processor models, where the CPU family is < 6
+
+   - Some Atoms (Bonnell, Saltwell, Goldmont, GoldmontPlus)
+
+   - Intel processors which have the ARCH_CAP_MDS_NO bit set in the
+     IA32_ARCH_CAPABILITIES MSR.
+
+Whether a processor is affected or not can be read out from the MDS
+vulnerability file in sysfs. See :ref:`mds_sys_info`.
+
+Related CVEs
+------------
+
+The following CVE entries are related to the MDS vulnerability:
+
+   ==============  =====  ==============================================
+   CVE-2018-12126  MSBDS  Microarchitectural Store Buffer Data Sampling
+   CVE-2018-12130  MFBDS  Microarchitectural Fill Buffer Data Sampling
+   CVE-2018-12127  MLPDS  Microarchitectural Load Port Data Sampling
+   ==============  =====  ==============================================
+
+Problem
+-------
+
+When performing store, load, L1 refill operations, processors write data
+into temporary microarchitectural structures (buffers). The data in the
+buffer can be forwarded to load operations as an optimization.
+
+Under certain conditions, usually a fault/assist caused by a load
+operation, data unrelated to the load memory address can be speculatively
+forwarded from the buffers. Because the load operation causes a fault or
+assist and its result will be discarded, the forwarded data will not cause
+incorrect programm execution or state changes. But a malicious operation
+may be able to forward this speculative data to a disclosure gadget which
+allows in turn to infer the value via a cache side channel attack.
+
+Because the buffers are potentially shared between Hyper-Threads cross
+Hyper-Thread attacks may be possible.
+
+As the buffer sizes are smaller than the L1 cache, which was target of
+previous vulnerabilities, e.g. Meltdown, L1TF, the vulnerability is harder
+to exploit than with those attack vectors.
+
+
+Attack scenarios
+----------------
+
+  TBD
+
+.. _mds_sys_info:
+
+MDS system information
+-----------------------
+
+The Linux kernel provides a sysfs interface to enumerate the current MDS
+status of the system: whether the system is vulnerable, and which
+mitigations are active. The relevant sysfs file is:
+
+/sys/devices/system/cpu/vulnerabilities/mds
+
+The possible values in this file are:
+
+  ==============================   ====================================
+  'Not affected'		   The processor is not vulnerable
+  'Vulnerable'			   The processor is vulnerable, but no
+				   mitigation enabled
+  'Mitigation: CPU buffer clear'   The processor is vulnerable and the
+				   CPU buffer clearing mitigation is
+				   enabled.
+  ==============================   ====================================
+
+If the processor is vulnerable then the following information is appended
+to the above information:
+
+  - SMT status:
+
+    ========================  ============================================
+    'SMT vulnerable'          SMT is enabled
+    'SMT disabled'            SMT is disabled
+    'SMT Host state unknown'  Kernel runs in a VM, Host SMT state unknown
+    ========================  ============================================
+
+
+Mitigation mechanism
+-------------------------
+
+The kernel detects the affected CPUs and the presence of the microcode
+which is required.
+
+If a CPU is affected and the microcode is available, then the kernel
+enables the mitigation by default. The mitigation can be controlled at boot
+time via a kernel command line option. See
+:ref:`mds_mitigation_control_command_line`.
+
+.. _cpu_buffer_clear_full:
+
+Unconditional CPU buffer clearing
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+   The mitigation for MDS clears the affected CPU buffers unconditionally
+   on return to user space and when entering a guest.
+
+   If SMT is enabled it also clears the buffers on idle entry, but that's
+   not a sufficient SMT protection for all MDS variants; it covers solely
+   MSBDS.
+
+.. _virt_mechanism:
+
+Virtualization mitigation
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+   If the CPU is also affected by L1TF and the L1D flush mitigation is
+   enabled and up to date microcode is available, the L1D flush mitigation
+   is automatically protecting the guest transition. For details on L1TF
+   and virtualization see:
+   :ref:`Documentation/admin-guide/hw-vuln//l1tf.rst <mitigation_control_kvm>`.
+
+   If the L1D flush mitigation is disabled or the microcode is not
+   available the guest transition is unprotected.
+
+.. _xeon_phi:
+
+XEON PHI specific considerations
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+   The XEON PHI processor familiy is affected by MSBDS which can be
+   exploited cross Hyper-Threads when entering idle states. Some XEON PHI
+   variants allow to use MWAIT in user space (Ring 3) which opens an
+   potential attack vector for malicious user space. The exposure can be
+   disabled on the kernel command line with the 'ring3mwait=disable'
+   command line option.
+
+.. _mds_smt_control:
+
+SMT control
+^^^^^^^^^^^
+
+   To prevent the SMT issues of MDS it might be necessary to disable SMT
+   completely. Disabling SMT can have a significant performance impact, but
+   the impact depends on the type of workloads.
+
+   See the relevant chapter in the L1TF mitigation documentation for details:
+   :ref:`Documentation/admin-guide/hw-vuln/l1tf.rst <smt_control>`.
+
+.. _mds_mitigation_control_command_line:
+
+Mitigation control on the kernel command line
+---------------------------------------------
+
+The kernel command line allows to control the MDS mitigations at boot
+time with the option "mds=". The valid arguments for this option are:
+
+  ============  =============================================================
+  full		Provides all available mitigations for the MDS vulnerability
+		vulnerability, unconditional CPU buffer clearing on exit to
+		userspace and when entering a VM.
+
+		It does not automatically disable SMT.
+
+  off		Disables MDS mitigations completely.
+
+  ============  =============================================================
+
+The default is 'full'. For details see :ref:`cpu_buffer_clear_full`.
+
+
+Mitigation selection guide
+--------------------------
+
+1. Trusted userspace
+^^^^^^^^^^^^^^^^^^^^
+
+   If all userspace applications are from a trusted source and do not
+   execute untrusted code which is supplied externally, then the mitigation
+   can be disabled.
+
+
+2. Virtualization with trusted guests
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+   The same considerations as above versus trusted user space apply. See
+   also: :ref:`Documentation/admin-guide/hw-vuln//l1tf.rst <mitigation_selection>`.
+
+
+3. Virtualization with untrusted guests
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+   See :ref:`virt_mechanism`.
+
+3.4. Nested virtual machines
+""""""""""""""""""""""""""""
+
+When nested virtualization is in use, three operating systems are involved:
+the bare metal hypervisor, the nested hypervisor and the nested virtual
+machine.  VMENTER operations from the nested hypervisor into the nested
+guest will always be processed by the bare metal hypervisor. If KVM is the
+bare metal hypervisor it will:
+
+ - Invoke the enabled mitigation mechanism, L1D flush or CPU buffer
+   clearing on every switch from the nested hypervisor to the nested
+   virtual machine.
+
+.. _mds_default_mitigations:
+
+Default mitigations
+-------------------
+
+  The kernel default mitigations for vulnerable processors are:
+
+  - Enable CPU buffer clearing
+
+  The kernel does not by default enforce the disabling of SMT, which leaves
+  SMT systems vulnerable when running untrusted code. The same rationale as
+  for L1TF applies.
+  See :ref:`Documentation/admin-guide/hw-vuln//l1tf.rst <default_mitigations>`.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [MODERATED] Re: [patch 4/8] MDS basics 4
  2019-02-19 12:44 ` [patch 4/8] MDS basics 4 Thomas Gleixner
@ 2019-02-19 13:54   ` Andrew Cooper
  2019-02-19 14:02     ` Thomas Gleixner
  2019-02-19 16:08     ` [MODERATED] " Andi Kleen
  2019-02-19 16:07   ` Andi Kleen
  1 sibling, 2 replies; 42+ messages in thread
From: Andrew Cooper @ 2019-02-19 13:54 UTC (permalink / raw)
  To: speck

On 19/02/2019 12:44, speck for Thomas Gleixner wrote:
> Subject: [patch 4/8] x86/speculation/mds: Conditionaly clear CPU buffers on idle entry
> From: Thomas Gleixner <tglx@linutronix.de>
>
> Add a static key which controls the invocation of the CPU buffer clear
> mechanism on idle entry. This is independent of other MDS mitigations
> because the idle entry invocation to mitigate the potential leakage due to
> store buffer repartitioning is only necessary on SMT systems.
>
> Add the actual invocations to the different halt/mwait variants which
> covers all usage sites. mwaitx is not patched as it's not available on
> Intel CPUs.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Unfortunately, clearing is needed on the exit from idle as well as the
entry.

This only impacts the store buffer (MSBDS, previously PSF) because it
gets statically re-partitioned when a thread comes in and out of idle.

From the point of view of the thread going idle, when going idle my half
of the store buffers get given to the other thread and potentially leak
my secrets, whereas when coming out of idle, the other threads store
buffers get split with me, potentially leaking their secrets.

~Andrew

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [MODERATED] Re: [patch 1/8] MDS basics 1
  2019-02-19 12:44 ` [patch 1/8] MDS basics 1 Thomas Gleixner
@ 2019-02-19 14:00   ` Borislav Petkov
  0 siblings, 0 replies; 42+ messages in thread
From: Borislav Petkov @ 2019-02-19 14:00 UTC (permalink / raw)
  To: speck

On Tue, Feb 19, 2019 at 01:44:07PM +0100, speck for Thomas Gleixner wrote:
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -344,6 +344,7 @@
>  /* Intel-defined CPU features, CPUID level 0x00000007:0 (EDX), word 18 */
>  #define X86_FEATURE_AVX512_4VNNIW	(18*32+ 2) /* AVX-512 Neural Network Instructions */
>  #define X86_FEATURE_AVX512_4FMAPS	(18*32+ 3) /* AVX-512 Multiply Accumulation Single precision */
> +#define X86_FEATURE_MD_CLEAR		(18*32+10) /* VERW flushs CPU state */

"flushes"

Otherwise:

Reviewed-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 4/8] MDS basics 4
  2019-02-19 13:54   ` [MODERATED] " Andrew Cooper
@ 2019-02-19 14:02     ` Thomas Gleixner
  2019-02-19 14:07       ` Thomas Gleixner
  2019-02-19 17:16       ` Thomas Gleixner
  2019-02-19 16:08     ` [MODERATED] " Andi Kleen
  1 sibling, 2 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-02-19 14:02 UTC (permalink / raw)
  To: speck

On Tue, 19 Feb 2019, speck for Andrew Cooper wrote:

> On 19/02/2019 12:44, speck for Thomas Gleixner wrote:
> > Subject: [patch 4/8] x86/speculation/mds: Conditionaly clear CPU buffers on idle entry
> > From: Thomas Gleixner <tglx@linutronix.de>
> >
> > Add a static key which controls the invocation of the CPU buffer clear
> > mechanism on idle entry. This is independent of other MDS mitigations
> > because the idle entry invocation to mitigate the potential leakage due to
> > store buffer repartitioning is only necessary on SMT systems.
> >
> > Add the actual invocations to the different halt/mwait variants which
> > covers all usage sites. mwaitx is not patched as it's not available on
> > Intel CPUs.
> >
> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> 
> Unfortunately, clearing is needed on the exit from idle as well as the
> entry.
> 
> This only impacts the store buffer (MSBDS, previously PSF) because it
> gets statically re-partitioned when a thread comes in and out of idle.
> 
> >From the point of view of the thread going idle, when going idle my half
> of the store buffers get given to the other thread and potentially leak
> my secrets, whereas when coming out of idle, the other threads store
> buffers get split with me, potentially leaking their secrets.

Duh, indeed. Easy enough to fix.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [MODERATED] Re: [patch 0/8] MDS basics 0
  2019-02-19 12:44 [patch 0/8] MDS basics 0 Thomas Gleixner
                   ` (7 preceding siblings ...)
  2019-02-19 12:44 ` [patch 8/8] MDS basics 8 Thomas Gleixner
@ 2019-02-19 14:03 ` Andrew Cooper
  2019-02-19 14:09   ` Thomas Gleixner
  2019-02-19 14:10   ` [MODERATED] " Tyler Hicks
  2019-02-19 15:56 ` Andi Kleen
  2019-02-21 16:14 ` [MODERATED] Encrypted Message Jon Masters
  10 siblings, 2 replies; 42+ messages in thread
From: Andrew Cooper @ 2019-02-19 14:03 UTC (permalink / raw)
  To: speck

On 19/02/2019 12:44, speck for Thomas Gleixner wrote:
> Subject: [patch 0/8] MDS basics
> From: Thomas Gleixner <tglx@linutronix.de>
>
> Hi!
>
> I got the following information yesterday night:
>
>   "All - FYI.  There has been some chatter/ discussion on the subject.
>    Hopefully this note will help clarify.  We received a report from a
>    researcher who independently identified what we formerly referred to as
>    PSF (aka Microarchitectural Store Buffer Data Sampling).  There were
>    some initial indications (this week) this researcher would elect to
>    release a paper publicly PRIOR to the May 14 embargo was lifted.

I do apologize.  I did post the information that I had the specxen list,
but it didn't occur to me that it was liable to have been missed from here.

In future, I'll cross post any updates which don't appear to have
already made their way here.

~Andrew

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 4/8] MDS basics 4
  2019-02-19 14:02     ` Thomas Gleixner
@ 2019-02-19 14:07       ` Thomas Gleixner
  2019-02-19 16:09         ` [MODERATED] " Andi Kleen
  2019-02-19 17:16       ` Thomas Gleixner
  1 sibling, 1 reply; 42+ messages in thread
From: Thomas Gleixner @ 2019-02-19 14:07 UTC (permalink / raw)
  To: speck

On Tue, 19 Feb 2019, speck for Thomas Gleixner wrote:
> On Tue, 19 Feb 2019, speck for Andrew Cooper wrote:
> 
> > On 19/02/2019 12:44, speck for Thomas Gleixner wrote:
> > > Subject: [patch 4/8] x86/speculation/mds: Conditionaly clear CPU buffers on idle entry
> > > From: Thomas Gleixner <tglx@linutronix.de>
> > >
> > > Add a static key which controls the invocation of the CPU buffer clear
> > > mechanism on idle entry. This is independent of other MDS mitigations
> > > because the idle entry invocation to mitigate the potential leakage due to
> > > store buffer repartitioning is only necessary on SMT systems.
> > >
> > > Add the actual invocations to the different halt/mwait variants which
> > > covers all usage sites. mwaitx is not patched as it's not available on
> > > Intel CPUs.
> > >
> > > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> > 
> > Unfortunately, clearing is needed on the exit from idle as well as the
> > entry.
> > 
> > This only impacts the store buffer (MSBDS, previously PSF) because it
> > gets statically re-partitioned when a thread comes in and out of idle.
> > 
> > >From the point of view of the thread going idle, when going idle my half
> > of the store buffers get given to the other thread and potentially leak
> > my secrets, whereas when coming out of idle, the other threads store
> > buffers get split with me, potentially leaking their secrets.
> 
> Duh, indeed. Easy enough to fix.

Delta patch below. Stupid me even mentioned the repartioning on both sides
in the changelog.

Thanks,
 
	tglx

8<------------------
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -58,12 +58,14 @@ static inline __cpuidle void native_safe
 {
 	mds_clear_cpu_buffers(&idle_mds_clear_cpu_buffers);
 	asm volatile("sti; hlt": : :"memory");
+	mds_clear_cpu_buffers(&idle_mds_clear_cpu_buffers);
 }
 
 static inline __cpuidle void native_halt(void)
 {
 	mds_clear_cpu_buffers(&idle_mds_clear_cpu_buffers);
 	asm volatile("hlt": : :"memory");
+	mds_clear_cpu_buffers(&idle_mds_clear_cpu_buffers);
 }
 
 #endif
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -46,6 +46,8 @@ static inline void __mwait(unsigned long
 	/* "mwait %eax, %ecx;" */
 	asm volatile(".byte 0x0f, 0x01, 0xc9;"
 		     :: "a" (eax), "c" (ecx));
+
+	mds_clear_cpu_buffers(&idle_mds_clear_cpu_buffers);
 }
 
 /*

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 0/8] MDS basics 0
  2019-02-19 14:03 ` [MODERATED] Re: [patch 0/8] MDS basics 0 Andrew Cooper
@ 2019-02-19 14:09   ` Thomas Gleixner
  2019-02-19 14:10   ` [MODERATED] " Tyler Hicks
  1 sibling, 0 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-02-19 14:09 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 1103 bytes --]

Andrew,

On Tue, 19 Feb 2019, speck for Andrew Cooper wrote:

> On 19/02/2019 12:44, speck for Thomas Gleixner wrote:
> > Subject: [patch 0/8] MDS basics
> > From: Thomas Gleixner <tglx@linutronix.de>
> >
> > Hi!
> >
> > I got the following information yesterday night:
> >
> >   "All - FYI.  There has been some chatter/ discussion on the subject.
> >    Hopefully this note will help clarify.  We received a report from a
> >    researcher who independently identified what we formerly referred to as
> >    PSF (aka Microarchitectural Store Buffer Data Sampling).  There were
> >    some initial indications (this week) this researcher would elect to
> >    release a paper publicly PRIOR to the May 14 embargo was lifted.
> 
> I do apologize.  I did post the information that I had the specxen list,
> but it didn't occur to me that it was liable to have been missed from here.
> 
> In future, I'll cross post any updates which don't appear to have
> already made their way here.

It's surely not your fault and your problem, but I appreciate the offer.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [MODERATED] Re: [patch 0/8] MDS basics 0
  2019-02-19 14:03 ` [MODERATED] Re: [patch 0/8] MDS basics 0 Andrew Cooper
  2019-02-19 14:09   ` Thomas Gleixner
@ 2019-02-19 14:10   ` Tyler Hicks
  1 sibling, 0 replies; 42+ messages in thread
From: Tyler Hicks @ 2019-02-19 14:10 UTC (permalink / raw)
  To: speck

On 2019-02-19 14:03:21, speck for Andrew Cooper wrote:
> On 19/02/2019 12:44, speck for Thomas Gleixner wrote:
> > Subject: [patch 0/8] MDS basics
> > From: Thomas Gleixner <tglx@linutronix.de>
> >
> > Hi!
> >
> > I got the following information yesterday night:
> >
> >   "All - FYI.  There has been some chatter/ discussion on the subject.
> >    Hopefully this note will help clarify.  We received a report from a
> >    researcher who independently identified what we formerly referred to as
> >    PSF (aka Microarchitectural Store Buffer Data Sampling).  There were
> >    some initial indications (this week) this researcher would elect to
> >    release a paper publicly PRIOR to the May 14 embargo was lifted.
> 
> I do apologize.  I did post the information that I had the specxen list,
> but it didn't occur to me that it was liable to have been missed from here.
> 
> In future, I'll cross post any updates which don't appear to have
> already made their way here.

Good idea. I'll also try to make sure nothing important slips through
the cracks.

Tyler

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [MODERATED] Re: [patch 8/8] MDS basics 8
  2019-02-19 12:44 ` [patch 8/8] MDS basics 8 Thomas Gleixner
@ 2019-02-19 14:17   ` Greg KH
  2019-02-19 14:22     ` Thomas Gleixner
  2019-02-19 17:27   ` [MODERATED] " Andrew Cooper
  1 sibling, 1 reply; 42+ messages in thread
From: Greg KH @ 2019-02-19 14:17 UTC (permalink / raw)
  To: speck

On Tue, Feb 19, 2019 at 01:44:14PM +0100, speck for Thomas Gleixner wrote:
> +Mitigation control on the kernel command line
> +---------------------------------------------
> +
> +The kernel command line allows to control the MDS mitigations at boot
> +time with the option "mds=". The valid arguments for this option are:
> +
> +  ============  =============================================================
> +  full		Provides all available mitigations for the MDS vulnerability
> +		vulnerability, unconditional CPU buffer clearing on exit to
> +		userspace and when entering a VM.
> +
> +		It does not automatically disable SMT.
> +
> +  off		Disables MDS mitigations completely.
> +
> +  ============  =============================================================
> +
> +The default is 'full'. For details see :ref:`cpu_buffer_clear_full`.

I think default is 'auto' according to the patch that enabled it :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 8/8] MDS basics 8
  2019-02-19 14:17   ` [MODERATED] " Greg KH
@ 2019-02-19 14:22     ` Thomas Gleixner
  0 siblings, 0 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-02-19 14:22 UTC (permalink / raw)
  To: speck

On Tue, 19 Feb 2019, speck for Greg KH wrote:

> On Tue, Feb 19, 2019 at 01:44:14PM +0100, speck for Thomas Gleixner wrote:
> > +Mitigation control on the kernel command line
> > +---------------------------------------------
> > +
> > +The kernel command line allows to control the MDS mitigations at boot
> > +time with the option "mds=". The valid arguments for this option are:
> > +
> > +  ============  =============================================================
> > +  full		Provides all available mitigations for the MDS vulnerability
> > +		vulnerability, unconditional CPU buffer clearing on exit to
> > +		userspace and when entering a VM.
> > +
> > +		It does not automatically disable SMT.
> > +
> > +  off		Disables MDS mitigations completely.
> > +
> > +  ============  =============================================================
> > +
> > +The default is 'full'. For details see :ref:`cpu_buffer_clear_full`.
> 
> I think default is 'auto' according to the patch that enabled it :)

Yes. Forgot to document auto as well.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 5/8] MDS basics 5
  2019-02-19 12:44 ` [patch 5/8] MDS basics 5 Thomas Gleixner
@ 2019-02-19 15:07   ` Thomas Gleixner
  2019-02-19 16:13     ` [MODERATED] " Andi Kleen
  2019-02-19 16:03   ` [MODERATED] " Andi Kleen
  1 sibling, 1 reply; 42+ messages in thread
From: Thomas Gleixner @ 2019-02-19 15:07 UTC (permalink / raw)
  To: speck

On Tue, 19 Feb 2019, speck for Thomas Gleixner wrote:
> +/* Update the static key controlling the MDS CPU buffer clear in idle */
> +static void update_mds_branch_idle(void)
> +{
> +	if (sched_smt_active())
> +		static_branch_enable(&user_mds_clear_cpu_buffers);
> +	else
> +		static_branch_disable(&user_mds_clear_cpu_buffers);

Those obviously need s/user/idle/

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [MODERATED] Re: [patch 0/8] MDS basics 0
  2019-02-19 12:44 [patch 0/8] MDS basics 0 Thomas Gleixner
                   ` (8 preceding siblings ...)
  2019-02-19 14:03 ` [MODERATED] Re: [patch 0/8] MDS basics 0 Andrew Cooper
@ 2019-02-19 15:56 ` Andi Kleen
  2019-02-19 17:42   ` Thomas Gleixner
  2019-02-21 16:14 ` [MODERATED] Encrypted Message Jon Masters
  10 siblings, 1 reply; 42+ messages in thread
From: Andi Kleen @ 2019-02-19 15:56 UTC (permalink / raw)
  To: speck



> So while being grumpy about this communication fail, I'm even more
> grumpy about the fact, that we don't have even the minimal full/off
> mitigation in place in a workable form. I asked specifically for this

There seems to be certainly a communication fail here.

The minimal version has been posted on this mailing list in late December

(although it only showed up in early January because ...)


-Andi

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [MODERATED] Re: [patch 5/8] MDS basics 5
  2019-02-19 12:44 ` [patch 5/8] MDS basics 5 Thomas Gleixner
  2019-02-19 15:07   ` Thomas Gleixner
@ 2019-02-19 16:03   ` Andi Kleen
  2019-02-19 17:40     ` Thomas Gleixner
  1 sibling, 1 reply; 42+ messages in thread
From: Andi Kleen @ 2019-02-19 16:03 UTC (permalink / raw)
  To: speck

> +	case MDS_MITIGATION_AUTO:
> +	case MDS_MITIGATION_FULL:
> +		if (boot_cpu_has(X86_FEATURE_MD_CLEAR)) {
> +			mds_mitigation = MDS_MITIGATION_FULL;
> +			static_branch_enable(&user_mds_clear_cpu_buffers);


This violates Linus' feedback of unconditionally enabling for one year
to work around VMWare breakage.

-Andi

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [MODERATED] Re: [patch 3/8] MDS basics 3
  2019-02-19 12:44 ` [patch 3/8] MDS basics 3 Thomas Gleixner
@ 2019-02-19 16:04   ` Andi Kleen
  2019-02-19 21:44     ` Thomas Gleixner
  0 siblings, 1 reply; 42+ messages in thread
From: Andi Kleen @ 2019-02-19 16:04 UTC (permalink / raw)
  To: speck

On Tue, Feb 19, 2019 at 01:44:09PM +0100, speck for Thomas Gleixner wrote:
> Subject: [patch 3/8] x86/speculation/mds: Clear CPU buffers on exit to user
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> Add a static key which controls the invocation of the CPU buffer clear
> mechanism on exit to user space and add the call into
> prepare_exit_to_usermode() right before actually returning.

This is missing the NMI exit case.

-Andi

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [MODERATED] Re: [patch 4/8] MDS basics 4
  2019-02-19 12:44 ` [patch 4/8] MDS basics 4 Thomas Gleixner
  2019-02-19 13:54   ` [MODERATED] " Andrew Cooper
@ 2019-02-19 16:07   ` Andi Kleen
  2019-02-19 18:29     ` Thomas Gleixner
  1 sibling, 1 reply; 42+ messages in thread
From: Andi Kleen @ 2019-02-19 16:07 UTC (permalink / raw)
  To: speck

On Tue, Feb 19, 2019 at 01:44:10PM +0100, speck for Thomas Gleixner wrote:
> Subject: [patch 4/8] x86/speculation/mds: Conditionaly clear CPU buffers on idle entry
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> Add a static key which controls the invocation of the CPU buffer clear
> mechanism on idle entry. This is independent of other MDS mitigations
> because the idle entry invocation to mitigate the potential leakage due to
> store buffer repartitioning is only necessary on SMT systems.
> 
> Add the actual invocations to the different halt/mwait variants which
> covers all usage sites. mwaitx is not patched as it's not available on
> Intel CPUs.

This doesn't handle ACPI IO port idling correctly.


-Andi

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [MODERATED] Re: [patch 4/8] MDS basics 4
  2019-02-19 13:54   ` [MODERATED] " Andrew Cooper
  2019-02-19 14:02     ` Thomas Gleixner
@ 2019-02-19 16:08     ` Andi Kleen
  2019-02-19 16:23       ` Andrew Cooper
  1 sibling, 1 reply; 42+ messages in thread
From: Andi Kleen @ 2019-02-19 16:08 UTC (permalink / raw)
  To: speck

On Tue, Feb 19, 2019 at 01:54:55PM +0000, speck for Andrew Cooper wrote:
> On 19/02/2019 12:44, speck for Thomas Gleixner wrote:
> > Subject: [patch 4/8] x86/speculation/mds: Conditionaly clear CPU buffers on idle entry
> > From: Thomas Gleixner <tglx@linutronix.de>
> >
> > Add a static key which controls the invocation of the CPU buffer clear
> > mechanism on idle entry. This is independent of other MDS mitigations
> > because the idle entry invocation to mitigate the potential leakage due to
> > store buffer repartitioning is only necessary on SMT systems.
> >
> > Add the actual invocations to the different halt/mwait variants which
> > covers all usage sites. mwaitx is not patched as it's not available on
> > Intel CPUs.
> >
> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> 
> Unfortunately, clearing is needed on the exit from idle as well as the
> entry.

Not with the full flush. There it would always clear on next kernel exit.

-Andi

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [MODERATED] Re: [patch 4/8] MDS basics 4
  2019-02-19 14:07       ` Thomas Gleixner
@ 2019-02-19 16:09         ` Andi Kleen
  2019-02-19 16:17           ` Peter Zijlstra
  0 siblings, 1 reply; 42+ messages in thread
From: Andi Kleen @ 2019-02-19 16:09 UTC (permalink / raw)
  To: speck

On Tue, Feb 19, 2019 at 03:07:59PM +0100, speck for Thomas Gleixner wrote:
> On Tue, 19 Feb 2019, speck for Thomas Gleixner wrote:
> > On Tue, 19 Feb 2019, speck for Andrew Cooper wrote:
> > 
> > > On 19/02/2019 12:44, speck for Thomas Gleixner wrote:
> > > > Subject: [patch 4/8] x86/speculation/mds: Conditionaly clear CPU buffers on idle entry
> > > > From: Thomas Gleixner <tglx@linutronix.de>
> > > >
> > > > Add a static key which controls the invocation of the CPU buffer clear
> > > > mechanism on idle entry. This is independent of other MDS mitigations
> > > > because the idle entry invocation to mitigate the potential leakage due to
> > > > store buffer repartitioning is only necessary on SMT systems.
> > > >
> > > > Add the actual invocations to the different halt/mwait variants which
> > > > covers all usage sites. mwaitx is not patched as it's not available on
> > > > Intel CPUs.
> > > >
> > > > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> > > 
> > > Unfortunately, clearing is needed on the exit from idle as well as the
> > > entry.
> > > 
> > > This only impacts the store buffer (MSBDS, previously PSF) because it
> > > gets statically re-partitioned when a thread comes in and out of idle.
> > > 
> > > >From the point of view of the thread going idle, when going idle my half
> > > of the store buffers get given to the other thread and potentially leak
> > > my secrets, whereas when coming out of idle, the other threads store
> > > buffers get split with me, potentially leaking their secrets.
> > 
> > Duh, indeed. Easy enough to fix.
> 
> Delta patch below. Stupid me even mentioned the repartioning on both sides
> in the changelog.

It's not needed.

You can drop it.

-Andi

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [MODERATED] Re: [patch 5/8] MDS basics 5
  2019-02-19 15:07   ` Thomas Gleixner
@ 2019-02-19 16:13     ` Andi Kleen
  2019-02-19 17:37       ` Thomas Gleixner
  0 siblings, 1 reply; 42+ messages in thread
From: Andi Kleen @ 2019-02-19 16:13 UTC (permalink / raw)
  To: speck

On Tue, Feb 19, 2019 at 04:07:11PM +0100, speck for Thomas Gleixner wrote:
> On Tue, 19 Feb 2019, speck for Thomas Gleixner wrote:
> > +/* Update the static key controlling the MDS CPU buffer clear in idle */
> > +static void update_mds_branch_idle(void)
> > +{
> > +	if (sched_smt_active())
> > +		static_branch_enable(&user_mds_clear_cpu_buffers);
> > +	else
> > +		static_branch_disable(&user_mds_clear_cpu_buffers);
> 
> Those obviously need s/user/idle/

Please don't post untested crap like this. It would be really
bad if someone applied it.

This can be easily verified just by looking at a few PT traces.

Or just use MDSv2 which actually worked.

BTW even for the minimum version virtualization is definitely needed.

-Andi

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [MODERATED] Re: [patch 4/8] MDS basics 4
  2019-02-19 16:09         ` [MODERATED] " Andi Kleen
@ 2019-02-19 16:17           ` Peter Zijlstra
  0 siblings, 0 replies; 42+ messages in thread
From: Peter Zijlstra @ 2019-02-19 16:17 UTC (permalink / raw)
  To: speck

On Tue, Feb 19, 2019 at 08:09:13AM -0800, speck for Andi Kleen wrote:
> On Tue, Feb 19, 2019 at 03:07:59PM +0100, speck for Thomas Gleixner wrote:
> > On Tue, 19 Feb 2019, speck for Thomas Gleixner wrote:
> > > On Tue, 19 Feb 2019, speck for Andrew Cooper wrote:
> > > 
> > > > On 19/02/2019 12:44, speck for Thomas Gleixner wrote:
> > > > > Subject: [patch 4/8] x86/speculation/mds: Conditionaly clear CPU buffers on idle entry
> > > > > From: Thomas Gleixner <tglx@linutronix.de>
> > > > >
> > > > > Add a static key which controls the invocation of the CPU buffer clear
> > > > > mechanism on idle entry. This is independent of other MDS mitigations
> > > > > because the idle entry invocation to mitigate the potential leakage due to
> > > > > store buffer repartitioning is only necessary on SMT systems.
> > > > >
> > > > > Add the actual invocations to the different halt/mwait variants which
> > > > > covers all usage sites. mwaitx is not patched as it's not available on
> > > > > Intel CPUs.
> > > > >
> > > > > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> > > > 
> > > > Unfortunately, clearing is needed on the exit from idle as well as the
> > > > entry.
> > > > 
> > > > This only impacts the store buffer (MSBDS, previously PSF) because it
> > > > gets statically re-partitioned when a thread comes in and out of idle.
> > > > 
> > > > >From the point of view of the thread going idle, when going idle my half
> > > > of the store buffers get given to the other thread and potentially leak
> > > > my secrets, whereas when coming out of idle, the other threads store
> > > > buffers get split with me, potentially leaking their secrets.
> > > 
> > > Duh, indeed. Easy enough to fix.
> > 
> > Delta patch below. Stupid me even mentioned the repartioning on both sides
> > in the changelog.
> 
> It's not needed.

But will the part of the store buffer that gets repartitioned not
contain state of the other sibling?

If it can see our data after we go idle; why can't we see its data when
we come out of idle?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [MODERATED] Re: [patch 4/8] MDS basics 4
  2019-02-19 16:08     ` [MODERATED] " Andi Kleen
@ 2019-02-19 16:23       ` Andrew Cooper
  0 siblings, 0 replies; 42+ messages in thread
From: Andrew Cooper @ 2019-02-19 16:23 UTC (permalink / raw)
  To: speck

On 19/02/2019 16:08, speck for Andi Kleen wrote:
> On Tue, Feb 19, 2019 at 01:54:55PM +0000, speck for Andrew Cooper wrote:
>> On 19/02/2019 12:44, speck for Thomas Gleixner wrote:
>>> Subject: [patch 4/8] x86/speculation/mds: Conditionaly clear CPU buffers on idle entry
>>> From: Thomas Gleixner <tglx@linutronix.de>
>>>
>>> Add a static key which controls the invocation of the CPU buffer clear
>>> mechanism on idle entry. This is independent of other MDS mitigations
>>> because the idle entry invocation to mitigate the potential leakage due to
>>> store buffer repartitioning is only necessary on SMT systems.
>>>
>>> Add the actual invocations to the different halt/mwait variants which
>>> covers all usage sites. mwaitx is not patched as it's not available on
>>> Intel CPUs.
>>>
>>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>> Unfortunately, clearing is needed on the exit from idle as well as the
>> entry.
> Not with the full flush. There it would always clear on next kernel exit.

What about a PSF gadget between idle and the return to userspace?  That
can end up leaking data before the next flush happens.

What is the latency of a microcoded VERW?  Its surely unmeasurable in
the noise compared to everything else involved in going idle.

~Andrew

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 4/8] MDS basics 4
  2019-02-19 14:02     ` Thomas Gleixner
  2019-02-19 14:07       ` Thomas Gleixner
@ 2019-02-19 17:16       ` Thomas Gleixner
  1 sibling, 0 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-02-19 17:16 UTC (permalink / raw)
  To: speck

On Tue, 19 Feb 2019, speck for Thomas Gleixner wrote:

> On Tue, 19 Feb 2019, speck for Andrew Cooper wrote:
> 
> > On 19/02/2019 12:44, speck for Thomas Gleixner wrote:
> > > Subject: [patch 4/8] x86/speculation/mds: Conditionaly clear CPU buffers on idle entry
> > > From: Thomas Gleixner <tglx@linutronix.de>
> > >
> > > Add a static key which controls the invocation of the CPU buffer clear
> > > mechanism on idle entry. This is independent of other MDS mitigations
> > > because the idle entry invocation to mitigate the potential leakage due to
> > > store buffer repartitioning is only necessary on SMT systems.
> > >
> > > Add the actual invocations to the different halt/mwait variants which
> > > covers all usage sites. mwaitx is not patched as it's not available on
> > > Intel CPUs.
> > >
> > > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> > 
> > Unfortunately, clearing is needed on the exit from idle as well as the
> > entry.
> > 
> > This only impacts the store buffer (MSBDS, previously PSF) because it
> > gets statically re-partitioned when a thread comes in and out of idle.
> > 
> > >From the point of view of the thread going idle, when going idle my half
> > of the store buffers get given to the other thread and potentially leak
> > my secrets, whereas when coming out of idle, the other threads store
> > buffers get split with me, potentially leaking their secrets.
> 
> Duh, indeed. Easy enough to fix.

I take that back. It's a pointless excercise. Either the flush happens on
exit to user or on vmenter. Up to there nothing to worry about. If we have
to worry about the kernel being the malicious entity, we have other
problems.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [MODERATED] Re: [patch 8/8] MDS basics 8
  2019-02-19 12:44 ` [patch 8/8] MDS basics 8 Thomas Gleixner
  2019-02-19 14:17   ` [MODERATED] " Greg KH
@ 2019-02-19 17:27   ` Andrew Cooper
  1 sibling, 0 replies; 42+ messages in thread
From: Andrew Cooper @ 2019-02-19 17:27 UTC (permalink / raw)
  To: speck

On 19/02/2019 12:44, speck for Thomas Gleixner wrote:
> --- /dev/null
> +++ b/Documentation/admin-guide/hw-vuln/mds.rst
> @@ -0,0 +1,230 @@
> +MDS - Microarchitectural Data Sampling
> +======================================
> +
> +Microarchitectural Data Sampling is a hardware vulnerability which allows
> +unprivileged speculative access to data which is available in various CPU
> +internal buffers.

Strictly speaking, it is a group of related vulnerabilities.

The distinction is further complicated because some processors are only
affected by a subset of the group.

> +
> +Affected processors
> +-------------------
> +
> +This vulnerability affects a wide range of Intel processors. The
> +vulnerability is not present on:
> +
> +   - Processors from AMD, Centaur and other non Intel vendors
> +
> +   - Older processor models, where the CPU family is < 6
> +
> +   - Some Atoms (Bonnell, Saltwell, Goldmont, GoldmontPlus)
> +
> +   - Intel processors which have the ARCH_CAP_MDS_NO bit set in the
> +     IA32_ARCH_CAPABILITIES MSR.
> +
> +Whether a processor is affected or not can be read out from the MDS
> +vulnerability file in sysfs. See :ref:`mds_sys_info`.
> +
> +Related CVEs
> +------------
> +
> +The following CVE entries are related to the MDS vulnerability:
> +
> +   ==============  =====  ==============================================
> +   CVE-2018-12126  MSBDS  Microarchitectural Store Buffer Data Sampling
> +   CVE-2018-12130  MFBDS  Microarchitectural Fill Buffer Data Sampling
> +   CVE-2018-12127  MLPDS  Microarchitectural Load Port Data Sampling
> +   ==============  =====  ==============================================

Any chance of listing in CVE order?  Something feels weird having it
like this.

> +
> +Problem
> +-------
> +
> +When performing store, load, L1 refill operations, processors write data

Possibly just limit it to loads and stores?  From what I've been told,
the fill buffers get all data exiting the pipeline (including WC and UC)
and the L1 refill bit is just one aspect of their functionality.

> +into temporary microarchitectural structures (buffers). The data in the
> +buffer can be forwarded to load operations as an optimization.
> +
> +Under certain conditions, usually a fault/assist caused by a load
> +operation, data unrelated to the load memory address can be speculatively
> +forwarded from the buffers. Because the load operation causes a fault or
> +assist and its result will be discarded, the forwarded data will not cause
> +incorrect programm execution or state changes. But a malicious operation

Hmm - Today I learnt that there is a difference between the English and
German spelling of programm.

> +may be able to forward this speculative data to a disclosure gadget which
> +allows in turn to infer the value via a cache side channel attack.

Its not restricted to cache side channels.  There are other options
available for a sufficiently crafty attacker.

> +
> +Because the buffers are potentially shared between Hyper-Threads cross
> +Hyper-Thread attacks may be possible.

The cooperating-hyperthread attack has been demonstrated in practice.

Another piece of information which may not have filtered through from
Keybase a while ago is (an indirect report) of a researcher's
description of their PoC against Linux.

One thread sits and makes setrlimit() syscalls with a userspace pointer
to train the branch predictor to strongly take the copy_from_user()
path.  It then switches the pointer to a kernel address of interest for
the next syscall.

Speculation takes the strongly taken path, reads from kernel space and
writes into the local buffer (also in kernel space), but this uses a
fill buffer in the pipeline.  The other thread can snatch the data by
repeatedly sampling the fill buffers.

The data rate wasn't very high (6 bytes per second) when trying to leak
a 4k page containing known ASCII text, but it clearly demonstrates that
the attack is possible.

This particular PoC also depends on SMAP being disabled, as STAC/CLAC
instructions have lfence properties, but this is an implementation
detail rather than an architectural guarantee.  There are also plenty of
opportunities to pull this attack off outside of a STAC/CLAC pair.

> +
> +As the buffer sizes are smaller than the L1 cache, which was target of
> +previous vulnerabilities, e.g. Meltdown, L1TF, the vulnerability is harder
> +to exploit than with those attack vectors.
> +
> +
> +Attack scenarios
> +----------------
> +
> +  TBD
> +
> +.. _mds_sys_info:
> +
> +MDS system information
> +-----------------------
> +
> +The Linux kernel provides a sysfs interface to enumerate the current MDS
> +status of the system: whether the system is vulnerable, and which
> +mitigations are active. The relevant sysfs file is:
> +
> +/sys/devices/system/cpu/vulnerabilities/mds
> +
> +The possible values in this file are:
> +
> +  ==============================   ====================================
> +  'Not affected'		   The processor is not vulnerable
> +  'Vulnerable'			   The processor is vulnerable, but no
> +				   mitigation enabled
> +  'Mitigation: CPU buffer clear'   The processor is vulnerable and the
> +				   CPU buffer clearing mitigation is
> +				   enabled.
> +  ==============================   ====================================
> +
> +If the processor is vulnerable then the following information is appended
> +to the above information:
> +
> +  - SMT status:
> +
> +    ========================  ============================================
> +    'SMT vulnerable'          SMT is enabled
> +    'SMT disabled'            SMT is disabled
> +    'SMT Host state unknown'  Kernel runs in a VM, Host SMT state unknown
> +    ========================  ============================================

To the virtualisation side of things, in Xen we are working to ensure
that the reported topology is accurate and is scheduled in a safe way.

There is a proposed extension to the HyperV Viridian specification,
which will be a "topology can be trusted for safety" bit, and Xen will
be gaining a similar mechanism as soon as we've made it work.

> +
> +
> +Mitigation mechanism
> +-------------------------
> +
> +The kernel detects the affected CPUs and the presence of the microcode
> +which is required.
> +
> +If a CPU is affected and the microcode is available, then the kernel
> +enables the mitigation by default. The mitigation can be controlled at boot
> +time via a kernel command line option. See
> +:ref:`mds_mitigation_control_command_line`.
> +
> +.. _cpu_buffer_clear_full:
> +
> +Unconditional CPU buffer clearing
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +   The mitigation for MDS clears the affected CPU buffers unconditionally
> +   on return to user space and when entering a guest.
> +
> +   If SMT is enabled it also clears the buffers on idle entry, but that's
> +   not a sufficient SMT protection for all MDS variants; it covers solely
> +   MSBDS.
> +
> +.. _virt_mechanism:
> +
> +Virtualization mitigation
> +^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +   If the CPU is also affected by L1TF and the L1D flush mitigation is
> +   enabled and up to date microcode is available, the L1D flush mitigation
> +   is automatically protecting the guest transition. For details on L1TF
> +   and virtualization see:
> +   :ref:`Documentation/admin-guide/hw-vuln//l1tf.rst <mitigation_control_kvm>`.

Spurious double slash ?

~Andrew

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 5/8] MDS basics 5
  2019-02-19 16:13     ` [MODERATED] " Andi Kleen
@ 2019-02-19 17:37       ` Thomas Gleixner
  2019-02-20  0:05         ` Thomas Gleixner
  0 siblings, 1 reply; 42+ messages in thread
From: Thomas Gleixner @ 2019-02-19 17:37 UTC (permalink / raw)
  To: speck

On Tue, 19 Feb 2019, speck for Andi Kleen wrote:

> On Tue, Feb 19, 2019 at 04:07:11PM +0100, speck for Thomas Gleixner wrote:
> > On Tue, 19 Feb 2019, speck for Thomas Gleixner wrote:
> > > +/* Update the static key controlling the MDS CPU buffer clear in idle */
> > > +static void update_mds_branch_idle(void)
> > > +{
> > > +	if (sched_smt_active())
> > > +		static_branch_enable(&user_mds_clear_cpu_buffers);
> > > +	else
> > > +		static_branch_disable(&user_mds_clear_cpu_buffers);
> > 
> > Those obviously need s/user/idle/
> 
> Please don't post untested crap like this. It would be really
> bad if someone applied it.

Your last series contained serious bugs and did not even compile...

> This can be easily verified just by looking at a few PT traces.
> 
> Or just use MDSv2 which actually worked.
> 
> BTW even for the minimum version virtualization is definitely needed.

Did you read the cover letter? It's a todo and I did nowhere claim that it
is complete and perfect.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 5/8] MDS basics 5
  2019-02-19 16:03   ` [MODERATED] " Andi Kleen
@ 2019-02-19 17:40     ` Thomas Gleixner
  2019-02-19 17:44       ` [MODERATED] " Andrew Cooper
  0 siblings, 1 reply; 42+ messages in thread
From: Thomas Gleixner @ 2019-02-19 17:40 UTC (permalink / raw)
  To: speck

On Tue, 19 Feb 2019, speck for Andi Kleen wrote:

> > +	case MDS_MITIGATION_AUTO:
> > +	case MDS_MITIGATION_FULL:
> > +		if (boot_cpu_has(X86_FEATURE_MD_CLEAR)) {
> > +			mds_mitigation = MDS_MITIGATION_FULL;
> > +			static_branch_enable(&user_mds_clear_cpu_buffers);
> 
> 
> This violates Linus' feedback of unconditionally enabling for one year
> to work around VMWare breakage.

That's easy enough to turn the other way round. That's what review is for,
isn't it?

What's the VMware breakage? I can't remember that I've read that mail.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 0/8] MDS basics 0
  2019-02-19 15:56 ` Andi Kleen
@ 2019-02-19 17:42   ` Thomas Gleixner
  0 siblings, 0 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-02-19 17:42 UTC (permalink / raw)
  To: speck


On Tue, 19 Feb 2019, speck for Andi Kleen wrote:
> > So while being grumpy about this communication fail, I'm even more
> > grumpy about the fact, that we don't have even the minimal full/off
> > mitigation in place in a workable form. I asked specifically for this
> 
> There seems to be certainly a communication fail here.

Definitely not. I asked you to provide one based on review feedback.

> The minimal version has been posted on this mailing list in late December

You just forgot to mention that it's in unmergeable state as is any later
version.

> (although it only showed up in early January because ...)

Because I went on vacation and did not serve as Intel's butler for a while.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [MODERATED] Re: [patch 5/8] MDS basics 5
  2019-02-19 17:40     ` Thomas Gleixner
@ 2019-02-19 17:44       ` Andrew Cooper
  2019-02-19 17:52         ` Thomas Gleixner
  0 siblings, 1 reply; 42+ messages in thread
From: Andrew Cooper @ 2019-02-19 17:44 UTC (permalink / raw)
  To: speck

On 19/02/2019 17:40, speck for Thomas Gleixner wrote:
> On Tue, 19 Feb 2019, speck for Andi Kleen wrote:
>
>>> +	case MDS_MITIGATION_AUTO:
>>> +	case MDS_MITIGATION_FULL:
>>> +		if (boot_cpu_has(X86_FEATURE_MD_CLEAR)) {
>>> +			mds_mitigation = MDS_MITIGATION_FULL;
>>> +			static_branch_enable(&user_mds_clear_cpu_buffers);
>>
>> This violates Linus' feedback of unconditionally enabling for one year
>> to work around VMWare breakage.
> That's easy enough to turn the other way round. That's what review is for,
> isn't it?
>
> What's the VMware breakage? I can't remember that I've read that mail.

VMWare reckon they will have updated microcode long before guest VM's
get to the the MD_CLEAR CPUID bit, and wanted guests to issue the VERW
instruction in the hope that it might do something.

~Andrew

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 5/8] MDS basics 5
  2019-02-19 17:44       ` [MODERATED] " Andrew Cooper
@ 2019-02-19 17:52         ` Thomas Gleixner
  0 siblings, 0 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-02-19 17:52 UTC (permalink / raw)
  To: speck

On Tue, 19 Feb 2019, speck for Andrew Cooper wrote:

> On 19/02/2019 17:40, speck for Thomas Gleixner wrote:
> > On Tue, 19 Feb 2019, speck for Andi Kleen wrote:
> >
> >>> +	case MDS_MITIGATION_AUTO:
> >>> +	case MDS_MITIGATION_FULL:
> >>> +		if (boot_cpu_has(X86_FEATURE_MD_CLEAR)) {
> >>> +			mds_mitigation = MDS_MITIGATION_FULL;
> >>> +			static_branch_enable(&user_mds_clear_cpu_buffers);
> >>
> >> This violates Linus' feedback of unconditionally enabling for one year
> >> to work around VMWare breakage.
> > That's easy enough to turn the other way round. That's what review is for,
> > isn't it?
> >
> > What's the VMware breakage? I can't remember that I've read that mail.
> 
> VMWare reckon they will have updated microcode long before guest VM's
> get to the the MD_CLEAR CPUID bit, and wanted guests to issue the VERW
> instruction in the hope that it might do something.

Thanks for the reminder. I was in the room and chuckled when this was
discussed, but it slipped from memory.

Oh well...

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 4/8] MDS basics 4
  2019-02-19 16:07   ` Andi Kleen
@ 2019-02-19 18:29     ` Thomas Gleixner
  0 siblings, 0 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-02-19 18:29 UTC (permalink / raw)
  To: speck

On Tue, 19 Feb 2019, speck for Andi Kleen wrote:
> On Tue, Feb 19, 2019 at 01:44:10PM +0100, speck for Thomas Gleixner wrote:
> > Subject: [patch 4/8] x86/speculation/mds: Conditionaly clear CPU buffers on idle entry
> > From: Thomas Gleixner <tglx@linutronix.de>
> > 
> > Add a static key which controls the invocation of the CPU buffer clear
> > mechanism on idle entry. This is independent of other MDS mitigations
> > because the idle entry invocation to mitigate the potential leakage due to
> > store buffer repartitioning is only necessary on SMT systems.
> > 
> > Add the actual invocations to the different halt/mwait variants which
> > covers all usage sites. mwaitx is not patched as it's not available on
> > Intel CPUs.
> 
> This doesn't handle ACPI IO port idling correctly.

But it neither breaks ARM64 nor IA64 due to cpu_clear_idle() not being
defined in acpi/processor_idle.c. Talking about untested crap....

That aside, it's easy enough to fix that.

OTOH, the question is whether it matters. Everything what matters moved
towards intel_idle almost a decade ago and even my nehalem box does not use
the old acpi processor idle stuff.

If you can come up with a compelling reason to address that, I'm happy to
fix it. If not, it'd be just wasted time and disk space.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 3/8] MDS basics 3
  2019-02-19 16:04   ` [MODERATED] " Andi Kleen
@ 2019-02-19 21:44     ` Thomas Gleixner
  2019-02-19 22:13       ` Thomas Gleixner
  2019-02-20 16:59       ` [MODERATED] " Andi Kleen
  0 siblings, 2 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-02-19 21:44 UTC (permalink / raw)
  To: speck

On Tue, 19 Feb 2019, speck for Andi Kleen wrote:

> On Tue, Feb 19, 2019 at 01:44:09PM +0100, speck for Thomas Gleixner wrote:
> > Subject: [patch 3/8] x86/speculation/mds: Clear CPU buffers on exit to user
> > From: Thomas Gleixner <tglx@linutronix.de>
> > 
> > Add a static key which controls the invocation of the CPU buffer clear
> > mechanism on exit to user space and add the call into
> > prepare_exit_to_usermode() right before actually returning.
> 
> This is missing the NMI exit case.

And that exit case is what? Security voodoo just in case?

Your NMI covering patch 'changelog':

  "NMIs don't go through the normal exit code when exiting to user
   space. Normally we consider NMIs not sensitive anyways, but they need
   special handling with mds=full.
   So add an explicit check to do_nmi to clear the CPU with mds=full"

is useful in that regard as always. It neither explains why NMIs are not
considered sensitive nor does it provide any useful argument why mds=full
would make any difference.

Can you please answer the following questions:

    1) Who is 'we'?

       Andi Kleen using pluralis majestatis or are you referring to the
       kernel community as a whole?

       I don't care either way because none of the options provides a
       rationale.

    2) What is 'normally' ?

       I'm not aware of any normative statement regarding this matter.

    3) Why are NMIs considered not sensitive?

       This is missing any explanation why this is the case

    4) What makes mds=full different?

       Just because it's named 'full'? That surely does not qualify as a
       technical argument which provides a useful rationale for adding this
       flush.

Please provide a conclusive technical explanation why a NMI entered from
user space and returning to that would expose sensible information to an
attacker in an exploitable way.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 3/8] MDS basics 3
  2019-02-19 21:44     ` Thomas Gleixner
@ 2019-02-19 22:13       ` Thomas Gleixner
  2019-02-20 16:59       ` [MODERATED] " Andi Kleen
  1 sibling, 0 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-02-19 22:13 UTC (permalink / raw)
  To: speck

On Tue, 19 Feb 2019, speck for Thomas Gleixner wrote:
> 
> Please provide a conclusive technical explanation why a NMI entered from
> user space and returning to that would expose sensible information to an
> attacker in an exploitable way.

And while at it, please provide explanations why this does not apply for
exceptions which hit the paranoid section, i.e. in the middle of returning
to user space where the regular exit path cannot be invoked.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 5/8] MDS basics 5
  2019-02-19 17:37       ` Thomas Gleixner
@ 2019-02-20  0:05         ` Thomas Gleixner
  0 siblings, 0 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-02-20  0:05 UTC (permalink / raw)
  To: speck

On Tue, 19 Feb 2019, speck for Thomas Gleixner wrote:
> On Tue, 19 Feb 2019, speck for Andi Kleen wrote:
> > BTW even for the minimum version virtualization is definitely needed.
> 
> Did you read the cover letter? It's a todo and I did nowhere claim that it
> is complete and perfect.

And for completeness sake:

Do you refer to that so complicated exposure of the MD_CLEAR feature bit to
the guest?

I assume so, because in your patches nothing else is done regarding
virtualization.

But if you've read my cover letter carefully you might have noticed that
there is no support for the case where the CPU is not affected by L1TF, but
is affected by MDS. That's not a theoretical case at all.

I fail to see that case addressed in your so complete and working patchset
version whatever. That's a way more interesting problem than adding a CPU
feature flag to some existing initializer, which is is an obvious and
purely mechanical operation.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [MODERATED] Re: [patch 3/8] MDS basics 3
  2019-02-19 21:44     ` Thomas Gleixner
  2019-02-19 22:13       ` Thomas Gleixner
@ 2019-02-20 16:59       ` Andi Kleen
  2019-02-20 21:28         ` Thomas Gleixner
  1 sibling, 1 reply; 42+ messages in thread
From: Andi Kleen @ 2019-02-20 16:59 UTC (permalink / raw)
  To: speck


Thomas,

> Please provide a conclusive technical explanation why a NMI entered from
> user space and returning to that would expose sensible information to an
> attacker in an exploitable way.

Okay so you're not actually implementing full, like you stated earlier,
but a strange undocumented minimal version of lazy.

Lazy is always a trade off. Since we still run with a single address
space the CPU can always do some prefetching or speculative execution
and fetch something nearby that the code doesn't actually access intentionally,
which may end up leaking through some buffer.

In such a case even a NMI could leak.

Of course such a case is unlikely, but in theory it could happen.

The intention of the full option was to allow an option 
for people who cannot accept any risk at all. Of course
that's not the right approach for most users, but for a few
it might be.

If you implement full you should implement it properly
and not leave holes.

If you're not interested in code review please state so and I will
not bother anymore.

-Andi

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 3/8] MDS basics 3
  2019-02-20 16:59       ` [MODERATED] " Andi Kleen
@ 2019-02-20 21:28         ` Thomas Gleixner
  0 siblings, 0 replies; 42+ messages in thread
From: Thomas Gleixner @ 2019-02-20 21:28 UTC (permalink / raw)
  To: speck

Andi,

On Wed, 20 Feb 2019, speck for Andi Kleen wrote:
> > Please provide a conclusive technical explanation why a NMI entered from
> > user space and returning to that would expose sensible information to an
> > attacker in an exploitable way.
> 
> Okay so you're not actually implementing full, like you stated earlier,
> but a strange undocumented minimal version of lazy.
> 
> Lazy is always a trade off. Since we still run with a single address
> space the CPU can always do some prefetching or speculative execution
> and fetch something nearby that the code doesn't actually access intentionally,
> which may end up leaking through some buffer.
> 
> In such a case even a NMI could leak.
> 
> Of course such a case is unlikely, but in theory it could happen.
> 
> The intention of the full option was to allow an option 
> for people who cannot accept any risk at all. Of course
> that's not the right approach for most users, but for a few
> it might be.

That makes sense if you map full to paranoid, which I tried to
avoid, but fair enough.

Thanks for the explanation.

       tglx

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [MODERATED] Encrypted Message
  2019-02-19 12:44 [patch 0/8] MDS basics 0 Thomas Gleixner
                   ` (9 preceding siblings ...)
  2019-02-19 15:56 ` Andi Kleen
@ 2019-02-21 16:14 ` Jon Masters
  10 siblings, 0 replies; 42+ messages in thread
From: Jon Masters @ 2019-02-21 16:14 UTC (permalink / raw)
  To: speck

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/rfc822-headers; protected-headers="v1", Size: 125 bytes --]

From: Jon Masters <jcm@redhat.com>
To: speck for Thomas Gleixner <speck@linutronix.de>
Subject: Re: [patch 0/8] MDS basics 0

[-- Attachment #2: Type: text/plain, Size: 304 bytes --]

Hi Thomas,

Just a note on testing. I built a few Coffelake client systems for Red
Hat using the 8086K anniversary processor for which we have test ucode.
I will build and test these patches and ask the RH perf team to test.

Jon.

-- 
Computer Architect | Sent with my Fedora powered laptop


^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2019-02-21 16:14 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-19 12:44 [patch 0/8] MDS basics 0 Thomas Gleixner
2019-02-19 12:44 ` [patch 1/8] MDS basics 1 Thomas Gleixner
2019-02-19 14:00   ` [MODERATED] " Borislav Petkov
2019-02-19 12:44 ` [patch 2/8] MDS basics 2 Thomas Gleixner
2019-02-19 12:44 ` [patch 3/8] MDS basics 3 Thomas Gleixner
2019-02-19 16:04   ` [MODERATED] " Andi Kleen
2019-02-19 21:44     ` Thomas Gleixner
2019-02-19 22:13       ` Thomas Gleixner
2019-02-20 16:59       ` [MODERATED] " Andi Kleen
2019-02-20 21:28         ` Thomas Gleixner
2019-02-19 12:44 ` [patch 4/8] MDS basics 4 Thomas Gleixner
2019-02-19 13:54   ` [MODERATED] " Andrew Cooper
2019-02-19 14:02     ` Thomas Gleixner
2019-02-19 14:07       ` Thomas Gleixner
2019-02-19 16:09         ` [MODERATED] " Andi Kleen
2019-02-19 16:17           ` Peter Zijlstra
2019-02-19 17:16       ` Thomas Gleixner
2019-02-19 16:08     ` [MODERATED] " Andi Kleen
2019-02-19 16:23       ` Andrew Cooper
2019-02-19 16:07   ` Andi Kleen
2019-02-19 18:29     ` Thomas Gleixner
2019-02-19 12:44 ` [patch 5/8] MDS basics 5 Thomas Gleixner
2019-02-19 15:07   ` Thomas Gleixner
2019-02-19 16:13     ` [MODERATED] " Andi Kleen
2019-02-19 17:37       ` Thomas Gleixner
2019-02-20  0:05         ` Thomas Gleixner
2019-02-19 16:03   ` [MODERATED] " Andi Kleen
2019-02-19 17:40     ` Thomas Gleixner
2019-02-19 17:44       ` [MODERATED] " Andrew Cooper
2019-02-19 17:52         ` Thomas Gleixner
2019-02-19 12:44 ` [patch 6/8] MDS basics 6 Thomas Gleixner
2019-02-19 12:44 ` [patch 7/8] MDS basics 7 Thomas Gleixner
2019-02-19 12:44 ` [patch 8/8] MDS basics 8 Thomas Gleixner
2019-02-19 14:17   ` [MODERATED] " Greg KH
2019-02-19 14:22     ` Thomas Gleixner
2019-02-19 17:27   ` [MODERATED] " Andrew Cooper
2019-02-19 14:03 ` [MODERATED] Re: [patch 0/8] MDS basics 0 Andrew Cooper
2019-02-19 14:09   ` Thomas Gleixner
2019-02-19 14:10   ` [MODERATED] " Tyler Hicks
2019-02-19 15:56 ` Andi Kleen
2019-02-19 17:42   ` Thomas Gleixner
2019-02-21 16:14 ` [MODERATED] Encrypted Message Jon Masters

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.