From mboxrd@z Thu Jan 1 00:00:00 1970 Return-path: Received: from mga14.intel.com ([192.55.52.115]) by Galois.linutronix.de with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1fDgSL-0000lP-LZ for speck@linutronix.de; Wed, 02 May 2018 03:21:18 +0200 Date: Tue, 1 May 2018 18:21:12 -0700 From: Andi Kleen Subject: [MODERATED] Re: Updated L1TF native OS patch Message-ID: <20180502012112.GQ75137@tassilo.jf.intel.com> References: <20180501234247.GA41910@tassilo.jf.intel.com> <20180502000512.GO75137@tassilo.jf.intel.com> MIME-Version: 1.0 In-Reply-To: <20180502000512.GO75137@tassilo.jf.intel.com> Content-Type: multipart/mixed; boundary="SLauP2uySp+9cKYP" Content-Disposition: inline To: speck@linutronix.de List-ID: --SLauP2uySp+9cKYP Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Tue, May 01, 2018 at 05:05:12PM -0700, speck for Andi Kleen wrote: > On Tue, May 01, 2018 at 04:59:53PM -0700, speck for Linus Torvalds wrote: > > > > > > On Tue, 1 May 2018, speck for Andi Kleen wrote: > > > > > > Here's a v2 of the L1TF native OS patch kit. > > > > Hmm. I got 1/6 three times, but not any of the others.. > > Sorry I had some troubles with the gpg scripting. I sent > them manually now. Hopefully this works. Also attaching the unencrypted mbox for easier applying etc. -Andi --SLauP2uySp+9cKYP Content-Type: application/vnd.wolfram.mathematica.package Content-Disposition: attachment; filename=m Content-Transfer-Encoding: quoted-printable =46rom 4c1c59c9fe67df207d4623ef3ccd1a25b0700184 Mon Sep 17 00:00:00 2001=0A= =46rom: Linus Torvalds =0ADate: Fri, 27 Apr = 2018 09:06:34 -0700=0ASubject: [PATCH 1/6] x86, l1tf: Protect swap entries = against L1TF=0ATo: speck@linutronix.de=0AStatus: RO=0AContent-Length: 4433= =0ALines: 108=0A=0AWith L1 terminal fault the CPU speculates into unmapped = PTEs, and=0Aresulting side effects allow to read the memory the PTE is poin= ting=0Atoo, if its values are still in the L1 cache.=0A=0AFor swapped out p= ages Linux uses unmapped PTEs and stores a swap entry=0Ainto them.=0A=0AWe = need to make sure the swap entry is not pointing to valid memory,=0Awhich r= equires setting higher bits (between bit 36 and bit 45) that=0Aare inside t= he CPUs physical address space, but outside any real=0Amemory.=0A=0ATo do t= his we invert the offset to make sure the higher bits are always=0Aset, as = long as the swap file is not too big.=0A=0AHere's a patch that switches the= order of "type" and=0A"offset" in the x86-64 encoding, in addition to doin= g the binary 'not' on=0Athe offset.=0A=0AThat means that now the offset is = bits 9-58 in the page table, and that=0Athe offset is in the bits that hard= ware generally doesn't care about.=0A=0AThat, in turn, means that if you ha= ve a desktop chip with only 40 bits of=0Aphysical addressing, now that the = offset starts at bit 9, you still have=0Ato have 30 bits of offset actually= *in use* until bit 39 ends up being=0Aclear.=0A=0ASo that's 4 terabyte of = swap space (because the offset is counted in=0Apages, so 30 bits of offset = is 42 bits of actual coverage). With bigger=0Aphysical addressing, that obv= iously grows further, until you hit the limit=0Aof the offset (at 50 bits o= f offset - 62 bits of actual swap file=0Acoverage).=0A=0ANote there is no w= orkaround for 32bit !PAE, or on systems which=0Ahave more than MAX_PA/2 mem= ory. The later case is very unlikely=0Ato happen on real systems.=0A=0A[upd= ated description and minor tweaks by AK]=0A=0AXXX Linus to add his SOB here= =0A=0ASigned-off-by: Andi Kleen =0ATested-by: Andi Klee= n =0A---=0A arch/x86/include/asm/pgtable_64.h | 36 ++++= +++++++++++++++++++++-----------=0A 1 file changed, 25 insertions(+), 11 de= letions(-)=0A=0Adiff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/i= nclude/asm/pgtable_64.h=0Aindex 877bc27718ae..593c3cf259dd 100644=0A--- a/a= rch/x86/include/asm/pgtable_64.h=0A+++ b/arch/x86/include/asm/pgtable_64.h= =0A@@ -273,7 +273,7 @@ static inline int pgd_large(pgd_t pgd) { return 0; }= =0A *=0A * | ... | 11| 10| 9|8|7|6|5| 4| 3|2| 1|0| <- bit= number=0A * | ... |SW3|SW2|SW1|G|L|D|A|CD|WT|U| W|P| <- bi= t names=0A- * | OFFSET (14->63) | TYPE (9-13) |0|0|X|X| X| X|X|SD|0| <- sw= p entry=0A+ * | TYPE (59-63) | ~OFFSET (9-58) |0|0|X|X| X| X|X|SD|0| <- sw= p entry=0A *=0A * G (8) is aliased and used as a PROT_NONE indicator for= =0A * !present ptes. We need to start storing swap entries above=0A@@ -28= 6,20 +286,34 @@ static inline int pgd_large(pgd_t pgd) { return 0; }=0A *= =0A * Bit 7 in swp entry should be 0 because pmd_present checks not only P= ,=0A * but also L and G.=0A+ *=0A+ * The offset is inverted by a binary no= t operation to make the high=0A+ * physical bits set.=0A */=0A-#define SWP= _TYPE_FIRST_BIT (_PAGE_BIT_PROTNONE + 1)=0A-#define SWP_TYPE_BITS 5=0A-/* P= lace the offset above the type: */=0A-#define SWP_OFFSET_FIRST_BIT (SWP_TYP= E_FIRST_BIT + SWP_TYPE_BITS)=0A+#define SWP_TYPE_BITS 5=0A+=0A+#define SWP= _OFFSET_FIRST_BIT (_PAGE_BIT_PROTNONE + 1)=0A+=0A+/* We always extract/enco= de the offset by shifting it all the way up, and then down again */=0A+#def= ine SWP_OFFSET_SHIFT (SWP_OFFSET_FIRST_BIT+SWP_TYPE_BITS)=0A =0A #define MA= X_SWAPFILES_CHECK() BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > SWP_TYPE_BITS)=0A = =0A-#define __swp_type(x) (((x).val >> (SWP_TYPE_FIRST_BIT)) \=0A- &= ((1U << SWP_TYPE_BITS) - 1))=0A-#define __swp_offset(x) ((x).val >> SWP_= OFFSET_FIRST_BIT)=0A-#define __swp_entry(type, offset) ((swp_entry_t) { \= =0A- ((type) << (SWP_TYPE_FIRST_BIT)) \=0A- | ((offset) << SWP_OF= FSET_FIRST_BIT) })=0A+/* Extract the high bits for type */=0A+#define __swp= _type(x) ((x).val >> (64 - SWP_TYPE_BITS))=0A+=0A+/* Shift up (to get rid o= f type), then down to get value */=0A+#define __swp_offset(x) (~(x).val << = SWP_TYPE_BITS >> SWP_OFFSET_SHIFT)=0A+=0A+/*=0A+ * Shift the offset up "too= far" by TYPE bits, then down again=0A+ * The offset is inverted by a binar= y not operation to make the high=0A+ * physical bits set.=0A+ */=0A+#define= __swp_entry(type, offset) ((swp_entry_t) { \=0A+ (~(unsigned long)(offset)= << SWP_OFFSET_SHIFT >> SWP_TYPE_BITS) \=0A+ | ((unsigned long)(type) << (6= 4-SWP_TYPE_BITS)) })=0A+=0A #define __pte_to_swp_entry(pte) ((swp_entry_t)= { pte_val((pte)) })=0A #define __pmd_to_swp_entry(pmd) ((swp_entry_t) { p= md_val((pmd)) })=0A #define __swp_entry_to_pte(x) ((pte_t) { .pte =3D (x).= val })=0A-- =0A2.15.0=0A=0A=0AFrom 73a8594bdc5d88bdb125e458a4147669b8ff1cd1= Mon Sep 17 00:00:00 2001=0AFrom: Andi Kleen =0ADate: F= ri, 27 Apr 2018 09:47:37 -0700=0ASubject: [PATCH 2/6] x86, l1tf: Protect PR= OT_NONE PTEs against speculation=0ATo: speck@linutronix.de=0AStatus: RO=0AC= ontent-Length: 7879=0ALines: 257=0A=0AWe also need to protect PTEs that are= set to PROT_NONE against=0AL1TF speculation attacks.=0A=0AThis is importan= t inside guests, because L1TF speculation=0Abypasses physical page remappin= g. While the VM has its own=0Amigitations preventing leaking data from othe= r VMs into=0Athe guest, this would still risk leaking the wrong page=0Ainsi= de the current guest.=0A=0AThis uses the same technique as Linus' swap entr= y patch:=0Awhile an entry is is in PROTNONE state we invert the=0Acomplete = PFN part part of it. This ensures that the=0Athe highest bit will point to = non existing memory.=0A=0AThe invert is done by pte/pmd/pud_modify and pfn/= pmd/pud_pte for=0APROTNONE and pte/pmd/pud_pfn undo it.=0A=0AWe assume that= noone tries to touch the PFN part of=0Aa PTE without using these primitive= s.=0A=0AThis doesn't handle the case that MMIO is on the top=0Aof the CPU p= hysical memory. If such an MMIO region=0Awas exposed by an unpriviledged dr= iver for mmap=0Ait would be possible to attack some real memory.=0AHowever = this situation is all rather unlikely.=0A=0AFor 32bit non PAE we don't try = inversion because=0Athere are really not enough bits to protect anything.= =0A=0ASigned-off-by: Andi Kleen =0A---=0A arch/x86/incl= ude/asm/pgtable-2level.h | 27 +++++++++++++++++++=0A arch/x86/include/asm/p= gtable-3level.h | 2 ++=0A arch/x86/include/asm/pgtable-invert.h | 45 +++++= ++++++++++++++++++++++++++=0A arch/x86/include/asm/pgtable.h | 50 ++= ++++++++++++++++++++++++---------=0A arch/x86/include/asm/pgtable_64.h = | 2 ++=0A 5 files changed, 113 insertions(+), 13 deletions(-)=0A create mo= de 100644 arch/x86/include/asm/pgtable-invert.h=0A=0Adiff --git a/arch/x86/= include/asm/pgtable-2level.h b/arch/x86/include/asm/pgtable-2level.h=0Ainde= x 685ffe8a0eaf..a1de6ae0c443 100644=0A--- a/arch/x86/include/asm/pgtable-2l= evel.h=0A+++ b/arch/x86/include/asm/pgtable-2level.h=0A@@ -95,4 +95,31 @@ s= tatic inline unsigned long pte_bitop(unsigned long value, unsigned int righ= tshi=0A #define __pte_to_swp_entry(pte) ((swp_entry_t) { (pte).pte_low })= =0A #define __swp_entry_to_pte(x) ((pte_t) { .pte =3D (x).val })=0A =0A+/*= No inverted PFNs on 2 level page tables */=0A+=0A+static inline bool pte_p= fn_inverted(pte_t pte)=0A+{=0A+ return false;=0A+}=0A+=0A+static inline boo= l pmd_pfn_inverted(pmd_t pmd)=0A+{=0A+ return false;=0A+}=0A+=0A+static inl= ine bool pud_pfn_inverted(pud_t pud)=0A+{=0A+ return false;=0A+}=0A+=0A+sta= tic inline bool pgprot_pfn_inverted(pgprot_t prot)=0A+{=0A+ return false;= =0A+}=0A+=0A+static inline u64 flip_protnone_guard(u64 oldval, u64 val, u64= mask)=0A+{=0A+ return val;=0A+}=0A+=0A #endif /* _ASM_X86_PGTABLE_2LEVEL_H= */=0Adiff --git a/arch/x86/include/asm/pgtable-3level.h b/arch/x86/include= /asm/pgtable-3level.h=0Aindex f24df59c40b2..76ab26a99e6e 100644=0A--- a/arc= h/x86/include/asm/pgtable-3level.h=0A+++ b/arch/x86/include/asm/pgtable-3le= vel.h=0A@@ -295,4 +295,6 @@ static inline pte_t gup_get_pte(pte_t *ptep)=0A= return pte;=0A }=0A =0A+#include =0A+=0A #endif /* = _ASM_X86_PGTABLE_3LEVEL_H */=0Adiff --git a/arch/x86/include/asm/pgtable-in= vert.h b/arch/x86/include/asm/pgtable-invert.h=0Anew file mode 100644=0Aind= ex 000000000000..045eb77411cc=0A--- /dev/null=0A+++ b/arch/x86/include/asm/= pgtable-invert.h=0A@@ -0,0 +1,45 @@=0A+/* SPDX-License-Identifier: GPL-2.0 = */=0A+#ifndef _ASM_PGTABLE_INVERT_H=0A+#define _ASM_PGTABLE_INVERT_H 1=0A+= =0A+#ifndef __ASSEMBLY__=0A+=0A+static inline bool pte_pfn_inverted(pte_t p= te)=0A+{=0A+ u64 val =3D pte_val(pte);=0A+ return (val & (_PAGE_PRESENT|_PA= GE_PROTNONE)) =3D=3D _PAGE_PROTNONE;=0A+}=0A+=0A+static inline bool pmd_pfn= _inverted(pmd_t pmd)=0A+{=0A+ u64 val =3D pmd_val(pmd);=0A+ return (val & (= _PAGE_PRESENT|_PAGE_PROTNONE)) =3D=3D _PAGE_PROTNONE;=0A+}=0A+=0A+static in= line bool pud_pfn_inverted(pud_t pud)=0A+{=0A+ u64 val =3D pud_val(pud);=0A= + return (val & (_PAGE_PRESENT|_PAGE_PROTNONE)) =3D=3D _PAGE_PROTNONE;=0A+}= =0A+=0A+static inline bool pgprot_pfn_inverted(pgprot_t prot)=0A+{=0A+ u64 = val =3D pgprot_val(prot);=0A+ return (val & (_PAGE_PRESENT|_PAGE_PROTNONE))= =3D=3D _PAGE_PROTNONE;=0A+}=0A+=0A+static inline u64 flip_protnone_guard(u= 64 oldval, u64 val, u64 mask)=0A+{=0A+ /*=0A+ * When a PTE transitions fro= m NONE to !NONE or vice-versa=0A+ * invert the PFN part to stop speculatio= n.=0A+ * pte_pfn undoes this when needed.=0A+ */=0A+ if ((oldval & _PAGE_= PROTNONE) !=3D (val & _PAGE_PROTNONE))=0A+ val =3D (val & ~mask) | (~val &= mask);=0A+ return val;=0A+}=0A+=0A+#endif /* __ASSEMBLY__ */=0A+=0A+#endif= =0Adiff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtab= le.h=0Aindex 5f49b4ff0c24..d8b6189cc97d 100644=0A--- a/arch/x86/include/asm= /pgtable.h=0A+++ b/arch/x86/include/asm/pgtable.h=0A@@ -185,19 +185,35 @@ s= tatic inline int pte_special(pte_t pte)=0A return pte_flags(pte) & _PAGE_S= PECIAL;=0A }=0A =0A+/* Entries that were set to PROT_NONE are inverted */= =0A+=0A+static inline bool pte_pfn_inverted(pte_t pte);=0A+static inline bo= ol pmd_pfn_inverted(pmd_t pmd);=0A+static inline bool pud_pfn_inverted(pud_= t pud);=0A+static inline bool pgprot_pfn_inverted(pgprot_t prot);=0A+=0A st= atic inline unsigned long pte_pfn(pte_t pte)=0A {=0A- return (pte_val(pte) = & PTE_PFN_MASK) >> PAGE_SHIFT;=0A+ unsigned long pfn =3D pte_val(pte);=0A+ = if (pte_pfn_inverted(pte))=0A+ pfn =3D ~pfn;=0A+ return (pfn & PTE_PFN_MAS= K) >> PAGE_SHIFT;=0A }=0A =0A static inline unsigned long pmd_pfn(pmd_t pmd= )=0A {=0A- return (pmd_val(pmd) & pmd_pfn_mask(pmd)) >> PAGE_SHIFT;=0A+ uns= igned long pfn =3D pmd_val(pmd);=0A+ if (pmd_pfn_inverted(pmd))=0A+ pfn = =3D ~pfn;=0A+ return (pfn & pmd_pfn_mask(pmd)) >> PAGE_SHIFT;=0A }=0A =0A s= tatic inline unsigned long pud_pfn(pud_t pud)=0A {=0A- return (pud_val(pud)= & pud_pfn_mask(pud)) >> PAGE_SHIFT;=0A+ unsigned long pfn =3D pud_val(pud)= ;=0A+ if (pud_pfn_inverted(pud))=0A+ pfn =3D ~pfn;=0A+ return (pfn & pud_p= fn_mask(pud)) >> PAGE_SHIFT;=0A }=0A =0A static inline unsigned long p4d_pf= n(p4d_t p4d)=0A@@ -545,25 +561,33 @@ static inline pgprotval_t check_pgprot= (pgprot_t pgprot)=0A =0A static inline pte_t pfn_pte(unsigned long page_nr,= pgprot_t pgprot)=0A {=0A- return __pte(((phys_addr_t)page_nr << PAGE_SHIFT= ) |=0A- check_pgprot(pgprot));=0A+ phys_addr_t pfn =3D page_nr << PAG= E_SHIFT;=0A+ if (pgprot_pfn_inverted(pgprot))=0A+ pfn =3D ~pfn & PTE_PFN_M= ASK;=0A+ return __pte(pfn | check_pgprot(pgprot));=0A }=0A =0A static inlin= e pmd_t pfn_pmd(unsigned long page_nr, pgprot_t pgprot)=0A {=0A- return __p= md(((phys_addr_t)page_nr << PAGE_SHIFT) |=0A- check_pgprot(pgprot));= =0A+ phys_addr_t pfn =3D page_nr << PAGE_SHIFT;=0A+ if (pgprot_pfn_inverted= (pgprot))=0A+ pfn =3D ~pfn & PHYSICAL_PMD_PAGE_MASK;=0A+ return __pmd(pfn = | check_pgprot(pgprot));=0A }=0A =0A static inline pud_t pfn_pud(unsigned l= ong page_nr, pgprot_t pgprot)=0A {=0A- return __pud(((phys_addr_t)page_nr <= < PAGE_SHIFT) |=0A- check_pgprot(pgprot));=0A+ phys_addr_t pfn =3D pa= ge_nr << PAGE_SHIFT;=0A+ if (pgprot_pfn_inverted(pgprot))=0A+ pfn =3D ~pfn= & PHYSICAL_PUD_PAGE_MASK;=0A+ return __pud(pfn | check_pgprot(pgprot));=0A= }=0A =0A+static inline u64 flip_protnone_guard(u64 oldval, u64 val, u64 ma= sk);=0A+=0A static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)=0A = {=0A- pteval_t val =3D pte_val(pte);=0A+ pteval_t val =3D pte_val(pte), old= val =3D val;=0A =0A /*=0A * Chop off the NX bit (if present), and add th= e NX portion of=0A@@ -571,17 +595,17 @@ static inline pte_t pte_modify(pte_= t pte, pgprot_t newprot)=0A */=0A val &=3D _PAGE_CHG_MASK;=0A val |=3D = check_pgprot(newprot) & ~_PAGE_CHG_MASK;=0A-=0A+ val =3D flip_protnone_guar= d(oldval, val, PTE_PFN_MASK);=0A return __pte(val);=0A }=0A =0A static inl= ine pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)=0A {=0A- pmdval_t val =3D= pmd_val(pmd);=0A+ pmdval_t val =3D pmd_val(pmd), oldval =3D val;=0A =0A v= al &=3D _HPAGE_CHG_MASK;=0A val |=3D check_pgprot(newprot) & ~_HPAGE_CHG_M= ASK;=0A-=0A+ val =3D flip_protnone_guard(oldval, val, PHYSICAL_PMD_PAGE_MAS= K);=0A return __pmd(val);=0A }=0A =0Adiff --git a/arch/x86/include/asm/pgt= able_64.h b/arch/x86/include/asm/pgtable_64.h=0Aindex 593c3cf259dd..ea99272= ab63e 100644=0A--- a/arch/x86/include/asm/pgtable_64.h=0A+++ b/arch/x86/inc= lude/asm/pgtable_64.h=0A@@ -357,5 +357,7 @@ static inline bool gup_fast_per= mitted(unsigned long start, int nr_pages,=0A return true;=0A }=0A =0A+#inc= lude =0A+=0A #endif /* !__ASSEMBLY__ */=0A #endif /* = _ASM_X86_PGTABLE_64_H */=0A-- =0A2.15.0=0A=0A=0AFrom 17df1843b8d59783742f2c= 0becad3eb9f275b76a Mon Sep 17 00:00:00 2001=0AFrom: Andi Kleen =0ADate: Mon, 23 Apr 2018 15:57:54 -0700=0ASubject: [PATCH 3/6] x86= , l1tf: Make sure the first page is always reserved=0ATo: speck@linutronix.= de=0AStatus: RO=0AContent-Length: 985=0ALines: 31=0A=0AThe L1TF workaround = doesn't make any attempt to mitigate speculate=0Aaccesses to the first phys= ical page for zeroed PTEs. Normally=0Ait only contains some data from the e= arly real mode BIOS.=0A=0AI couldn't convince myself we always reserve the = first page in=0Aall configurations, so add an extra reservation call to=0Am= ake sure it is really reserved. In most configurations (e.g.=0Awith the sta= ndard reservations) it's likely a nop.=0A=0ASigned-off-by: Andi Kleen =0A---=0A arch/x86/kernel/setup.c | 3 +++=0A 1 file changed,= 3 insertions(+)=0A=0Adiff --git a/arch/x86/kernel/setup.c b/arch/x86/kerne= l/setup.c=0Aindex 6285697b6e56..fadbd41094d2 100644=0A--- a/arch/x86/kernel= /setup.c=0A+++ b/arch/x86/kernel/setup.c=0A@@ -817,6 +817,9 @@ void __init = setup_arch(char **cmdline_p)=0A memblock_reserve(__pa_symbol(_text),=0A = (unsigned long)__bss_stop - (unsigned long)_text);=0A =0A+ /* Make sure p= age 0 is always reserved */=0A+ memblock_reserve(0, PAGE_SIZE);=0A+=0A ear= ly_reserve_initrd();=0A =0A /*=0A-- =0A2.15.0=0A=0A=0AFrom 8865a468fa92e1e= 507b820f74e8d051c50ef49dc Mon Sep 17 00:00:00 2001=0AFrom: Andi Kleen =0ADate: Fri, 27 Apr 2018 14:44:53 -0700=0ASubject: [PATCH 4= /6] x86, l1tf: Add sysfs report for l1tf=0ATo: speck@linutronix.de=0AStatus= : RO=0AContent-Length: 4922=0ALines: 124=0A=0AL1TF core kernel workarounds = are cheap and generally always disabled.=0AHowever we still want to report = in sysfs if the system is vulnerable=0Aor mitigated. Add the necessary chec= ks.=0A=0A- We use the same checks as Meltdown to determine if the system is= =0Avulnerable. This excludes some Atom CPUs which don't have this=0Aproblem= =2E=0A- We check for the (very unlikely) memory > MAX_PA/2 case=0A- We chec= k for 32bit PAE and warn=0A=0ANote this patch will likely conflict with som= e other workaround patches=0Afloating around, but should be straight forwar= d to fix.=0A=0ASigned-off-by: Andi Kleen =0A---=0A arch= /x86/include/asm/cpufeatures.h | 2 ++=0A arch/x86/kernel/cpu/bugs.c = | 11 +++++++++++=0A arch/x86/kernel/cpu/common.c | 8 +++++++-=0A d= rivers/base/cpu.c | 8 ++++++++=0A include/linux/cpu.h = | 2 ++=0A 5 files changed, 30 insertions(+), 1 deletion(-)=0A= =0Adiff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/c= pufeatures.h=0Aindex d554c11e01ff..f51549640f64 100644=0A--- a/arch/x86/inc= lude/asm/cpufeatures.h=0A+++ b/arch/x86/include/asm/cpufeatures.h=0A@@ -214= ,6 +214,7 @@=0A =0A #define X86_FEATURE_USE_IBPB ( 7*32+21) /* "" Indirect= Branch Prediction Barrier enabled */=0A #define X86_FEATURE_USE_IBRS_FW (= 7*32+22) /* "" Use IBRS during runtime firmware calls */=0A+#define X86_FE= ATURE_NO_L1TF_FIX ( 7*32+23) /* "" L1TF workaround needed, but disabled */= =0A =0A /* Virtualization flags: Linux defined, word 8 */=0A #define X86_FE= ATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */=0A@@ -362,5 +363,6 @@= =0A #define X86_BUG_CPU_MELTDOWN X86_BUG(14) /* CPU is affected by meltdow= n attack and needs kernel page table isolation */=0A #define X86_BUG_SPECTR= E_V1 X86_BUG(15) /* CPU is affected by Spectre variant 1 attack with condi= tional branches */=0A #define X86_BUG_SPECTRE_V2 X86_BUG(16) /* CPU is aff= ected by Spectre variant 2 attack with indirect branches */=0A+#define X86_= BUG_L1TF X86_BUG(17) /* CPU is affected by L1 Terminal Fault */=0A =0A #e= ndif /* _ASM_X86_CPUFEATURES_H */=0Adiff --git a/arch/x86/kernel/cpu/bugs.c= b/arch/x86/kernel/cpu/bugs.c=0Aindex bfca937bdcc3..141a0135a8ca 100644=0A-= -- a/arch/x86/kernel/cpu/bugs.c=0A+++ b/arch/x86/kernel/cpu/bugs.c=0A@@ -34= 0,4 +340,15 @@ ssize_t cpu_show_spectre_v2(struct device *dev, struct devic= e_attribute *attr, c=0A boot_cpu_has(X86_FEATURE_USE_IBRS_FW) ? ",= IBRS_FW" : "",=0A spectre_v2_module_string());=0A }=0A+=0A+ssize_= t cpu_show_l1tf(struct device *dev, struct device_attribute *attr, char *bu= f)=0A+{=0A+ if (!boot_cpu_has_bug(X86_BUG_L1TF))=0A+ return sprintf(buf, "= Not affected\n");=0A+=0A+ if (boot_cpu_has(X86_FEATURE_NO_L1TF_FIX))=0A+ r= eturn sprintf(buf, "Mitigation Unavailable\n");=0A+=0A+ return sprintf(buf,= "Mitigated\n");=0A+}=0A #endif=0Adiff --git a/arch/x86/kernel/cpu/common.c= b/arch/x86/kernel/cpu/common.c=0Aindex 8a5b185735e1..2b292aa237ee 100644= =0A--- a/arch/x86/kernel/cpu/common.c=0A+++ b/arch/x86/kernel/cpu/common.c= =0A@@ -989,8 +989,14 @@ static void __init early_identify_cpu(struct cpuinf= o_x86 *c)=0A setup_force_cpu_cap(X86_FEATURE_ALWAYS);=0A =0A if (!x86_mat= ch_cpu(cpu_no_speculation)) {=0A- if (cpu_vulnerable_to_meltdown(c))=0A+ = if (cpu_vulnerable_to_meltdown(c)) {=0A setup_force_cpu_bug(X86_BUG_CPU_= MELTDOWN);=0A+ setup_force_cpu_bug(X86_BUG_L1TF);=0A+#if CONFIG_PGTABLE_L= EVELS =3D=3D 2=0A+ pr_warn("Kernel not compiled for PAE. No workaround fo= r L1TF\n");=0A+ setup_force_cpu_bug(X86_FEATURE_NO_L1TF_FIX);=0A+#endif= =0A+ }=0A setup_force_cpu_bug(X86_BUG_SPECTRE_V1);=0A setup_force_cpu_= bug(X86_BUG_SPECTRE_V2);=0A }=0Adiff --git a/drivers/base/cpu.c b/drivers/= base/cpu.c=0Aindex 2da998baa75c..ed7b8591d461 100644=0A--- a/drivers/base/c= pu.c=0A+++ b/drivers/base/cpu.c=0A@@ -534,14 +534,22 @@ ssize_t __weak cpu_= show_spectre_v2(struct device *dev,=0A return sprintf(buf, "Not affected\n= ");=0A }=0A =0A+ssize_t __weak cpu_show_l1tf(struct device *dev,=0A+ = struct device_attribute *attr, char *buf)=0A+{=0A+ return sprintf(buf, "Not= affected\n");=0A+}=0A+=0A static DEVICE_ATTR(meltdown, 0444, cpu_show_melt= down, NULL);=0A static DEVICE_ATTR(spectre_v1, 0444, cpu_show_spectre_v1, N= ULL);=0A static DEVICE_ATTR(spectre_v2, 0444, cpu_show_spectre_v2, NULL);= =0A+static DEVICE_ATTR(l1tf, 0444, cpu_show_l1tf, NULL);=0A =0A static stru= ct attribute *cpu_root_vulnerabilities_attrs[] =3D {=0A &dev_attr_meltdown= =2Eattr,=0A &dev_attr_spectre_v1.attr,=0A &dev_attr_spectre_v2.attr,=0A+ = &dev_attr_l1tf.attr,=0A NULL=0A };=0A =0Adiff --git a/include/linux/cpu.h = b/include/linux/cpu.h=0Aindex 7b01bc11c692..75c430046ca0 100644=0A--- a/inc= lude/linux/cpu.h=0A+++ b/include/linux/cpu.h=0A@@ -53,6 +53,8 @@ extern ssi= ze_t cpu_show_spectre_v1(struct device *dev,=0A struct device_attrib= ute *attr, char *buf);=0A extern ssize_t cpu_show_spectre_v2(struct device = *dev,=0A struct device_attribute *attr, char *buf);=0A+extern ssize_= t cpu_show_l1tf(struct device *dev,=0A+ struct device_attribute *attr= , char *buf);=0A =0A extern __printf(4, 5)=0A struct device *cpu_device_cre= ate(struct device *parent, void *drvdata,=0A-- =0A2.15.0=0A=0A=0AFrom 39874= 5eb3e9f03778e8e910d0b315b3c76a4de56 Mon Sep 17 00:00:00 2001=0AFrom: Andi K= leen =0ADate: Fri, 9 Feb 2018 10:36:15 -0800=0ASubject:= [PATCH 5/6] x86, l1tf: Report if too much memory for L1TF workaround=0ATo:= speck@linutronix.de=0AStatus: RO=0AContent-Length: 2088=0ALines: 72=0A=0AI= f the system has more than MAX_PA/2 physical memory the=0Ainvert page worka= rounds don't protect the system against=0Athe L1TF attack anymore, because = an inverted physical address=0Awill point to valid memory.=0A=0AWe cannot d= o much here, after all users want to use the=0Amemory, but at least print a= warning and report the system as=0Avulnerable in sysfs=0A=0ANote this is a= ll extremely unlikely to happen on a real machine=0Abecause they typically = have far more MAX_PA than DIMM slots=0A=0ATypical MAX_PA sizes on Intel and= respective threshold:=0A=0ANehalem Client 39 >=3D0.25TB=0ANehalem Serv= er 44 >=3D8TB=0ASandyBridge/IvyBridge/Haswell/Broadwell/Skylake 46 >= =3D32TB=0A=0ASome VMs also report fairly small PAs to guest, e.g. only 36bi= ts.=0AIn this case the threshold will be lower, but applies only=0Ato the m= aximum guest size.=0A=0ASigned-off-by: Andi Kleen =0A--= -=0A arch/x86/kernel/setup.c | 24 +++++++++++++++++++++++-=0A 1 file change= d, 23 insertions(+), 1 deletion(-)=0A=0Adiff --git a/arch/x86/kernel/setup.= c b/arch/x86/kernel/setup.c=0Aindex fadbd41094d2..62dbfc533b99 100644=0A---= a/arch/x86/kernel/setup.c=0A+++ b/arch/x86/kernel/setup.c=0A@@ -779,7 +779= ,27 @@ static void __init trim_low_memory_range(void)=0A {=0A memblock_res= erve(0, ALIGN(reserve_low, PAGE_SIZE));=0A }=0A- =0A+=0A+static __init void= check_maxpa_memory(void)=0A+{=0A+ u64 len;=0A+=0A+ if (!boot_cpu_has(X86_B= UG_L1TF))=0A+ return;=0A+=0A+ len =3D (1ULL << (boot_cpu_data.x86_phys_bit= s - 1)) - 1;=0A+=0A+ /*=0A+ * This is extremely unlikely to happen because= systems near always have far=0A+ * more MAX_PA than DIMM slots.=0A+ */= =0A+ if (e820__mapped_any(len, ULLONG_MAX - len,=0A+ E820_TYPE_RAM)= ) {=0A+ pr_warn("System has more than MAX_PA/2 memory. Disabled L1TF worka= round\n");=0A+ setup_force_cpu_cap(X86_FEATURE_NO_L1TF_FIX);=0A+ }=0A+}=0A= +=0A /*=0A * Dump out kernel offset information on panic.=0A */=0A@@ -101= 6,6 +1036,8 @@ void __init setup_arch(char **cmdline_p)=0A insert_resource= (&iomem_resource, &data_resource);=0A insert_resource(&iomem_resource, &bs= s_resource);=0A =0A+ check_maxpa_memory();=0A+=0A e820_add_kernel_range();= =0A trim_bios_range();=0A #ifdef CONFIG_X86_32=0A-- =0A2.15.0=0A=0A=0AFrom= aaedeb15cb5c75e44b29e895b60c2dbffa1a7e14 Mon Sep 17 00:00:00 2001=0AFrom: = Andi Kleen =0ADate: Fri, 27 Apr 2018 15:29:17 -0700=0AS= ubject: [PATCH 6/6] x86, l1tf: Limit swap file size to MAX_PA/2=0ATo: speck= @linutronix.de=0AStatus: RO=0AContent-Length: 4670=0ALines: 131=0A=0AFor th= e L1TF workaround we want to limit the swap file size to below=0AMAX_PA/2, = so that the higher bits of the swap offset inverted never=0Apoint to valid = memory.=0A=0AAdd a way for the architecture to override the swap file=0Asiz= e check in swapfile.c and add a x86 specific max swapfile check=0Afunction = that enforces that limit.=0A=0AThe check is only enabled if the CPU is vuln= erable to L1TF.=0A=0AIn VMs with 42bit MAX_PA the typical limit is 2TB now,= =0Aon a native system with 46bit PA it is 32TB. The limit=0Ais only per ind= ividual swap file, so it's always possible=0Ato exceed these limits with mu= ltiple swap files or=0Apartitions.=0A=0ASigned-off-by: Andi Kleen =0A---=0A arch/x86/mm/init.c | 17 +++++++++++++++++=0A i= nclude/linux/swapfile.h | 2 ++=0A mm/swapfile.c | 44 ++++++++++= ++++++++++++++++++----------------=0A 3 files changed, 47 insertions(+), 16= deletions(-)=0A=0Adiff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c=0Ai= ndex fec82b577c18..9f571225f5db 100644=0A--- a/arch/x86/mm/init.c=0A+++ b/a= rch/x86/mm/init.c=0A@@ -4,6 +4,8 @@=0A #include =0A #include = =0A #include /* for max_low_pfn */=0A+#= include =0A+#include =0A =0A #include =0A #include =0A@@ -878,3 +880,18 @@ void u= pdate_cache_mode_entry(unsigned entry, enum page_cache_mode cache)=0A __ca= chemode2pte_tbl[cache] =3D __cm_idx2pte(entry);=0A __pte2cachemode_tbl[ent= ry] =3D cache;=0A }=0A+=0A+unsigned long max_swapfile_size(void)=0A+{=0A+ u= nsigned long pages;=0A+=0A+ pages =3D generic_max_swapfile_size();=0A+=0A+ = if (boot_cpu_has(X86_BUG_L1TF)) {=0A+ /* Limit the swap file size to MAX_P= A/2 for the L1TF workaround */=0A+ pages =3D min_t(unsigned long,=0A+ = 1ULL << (boot_cpu_data.x86_phys_bits - 1 - PAGE_SHIFT),=0A+ pag= es);=0A+ }=0A+ return pages;=0A+}=0Adiff --git a/include/linux/swapfile.h b= /include/linux/swapfile.h=0Aindex 06bd7b096167..e06febf62978 100644=0A--- a= /include/linux/swapfile.h=0A+++ b/include/linux/swapfile.h=0A@@ -10,5 +10,7= @@ extern spinlock_t swap_lock;=0A extern struct plist_head swap_active_he= ad;=0A extern struct swap_info_struct *swap_info[];=0A extern int try_to_un= use(unsigned int, bool, unsigned long);=0A+extern unsigned long generic_max= _swapfile_size(void);=0A+extern unsigned long max_swapfile_size(void);=0A = =0A #endif /* _LINUX_SWAPFILE_H */=0Adiff --git a/mm/swapfile.c b/mm/swapfi= le.c=0Aindex cc2cf04d9018..413f48424194 100644=0A--- a/mm/swapfile.c=0A+++ = b/mm/swapfile.c=0A@@ -2909,6 +2909,33 @@ static int claim_swapfile(struct s= wap_info_struct *p, struct inode *inode)=0A return 0;=0A }=0A =0A+=0A+/*= =0A+ * Find out how many pages are allowed for a single swap=0A+ * device. = There are two limiting factors: 1) the number=0A+ * of bits for the swap of= fset in the swp_entry_t type, and=0A+ * 2) the number of bits in the swap p= te as defined by the=0A+ * different architectures. In order to find the=0A= + * largest possible bit mask, a swap entry with swap type 0=0A+ * and swap= offset ~0UL is created, encoded to a swap pte,=0A+ * decoded to a swp_entr= y_t again, and finally the swap=0A+ * offset is extracted. This will mask a= ll the bits from=0A+ * the initial ~0UL mask that can't be encoded in eithe= r=0A+ * the swp_entry_t or the architecture definition of a=0A+ * swap pte.= =0A+ */=0A+unsigned long generic_max_swapfile_size(void)=0A+{=0A+ return sw= p_offset(pte_to_swp_entry(=0A+ swp_entry_to_pte(swp_entry(0, ~0UL)))) + 1= ;=0A+}=0A+=0A+/* Can be overridden by an architecture for additional checks= =2E */=0A+__weak unsigned long max_swapfile_size(void)=0A+{=0A+ return gene= ric_max_swapfile_size();=0A+}=0A+=0A static unsigned long read_swap_header(= struct swap_info_struct *p,=0A union swap_header *swap_header,=0A = struct inode *inode)=0A@@ -2944,22 +2971,7 @@ static unsigned long read_sw= ap_header(struct swap_info_struct *p,=0A p->cluster_next =3D 1;=0A p->clu= ster_nr =3D 0;=0A =0A- /*=0A- * Find out how many pages are allowed for a = single swap=0A- * device. There are two limiting factors: 1) the number=0A= - * of bits for the swap offset in the swp_entry_t type, and=0A- * 2) the= number of bits in the swap pte as defined by the=0A- * different architec= tures. In order to find the=0A- * largest possible bit mask, a swap entry = with swap type 0=0A- * and swap offset ~0UL is created, encoded to a swap = pte,=0A- * decoded to a swp_entry_t again, and finally the swap=0A- * off= set is extracted. This will mask all the bits from=0A- * the initial ~0UL = mask that can't be encoded in either=0A- * the swp_entry_t or the architec= ture definition of a=0A- * swap pte.=0A- */=0A- maxpages =3D swp_offset(p= te_to_swp_entry(=0A- swp_entry_to_pte(swp_entry(0, ~0UL)))) + 1;=0A+ maxp= ages =3D max_swapfile_size();=0A last_page =3D swap_header->info.last_page= ;=0A if (!last_page) {=0A pr_warn("Empty swap-file\n");=0A-- =0A2.15.0= =0A=0A --SLauP2uySp+9cKYP--