linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Andi Kleen <ak@linux.intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Josh Poimboeuf <jpoimboe@redhat.com>,
	Michal Hocko <mhocko@suse.com>, Vlastimil Babka <vbabka@suse.cz>,
	Dave Hansen <dave.hansen@intel.com>,
	David Woodhouse <dwmw@amazon.co.uk>,
	Guenter Roeck <linux@roeck-us.net>
Subject: [PATCH 4.4 27/43] x86/speculation/l1tf: Protect PROT_NONE PTEs against speculation
Date: Tue, 14 Aug 2018 19:18:03 +0200	[thread overview]
Message-ID: <20180814171518.909559393@linuxfoundation.org> (raw)
In-Reply-To: <20180814171517.014285600@linuxfoundation.org>

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Andi Kleen <ak@linux.intel.com>

commit 6b28baca9b1f0d4a42b865da7a05b1c81424bd5c upstream

When PTEs are set to PROT_NONE the kernel just clears the Present bit and
preserves the PFN, which creates attack surface for L1TF speculation
speculation attacks.

This is important inside guests, because L1TF speculation bypasses physical
page remapping. While the host has its own migitations preventing leaking
data from other VMs into the guest, this would still risk leaking the wrong
page inside the current guest.

This uses the same technique as Linus' swap entry patch: while an entry is
is in PROTNONE state invert the complete PFN part part of it. This ensures
that the the highest bit will point to non existing memory.

The invert is done by pte/pmd_modify and pfn/pmd/pud_pte for PROTNONE and
pte/pmd/pud_pfn undo it.

This assume that no code path touches the PFN part of a PTE directly
without using these primitives.

This doesn't handle the case that MMIO is on the top of the CPU physical
memory. If such an MMIO region was exposed by an unpriviledged driver for
mmap it would be possible to attack some real memory.  However this
situation is all rather unlikely.

For 32bit non PAE the inversion is not done because there are really not
enough bits to protect anything.

Q: Why does the guest need to be protected when the HyperVisor already has
   L1TF mitigations?

A: Here's an example:

   Physical pages 1 2 get mapped into a guest as
   GPA 1 -> PA 2
   GPA 2 -> PA 1
   through EPT.

   The L1TF speculation ignores the EPT remapping.

   Now the guest kernel maps GPA 1 to process A and GPA 2 to process B, and
   they belong to different users and should be isolated.

   A sets the GPA 1 PA 2 PTE to PROT_NONE to bypass the EPT remapping and
   gets read access to the underlying physical page. Which in this case
   points to PA 2, so it can read process B's data, if it happened to be in
   L1, so isolation inside the guest is broken.

   There's nothing the hypervisor can do about this. This mitigation has to
   be done in the guest itself.

[ tglx: Massaged changelog ]
[ dwmw2: backported to 4.9 ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/include/asm/pgtable-2level.h |   17 +++++++++++++++
 arch/x86/include/asm/pgtable-3level.h |    2 +
 arch/x86/include/asm/pgtable-invert.h |   32 ++++++++++++++++++++++++++++
 arch/x86/include/asm/pgtable.h        |   38 ++++++++++++++++++++++++----------
 arch/x86/include/asm/pgtable_64.h     |    2 +
 5 files changed, 80 insertions(+), 11 deletions(-)
 create mode 100644 arch/x86/include/asm/pgtable-invert.h

--- a/arch/x86/include/asm/pgtable-2level.h
+++ b/arch/x86/include/asm/pgtable-2level.h
@@ -77,4 +77,21 @@ static inline unsigned long pte_bitop(un
 #define __pte_to_swp_entry(pte)		((swp_entry_t) { (pte).pte_low })
 #define __swp_entry_to_pte(x)		((pte_t) { .pte = (x).val })
 
+/* No inverted PFNs on 2 level page tables */
+
+static inline u64 protnone_mask(u64 val)
+{
+	return 0;
+}
+
+static inline u64 flip_protnone_guard(u64 oldval, u64 val, u64 mask)
+{
+	return val;
+}
+
+static inline bool __pte_needs_invert(u64 val)
+{
+	return false;
+}
+
 #endif /* _ASM_X86_PGTABLE_2LEVEL_H */
--- a/arch/x86/include/asm/pgtable-3level.h
+++ b/arch/x86/include/asm/pgtable-3level.h
@@ -184,4 +184,6 @@ static inline pmd_t native_pmdp_get_and_
 #define __pte_to_swp_entry(pte)		((swp_entry_t){ (pte).pte_high })
 #define __swp_entry_to_pte(x)		((pte_t){ { .pte_high = (x).val } })
 
+#include <asm/pgtable-invert.h>
+
 #endif /* _ASM_X86_PGTABLE_3LEVEL_H */
--- /dev/null
+++ b/arch/x86/include/asm/pgtable-invert.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_PGTABLE_INVERT_H
+#define _ASM_PGTABLE_INVERT_H 1
+
+#ifndef __ASSEMBLY__
+
+static inline bool __pte_needs_invert(u64 val)
+{
+	return (val & (_PAGE_PRESENT|_PAGE_PROTNONE)) == _PAGE_PROTNONE;
+}
+
+/* Get a mask to xor with the page table entry to get the correct pfn. */
+static inline u64 protnone_mask(u64 val)
+{
+	return __pte_needs_invert(val) ?  ~0ull : 0;
+}
+
+static inline u64 flip_protnone_guard(u64 oldval, u64 val, u64 mask)
+{
+	/*
+	 * When a PTE transitions from NONE to !NONE or vice-versa
+	 * invert the PFN part to stop speculation.
+	 * pte_pfn undoes this when needed.
+	 */
+	if (__pte_needs_invert(oldval) != __pte_needs_invert(val))
+		val = (val & ~mask) | (~val & mask);
+	return val;
+}
+
+#endif /* __ASSEMBLY__ */
+
+#endif
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -148,19 +148,29 @@ static inline int pte_special(pte_t pte)
 	return pte_flags(pte) & _PAGE_SPECIAL;
 }
 
+/* Entries that were set to PROT_NONE are inverted */
+
+static inline u64 protnone_mask(u64 val);
+
 static inline unsigned long pte_pfn(pte_t pte)
 {
-	return (pte_val(pte) & PTE_PFN_MASK) >> PAGE_SHIFT;
+	unsigned long pfn = pte_val(pte);
+	pfn ^= protnone_mask(pfn);
+	return (pfn & PTE_PFN_MASK) >> PAGE_SHIFT;
 }
 
 static inline unsigned long pmd_pfn(pmd_t pmd)
 {
-	return (pmd_val(pmd) & pmd_pfn_mask(pmd)) >> PAGE_SHIFT;
+	unsigned long pfn = pmd_val(pmd);
+	pfn ^= protnone_mask(pfn);
+	return (pfn & pmd_pfn_mask(pmd)) >> PAGE_SHIFT;
 }
 
 static inline unsigned long pud_pfn(pud_t pud)
 {
-	return (pud_val(pud) & pud_pfn_mask(pud)) >> PAGE_SHIFT;
+	unsigned long pfn = pud_val(pud);
+	pfn ^= protnone_mask(pfn);
+	return (pfn & pud_pfn_mask(pud)) >> PAGE_SHIFT;
 }
 
 #define pte_page(pte)	pfn_to_page(pte_pfn(pte))
@@ -359,19 +369,25 @@ static inline pgprotval_t massage_pgprot
 
 static inline pte_t pfn_pte(unsigned long page_nr, pgprot_t pgprot)
 {
-	return __pte(((phys_addr_t)page_nr << PAGE_SHIFT) |
-		     massage_pgprot(pgprot));
+	phys_addr_t pfn = page_nr << PAGE_SHIFT;
+	pfn ^= protnone_mask(pgprot_val(pgprot));
+	pfn &= PTE_PFN_MASK;
+	return __pte(pfn | massage_pgprot(pgprot));
 }
 
 static inline pmd_t pfn_pmd(unsigned long page_nr, pgprot_t pgprot)
 {
-	return __pmd(((phys_addr_t)page_nr << PAGE_SHIFT) |
-		     massage_pgprot(pgprot));
+	phys_addr_t pfn = page_nr << PAGE_SHIFT;
+	pfn ^= protnone_mask(pgprot_val(pgprot));
+	pfn &= PHYSICAL_PMD_PAGE_MASK;
+	return __pmd(pfn | massage_pgprot(pgprot));
 }
 
+static inline u64 flip_protnone_guard(u64 oldval, u64 val, u64 mask);
+
 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 {
-	pteval_t val = pte_val(pte);
+	pteval_t val = pte_val(pte), oldval = val;
 
 	/*
 	 * Chop off the NX bit (if present), and add the NX portion of
@@ -379,17 +395,17 @@ static inline pte_t pte_modify(pte_t pte
 	 */
 	val &= _PAGE_CHG_MASK;
 	val |= massage_pgprot(newprot) & ~_PAGE_CHG_MASK;
-
+	val = flip_protnone_guard(oldval, val, PTE_PFN_MASK);
 	return __pte(val);
 }
 
 static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
 {
-	pmdval_t val = pmd_val(pmd);
+	pmdval_t val = pmd_val(pmd), oldval = val;
 
 	val &= _HPAGE_CHG_MASK;
 	val |= massage_pgprot(newprot) & ~_HPAGE_CHG_MASK;
-
+	val = flip_protnone_guard(oldval, val, PHYSICAL_PMD_PAGE_MASK);
 	return __pmd(val);
 }
 
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -235,6 +235,8 @@ extern void cleanup_highmap(void);
 extern void init_extra_mapping_uc(unsigned long phys, unsigned long size);
 extern void init_extra_mapping_wb(unsigned long phys, unsigned long size);
 
+#include <asm/pgtable-invert.h>
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* _ASM_X86_PGTABLE_64_H */



  parent reply	other threads:[~2018-08-14 17:49 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-14 17:17 [PATCH 4.4 00/43] 4.4.148-stable review Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.4 01/43] ext4: fix check to prevent initializing reserved inodes Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.4 02/43] tpm: fix race condition in tpm_common_write() Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.4 03/43] ipv4+ipv6: Make INET*_ESP select CRYPTO_ECHAINIV Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.4 04/43] fork: unconditionally clear stack on fork Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.4 05/43] parisc: Enable CONFIG_MLONGCALLS by default Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.4 07/43] xen/netfront: dont cache skb_shinfo() Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.4 08/43] ACPI / LPSS: Add missing prv_offset setting for byt/cht PWM devices Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.4 09/43] scsi: sr: Avoid that opening a CD-ROM hangs with runtime power management enabled Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.4 10/43] root dentries need RCU-delayed freeing Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.4 11/43] fix mntput/mntput race Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.4 12/43] fix __legitimize_mnt()/mntput() race Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.4 13/43] IB/core: Make testing MR flags for writability a static inline function Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.4 14/43] IB/mlx4: Mark user MR as writable if actual virtual memory is writable Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.4 15/43] IB/ocrdma: fix out of bounds access to local buffer Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.4 16/43] ARM: dts: imx6sx: fix irq for pcie bridge Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.4 17/43] x86/paravirt: Fix spectre-v2 mitigations for paravirt guests Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.4 18/43] x86/speculation: Protect against userspace-userspace spectreRSB Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.4 19/43] kprobes/x86: Fix %p uses in error messages Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.4 20/43] x86/irqflags: Provide a declaration for native_save_fl Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.4 21/43] x86/speculation/l1tf: Increase 32bit PAE __PHYSICAL_PAGE_SHIFT Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.4 22/43] x86/mm: Move swap offset/type up in PTE to work around erratum Greg Kroah-Hartman
2018-08-14 17:17 ` [PATCH 4.4 23/43] x86/mm: Fix swap entry comment and macro Greg Kroah-Hartman
2018-08-14 17:18 ` [PATCH 4.4 24/43] mm: x86: move _PAGE_SWP_SOFT_DIRTY from bit 7 to bit 1 Greg Kroah-Hartman
2018-08-14 17:18 ` [PATCH 4.4 25/43] x86/speculation/l1tf: Change order of offset/type in swap entry Greg Kroah-Hartman
2018-08-14 17:18 ` [PATCH 4.4 26/43] x86/speculation/l1tf: Protect swap entries against L1TF Greg Kroah-Hartman
2018-08-14 17:18 ` Greg Kroah-Hartman [this message]
2018-08-14 17:18 ` [PATCH 4.4 28/43] x86/speculation/l1tf: Make sure the first page is always reserved Greg Kroah-Hartman
2018-08-14 17:18 ` [PATCH 4.4 29/43] x86/speculation/l1tf: Add sysfs reporting for l1tf Greg Kroah-Hartman
2018-08-14 17:18 ` [PATCH 4.4 30/43] mm: Add vm_insert_pfn_prot() Greg Kroah-Hartman
2018-08-14 17:18 ` [PATCH 4.4 31/43] mm: fix cache mode tracking in vm_insert_mixed() Greg Kroah-Hartman
2018-09-07 17:05   ` Ben Hutchings
2018-09-07 20:03     ` Greg Kroah-Hartman
2018-08-14 17:18 ` [PATCH 4.4 32/43] x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappings Greg Kroah-Hartman
2018-08-14 17:18 ` [PATCH 4.4 33/43] x86/speculation/l1tf: Limit swap file size to MAX_PA/2 Greg Kroah-Hartman
2018-08-14 17:18 ` [PATCH 4.4 34/43] x86/bugs: Move the l1tf function and define pr_fmt properly Greg Kroah-Hartman
2018-08-14 17:18 ` [PATCH 4.4 35/43] x86/speculation/l1tf: Extend 64bit swap file size limit Greg Kroah-Hartman
2018-08-14 17:18 ` [PATCH 4.4 36/43] x86/cpufeatures: Add detection of L1D cache flush support Greg Kroah-Hartman
2018-08-14 17:18 ` [PATCH 4.4 37/43] x86/speculation/l1tf: Protect PAE swap entries against L1TF Greg Kroah-Hartman
2018-08-14 17:18 ` [PATCH 4.4 38/43] x86/speculation/l1tf: Fix up pte->pfn conversion for PAE Greg Kroah-Hartman
2018-08-14 17:18 ` [PATCH 4.4 39/43] x86/speculation/l1tf: Invert all not present mappings Greg Kroah-Hartman
2018-08-14 17:18 ` [PATCH 4.4 40/43] x86/speculation/l1tf: Make pmd/pud_mknotpresent() invert Greg Kroah-Hartman
2018-08-14 17:18 ` [PATCH 4.4 41/43] x86/mm/pat: Make set_memory_np() L1TF safe Greg Kroah-Hartman
2018-09-09 16:46   ` Ben Hutchings
2018-09-09 17:06     ` Guenter Roeck
2018-09-10  7:16       ` Greg Kroah-Hartman
2018-08-14 17:18 ` [PATCH 4.4 42/43] x86/mm/kmmio: Make the tracer robust against L1TF Greg Kroah-Hartman
2018-08-14 17:18 ` [PATCH 4.4 43/43] x86/speculation/l1tf: Fix up CPU feature flags Greg Kroah-Hartman
2018-08-15  6:15 ` [PATCH 4.4 00/43] 4.4.148-stable review Greg Kroah-Hartman
2018-08-15 13:10 ` Guenter Roeck
2018-08-15 20:52 ` Dan Rue

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180814171518.909559393@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=ak@linux.intel.com \
    --cc=dave.hansen@intel.com \
    --cc=dwmw@amazon.co.uk \
    --cc=jpoimboe@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@roeck-us.net \
    --cc=mhocko@suse.com \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).