All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 00/10] Use global pages with PTI
@ 2018-02-22 20:36 Dave Hansen
  2018-02-22 20:36 ` [RFC][PATCH 01/10] x86/mm: factor out pageattr _PAGE_GLOBAL setting Dave Hansen
                   ` (10 more replies)
  0 siblings, 11 replies; 26+ messages in thread
From: Dave Hansen @ 2018-02-22 20:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Hansen, aarcange, luto, torvalds, keescook, hughd, jgross,
	x86, namit

The later verions of the KAISER pathces (pre-PTI) allowed the user/kernel
shared areas to be GLOBAL.  The thought was that this would reduce the
TLB overhead of keeping two copies of these mappings.

During the switch over to PTI, we seem to have lost our ability to have
GLOBAL mappings.  This adds them back.

Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Kees Cook <keescook@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: x86@kernel.org
Cc: Nadav Amit <namit@vmware.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC][PATCH 01/10] x86/mm: factor out pageattr _PAGE_GLOBAL setting
  2018-02-22 20:36 [RFC][PATCH 00/10] Use global pages with PTI Dave Hansen
@ 2018-02-22 20:36 ` Dave Hansen
  2018-02-22 20:36 ` [RFC][PATCH 02/10] x86/mm: undo double _PAGE_PSE clearing Dave Hansen
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: Dave Hansen @ 2018-02-22 20:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Hansen, aarcange, luto, torvalds, keescook, hughd, jgross,
	x86, namit


From: Dave Hansen <dave.hansen@linux.intel.com>

The pageattr code has a pattern repeated where it sets
_PAGE_GLOBAL for present PTEs but clears it for non-present PTEs.
The intention is to keep _PAGE_GLOBAL from getting confused
with _PAGE_PROTNONE since _PAGE_GLOBAL is for present PTEs and
_PAGE_PROTNONE is for non-present

But, this pattern makes no sense.  Effectively, it says, if
you use the pageattr code, always set _PAGE_GLOBAL when
_PAGE_PRESENT.  canon_pgprot() will clear it if unsupported,
but we *always* set it.

This gets confusing when we have PTI and non-PTI and we want
some areas to have _PAGE_GLOBAL and some not.

This updated version of the code says:
1. Clear _PAGE_GLOBAL when !_PAGE_PRESENT
2. Never set _PAGE_GLOBAL implicitly
3. Allow _PAGE_GLOBAL to be in cpa.set_mask
4. Allow _PAGE_GLOBAL to be inherited from previous PTE

Aside: _PAGE_GLOBAL is ignored when CR4.PGE=1, so why do we
even go to the trouble of filtering it anywhere?

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Kees Cook <keescook@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: x86@kernel.org
Cc: Nadav Amit <namit@vmware.com>
---

 b/arch/x86/mm/pageattr.c |   68 ++++++++++++++++-------------------------------
 1 file changed, 24 insertions(+), 44 deletions(-)

diff -puN arch/x86/mm/pageattr.c~kpti-centralize-global-setting arch/x86/mm/pageattr.c
--- a/arch/x86/mm/pageattr.c~kpti-centralize-global-setting	2018-02-22 12:36:17.531036556 -0800
+++ b/arch/x86/mm/pageattr.c	2018-02-22 12:36:17.535036556 -0800
@@ -512,6 +512,23 @@ static void __set_pmd_pte(pte_t *kpte, u
 #endif
 }
 
+static pgprot_t pgprot_clear_protnone_bits(pgprot_t prot)
+{
+	/*
+	 * _PAGE_GLOBAL means "global page" for present PTEs.
+	 * But, it is also used to indicate _PAGE_PROTNONE
+	 * for non-present PTEs.
+	 *
+	 * This ensures that a _PAGE_GLOBAL PTE going from
+	 * present to non-present is not confused as
+	 * _PAGE_PROTNONE.
+	 */
+	if (!(pgprot_val(prot) & _PAGE_PRESENT))
+		pgprot_val(prot) &= ~_PAGE_GLOBAL;
+
+	return prot;
+}
+
 static int
 try_preserve_large_page(pte_t *kpte, unsigned long address,
 			struct cpa_data *cpa)
@@ -577,18 +594,11 @@ try_preserve_large_page(pte_t *kpte, uns
 	 * different bit positions in the two formats.
 	 */
 	req_prot = pgprot_4k_2_large(req_prot);
-
-	/*
-	 * Set the PSE and GLOBAL flags only if the PRESENT flag is
-	 * set otherwise pmd_present/pmd_huge will return true even on
-	 * a non present pmd. The canon_pgprot will clear _PAGE_GLOBAL
-	 * for the ancient hardware that doesn't support it.
-	 */
-	if (pgprot_val(req_prot) & _PAGE_PRESENT)
-		pgprot_val(req_prot) |= _PAGE_PSE | _PAGE_GLOBAL;
+	req_prot = pgprot_clear_protnone_bits(req_prot);
+        if (pgprot_val(req_prot) & _PAGE_PRESENT)
+		pgprot_val(req_prot) |= _PAGE_PSE;
 	else
-		pgprot_val(req_prot) &= ~(_PAGE_PSE | _PAGE_GLOBAL);
-
+		pgprot_val(req_prot) &= ~_PAGE_PSE;
 	req_prot = canon_pgprot(req_prot);
 
 	/*
@@ -698,16 +708,7 @@ __split_large_page(struct cpa_data *cpa,
 		return 1;
 	}
 
-	/*
-	 * Set the GLOBAL flags only if the PRESENT flag is set
-	 * otherwise pmd/pte_present will return true even on a non
-	 * present pmd/pte. The canon_pgprot will clear _PAGE_GLOBAL
-	 * for the ancient hardware that doesn't support it.
-	 */
-	if (pgprot_val(ref_prot) & _PAGE_PRESENT)
-		pgprot_val(ref_prot) |= _PAGE_GLOBAL;
-	else
-		pgprot_val(ref_prot) &= ~_PAGE_GLOBAL;
+	ref_prot = pgprot_clear_protnone_bits(ref_prot);
 
 	/*
 	 * Get the target pfn from the original entry:
@@ -930,18 +931,7 @@ static void populate_pte(struct cpa_data
 
 	pte = pte_offset_kernel(pmd, start);
 
-	/*
-	 * Set the GLOBAL flags only if the PRESENT flag is
-	 * set otherwise pte_present will return true even on
-	 * a non present pte. The canon_pgprot will clear
-	 * _PAGE_GLOBAL for the ancient hardware that doesn't
-	 * support it.
-	 */
-	if (pgprot_val(pgprot) & _PAGE_PRESENT)
-		pgprot_val(pgprot) |= _PAGE_GLOBAL;
-	else
-		pgprot_val(pgprot) &= ~_PAGE_GLOBAL;
-
+	pgprot = pgprot_clear_protnone_bits(pgprot);
 	pgprot = canon_pgprot(pgprot);
 
 	while (num_pages-- && start < end) {
@@ -1234,17 +1224,7 @@ repeat:
 
 		new_prot = static_protections(new_prot, address, pfn);
 
-		/*
-		 * Set the GLOBAL flags only if the PRESENT flag is
-		 * set otherwise pte_present will return true even on
-		 * a non present pte. The canon_pgprot will clear
-		 * _PAGE_GLOBAL for the ancient hardware that doesn't
-		 * support it.
-		 */
-		if (pgprot_val(new_prot) & _PAGE_PRESENT)
-			pgprot_val(new_prot) |= _PAGE_GLOBAL;
-		else
-			pgprot_val(new_prot) &= ~_PAGE_GLOBAL;
+		new_prot = pgprot_clear_protnone_bits(new_prot);
 
 		/*
 		 * We need to keep the pfn from the existing PTE,
_

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC][PATCH 02/10] x86/mm: undo double _PAGE_PSE clearing
  2018-02-22 20:36 [RFC][PATCH 00/10] Use global pages with PTI Dave Hansen
  2018-02-22 20:36 ` [RFC][PATCH 01/10] x86/mm: factor out pageattr _PAGE_GLOBAL setting Dave Hansen
@ 2018-02-22 20:36 ` Dave Hansen
  2018-02-22 20:36 ` [RFC][PATCH 03/10] x86/mm: introduce "default" kernel PTE mask Dave Hansen
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: Dave Hansen @ 2018-02-22 20:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Hansen, aarcange, luto, torvalds, keescook, hughd, jgross,
	x86, namit


From: Dave Hansen <dave.hansen@linux.intel.com>

When clearing _PAGE_PRESENT on a huge page, we need to be careful
to also clear _PAGE_PSE, otherwise it might still get confused
for a valid large page table entry.

We do that near the spot where we *set* _PAGE_PSE.  That's fine,
but it's unnecessary.  pgprot_large_2_4k() already did it.

BTW, I also noticed that pgprot_large_2_4k() and
pgprot_4k_2_large() are not symmetric.  pgprot_large_2_4k() clears
_PAGE_PSE (because it is aliased to _PAGE_PAT) but
pgprot_4k_2_large() does not put _PAGE_PSE back.  Bummer.

Also, add some comments and change "promote" to "move".  "Promote"
seems an odd word to move when we are logically moving a bit to a
lower bit position.  Also add an extra line return to make it clear
to which line the comment applies.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Kees Cook <keescook@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: x86@kernel.org
Cc: Nadav Amit <namit@vmware.com>
---

 b/arch/x86/mm/pageattr.c |    9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff -puN arch/x86/mm/pageattr.c~kpti-undo-double-_PAGE_PSE-clear arch/x86/mm/pageattr.c
--- a/arch/x86/mm/pageattr.c~kpti-undo-double-_PAGE_PSE-clear	2018-02-22 12:36:18.072036555 -0800
+++ b/arch/x86/mm/pageattr.c	2018-02-22 12:36:18.076036555 -0800
@@ -583,6 +583,7 @@ try_preserve_large_page(pte_t *kpte, uns
 	 * up accordingly.
 	 */
 	old_pte = *kpte;
+	/* Clear PSE (aka _PAGE_PAT) and move PAT bit to correct position */
 	req_prot = pgprot_large_2_4k(old_prot);
 
 	pgprot_val(req_prot) &= ~pgprot_val(cpa->mask_clr);
@@ -597,8 +598,6 @@ try_preserve_large_page(pte_t *kpte, uns
 	req_prot = pgprot_clear_protnone_bits(req_prot);
         if (pgprot_val(req_prot) & _PAGE_PRESENT)
 		pgprot_val(req_prot) |= _PAGE_PSE;
-	else
-		pgprot_val(req_prot) &= ~_PAGE_PSE;
 	req_prot = canon_pgprot(req_prot);
 
 	/*
@@ -684,8 +683,12 @@ __split_large_page(struct cpa_data *cpa,
 	switch (level) {
 	case PG_LEVEL_2M:
 		ref_prot = pmd_pgprot(*(pmd_t *)kpte);
-		/* clear PSE and promote PAT bit to correct position */
+		/*
+		 * Clear PSE (aka _PAGE_PAT) and move
+		 * PAT bit to correct position.
+		 */
 		ref_prot = pgprot_large_2_4k(ref_prot);
+
 		ref_pfn = pmd_pfn(*(pmd_t *)kpte);
 		break;
 
_

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC][PATCH 03/10] x86/mm: introduce "default" kernel PTE mask
  2018-02-22 20:36 [RFC][PATCH 00/10] Use global pages with PTI Dave Hansen
  2018-02-22 20:36 ` [RFC][PATCH 01/10] x86/mm: factor out pageattr _PAGE_GLOBAL setting Dave Hansen
  2018-02-22 20:36 ` [RFC][PATCH 02/10] x86/mm: undo double _PAGE_PSE clearing Dave Hansen
@ 2018-02-22 20:36 ` Dave Hansen
  2018-02-22 22:21   ` Nadav Amit
  2018-02-22 20:36 ` [RFC][PATCH 04/10] x86/espfix: use kernel-default " Dave Hansen
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 26+ messages in thread
From: Dave Hansen @ 2018-02-22 20:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Hansen, aarcange, luto, torvalds, keescook, hughd, jgross,
	x86, namit


From: Dave Hansen <dave.hansen@linux.intel.com>

The __PAGE_KERNEL_* page permissions are "raw".  They contain bits
that may or may not be supported on the current processor.  They
need to be filtered by a mask (currently __supported_pte_mask) to
turn them into a value that we can actually set in a PTE.

These __PAGE_KERNEL_* values all contain _PAGE_GLOBAL.  But, with
PTI, we want to be able to support _PAGE_GLOBAL (have the bit set
in __supported_pte_mask) but not have it appear in any of these
masks by default.

This patch creates a new mask, __default_kernel_pte_mask, and
applies it when creating all of the PAGE_KERNEL_* masks.  This
makes PAGE_KERNEL_* safe to use anywhere (they only contain
supported bits).  It also ensures that PAGE_KERNEL_* contains
_PAGE_GLOBAL on PTI=n kernels but clears _PAGE_GLOBAL when
PTI=y.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Kees Cook <keescook@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: x86@kernel.org
Cc: Nadav Amit <namit@vmware.com>
---

 b/arch/x86/include/asm/pgtable_types.h |   29 ++++++++++++++++-------------
 b/arch/x86/mm/init.c                   |    6 ++++++
 b/arch/x86/mm/init_32.c                |    7 ++++++-
 b/arch/x86/mm/init_64.c                |    4 ++++
 4 files changed, 32 insertions(+), 14 deletions(-)

diff -puN arch/x86/include/asm/pgtable_types.h~KERN-pgprot-default arch/x86/include/asm/pgtable_types.h
--- a/arch/x86/include/asm/pgtable_types.h~KERN-pgprot-default	2018-02-22 12:36:18.608036554 -0800
+++ b/arch/x86/include/asm/pgtable_types.h	2018-02-22 12:36:18.617036554 -0800
@@ -145,6 +145,7 @@ enum page_cache_mode {
 	_PAGE_CACHE_MODE_WP = 5,
 	_PAGE_CACHE_MODE_NUM = 8
 };
+extern unsigned long __default_kernel_pte_mask;
 #endif
 
 #define _PAGE_CACHE_MASK	(_PAGE_PAT | _PAGE_PCD | _PAGE_PWT)
@@ -197,20 +198,22 @@ enum page_cache_mode {
 #define __PAGE_KERNEL_NOENC	(__PAGE_KERNEL)
 #define __PAGE_KERNEL_NOENC_WP	(__PAGE_KERNEL_WP)
 
-#define PAGE_KERNEL		__pgprot(__PAGE_KERNEL | _PAGE_ENC)
-#define PAGE_KERNEL_NOENC	__pgprot(__PAGE_KERNEL)
-#define PAGE_KERNEL_RO		__pgprot(__PAGE_KERNEL_RO | _PAGE_ENC)
-#define PAGE_KERNEL_EXEC	__pgprot(__PAGE_KERNEL_EXEC | _PAGE_ENC)
-#define PAGE_KERNEL_EXEC_NOENC	__pgprot(__PAGE_KERNEL_EXEC)
-#define PAGE_KERNEL_RX		__pgprot(__PAGE_KERNEL_RX | _PAGE_ENC)
-#define PAGE_KERNEL_NOCACHE	__pgprot(__PAGE_KERNEL_NOCACHE | _PAGE_ENC)
-#define PAGE_KERNEL_LARGE	__pgprot(__PAGE_KERNEL_LARGE | _PAGE_ENC)
-#define PAGE_KERNEL_LARGE_EXEC	__pgprot(__PAGE_KERNEL_LARGE_EXEC | _PAGE_ENC)
-#define PAGE_KERNEL_VSYSCALL	__pgprot(__PAGE_KERNEL_VSYSCALL | _PAGE_ENC)
-#define PAGE_KERNEL_VVAR	__pgprot(__PAGE_KERNEL_VVAR | _PAGE_ENC)
+#define default_pgprot(x)	__pgprot((x) & __default_kernel_pte_mask)
 
-#define PAGE_KERNEL_IO		__pgprot(__PAGE_KERNEL_IO)
-#define PAGE_KERNEL_IO_NOCACHE	__pgprot(__PAGE_KERNEL_IO_NOCACHE)
+#define PAGE_KERNEL		default_pgprot(__PAGE_KERNEL | _PAGE_ENC)
+#define PAGE_KERNEL_NOENC	default_pgprot(__PAGE_KERNEL)
+#define PAGE_KERNEL_RO		default_pgprot(__PAGE_KERNEL_RO | _PAGE_ENC)
+#define PAGE_KERNEL_EXEC	default_pgprot(__PAGE_KERNEL_EXEC | _PAGE_ENC)
+#define PAGE_KERNEL_EXEC_NOENC	default_pgprot(__PAGE_KERNEL_EXEC)
+#define PAGE_KERNEL_RX		default_pgprot(__PAGE_KERNEL_RX | _PAGE_ENC)
+#define PAGE_KERNEL_NOCACHE	default_pgprot(__PAGE_KERNEL_NOCACHE | _PAGE_ENC)
+#define PAGE_KERNEL_LARGE	default_pgprot(__PAGE_KERNEL_LARGE | _PAGE_ENC)
+#define PAGE_KERNEL_LARGE_EXEC	default_pgprot(__PAGE_KERNEL_LARGE_EXEC | _PAGE_ENC)
+#define PAGE_KERNEL_VSYSCALL	default_pgprot(__PAGE_KERNEL_VSYSCALL | _PAGE_ENC)
+#define PAGE_KERNEL_VVAR	default_pgprot(__PAGE_KERNEL_VVAR | _PAGE_ENC)
+
+#define PAGE_KERNEL_IO		default_pgprot(__PAGE_KERNEL_IO)
+#define PAGE_KERNEL_IO_NOCACHE	default_pgprot(__PAGE_KERNEL_IO_NOCACHE)
 
 #endif	/* __ASSEMBLY__ */
 
diff -puN arch/x86/mm/init_32.c~KERN-pgprot-default arch/x86/mm/init_32.c
--- a/arch/x86/mm/init_32.c~KERN-pgprot-default	2018-02-22 12:36:18.610036554 -0800
+++ b/arch/x86/mm/init_32.c	2018-02-22 12:36:18.618036554 -0800
@@ -543,8 +543,13 @@ static void __init pagetable_init(void)
 	permanent_kmaps_init(pgd_base);
 }
 
-pteval_t __supported_pte_mask __read_mostly = ~(_PAGE_NX | _PAGE_GLOBAL);
+#define DEFAULT_PTE_MASK ~(_PAGE_NX | _PAGE_GLOBAL)
+/* Bits supported by the hardware: */
+pteval_t __supported_pte_mask __read_mostly = DEFAULT_PTE_MASK;
+/* Bits allowed in normal kernel mappings: */
+pteval_t __default_kernel_pte_mask __read_mostly = DEFAULT_PTE_MASK;
 EXPORT_SYMBOL_GPL(__supported_pte_mask);
+EXPORT_SYMBOL_GPL(__default_kernel_pte_mask);
 
 /* user-defined highmem size */
 static unsigned int highmem_pages = -1;
diff -puN arch/x86/mm/init_64.c~KERN-pgprot-default arch/x86/mm/init_64.c
--- a/arch/x86/mm/init_64.c~KERN-pgprot-default	2018-02-22 12:36:18.612036554 -0800
+++ b/arch/x86/mm/init_64.c	2018-02-22 12:36:18.618036554 -0800
@@ -65,8 +65,12 @@
  * around without checking the pgd every time.
  */
 
+/* Bits supported by the hardware: */
 pteval_t __supported_pte_mask __read_mostly = ~0;
+/* Bits allowed in normal kernel mappings: */
+pteval_t __default_kernel_pte_mask __read_mostly = ~0;
 EXPORT_SYMBOL_GPL(__supported_pte_mask);
+EXPORT_SYMBOL_GPL(__default_kernel_pte_mask);
 
 int force_personality32;
 
diff -puN arch/x86/mm/init.c~KERN-pgprot-default arch/x86/mm/init.c
--- a/arch/x86/mm/init.c~KERN-pgprot-default	2018-02-22 12:36:18.614036554 -0800
+++ b/arch/x86/mm/init.c	2018-02-22 12:36:18.619036554 -0800
@@ -190,6 +190,12 @@ static void __init probe_page_size_mask(
 		enable_global_pages();
 	}
 
+	/* By the default is everything supported: */
+	__default_kernel_pte_mask = __supported_pte_mask;
+	/* Except when with PTI where the kernel is mostly non-Global: */
+	if (cpu_feature_enabled(X86_FEATURE_PTI))
+		__default_kernel_pte_mask &= ~_PAGE_GLOBAL;
+
 	/* Enable 1 GB linear kernel mappings if available: */
 	if (direct_gbpages && boot_cpu_has(X86_FEATURE_GBPAGES)) {
 		printk(KERN_INFO "Using GB pages for direct mapping\n");
_

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC][PATCH 04/10] x86/espfix: use kernel-default PTE mask
  2018-02-22 20:36 [RFC][PATCH 00/10] Use global pages with PTI Dave Hansen
                   ` (2 preceding siblings ...)
  2018-02-22 20:36 ` [RFC][PATCH 03/10] x86/mm: introduce "default" kernel PTE mask Dave Hansen
@ 2018-02-22 20:36 ` Dave Hansen
  2018-02-22 21:27   ` Nadav Amit
  2018-02-22 20:37 ` [RFC][PATCH 05/10] x86/mm: do not auto-massage page protections Dave Hansen
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 26+ messages in thread
From: Dave Hansen @ 2018-02-22 20:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Hansen, aarcange, luto, torvalds, keescook, hughd, jgross,
	x86, namit


From: Dave Hansen <dave.hansen@linux.intel.com>

In creating its page tables, the espfix code masks its PGTABLE_PROT
value with the supported mask: __supported_pte_mask.  This ensures
that unsupported bits are not set in the final PTE.  But, it also
sets _PAGE_GLOBAL which we do not want for PTE.  Use
__default_kernel_pte_mask instead which clears _PAGE_GLOBAL for PTI.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Kees Cook <keescook@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: x86@kernel.org
Cc: Nadav Amit <namit@vmware.com>
---

 b/arch/x86/kernel/espfix_64.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff -puN arch/x86/kernel/espfix_64.c~espfix-use-kern-defaults-not-supported arch/x86/kernel/espfix_64.c
--- a/arch/x86/kernel/espfix_64.c~espfix-use-kern-defaults-not-supported	2018-02-22 12:36:19.217036552 -0800
+++ b/arch/x86/kernel/espfix_64.c	2018-02-22 12:36:19.221036552 -0800
@@ -167,7 +167,7 @@ void init_espfix_ap(int cpu)
 		goto unlock_done;
 
 	node = cpu_to_node(cpu);
-	ptemask = __supported_pte_mask;
+	ptemask = __default_kernel_pte_mask;
 
 	pud_p = &espfix_pud_page[pud_index(addr)];
 	pud = *pud_p;
_

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC][PATCH 05/10] x86/mm: do not auto-massage page protections
  2018-02-22 20:36 [RFC][PATCH 00/10] Use global pages with PTI Dave Hansen
                   ` (3 preceding siblings ...)
  2018-02-22 20:36 ` [RFC][PATCH 04/10] x86/espfix: use kernel-default " Dave Hansen
@ 2018-02-22 20:37 ` Dave Hansen
  2018-02-22 21:46   ` Nadav Amit
  2018-02-22 20:37 ` [RFC][PATCH 06/10] x86/mm: remove extra filtering in pageattr code Dave Hansen
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 26+ messages in thread
From: Dave Hansen @ 2018-02-22 20:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Hansen, aarcange, luto, torvalds, keescook, hughd, jgross,
	x86, namit


From: Dave Hansen <dave.hansen@linux.intel.com>

A PTE is constructed from a physical address and a pgprotval_t.
__PAGE_KERNEL, for instance, is a pgprot_t and must be converted
into a pgprotval_t before it can be used to create a PTE.  This is
done implicitly within functions like set_pte() by massage_pgprot().

However, this makes it very challenging to set bits (and keep them
set) if your bit is being filtered out by massage_pgprot().

This moves the bit filtering out of set_pte() and friends.  For
users of PAGE_KERNEL*, filtering will be done automatically inside
those macros but for users of __PAGE_KERNEL*, they need to do their
own filtering now.

Note that we also just move pfn_pte/pmd/pud() over to check_pgprot()
instead of massage_pgprot().  This way, we still *look* for
unsupported bits and properly warn about them if we find them.  This
might happen if an unfiltered __PAGE_KERNEL* value was passed in,
for instance.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Kees Cook <keescook@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: x86@kernel.org
Cc: Nadav Amit <namit@vmware.com>
---

 b/arch/x86/include/asm/pgtable.h |   24 +++++++++++++++++++-----
 b/arch/x86/kernel/head64.c       |    2 ++
 b/arch/x86/kernel/ldt.c          |    6 +++++-
 b/arch/x86/mm/ident_map.c        |    3 +++
 b/arch/x86/mm/iomap_32.c         |    6 ++++++
 b/arch/x86/mm/kasan_init_64.c    |   14 +++++++++++++-
 b/arch/x86/power/hibernate_64.c  |   20 +++++++++++++++-----
 b/mm/early_ioremap.c             |    3 +++
 8 files changed, 66 insertions(+), 12 deletions(-)

diff -puN arch/x86/include/asm/pgtable.h~x86-no-auto-massage arch/x86/include/asm/pgtable.h
--- a/arch/x86/include/asm/pgtable.h~x86-no-auto-massage	2018-02-22 12:36:19.752036551 -0800
+++ b/arch/x86/include/asm/pgtable.h	2018-02-22 12:36:19.769036551 -0800
@@ -526,22 +526,36 @@ static inline pgprotval_t massage_pgprot
 	return protval;
 }
 
+static inline pgprotval_t check_pgprot(pgprot_t pgprot)
+{
+	pgprotval_t massaged_val = massage_pgprot(pgprot);
+
+	WARN_ONCE(pgprot_val(pgprot) != massaged_val,
+		  "attempted to set unsupported pgprot: %016lx "
+		  "bits: %016lx supported: %016lx\n",
+		  pgprot_val(pgprot),
+		  pgprot_val(pgprot) ^ massaged_val,
+		  __supported_pte_mask);
+
+	return massaged_val;
+}
+
 static inline pte_t pfn_pte(unsigned long page_nr, pgprot_t pgprot)
 {
 	return __pte(((phys_addr_t)page_nr << PAGE_SHIFT) |
-		     massage_pgprot(pgprot));
+		     check_pgprot(pgprot));
 }
 
 static inline pmd_t pfn_pmd(unsigned long page_nr, pgprot_t pgprot)
 {
 	return __pmd(((phys_addr_t)page_nr << PAGE_SHIFT) |
-		     massage_pgprot(pgprot));
+		     check_pgprot(pgprot));
 }
 
 static inline pud_t pfn_pud(unsigned long page_nr, pgprot_t pgprot)
 {
 	return __pud(((phys_addr_t)page_nr << PAGE_SHIFT) |
-		     massage_pgprot(pgprot));
+		     check_pgprot(pgprot));
 }
 
 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
@@ -553,7 +567,7 @@ static inline pte_t pte_modify(pte_t pte
 	 * the newprot (if present):
 	 */
 	val &= _PAGE_CHG_MASK;
-	val |= massage_pgprot(newprot) & ~_PAGE_CHG_MASK;
+	val |= check_pgprot(newprot) & ~_PAGE_CHG_MASK;
 
 	return __pte(val);
 }
@@ -563,7 +577,7 @@ static inline pmd_t pmd_modify(pmd_t pmd
 	pmdval_t val = pmd_val(pmd);
 
 	val &= _HPAGE_CHG_MASK;
-	val |= massage_pgprot(newprot) & ~_HPAGE_CHG_MASK;
+	val |= check_pgprot(newprot) & ~_HPAGE_CHG_MASK;
 
 	return __pmd(val);
 }
diff -puN arch/x86/kernel/head64.c~x86-no-auto-massage arch/x86/kernel/head64.c
--- a/arch/x86/kernel/head64.c~x86-no-auto-massage	2018-02-22 12:36:19.754036551 -0800
+++ b/arch/x86/kernel/head64.c	2018-02-22 12:36:19.769036551 -0800
@@ -129,6 +129,8 @@ unsigned long __head __startup_64(unsign
 	pud[i + 1] = (pudval_t)pmd + pgtable_flags;
 
 	pmd_entry = __PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL;
+	/* Filter out unsupported __PAGE_KERNEL_* bits: */
+	pmd_entry |= __supported_pte_mask;
 	pmd_entry += sme_get_me_mask();
 	pmd_entry +=  physaddr;
 
diff -puN arch/x86/kernel/ldt.c~x86-no-auto-massage arch/x86/kernel/ldt.c
--- a/arch/x86/kernel/ldt.c~x86-no-auto-massage	2018-02-22 12:36:19.756036551 -0800
+++ b/arch/x86/kernel/ldt.c	2018-02-22 12:36:19.769036551 -0800
@@ -145,6 +145,7 @@ map_ldt_struct(struct mm_struct *mm, str
 		unsigned long offset = i << PAGE_SHIFT;
 		const void *src = (char *)ldt->entries + offset;
 		unsigned long pfn;
+		pgprot_t pte_prot;
 		pte_t pte, *ptep;
 
 		va = (unsigned long)ldt_slot_va(slot) + offset;
@@ -163,7 +164,10 @@ map_ldt_struct(struct mm_struct *mm, str
 		 * target via some kernel interface which misses a
 		 * permission check.
 		 */
-		pte = pfn_pte(pfn, __pgprot(__PAGE_KERNEL_RO & ~_PAGE_GLOBAL));
+	       	pte_prot = __pgprot(__PAGE_KERNEL_RO & ~_PAGE_GLOBAL);
+		/* Filter out unsuppored __PAGE_KERNEL* bits: */
+		pgprot_val(pte_prot) |= __supported_pte_mask;
+		pte = pfn_pte(pfn, pte_prot);
 		set_pte_at(mm, va, ptep, pte);
 		pte_unmap_unlock(ptep, ptl);
 	}
diff -puN arch/x86/mm/ident_map.c~x86-no-auto-massage arch/x86/mm/ident_map.c
--- a/arch/x86/mm/ident_map.c~x86-no-auto-massage	2018-02-22 12:36:19.758036551 -0800
+++ b/arch/x86/mm/ident_map.c	2018-02-22 12:36:19.769036551 -0800
@@ -98,6 +98,9 @@ int kernel_ident_mapping_init(struct x86
 	if (!info->kernpg_flag)
 		info->kernpg_flag = _KERNPG_TABLE;
 
+	/* Filter out unsupported __PAGE_KERNEL_* bits: */
+	info->kernpg_flag &= __default_kernel_pte_mask;
+
 	for (; addr < end; addr = next) {
 		pgd_t *pgd = pgd_page + pgd_index(addr);
 		p4d_t *p4d;
diff -puN arch/x86/mm/iomap_32.c~x86-no-auto-massage arch/x86/mm/iomap_32.c
--- a/arch/x86/mm/iomap_32.c~x86-no-auto-massage	2018-02-22 12:36:19.760036551 -0800
+++ b/arch/x86/mm/iomap_32.c	2018-02-22 12:36:19.770036551 -0800
@@ -44,6 +44,9 @@ int iomap_create_wc(resource_size_t base
 		return ret;
 
 	*prot = __pgprot(__PAGE_KERNEL | cachemode2protval(pcm));
+	/* Filter out unsupported __PAGE_KERNEL* bits: */
+	pgprot_val(*prot) &= __default_kernel_pte_mask;
+
 	return 0;
 }
 EXPORT_SYMBOL_GPL(iomap_create_wc);
@@ -88,6 +91,9 @@ iomap_atomic_prot_pfn(unsigned long pfn,
 		prot = __pgprot(__PAGE_KERNEL |
 				cachemode2protval(_PAGE_CACHE_MODE_UC_MINUS));
 
+	/* Filter out unsupported __PAGE_KERNEL* bits: */
+	pgprot_val(prot) &= __default_kernel_pte_mask;
+
 	return (void __force __iomem *) kmap_atomic_prot_pfn(pfn, prot);
 }
 EXPORT_SYMBOL_GPL(iomap_atomic_prot_pfn);
diff -puN arch/x86/mm/kasan_init_64.c~x86-no-auto-massage arch/x86/mm/kasan_init_64.c
--- a/arch/x86/mm/kasan_init_64.c~x86-no-auto-massage	2018-02-22 12:36:19.761036551 -0800
+++ b/arch/x86/mm/kasan_init_64.c	2018-02-22 12:36:19.770036551 -0800
@@ -263,6 +263,12 @@ void __init kasan_early_init(void)
 	pudval_t pud_val = __pa_nodebug(kasan_zero_pmd) | _KERNPG_TABLE;
 	p4dval_t p4d_val = __pa_nodebug(kasan_zero_pud) | _KERNPG_TABLE;
 
+	/* Mask out unsupported __PAGE_KERNEL bits: */
+	pte_val &= __default_kernel_pte_mask;
+	pmd_val &= __default_kernel_pte_mask;
+	pud_val &= __default_kernel_pte_mask;
+	p4d_val &= __default_kernel_pte_mask;
+
 	for (i = 0; i < PTRS_PER_PTE; i++)
 		kasan_zero_pte[i] = __pte(pte_val);
 
@@ -365,7 +371,13 @@ void __init kasan_init(void)
 	 */
 	memset(kasan_zero_page, 0, PAGE_SIZE);
 	for (i = 0; i < PTRS_PER_PTE; i++) {
-		pte_t pte = __pte(__pa(kasan_zero_page) | __PAGE_KERNEL_RO | _PAGE_ENC);
+		pte_t pte;
+		pgprot_t prot;
+
+		prot = __pgprot(__PAGE_KERNEL_RO | _PAGE_ENC);
+		pgprot_val(prot) &= __default_kernel_pte_mask;
+
+		pte = __pte(__pa(kasan_zero_page) | pgprot_val(prot));
 		set_pte(&kasan_zero_pte[i], pte);
 	}
 	/* Flush TLBs again to be sure that write protection applied. */
diff -puN arch/x86/power/hibernate_64.c~x86-no-auto-massage arch/x86/power/hibernate_64.c
--- a/arch/x86/power/hibernate_64.c~x86-no-auto-massage	2018-02-22 12:36:19.763036551 -0800
+++ b/arch/x86/power/hibernate_64.c	2018-02-22 12:36:19.770036551 -0800
@@ -51,6 +51,12 @@ static int set_up_temporary_text_mapping
 	pmd_t *pmd;
 	pud_t *pud;
 	p4d_t *p4d;
+	pgprot_t pgtable_prot = __pgprot(_KERNPG_TABLE);
+	pgprot_t pmd_text_prot = __pgprot(__PAGE_KERNEL_LARGE_EXEC);
+
+	/* Filter out unsupported __PAGE_KERNEL* bits: */
+	pgprot_val(pmd_text_prot) &= __default_kernel_pte_mask;
+	pgprot_val(pgtable_prot)  &= __default_kernel_pte_mask;
 
 	/*
 	 * The new mapping only has to cover the page containing the image
@@ -81,15 +87,19 @@ static int set_up_temporary_text_mapping
 		return -ENOMEM;
 
 	set_pmd(pmd + pmd_index(restore_jump_address),
-		__pmd((jump_address_phys & PMD_MASK) | __PAGE_KERNEL_LARGE_EXEC));
+		__pmd((jump_address_phys & PMD_MASK) | pgprot_val(pmd_text_prot)));
 	set_pud(pud + pud_index(restore_jump_address),
-		__pud(__pa(pmd) | _KERNPG_TABLE));
+		__pud(__pa(pmd) | pgprot_val(pgtable_prot)));
 	if (IS_ENABLED(CONFIG_X86_5LEVEL)) {
-		set_p4d(p4d + p4d_index(restore_jump_address), __p4d(__pa(pud) | _KERNPG_TABLE));
-		set_pgd(pgd + pgd_index(restore_jump_address), __pgd(__pa(p4d) | _KERNPG_TABLE));
+		p4d_t new_p4d = __p4d(__pa(pud) | pgprot_val(pgtable_prot));
+		pgd_t new_pgd = __pgd(__pa(p4d) | pgprot_val(pgtable_prot));
+
+		set_p4d(p4d + p4d_index(restore_jump_address), new_p4d);
+		set_pgd(pgd + pgd_index(restore_jump_address), new_pgd);
 	} else {
 		/* No p4d for 4-level paging: point the pgd to the pud page table */
-		set_pgd(pgd + pgd_index(restore_jump_address), __pgd(__pa(pud) | _KERNPG_TABLE));
+		pgd_t new_pgd = __pgd(__pa(p4d) | pgprot_val(pgtable_prot));
+		set_pgd(pgd + pgd_index(restore_jump_address), new_pgd);
 	}
 
 	return 0;
diff -puN mm/early_ioremap.c~x86-no-auto-massage mm/early_ioremap.c
--- a/mm/early_ioremap.c~x86-no-auto-massage	2018-02-22 12:36:19.765036551 -0800
+++ b/mm/early_ioremap.c	2018-02-22 12:36:19.770036551 -0800
@@ -113,6 +113,9 @@ __early_ioremap(resource_size_t phys_add
 
 	WARN_ON(system_state >= SYSTEM_RUNNING);
 
+	/* Sanitize 'prot' against any unsupported bits: */
+	pgprot_val(prot) &= __default_kernel_pte_mask;
+
 	slot = -1;
 	for (i = 0; i < FIX_BTMAPS_SLOTS; i++) {
 		if (!prev_map[i]) {
_

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC][PATCH 06/10] x86/mm: remove extra filtering in pageattr code
  2018-02-22 20:36 [RFC][PATCH 00/10] Use global pages with PTI Dave Hansen
                   ` (4 preceding siblings ...)
  2018-02-22 20:37 ` [RFC][PATCH 05/10] x86/mm: do not auto-massage page protections Dave Hansen
@ 2018-02-22 20:37 ` Dave Hansen
  2018-02-22 20:37 ` [RFC][PATCH 07/10] x86/mm: comment _PAGE_GLOBAL mystery Dave Hansen
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: Dave Hansen @ 2018-02-22 20:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Hansen, aarcange, luto, torvalds, keescook, hughd, jgross,
	x86, namit


From: Dave Hansen <dave.hansen@linux.intel.com>

The pageattr code has a mode where it can set or clear PTE bits in
existing PTEs, so the page protections of the *new* PTEs come from
one of two places:
1. The set/clear masks: cpa->mask_clr / cpa->mask_set
2. The existing PTE

We filter ->mask_set/clr for supported PTE bits at entry to
__change_page_attr() so we never need to filter them again.

The only other place permissions can come from is an existing PTE
and those already presumably have good bits.  We do not need to filter
them again.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Kees Cook <keescook@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: x86@kernel.org
Cc: Nadav Amit <namit@vmware.com>

---

 b/arch/x86/mm/pageattr.c |    6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff -puN arch/x86/mm/pageattr.c~x86-pageattr-dont-filter-global arch/x86/mm/pageattr.c
--- a/arch/x86/mm/pageattr.c~x86-pageattr-dont-filter-global	2018-02-22 12:36:20.456036549 -0800
+++ b/arch/x86/mm/pageattr.c	2018-02-22 12:36:20.460036549 -0800
@@ -598,7 +598,6 @@ try_preserve_large_page(pte_t *kpte, uns
 	req_prot = pgprot_clear_protnone_bits(req_prot);
         if (pgprot_val(req_prot) & _PAGE_PRESENT)
 		pgprot_val(req_prot) |= _PAGE_PSE;
-	req_prot = canon_pgprot(req_prot);
 
 	/*
 	 * old_pfn points to the large page base pfn. So we need
@@ -718,7 +717,7 @@ __split_large_page(struct cpa_data *cpa,
 	 */
 	pfn = ref_pfn;
 	for (i = 0; i < PTRS_PER_PTE; i++, pfn += pfninc)
-		set_pte(&pbase[i], pfn_pte(pfn, canon_pgprot(ref_prot)));
+		set_pte(&pbase[i], pfn_pte(pfn, ref_prot));
 
 	if (virt_addr_valid(address)) {
 		unsigned long pfn = PFN_DOWN(__pa(address));
@@ -935,7 +934,6 @@ static void populate_pte(struct cpa_data
 	pte = pte_offset_kernel(pmd, start);
 
 	pgprot = pgprot_clear_protnone_bits(pgprot);
-	pgprot = canon_pgprot(pgprot);
 
 	while (num_pages-- && start < end) {
 		set_pte(pte, pfn_pte(cpa->pfn, pgprot));
@@ -1234,7 +1232,7 @@ repeat:
 		 * after all we're only going to change it's attributes
 		 * not the memory it points to
 		 */
-		new_pte = pfn_pte(pfn, canon_pgprot(new_prot));
+		new_pte = pfn_pte(pfn, new_prot);
 		cpa->pfn = pfn;
 		/*
 		 * Do we really change anything ?
_

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC][PATCH 07/10] x86/mm: comment _PAGE_GLOBAL mystery
  2018-02-22 20:36 [RFC][PATCH 00/10] Use global pages with PTI Dave Hansen
                   ` (5 preceding siblings ...)
  2018-02-22 20:37 ` [RFC][PATCH 06/10] x86/mm: remove extra filtering in pageattr code Dave Hansen
@ 2018-02-22 20:37 ` Dave Hansen
  2018-02-22 20:37 ` [RFC][PATCH 08/10] x86/mm: do not forbid _PAGE_RW before init for __ro_after_init Dave Hansen
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: Dave Hansen @ 2018-02-22 20:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Hansen, aarcange, luto, torvalds, keescook, hughd, jgross,
	x86, namit


From: Dave Hansen <dave.hansen@linux.intel.com>

I was mystified as to where the _PAGE_GLOBAL in the kernel page tables
for kernel text came from.  I audited all the places I could find, but
I missed one: head_64.S.

The page tables that we create in here live for a long time, and they
also have _PAGE_GLOBAL set, despite whether the processor supports it
or not.  It's harmless, and we got *lucky* that the pageattr code
accidentally clears it when we wipe it out of __supported_pte_mask and
then later try to mark kernel text read-only.

Comment some of these properties to make it easier to find and
understand in the future.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Kees Cook <keescook@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: x86@kernel.org
Cc: Nadav Amit <namit@vmware.com>
---

 b/arch/x86/kernel/head_64.S |   11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff -puN arch/x86/kernel/head_64.S~comment-global-page arch/x86/kernel/head_64.S
--- a/arch/x86/kernel/head_64.S~comment-global-page	2018-02-22 12:36:20.997036548 -0800
+++ b/arch/x86/kernel/head_64.S	2018-02-22 12:36:21.000036548 -0800
@@ -399,8 +399,13 @@ NEXT_PAGE(level3_ident_pgt)
 	.quad	level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
 	.fill	511, 8, 0
 NEXT_PAGE(level2_ident_pgt)
-	/* Since I easily can, map the first 1G.
+	/*
+	 * Since I easily can, map the first 1G.
 	 * Don't set NX because code runs from these pages.
+	 *
+	 * Note: This sets _PAGE_GLOBAL despite whether
+	 * the CPU supports it or it is enabled.  But,
+	 * the CPU should ignore the bit.
 	 */
 	PMDS(0, __PAGE_KERNEL_IDENT_LARGE_EXEC, PTRS_PER_PMD)
 #else
@@ -431,6 +436,10 @@ NEXT_PAGE(level2_kernel_pgt)
 	 * (NOTE: at +512MB starts the module area, see MODULES_VADDR.
 	 *  If you want to increase this then increase MODULES_VADDR
 	 *  too.)
+	 *
+	 *  This table is eventually used by the kernel during normal
+	 *  runtime.  Care must be taken to clear out undesired bits
+	 *  later, like _PAGE_RW or _PAGE_GLOBAL in some cases.
 	 */
 	PMDS(0, __PAGE_KERNEL_LARGE_EXEC,
 		KERNEL_IMAGE_SIZE/PMD_SIZE)
_

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC][PATCH 08/10] x86/mm: do not forbid _PAGE_RW before init for __ro_after_init
  2018-02-22 20:36 [RFC][PATCH 00/10] Use global pages with PTI Dave Hansen
                   ` (6 preceding siblings ...)
  2018-02-22 20:37 ` [RFC][PATCH 07/10] x86/mm: comment _PAGE_GLOBAL mystery Dave Hansen
@ 2018-02-22 20:37 ` Dave Hansen
  2018-02-22 20:53   ` Kees Cook
  2018-02-22 20:37 ` [RFC][PATCH 09/10] x86/pti: enable global pages for shared areas Dave Hansen
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 26+ messages in thread
From: Dave Hansen @ 2018-02-22 20:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Hansen, aarcange, luto, torvalds, keescook, hughd, jgross,
	x86, namit


From: Dave Hansen <dave.hansen@linux.intel.com>

__ro_after_init data gets stuck in the .rodata section.  That's normally
fine because the kernel itself manages the R/W properties.

But, if we run __change_page_attr() on an area which is __ro_after_init,
the .rodata checks will trigger and force the area to be immediately
read-only, even if it is early-ish in boot.  This caused problems when
trying to clear the _PAGE_GLOBAL bit for these area in the PTI code:
it cleared _PAGE_GLOBAL like I asked, but also took it up on itself
to clear _PAGE_RW.  The kernel then oopses the next time it wrote to
a __ro_after_init data structure.

To fix this, add the kernel_set_to_readonly check, just like we have
for kernel text, just a few lines below in this function.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Kees Cook <keescook@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: x86@kernel.org
Cc: Nadav Amit <namit@vmware.com>
---

 b/arch/x86/mm/pageattr.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff -puN arch/x86/mm/pageattr.c~check-kernel_set_to_readonly arch/x86/mm/pageattr.c
--- a/arch/x86/mm/pageattr.c~check-kernel_set_to_readonly	2018-02-22 12:36:21.531036546 -0800
+++ b/arch/x86/mm/pageattr.c	2018-02-22 12:36:21.535036546 -0800
@@ -298,9 +298,11 @@ static inline pgprot_t static_protection
 
 	/*
 	 * The .rodata section needs to be read-only. Using the pfn
-	 * catches all aliases.
+	 * catches all aliases.  This also includes __ro_after_init,
+	 * so do not enforce until kernel_set_to_readonly is true.
 	 */
-	if (within(pfn, __pa_symbol(__start_rodata) >> PAGE_SHIFT,
+	if (kernel_set_to_readonly &&
+	    within(pfn, __pa_symbol(__start_rodata) >> PAGE_SHIFT,
 		   __pa_symbol(__end_rodata) >> PAGE_SHIFT))
 		pgprot_val(forbidden) |= _PAGE_RW;
 
_

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC][PATCH 09/10] x86/pti: enable global pages for shared areas
  2018-02-22 20:36 [RFC][PATCH 00/10] Use global pages with PTI Dave Hansen
                   ` (7 preceding siblings ...)
  2018-02-22 20:37 ` [RFC][PATCH 08/10] x86/mm: do not forbid _PAGE_RW before init for __ro_after_init Dave Hansen
@ 2018-02-22 20:37 ` Dave Hansen
  2018-02-22 20:37 ` [RFC][PATCH 10/10] x86/pti: clear _PAGE_GLOBAL for kernel image Dave Hansen
  2018-02-22 21:52 ` [RFC][PATCH 00/10] Use global pages with PTI Linus Torvalds
  10 siblings, 0 replies; 26+ messages in thread
From: Dave Hansen @ 2018-02-22 20:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Hansen, aarcange, luto, torvalds, keescook, hughd, jgross,
	x86, namit


From: Dave Hansen <dave.hansen@linux.intel.com>

The entry/exit text and cpu_entry_area are mapped into userspace and
the kernel.  But, they are not _PAGE_GLOBAL.  This creates unnecessary
TLB misses.

Add the _PAGE_GLOBAL flag for these areas.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Kees Cook <keescook@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: x86@kernel.org
Cc: Nadav Amit <namit@vmware.com>
---

 b/arch/x86/mm/cpu_entry_area.c |   10 +++++++++-
 b/arch/x86/mm/pti.c            |    9 ++++++++-
 2 files changed, 17 insertions(+), 2 deletions(-)

diff -puN arch/x86/mm/cpu_entry_area.c~kpti-why-no-global arch/x86/mm/cpu_entry_area.c
--- a/arch/x86/mm/cpu_entry_area.c~kpti-why-no-global	2018-02-22 12:36:22.067036545 -0800
+++ b/arch/x86/mm/cpu_entry_area.c	2018-02-22 12:36:22.072036545 -0800
@@ -27,8 +27,16 @@ EXPORT_SYMBOL(get_cpu_entry_area);
 void cea_set_pte(void *cea_vaddr, phys_addr_t pa, pgprot_t flags)
 {
 	unsigned long va = (unsigned long) cea_vaddr;
+	pte_t pte = pfn_pte(pa >> PAGE_SHIFT, flags);
 
-	set_pte_vaddr(va, pfn_pte(pa >> PAGE_SHIFT, flags));
+	/*
+	 * The cpu_entry_area is shared between the user and kernel
+	 * page tables.  All of its ptes can safely be global.
+	 */
+	if (boot_cpu_has(X86_FEATURE_PGE))
+		pte = pte_set_flags(pte, _PAGE_GLOBAL);
+
+	set_pte_vaddr(va, pte);
 }
 
 static void __init
diff -puN arch/x86/mm/pti.c~kpti-why-no-global arch/x86/mm/pti.c
--- a/arch/x86/mm/pti.c~kpti-why-no-global	2018-02-22 12:36:22.069036545 -0800
+++ b/arch/x86/mm/pti.c	2018-02-22 12:36:22.072036545 -0800
@@ -300,6 +300,13 @@ pti_clone_pmds(unsigned long start, unsi
 			return;
 
 		/*
+		 * Setting 'target_pmd' below creates a mapping in both
+		 * the user and kernel page tables.  It is effectively
+		 * global, so set it as global in both copies.
+		 */
+		*pmd = pmd_set_flags(*pmd, _PAGE_GLOBAL);
+
+		/*
 		 * Copy the PMD.  That is, the kernelmode and usermode
 		 * tables will share the last-level page tables of this
 		 * address range
@@ -348,7 +355,7 @@ static void __init pti_clone_entry_text(
 {
 	pti_clone_pmds((unsigned long) __entry_text_start,
 			(unsigned long) __irqentry_text_end,
-		       _PAGE_RW | _PAGE_GLOBAL);
+		       _PAGE_RW);
 }
 
 /*
_

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC][PATCH 10/10] x86/pti: clear _PAGE_GLOBAL for kernel image
  2018-02-22 20:36 [RFC][PATCH 00/10] Use global pages with PTI Dave Hansen
                   ` (8 preceding siblings ...)
  2018-02-22 20:37 ` [RFC][PATCH 09/10] x86/pti: enable global pages for shared areas Dave Hansen
@ 2018-02-22 20:37 ` Dave Hansen
  2018-02-22 21:52 ` [RFC][PATCH 00/10] Use global pages with PTI Linus Torvalds
  10 siblings, 0 replies; 26+ messages in thread
From: Dave Hansen @ 2018-02-22 20:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Hansen, aarcange, luto, torvalds, keescook, hughd, jgross,
	x86, namit


From: Dave Hansen <dave.hansen@linux.intel.com>

The kernel page tables are inherited from head_64.S which rudely marks
them as _PAGE_GLOBAL.  For PTI, we have been relying on the grace of
$DEITY and some insane behavior in pageattr.c.  Now we do it properly.

First, stop filtering out "unsupported" bits from being cleared in the
pageattr code.  It's fine to filter out *setting* these bits but it
is insane to keep us from clearing them.

Then, *explicitly* go clear _PAGE_GLOBAL from the kernel identity map.
Do not rely on pageattr to do it magically.

After this patch, we can see that "GLB" shows up in each copy of the
page tables, that we have the same number of global entries in each
and that they are the *same* entries.

# grep -c GLB /sys/kernel/debug/page_tables/*
/sys/kernel/debug/page_tables/current_kernel:11
/sys/kernel/debug/page_tables/current_user:11
/sys/kernel/debug/page_tables/kernel:11

# for f in `ls /sys/kernel/debug/page_tables/`; do grep GLB /sys/kernel/debug/page_tables/$f > $f.GLB; done
# md5sum *.GLB
9caae8ad6a1fb53aca2407ec037f612d  current_kernel.GLB
9caae8ad6a1fb53aca2407ec037f612d  current_user.GLB
9caae8ad6a1fb53aca2407ec037f612d  kernel.GLB

A quick visual audit also shows that all the entries make sense.
0xfffffe0000000000 is the cpu_entry_area and 0xffffffff81c00000
is the entry/exit text:

# grep -c GLB /sys/kernel/debug/page_tables/current_user
0xfffffe0000000000-0xfffffe0000002000           8K     ro                 GLB NX pte
0xfffffe0000002000-0xfffffe0000003000           4K     RW                 GLB NX pte
0xfffffe0000003000-0xfffffe0000006000          12K     ro                 GLB NX pte
0xfffffe0000006000-0xfffffe0000007000           4K     ro                 GLB x  pte
0xfffffe0000007000-0xfffffe000000d000          24K     RW                 GLB NX pte
0xfffffe000002d000-0xfffffe000002e000           4K     ro                 GLB NX pte
0xfffffe000002e000-0xfffffe000002f000           4K     RW                 GLB NX pte
0xfffffe000002f000-0xfffffe0000032000          12K     ro                 GLB NX pte
0xfffffe0000032000-0xfffffe0000033000           4K     ro                 GLB x  pte
0xfffffe0000033000-0xfffffe0000039000          24K     RW                 GLB NX pte
0xffffffff81c00000-0xffffffff81e00000           2M     ro         PSE     GLB x  pmd

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Kees Cook <keescook@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: x86@kernel.org
Cc: Nadav Amit <namit@vmware.com>
---

 b/arch/x86/mm/init.c     |    8 +-------
 b/arch/x86/mm/pageattr.c |   12 +++++++++---
 b/arch/x86/mm/pti.c      |   26 ++++++++++++++++++++++++++
 3 files changed, 36 insertions(+), 10 deletions(-)

diff -puN arch/x86/mm/init.c~clear-global-for-pti arch/x86/mm/init.c
--- a/arch/x86/mm/init.c~clear-global-for-pti	2018-02-22 12:36:22.627036544 -0800
+++ b/arch/x86/mm/init.c	2018-02-22 12:36:22.634036544 -0800
@@ -161,12 +161,6 @@ struct map_range {
 
 static int page_size_mask;
 
-static void enable_global_pages(void)
-{
-	if (!static_cpu_has(X86_FEATURE_PTI))
-		__supported_pte_mask |= _PAGE_GLOBAL;
-}
-
 static void __init probe_page_size_mask(void)
 {
 	/*
@@ -187,7 +181,7 @@ static void __init probe_page_size_mask(
 	__supported_pte_mask &= ~_PAGE_GLOBAL;
 	if (boot_cpu_has(X86_FEATURE_PGE)) {
 		cr4_set_bits_and_update_boot(X86_CR4_PGE);
-		enable_global_pages();
+		__supported_pte_mask |= _PAGE_GLOBAL;
 	}
 
 	/* By the default is everything supported: */
diff -puN arch/x86/mm/pageattr.c~clear-global-for-pti arch/x86/mm/pageattr.c
--- a/arch/x86/mm/pageattr.c~clear-global-for-pti	2018-02-22 12:36:22.629036544 -0800
+++ b/arch/x86/mm/pageattr.c	2018-02-22 12:36:22.635036544 -0800
@@ -1411,11 +1411,11 @@ static int change_page_attr_set_clr(unsi
 	memset(&cpa, 0, sizeof(cpa));
 
 	/*
-	 * Check, if we are requested to change a not supported
-	 * feature:
+	 * Check, if we are requested to set a not supported
+	 * feature.  Clearing non-supported features is OK.
 	 */
 	mask_set = canon_pgprot(mask_set);
-	mask_clr = canon_pgprot(mask_clr);
+
 	if (!pgprot_val(mask_set) && !pgprot_val(mask_clr) && !force_split)
 		return 0;
 
@@ -1758,6 +1758,12 @@ int set_memory_4k(unsigned long addr, in
 					__pgprot(0), 1, 0, NULL);
 }
 
+int set_memory_nonglobal(unsigned long addr, int numpages)
+{
+	return change_page_attr_clear(&addr, numpages,
+				      __pgprot(_PAGE_GLOBAL), 0);
+}
+
 static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
 {
 	struct cpa_data cpa;
diff -puN arch/x86/mm/pti.c~clear-global-for-pti arch/x86/mm/pti.c
--- a/arch/x86/mm/pti.c~clear-global-for-pti	2018-02-22 12:36:22.631036544 -0800
+++ b/arch/x86/mm/pti.c	2018-02-22 12:36:22.635036544 -0800
@@ -359,6 +359,28 @@ static void __init pti_clone_entry_text(
 }
 
 /*
+ * This is the only user for it and it is not arch-generic like
+ * the other set_memory.h functions.  Just extern it.
+ */
+extern int set_memory_nonglobal(unsigned long addr, int numpages);
+void pti_set_kernel_image_nonglobal(void)
+{
+	/*
+	 * The identity map is created with PMDs, regardless of the
+	 * actual length of the kernel.  We need to clear
+	 * _PAGE_GLOBAL up to a PMD boundary, not just to the end
+	 * of the image.
+	 */
+        unsigned long start = PFN_ALIGN(_text);
+        unsigned long end = ALIGN((unsigned long)_end, PMD_PAGE_SIZE);
+
+        pr_debug("x86/pti: Set kernel image: %lx - %lx non-global\n",
+                 start, end);
+
+        set_memory_nonglobal(start, (end - start) >> PAGE_SHIFT);
+}
+
+/*
  * Initialize kernel page table isolation
  */
 void __init pti_init(void)
@@ -369,6 +391,10 @@ void __init pti_init(void)
 	pr_info("enabled\n");
 
 	pti_clone_user_shared();
+
+	/* Undo all global bits from the init pagetables in head_64.S: */
+	pti_set_kernel_image_nonglobal();
+	/* Replace some of the global bits just for shared entry text: */
 	pti_clone_entry_text();
 	pti_setup_espfix64();
 	pti_setup_vsyscall();
_

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 08/10] x86/mm: do not forbid _PAGE_RW before init for __ro_after_init
  2018-02-22 20:37 ` [RFC][PATCH 08/10] x86/mm: do not forbid _PAGE_RW before init for __ro_after_init Dave Hansen
@ 2018-02-22 20:53   ` Kees Cook
  0 siblings, 0 replies; 26+ messages in thread
From: Kees Cook @ 2018-02-22 20:53 UTC (permalink / raw)
  To: Dave Hansen
  Cc: LKML, Andrea Arcangeli, Andy Lutomirski, Linus Torvalds,
	Hugh Dickins, Juergen Gross, X86 ML, namit

On Thu, Feb 22, 2018 at 12:37 PM, Dave Hansen
<dave.hansen@linux.intel.com> wrote:
>
> From: Dave Hansen <dave.hansen@linux.intel.com>
>
> __ro_after_init data gets stuck in the .rodata section.  That's normally
> fine because the kernel itself manages the R/W properties.
>
> But, if we run __change_page_attr() on an area which is __ro_after_init,
> the .rodata checks will trigger and force the area to be immediately
> read-only, even if it is early-ish in boot.  This caused problems when
> trying to clear the _PAGE_GLOBAL bit for these area in the PTI code:
> it cleared _PAGE_GLOBAL like I asked, but also took it up on itself
> to clear _PAGE_RW.  The kernel then oopses the next time it wrote to
> a __ro_after_init data structure.
>
> To fix this, add the kernel_set_to_readonly check, just like we have
> for kernel text, just a few lines below in this function.

Yup, looks sensible. Thanks!

>
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Kees Cook <keescook@google.com>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: x86@kernel.org
> Cc: Nadav Amit <namit@vmware.com>

Acked-by: Kees Cook <keescook@chromium.org>

-Kees

> ---
>
>  b/arch/x86/mm/pageattr.c |    6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff -puN arch/x86/mm/pageattr.c~check-kernel_set_to_readonly arch/x86/mm/pageattr.c
> --- a/arch/x86/mm/pageattr.c~check-kernel_set_to_readonly       2018-02-22 12:36:21.531036546 -0800
> +++ b/arch/x86/mm/pageattr.c    2018-02-22 12:36:21.535036546 -0800
> @@ -298,9 +298,11 @@ static inline pgprot_t static_protection
>
>         /*
>          * The .rodata section needs to be read-only. Using the pfn
> -        * catches all aliases.
> +        * catches all aliases.  This also includes __ro_after_init,
> +        * so do not enforce until kernel_set_to_readonly is true.
>          */
> -       if (within(pfn, __pa_symbol(__start_rodata) >> PAGE_SHIFT,
> +       if (kernel_set_to_readonly &&
> +           within(pfn, __pa_symbol(__start_rodata) >> PAGE_SHIFT,
>                    __pa_symbol(__end_rodata) >> PAGE_SHIFT))
>                 pgprot_val(forbidden) |= _PAGE_RW;
>
> _



-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 04/10] x86/espfix: use kernel-default PTE mask
  2018-02-22 20:36 ` [RFC][PATCH 04/10] x86/espfix: use kernel-default " Dave Hansen
@ 2018-02-22 21:27   ` Nadav Amit
  2018-02-22 21:30     ` Dave Hansen
  0 siblings, 1 reply; 26+ messages in thread
From: Nadav Amit @ 2018-02-22 21:27 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-kernel, aarcange, luto, torvalds, keescook, hughd, jgross, x86

Dave Hansen <dave.hansen@linux.intel.com> wrote:

> 
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> In creating its page tables, the espfix code masks its PGTABLE_PROT
> value with the supported mask: __supported_pte_mask.  This ensures
> that unsupported bits are not set in the final PTE.  But, it also
> sets _PAGE_GLOBAL which we do not want for PTE.  Use
> __default_kernel_pte_mask instead which clears _PAGE_GLOBAL for PTI.

Can you please explain what is your concern? Exposing more gadgets for
speculative ROP attacks?

Or is it a general rule of not exposing any kernel code &data more than
absolutely necessary?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 04/10] x86/espfix: use kernel-default PTE mask
  2018-02-22 21:27   ` Nadav Amit
@ 2018-02-22 21:30     ` Dave Hansen
  2018-02-22 21:59       ` Andy Lutomirski
  0 siblings, 1 reply; 26+ messages in thread
From: Dave Hansen @ 2018-02-22 21:30 UTC (permalink / raw)
  To: Nadav Amit
  Cc: linux-kernel, aarcange, luto, torvalds, keescook, hughd, jgross, x86

On 02/22/2018 01:27 PM, Nadav Amit wrote:
> Dave Hansen <dave.hansen@linux.intel.com> wrote:
>> From: Dave Hansen <dave.hansen@linux.intel.com>
>> In creating its page tables, the espfix code masks its PGTABLE_PROT
>> value with the supported mask: __supported_pte_mask.  This ensures
>> that unsupported bits are not set in the final PTE.  But, it also
>> sets _PAGE_GLOBAL which we do not want for PTE.  Use
>> __default_kernel_pte_mask instead which clears _PAGE_GLOBAL for PTI.
> 
> Can you please explain what is your concern? Exposing more gadgets for
> speculative ROP attacks?
> 
> Or is it a general rule of not exposing any kernel code &data more than
> absolutely necessary?

I think it's good practice to just expose only the *minimal* amount of
data necessary.  It's easier to audit and less likely to expose things
accidentall.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 05/10] x86/mm: do not auto-massage page protections
  2018-02-22 20:37 ` [RFC][PATCH 05/10] x86/mm: do not auto-massage page protections Dave Hansen
@ 2018-02-22 21:46   ` Nadav Amit
  2018-02-22 21:52     ` Dave Hansen
  0 siblings, 1 reply; 26+ messages in thread
From: Nadav Amit @ 2018-02-22 21:46 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-kernel, aarcange, luto, torvalds, keescook, hughd, jgross, x86

Dave Hansen <dave.hansen@linux.intel.com> wrote:

> 
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> 
> +static inline pgprotval_t check_pgprot(pgprot_t pgprot)
> +{
> +	pgprotval_t massaged_val = massage_pgprot(pgprot);
> +
> +	WARN_ONCE(pgprot_val(pgprot) != massaged_val,
> +		  "attempted to set unsupported pgprot: %016lx "
> +		  "bits: %016lx supported: %016lx\n",
> +		  pgprot_val(pgprot),
> +		  pgprot_val(pgprot) ^ massaged_val,
> +		  __supported_pte_mask);

Perhaps use VM_WARN_ONCE instead to avoid any overhead on production
systems?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 00/10] Use global pages with PTI
  2018-02-22 20:36 [RFC][PATCH 00/10] Use global pages with PTI Dave Hansen
                   ` (9 preceding siblings ...)
  2018-02-22 20:37 ` [RFC][PATCH 10/10] x86/pti: clear _PAGE_GLOBAL for kernel image Dave Hansen
@ 2018-02-22 21:52 ` Linus Torvalds
  2018-02-24  1:49   ` Dave Hansen
  10 siblings, 1 reply; 26+ messages in thread
From: Linus Torvalds @ 2018-02-22 21:52 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Linux Kernel Mailing List, Andrea Arcangeli, Andrew Lutomirski,
	Kees Cook, Hugh Dickins, Jürgen Groß,
	the arch/x86 maintainers, namit

Side note - and this may be crazy talk - I wonder if it might make
sense to have a mode where we allow executable read-only kernel pages
to be marked global too (but only in the kernel mapping).

Yes, yes, that potentially means that you can run meltdown on recently
run kernel code that is still in the TLB, but even if so, that may be
something people are ok with.

Because it would leak data that can generally be gotten other ways
(aka "just download the vendor kernel from a website"), and from a
security standpoint the biggest issue is likely that you can break
KASLR. But again, that is something that some people may be perfectly
fine with - it's certainly much less of an issue than leaking actual
*data*.

And it might be an easy option to give people, with something like
"pti=data" or similar.

It's also quite possible that depending on how separate the ITLB and
DTLB is, an ITLB entry might not even really be amenable to Meltdown
leaking at all.

Since the pages wouldn't actually exist in the user page tables, the
only sign of them would be in the TLB entries, and if the ITLB is
separate enough it might be practically impossible to really use those
ITLB entries for meltdown.

Of course, maybe the performance advantage from keeping the ITLB
entries around isn't huge, but this *may* be worth at least asking
some Intel architects about?

Because my gut feel is that a noticeable amount of the TLB cost from
PTI comes from the code side. I suspect a lot of kernel code has a
bigger instruction footprint than data.

(And yes, like this set of global bit patches, it probably matters a
whole lot more on CPU's that don't have PCID. With PCID it's probably
not all that noticeable at all).

             Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 05/10] x86/mm: do not auto-massage page protections
  2018-02-22 21:46   ` Nadav Amit
@ 2018-02-22 21:52     ` Dave Hansen
  0 siblings, 0 replies; 26+ messages in thread
From: Dave Hansen @ 2018-02-22 21:52 UTC (permalink / raw)
  To: Nadav Amit
  Cc: linux-kernel, aarcange, luto, torvalds, keescook, hughd, jgross, x86

On 02/22/2018 01:46 PM, Nadav Amit wrote:
>>
>> +static inline pgprotval_t check_pgprot(pgprot_t pgprot)
>> +{
>> +	pgprotval_t massaged_val = massage_pgprot(pgprot);
>> +
>> +	WARN_ONCE(pgprot_val(pgprot) != massaged_val,
>> +		  "attempted to set unsupported pgprot: %016lx "
>> +		  "bits: %016lx supported: %016lx\n",
>> +		  pgprot_val(pgprot),
>> +		  pgprot_val(pgprot) ^ massaged_val,
>> +		  __supported_pte_mask);
> Perhaps use VM_WARN_ONCE instead to avoid any overhead on production
> systems?

Sounds sane enough.  I'll change it.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 04/10] x86/espfix: use kernel-default PTE mask
  2018-02-22 21:30     ` Dave Hansen
@ 2018-02-22 21:59       ` Andy Lutomirski
  2018-02-22 22:05         ` Dave Hansen
  0 siblings, 1 reply; 26+ messages in thread
From: Andy Lutomirski @ 2018-02-22 21:59 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Nadav Amit, linux-kernel, aarcange, luto, torvalds, keescook,
	hughd, jgross, x86

On Thu, Feb 22, 2018 at 9:30 PM, Dave Hansen
<dave.hansen@linux.intel.com> wrote:
> On 02/22/2018 01:27 PM, Nadav Amit wrote:
>> Dave Hansen <dave.hansen@linux.intel.com> wrote:
>>> From: Dave Hansen <dave.hansen@linux.intel.com>
>>> In creating its page tables, the espfix code masks its PGTABLE_PROT
>>> value with the supported mask: __supported_pte_mask.  This ensures
>>> that unsupported bits are not set in the final PTE.  But, it also
>>> sets _PAGE_GLOBAL which we do not want for PTE.  Use
>>> __default_kernel_pte_mask instead which clears _PAGE_GLOBAL for PTI.
>>
>> Can you please explain what is your concern? Exposing more gadgets for
>> speculative ROP attacks?
>>
>> Or is it a general rule of not exposing any kernel code &data more than
>> absolutely necessary?
>
> I think it's good practice to just expose only the *minimal* amount of
> data necessary.  It's easier to audit and less likely to expose things
> accidentall.

But espfix64 is geniunely global.  I'm confused.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 04/10] x86/espfix: use kernel-default PTE mask
  2018-02-22 21:59       ` Andy Lutomirski
@ 2018-02-22 22:05         ` Dave Hansen
  0 siblings, 0 replies; 26+ messages in thread
From: Dave Hansen @ 2018-02-22 22:05 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Nadav Amit, linux-kernel, aarcange, torvalds, keescook, hughd,
	jgross, x86

On 02/22/2018 01:59 PM, Andy Lutomirski wrote:
>> I think it's good practice to just expose only the *minimal* amount of
>> data necessary.  It's easier to audit and less likely to expose things
>> accidentall.
> But espfix64 is geniunely global.  I'm confused.

I'm the confused one.

In my *first* set of patches to do this, I missed making the espfix64
area global.  This new code apparently got it right, but I assumed the
"new" global area that showed up was a mistake, thus this patch.

I'll change this patch into a commenting patch that calls out the need
for keeping _PAGE_GLOBAL.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 03/10] x86/mm: introduce "default" kernel PTE mask
  2018-02-22 20:36 ` [RFC][PATCH 03/10] x86/mm: introduce "default" kernel PTE mask Dave Hansen
@ 2018-02-22 22:21   ` Nadav Amit
  2018-02-22 22:26     ` Dave Hansen
  0 siblings, 1 reply; 26+ messages in thread
From: Nadav Amit @ 2018-02-22 22:21 UTC (permalink / raw)
  To: Dave Hansen
  Cc: LKML, Andrea Arcangeli, Andy Lutomirski, Linus Torvalds,
	keescook, hughd, jgross, x86

Dave Hansen <dave.hansen@linux.intel.com> wrote:

> 
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> The __PAGE_KERNEL_* page permissions are "raw".  They contain bits
> that may or may not be supported on the current processor.  They
> need to be filtered by a mask (currently __supported_pte_mask) to
> turn them into a value that we can actually set in a PTE.
> 
> These __PAGE_KERNEL_* values all contain _PAGE_GLOBAL.  But, with
> PTI, we want to be able to support _PAGE_GLOBAL (have the bit set
> in __supported_pte_mask) but not have it appear in any of these
> masks by default.

There might be a potential issue with this approach. __supported_pte_mask is
exported, so out-of-tree modules might use it. They therefore can
unknowingly use this value to set PTEs with _PAGE_GLOBAL set.

I do not know if it is a real issue, but leaving __supported_pte_mask as it
is now (with _PAGE_GLOBAL masked), and using a different variable for with
_PAGE_GLOBAL unmasked (i.e., the real “__supported_pte_mask”) can solve it.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 03/10] x86/mm: introduce "default" kernel PTE mask
  2018-02-22 22:21   ` Nadav Amit
@ 2018-02-22 22:26     ` Dave Hansen
  2018-02-22 23:11       ` Tom Lendacky
  0 siblings, 1 reply; 26+ messages in thread
From: Dave Hansen @ 2018-02-22 22:26 UTC (permalink / raw)
  To: Nadav Amit
  Cc: LKML, Andrea Arcangeli, Andy Lutomirski, Linus Torvalds,
	keescook, hughd, jgross, x86

On 02/22/2018 02:21 PM, Nadav Amit wrote:
> Dave Hansen <dave.hansen@linux.intel.com> wrote:
>> From: Dave Hansen <dave.hansen@linux.intel.com>
>> The __PAGE_KERNEL_* page permissions are "raw".  They contain bits
>> that may or may not be supported on the current processor.  They
>> need to be filtered by a mask (currently __supported_pte_mask) to
>> turn them into a value that we can actually set in a PTE.
>>
>> These __PAGE_KERNEL_* values all contain _PAGE_GLOBAL.  But, with
>> PTI, we want to be able to support _PAGE_GLOBAL (have the bit set
>> in __supported_pte_mask) but not have it appear in any of these
>> masks by default.
> 
> There might be a potential issue with this approach. __supported_pte_mask is
> exported, so out-of-tree modules might use it. They therefore can
> unknowingly use this value to set PTEs with _PAGE_GLOBAL set.

I don't think we can help out-of-tree modules getting this wrong.
They're OK if they use PAGE_KERNEL*, btw.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 03/10] x86/mm: introduce "default" kernel PTE mask
  2018-02-22 22:26     ` Dave Hansen
@ 2018-02-22 23:11       ` Tom Lendacky
  2018-02-23 23:46         ` Dave Hansen
  0 siblings, 1 reply; 26+ messages in thread
From: Tom Lendacky @ 2018-02-22 23:11 UTC (permalink / raw)
  To: Dave Hansen, Nadav Amit
  Cc: LKML, Andrea Arcangeli, Andy Lutomirski, Linus Torvalds,
	keescook, hughd, jgross, x86

On 2/22/2018 4:26 PM, Dave Hansen wrote:
> On 02/22/2018 02:21 PM, Nadav Amit wrote:
>> Dave Hansen <dave.hansen@linux.intel.com> wrote:
>>> From: Dave Hansen <dave.hansen@linux.intel.com>
>>> The __PAGE_KERNEL_* page permissions are "raw".  They contain bits
>>> that may or may not be supported on the current processor.  They
>>> need to be filtered by a mask (currently __supported_pte_mask) to
>>> turn them into a value that we can actually set in a PTE.
>>>
>>> These __PAGE_KERNEL_* values all contain _PAGE_GLOBAL.  But, with
>>> PTI, we want to be able to support _PAGE_GLOBAL (have the bit set
>>> in __supported_pte_mask) but not have it appear in any of these
>>> masks by default.
>>
>> There might be a potential issue with this approach. __supported_pte_mask is
>> exported, so out-of-tree modules might use it. They therefore can
>> unknowingly use this value to set PTEs with _PAGE_GLOBAL set.
> 
> I don't think we can help out-of-tree modules getting this wrong.
> They're OK if they use PAGE_KERNEL*, btw.

You will probably need to change __default_kernel_pte_mask to be an
EXPORT_SYMBOL instead of an EXPORT_SYMBOL_GPL for (some) out-of-tree
modules.

Thanks,
Tom

> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 03/10] x86/mm: introduce "default" kernel PTE mask
  2018-02-22 23:11       ` Tom Lendacky
@ 2018-02-23 23:46         ` Dave Hansen
  0 siblings, 0 replies; 26+ messages in thread
From: Dave Hansen @ 2018-02-23 23:46 UTC (permalink / raw)
  To: Tom Lendacky, Nadav Amit
  Cc: LKML, Andrea Arcangeli, Andy Lutomirski, Linus Torvalds,
	keescook, hughd, jgross, x86

On 02/22/2018 03:11 PM, Tom Lendacky wrote:
>> I don't think we can help out-of-tree modules getting this wrong.
>> They're OK if they use PAGE_KERNEL*, btw.
> You will probably need to change __default_kernel_pte_mask to be an
> EXPORT_SYMBOL instead of an EXPORT_SYMBOL_GPL for (some) out-of-tree
> modules.

As much as I detest missing an opportunity to kick non-GPL drivers in
the shins, I guess it is pretty cruel to break them for doing things
like vmap(PAGE_KERNEL).

I'll make it plain EXPORT_SYMBOL().

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 00/10] Use global pages with PTI
  2018-02-22 21:52 ` [RFC][PATCH 00/10] Use global pages with PTI Linus Torvalds
@ 2018-02-24  1:49   ` Dave Hansen
  2018-02-24  4:20     ` Linus Torvalds
  0 siblings, 1 reply; 26+ messages in thread
From: Dave Hansen @ 2018-02-24  1:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Andrea Arcangeli, Andrew Lutomirski,
	Kees Cook, Hugh Dickins, Jürgen Groß,
	the arch/x86 maintainers, namit

On 02/22/2018 01:52 PM, Linus Torvalds wrote:
> Side note - and this may be crazy talk - I wonder if it might make
> sense to have a mode where we allow executable read-only kernel pages
> to be marked global too (but only in the kernel mapping).

We did that accidentally, somewhere.  It causes machine checks on K8's
iirc, which is fun (52994c256df fixed it).  So, we'd need to make sure
we avoid it there, or just make it global in the user mapping too.

> Of course, maybe the performance advantage from keeping the ITLB
> entries around isn't huge, but this *may* be worth at least asking
> some Intel architects about?

I kinda doubt it's worth the trouble.  Like you said, this probably
doesn't even matter when we have PCID support.  Also, we'll usually map
all of this text with 2M pages, minus whatever hangs over into the last
2M page of text.  My laptop looks like this:

> 0xffffffff81000000-0xffffffff81c00000          12M     ro         PSE         x  pmd
> 0xffffffff81c00000-0xffffffff81c0b000          44K     ro                     x  pte

So, even if we've flushed these entries, we can get all of them back
with a single cacheline worth of PMD entries.

Just for fun, I tried a 4-core Skylake system with KPTI and nopcid and
compiled a random kernel 10 times.  I did three configs: no global, all
kernel text global + cpu_entry_area, and only cpu_entry_area + entry
text.  The delta percentages are from the Baseline.  The deltas are
measurable, but the largest bang for our buck is obviously the entry text.

			User Time	Kernel Time	Clock Elapsed
Baseline (33 GLB PTEs)	907.6	        81.6		264.7
Entry    (28 GLB PTEs)	910.9 (+0.4%)	84.0 (+2.9%)	265.2 (+0.2%)
No global( 0 GLB PTEs)  914.2 (+0.7%)	89.2 (+9.3%)	267.8 (+1.2%)

It's a single line of code to go from the "33" to "28" configuration, so
it's totally doable.  But, it means having and parsing another boot
option that confuses people and then I have to go write actual
documentation, which I detest. :)

My inclination would be to just do the "entry" stuff as global just as
this set left things and leave it at that.

I also measured frontend stalls with the toplev.py tool[1].  They show
roughly the same thing, but a bit magnified since I was only monitoring
the kernel and because in some of these cases, even if we stop being
iTLB bound we just bottleneck on something else.

I ran:

	python ~/pmu-tools/toplev.py --kernel --level 3 make -j8

And looked for the relevant ITLB misses in the output:

Baseline
> FE             Frontend_Bound:        24.33 % Slots  [  7.68%]
> FE                ITLB_Misses:         5.16 % Clocks [  7.73%]
Entry:
> FE             Frontend_Bound:        26.62 % Slots  [  7.75%]
> FE                ITLB_Misses:        12.50 % Clocks [  7.74%]
No global:
> FE             Frontend_Bound:        27.58 % Slots  [  7.65%]
> FE                ITLB_Misses:        14.74 % Clocks [  7.71%]

1. https://github.com/andikleen/pmu-tools

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 00/10] Use global pages with PTI
  2018-02-24  1:49   ` Dave Hansen
@ 2018-02-24  4:20     ` Linus Torvalds
  2018-02-24  4:34       ` [RFC][PATCH 00/10] Use global pages with PTI - Truth about the white man thetruthbeforeus
  0 siblings, 1 reply; 26+ messages in thread
From: Linus Torvalds @ 2018-02-24  4:20 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Linux Kernel Mailing List, Andrea Arcangeli, Andrew Lutomirski,
	Kees Cook, Hugh Dickins, Jürgen Groß,
	the arch/x86 maintainers, namit

On Fri, Feb 23, 2018 at 5:49 PM, Dave Hansen
<dave.hansen@linux.intel.com> wrote:
> On 02/22/2018 01:52 PM, Linus Torvalds wrote:
>> Side note - and this may be crazy talk - I wonder if it might make
>> sense to have a mode where we allow executable read-only kernel pages
>> to be marked global too (but only in the kernel mapping).
>
> We did that accidentally, somewhere.  It causes machine checks on K8's
> iirc, which is fun (52994c256df fixed it).  So, we'd need to make sure
> we avoid it there, or just make it global in the user mapping too.

They'd be missing _entirely_ in the user maps, which should be fine.
The problem that commit 52994c256df3 fixed was that they actually
existed in the user maps, just with different data, and then you can
have a ITLB and a DTLB entry for the same address that don't match
(because one has been loaded from the kernel mapping and the other
from the user one).

But when the address isn't mapped at all in the user map, that should
be fine - because there's no associated TLB entry to get mixed up
about.

It's no different from clearing a page from the page table before then
flushing the TLB entry later - which is the normal (and required)
behavior for unmapping a page. For a while it exists in the TLB
without existing in the page tables.

> Just for fun, I tried a 4-core Skylake system with KPTI and nopcid and
> compiled a random kernel 10 times.  I did three configs: no global, all
> kernel text global + cpu_entry_area, and only cpu_entry_area + entry
> text.  The delta percentages are from the Baseline.  The deltas are
> measurable, but the largest bang for our buck is obviously the entry text.
>
>                         User Time       Kernel Time     Clock Elapsed
> Baseline (33 GLB PTEs)  907.6           81.6            264.7
> Entry    (28 GLB PTEs)  910.9 (+0.4%)   84.0 (+2.9%)    265.2 (+0.2%)
> No global( 0 GLB PTEs)  914.2 (+0.7%)   89.2 (+9.3%)    267.8 (+1.2%)

That's actually noticeable. Maybe not so much in the final elapsed
time itself, but almost 3% for just the kernel side sounds meaningful.

Of course, that's with nopcid, so it's a fairly small special case, but still.

> It's a single line of code to go from the "33" to "28" configuration, so
> it's totally doable.  But, it means having and parsing another boot
> option that confuses people and then I have to go write actual
> documentation, which I detest. :)

Heh.

Ok, maybe the complexity isn't in the code, but in the concept.

               Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 00/10] Use global pages with PTI - Truth about the white man.
  2018-02-24  4:20     ` Linus Torvalds
@ 2018-02-24  4:34       ` thetruthbeforeus
  0 siblings, 0 replies; 26+ messages in thread
From: thetruthbeforeus @ 2018-02-24  4:34 UTC (permalink / raw)


Linus, this talk about the memory map bullshit is interesting and all,
with that binary encoding and shit. But I want you to take a moment and
reflect. I want you to reflect on truth.

Ask yourself. "Am I a white man" and then listen to those who...
who see you ALL for what you are and couldn't be.

Take a listen:
http://www.liveleak.com/view?i=017_1519418755

Or are you too ... well lets just let that one slide.

On 2018-02-24 04:20, Linus Torvalds wrote:
> On Fri, Feb 23, 2018 at 5:49 PM, Dave Hansen
> <dave.hansen@linux.intel.com> wrote:
>> On 02/22/2018 01:52 PM, Linus Torvalds wrote:
>>> Side note - and this may be crazy talk - I wonder if it might make
>>> sense to have a mode where we allow executable read-only kernel pages
>>> to be marked global too (but only in the kernel mapping).
>> 
>> We did that accidentally, somewhere.  It causes machine checks on K8's
>> iirc, which is fun (52994c256df fixed it).  So, we'd need to make sure
>> we avoid it there, or just make it global in the user mapping too.
> 
> They'd be missing _entirely_ in the user maps, which should be fine.
> The problem that commit 52994c256df3 fixed was that they actually
> existed in the user maps, just with different data, and then you can
> have a ITLB and a DTLB entry for the same address that don't match
> (because one has been loaded from the kernel mapping and the other
> from the user one).
> 
> But when the address isn't mapped at all in the user map, that should
> be fine - because there's no associated TLB entry to get mixed up
> about.
> 
> It's no different from clearing a page from the page table before then
> flushing the TLB entry later - which is the normal (and required)
> behavior for unmapping a page. For a while it exists in the TLB
> without existing in the page tables.
> 
>> Just for fun, I tried a 4-core Skylake system with KPTI and nopcid and
>> compiled a random kernel 10 times.  I did three configs: no global, 
>> all
>> kernel text global + cpu_entry_area, and only cpu_entry_area + entry
>> text.  The delta percentages are from the Baseline.  The deltas are
>> measurable, but the largest bang for our buck is obviously the entry 
>> text.
>> 
>>                         User Time       Kernel Time     Clock Elapsed
>> Baseline (33 GLB PTEs)  907.6           81.6            264.7
>> Entry    (28 GLB PTEs)  910.9 (+0.4%)   84.0 (+2.9%)    265.2 (+0.2%)
>> No global( 0 GLB PTEs)  914.2 (+0.7%)   89.2 (+9.3%)    267.8 (+1.2%)
> 
> That's actually noticeable. Maybe not so much in the final elapsed
> time itself, but almost 3% for just the kernel side sounds meaningful.
> 
> Of course, that's with nopcid, so it's a fairly small special case, but 
> still.
> 
>> It's a single line of code to go from the "33" to "28" configuration, 
>> so
>> it's totally doable.  But, it means having and parsing another boot
>> option that confuses people and then I have to go write actual
>> documentation, which I detest. :)
> 
> Heh.
> 
> Ok, maybe the complexity isn't in the code, but in the concept.
> 
>                Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2018-03-02  6:22 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-22 20:36 [RFC][PATCH 00/10] Use global pages with PTI Dave Hansen
2018-02-22 20:36 ` [RFC][PATCH 01/10] x86/mm: factor out pageattr _PAGE_GLOBAL setting Dave Hansen
2018-02-22 20:36 ` [RFC][PATCH 02/10] x86/mm: undo double _PAGE_PSE clearing Dave Hansen
2018-02-22 20:36 ` [RFC][PATCH 03/10] x86/mm: introduce "default" kernel PTE mask Dave Hansen
2018-02-22 22:21   ` Nadav Amit
2018-02-22 22:26     ` Dave Hansen
2018-02-22 23:11       ` Tom Lendacky
2018-02-23 23:46         ` Dave Hansen
2018-02-22 20:36 ` [RFC][PATCH 04/10] x86/espfix: use kernel-default " Dave Hansen
2018-02-22 21:27   ` Nadav Amit
2018-02-22 21:30     ` Dave Hansen
2018-02-22 21:59       ` Andy Lutomirski
2018-02-22 22:05         ` Dave Hansen
2018-02-22 20:37 ` [RFC][PATCH 05/10] x86/mm: do not auto-massage page protections Dave Hansen
2018-02-22 21:46   ` Nadav Amit
2018-02-22 21:52     ` Dave Hansen
2018-02-22 20:37 ` [RFC][PATCH 06/10] x86/mm: remove extra filtering in pageattr code Dave Hansen
2018-02-22 20:37 ` [RFC][PATCH 07/10] x86/mm: comment _PAGE_GLOBAL mystery Dave Hansen
2018-02-22 20:37 ` [RFC][PATCH 08/10] x86/mm: do not forbid _PAGE_RW before init for __ro_after_init Dave Hansen
2018-02-22 20:53   ` Kees Cook
2018-02-22 20:37 ` [RFC][PATCH 09/10] x86/pti: enable global pages for shared areas Dave Hansen
2018-02-22 20:37 ` [RFC][PATCH 10/10] x86/pti: clear _PAGE_GLOBAL for kernel image Dave Hansen
2018-02-22 21:52 ` [RFC][PATCH 00/10] Use global pages with PTI Linus Torvalds
2018-02-24  1:49   ` Dave Hansen
2018-02-24  4:20     ` Linus Torvalds
2018-02-24  4:34       ` [RFC][PATCH 00/10] Use global pages with PTI - Truth about the white man thetruthbeforeus

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.