linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/11] Use global pages with PTI
@ 2018-03-23 17:44 Dave Hansen
  2018-03-23 17:44 ` [PATCH 01/11] x86/mm: factor out pageattr _PAGE_GLOBAL setting Dave Hansen
                   ` (11 more replies)
  0 siblings, 12 replies; 38+ messages in thread
From: Dave Hansen @ 2018-03-23 17:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, Dave Hansen, aarcange, luto, torvalds, keescook, hughd,
	jgross, x86, namit

The later verions of the KAISER pathces (pre-PTI) allowed the user/kernel
shared areas to be GLOBAL.  The thought was that this would reduce the
TLB overhead of keeping two copies of these mappings.

During the switch over to PTI, we seem to have lost our ability to have
GLOBAL mappings.  This adds them back.

This adds one major change from the last version of the patch set
(present in the last patch).  It makes all kernel text global for non-
PCID systems.  This keeps kernel data protected always, but means that
it will be easier to find kernel gadgets via meltdown on old systems
without PCIDs.  This heuristic is, I think, a reasonable one and it
keeps us from having to create any new pti=foo options

Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Kees Cook <keescook@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: x86@kernel.org
Cc: Nadav Amit <namit@vmware.com>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 01/11] x86/mm: factor out pageattr _PAGE_GLOBAL setting
  2018-03-23 17:44 [PATCH 00/11] Use global pages with PTI Dave Hansen
@ 2018-03-23 17:44 ` Dave Hansen
  2018-03-23 17:44 ` [PATCH 02/11] x86/mm: undo double _PAGE_PSE clearing Dave Hansen
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 38+ messages in thread
From: Dave Hansen @ 2018-03-23 17:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, Dave Hansen, aarcange, luto, torvalds, keescook, hughd,
	jgross, x86, namit


From: Dave Hansen <dave.hansen@linux.intel.com>

The pageattr code has a pattern repeated where it sets
_PAGE_GLOBAL for present PTEs but clears it for non-present PTEs.
The intention is to keep _PAGE_GLOBAL from getting confused
with _PAGE_PROTNONE since _PAGE_GLOBAL is for present PTEs and
_PAGE_PROTNONE is for non-present

But, this pattern makes no sense.  Effectively, it says, if
you use the pageattr code, always set _PAGE_GLOBAL when
_PAGE_PRESENT.  canon_pgprot() will clear it if unsupported,
but we *always* set it.

This gets confusing when we have PTI and non-PTI and we want
some areas to have _PAGE_GLOBAL and some not.

This updated version of the code says:
1. Clear _PAGE_GLOBAL when !_PAGE_PRESENT
2. Never set _PAGE_GLOBAL implicitly
3. Allow _PAGE_GLOBAL to be in cpa.set_mask
4. Allow _PAGE_GLOBAL to be inherited from previous PTE

Aside: _PAGE_GLOBAL is ignored when CR4.PGE=1, so why do we
even go to the trouble of filtering it anywhere?

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Kees Cook <keescook@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: x86@kernel.org
Cc: Nadav Amit <namit@vmware.com>
---

 b/arch/x86/mm/pageattr.c |   68 ++++++++++++++++-------------------------------
 1 file changed, 24 insertions(+), 44 deletions(-)

diff -puN arch/x86/mm/pageattr.c~kpti-centralize-global-setting arch/x86/mm/pageattr.c
--- a/arch/x86/mm/pageattr.c~kpti-centralize-global-setting	2018-03-21 16:31:56.262192322 -0700
+++ b/arch/x86/mm/pageattr.c	2018-03-21 16:31:56.266192322 -0700
@@ -512,6 +512,23 @@ static void __set_pmd_pte(pte_t *kpte, u
 #endif
 }
 
+static pgprot_t pgprot_clear_protnone_bits(pgprot_t prot)
+{
+	/*
+	 * _PAGE_GLOBAL means "global page" for present PTEs.
+	 * But, it is also used to indicate _PAGE_PROTNONE
+	 * for non-present PTEs.
+	 *
+	 * This ensures that a _PAGE_GLOBAL PTE going from
+	 * present to non-present is not confused as
+	 * _PAGE_PROTNONE.
+	 */
+	if (!(pgprot_val(prot) & _PAGE_PRESENT))
+		pgprot_val(prot) &= ~_PAGE_GLOBAL;
+
+	return prot;
+}
+
 static int
 try_preserve_large_page(pte_t *kpte, unsigned long address,
 			struct cpa_data *cpa)
@@ -577,18 +594,11 @@ try_preserve_large_page(pte_t *kpte, uns
 	 * different bit positions in the two formats.
 	 */
 	req_prot = pgprot_4k_2_large(req_prot);
-
-	/*
-	 * Set the PSE and GLOBAL flags only if the PRESENT flag is
-	 * set otherwise pmd_present/pmd_huge will return true even on
-	 * a non present pmd. The canon_pgprot will clear _PAGE_GLOBAL
-	 * for the ancient hardware that doesn't support it.
-	 */
-	if (pgprot_val(req_prot) & _PAGE_PRESENT)
-		pgprot_val(req_prot) |= _PAGE_PSE | _PAGE_GLOBAL;
+	req_prot = pgprot_clear_protnone_bits(req_prot);
+        if (pgprot_val(req_prot) & _PAGE_PRESENT)
+		pgprot_val(req_prot) |= _PAGE_PSE;
 	else
-		pgprot_val(req_prot) &= ~(_PAGE_PSE | _PAGE_GLOBAL);
-
+		pgprot_val(req_prot) &= ~_PAGE_PSE;
 	req_prot = canon_pgprot(req_prot);
 
 	/*
@@ -698,16 +708,7 @@ __split_large_page(struct cpa_data *cpa,
 		return 1;
 	}
 
-	/*
-	 * Set the GLOBAL flags only if the PRESENT flag is set
-	 * otherwise pmd/pte_present will return true even on a non
-	 * present pmd/pte. The canon_pgprot will clear _PAGE_GLOBAL
-	 * for the ancient hardware that doesn't support it.
-	 */
-	if (pgprot_val(ref_prot) & _PAGE_PRESENT)
-		pgprot_val(ref_prot) |= _PAGE_GLOBAL;
-	else
-		pgprot_val(ref_prot) &= ~_PAGE_GLOBAL;
+	ref_prot = pgprot_clear_protnone_bits(ref_prot);
 
 	/*
 	 * Get the target pfn from the original entry:
@@ -930,18 +931,7 @@ static void populate_pte(struct cpa_data
 
 	pte = pte_offset_kernel(pmd, start);
 
-	/*
-	 * Set the GLOBAL flags only if the PRESENT flag is
-	 * set otherwise pte_present will return true even on
-	 * a non present pte. The canon_pgprot will clear
-	 * _PAGE_GLOBAL for the ancient hardware that doesn't
-	 * support it.
-	 */
-	if (pgprot_val(pgprot) & _PAGE_PRESENT)
-		pgprot_val(pgprot) |= _PAGE_GLOBAL;
-	else
-		pgprot_val(pgprot) &= ~_PAGE_GLOBAL;
-
+	pgprot = pgprot_clear_protnone_bits(pgprot);
 	pgprot = canon_pgprot(pgprot);
 
 	while (num_pages-- && start < end) {
@@ -1234,17 +1224,7 @@ repeat:
 
 		new_prot = static_protections(new_prot, address, pfn);
 
-		/*
-		 * Set the GLOBAL flags only if the PRESENT flag is
-		 * set otherwise pte_present will return true even on
-		 * a non present pte. The canon_pgprot will clear
-		 * _PAGE_GLOBAL for the ancient hardware that doesn't
-		 * support it.
-		 */
-		if (pgprot_val(new_prot) & _PAGE_PRESENT)
-			pgprot_val(new_prot) |= _PAGE_GLOBAL;
-		else
-			pgprot_val(new_prot) &= ~_PAGE_GLOBAL;
+		new_prot = pgprot_clear_protnone_bits(new_prot);
 
 		/*
 		 * We need to keep the pfn from the existing PTE,
_

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 02/11] x86/mm: undo double _PAGE_PSE clearing
  2018-03-23 17:44 [PATCH 00/11] Use global pages with PTI Dave Hansen
  2018-03-23 17:44 ` [PATCH 01/11] x86/mm: factor out pageattr _PAGE_GLOBAL setting Dave Hansen
@ 2018-03-23 17:44 ` Dave Hansen
  2018-03-23 17:44 ` [PATCH 03/11] x86/mm: introduce "default" kernel PTE mask Dave Hansen
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 38+ messages in thread
From: Dave Hansen @ 2018-03-23 17:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, Dave Hansen, aarcange, luto, torvalds, keescook, hughd,
	jgross, x86, namit


From: Dave Hansen <dave.hansen@linux.intel.com>

When clearing _PAGE_PRESENT on a huge page, we need to be careful
to also clear _PAGE_PSE, otherwise it might still get confused
for a valid large page table entry.

We do that near the spot where we *set* _PAGE_PSE.  That's fine,
but it's unnecessary.  pgprot_large_2_4k() already did it.

BTW, I also noticed that pgprot_large_2_4k() and
pgprot_4k_2_large() are not symmetric.  pgprot_large_2_4k() clears
_PAGE_PSE (because it is aliased to _PAGE_PAT) but
pgprot_4k_2_large() does not put _PAGE_PSE back.  Bummer.

Also, add some comments and change "promote" to "move".  "Promote"
seems an odd word to move when we are logically moving a bit to a
lower bit position.  Also add an extra line return to make it clear
to which line the comment applies.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Kees Cook <keescook@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: x86@kernel.org
Cc: Nadav Amit <namit@vmware.com>
---

 b/arch/x86/mm/pageattr.c |    9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff -puN arch/x86/mm/pageattr.c~kpti-undo-double-_PAGE_PSE-clear arch/x86/mm/pageattr.c
--- a/arch/x86/mm/pageattr.c~kpti-undo-double-_PAGE_PSE-clear	2018-03-21 16:31:56.803192321 -0700
+++ b/arch/x86/mm/pageattr.c	2018-03-21 16:31:56.806192321 -0700
@@ -583,6 +583,7 @@ try_preserve_large_page(pte_t *kpte, uns
 	 * up accordingly.
 	 */
 	old_pte = *kpte;
+	/* Clear PSE (aka _PAGE_PAT) and move PAT bit to correct position */
 	req_prot = pgprot_large_2_4k(old_prot);
 
 	pgprot_val(req_prot) &= ~pgprot_val(cpa->mask_clr);
@@ -597,8 +598,6 @@ try_preserve_large_page(pte_t *kpte, uns
 	req_prot = pgprot_clear_protnone_bits(req_prot);
         if (pgprot_val(req_prot) & _PAGE_PRESENT)
 		pgprot_val(req_prot) |= _PAGE_PSE;
-	else
-		pgprot_val(req_prot) &= ~_PAGE_PSE;
 	req_prot = canon_pgprot(req_prot);
 
 	/*
@@ -684,8 +683,12 @@ __split_large_page(struct cpa_data *cpa,
 	switch (level) {
 	case PG_LEVEL_2M:
 		ref_prot = pmd_pgprot(*(pmd_t *)kpte);
-		/* clear PSE and promote PAT bit to correct position */
+		/*
+		 * Clear PSE (aka _PAGE_PAT) and move
+		 * PAT bit to correct position.
+		 */
 		ref_prot = pgprot_large_2_4k(ref_prot);
+
 		ref_pfn = pmd_pfn(*(pmd_t *)kpte);
 		break;
 
_

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 03/11] x86/mm: introduce "default" kernel PTE mask
  2018-03-23 17:44 [PATCH 00/11] Use global pages with PTI Dave Hansen
  2018-03-23 17:44 ` [PATCH 01/11] x86/mm: factor out pageattr _PAGE_GLOBAL setting Dave Hansen
  2018-03-23 17:44 ` [PATCH 02/11] x86/mm: undo double _PAGE_PSE clearing Dave Hansen
@ 2018-03-23 17:44 ` Dave Hansen
  2018-03-23 17:44 ` [PATCH 04/11] x86/espfix: document use of _PAGE_GLOBAL Dave Hansen
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 38+ messages in thread
From: Dave Hansen @ 2018-03-23 17:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, Dave Hansen, aarcange, luto, torvalds, keescook, hughd,
	jgross, x86, namit


From: Dave Hansen <dave.hansen@linux.intel.com>

The __PAGE_KERNEL_* page permissions are "raw".  They contain bits
that may or may not be supported on the current processor.  They need
to be filtered by a mask (currently __supported_pte_mask) to turn them
into a value that we can actually set in a PTE.

These __PAGE_KERNEL_* values all contain _PAGE_GLOBAL.  But, with PTI,
we want to be able to support _PAGE_GLOBAL (have the bit set in
__supported_pte_mask) but not have it appear in any of these masks by
default.

This patch creates a new mask, __default_kernel_pte_mask, and applies
it when creating all of the PAGE_KERNEL_* masks.  This makes
PAGE_KERNEL_* safe to use anywhere (they only contain supported bits).
It also ensures that PAGE_KERNEL_* contains _PAGE_GLOBAL on PTI=n
kernels but clears _PAGE_GLOBAL when PTI=y.

We also make __default_kernel_pte_mask a non-GPL exported symbol
because there are plenty of driver-available interfaces that take
PAGE_KERNEL_* permissions.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Kees Cook <keescook@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: x86@kernel.org
Cc: Nadav Amit <namit@vmware.com>
---

 b/arch/x86/include/asm/pgtable_types.h |   27 +++++++++++++++------------
 b/arch/x86/mm/init.c                   |    6 ++++++
 b/arch/x86/mm/init_32.c                |    8 +++++++-
 b/arch/x86/mm/init_64.c                |    5 +++++
 4 files changed, 33 insertions(+), 13 deletions(-)

diff -puN arch/x86/include/asm/pgtable_types.h~KERN-pgprot-default arch/x86/include/asm/pgtable_types.h
--- a/arch/x86/include/asm/pgtable_types.h~KERN-pgprot-default	2018-03-21 16:31:57.339192320 -0700
+++ b/arch/x86/include/asm/pgtable_types.h	2018-03-21 16:31:57.348192320 -0700
@@ -196,19 +196,21 @@ enum page_cache_mode {
 #define __PAGE_KERNEL_NOENC	(__PAGE_KERNEL)
 #define __PAGE_KERNEL_NOENC_WP	(__PAGE_KERNEL_WP)
 
-#define PAGE_KERNEL		__pgprot(__PAGE_KERNEL | _PAGE_ENC)
-#define PAGE_KERNEL_NOENC	__pgprot(__PAGE_KERNEL)
-#define PAGE_KERNEL_RO		__pgprot(__PAGE_KERNEL_RO | _PAGE_ENC)
-#define PAGE_KERNEL_EXEC	__pgprot(__PAGE_KERNEL_EXEC | _PAGE_ENC)
-#define PAGE_KERNEL_EXEC_NOENC	__pgprot(__PAGE_KERNEL_EXEC)
-#define PAGE_KERNEL_RX		__pgprot(__PAGE_KERNEL_RX | _PAGE_ENC)
-#define PAGE_KERNEL_NOCACHE	__pgprot(__PAGE_KERNEL_NOCACHE | _PAGE_ENC)
-#define PAGE_KERNEL_LARGE	__pgprot(__PAGE_KERNEL_LARGE | _PAGE_ENC)
-#define PAGE_KERNEL_LARGE_EXEC	__pgprot(__PAGE_KERNEL_LARGE_EXEC | _PAGE_ENC)
-#define PAGE_KERNEL_VVAR	__pgprot(__PAGE_KERNEL_VVAR | _PAGE_ENC)
+#define default_pgprot(x)	__pgprot((x) & __default_kernel_pte_mask)
 
-#define PAGE_KERNEL_IO		__pgprot(__PAGE_KERNEL_IO)
-#define PAGE_KERNEL_IO_NOCACHE	__pgprot(__PAGE_KERNEL_IO_NOCACHE)
+#define PAGE_KERNEL		default_pgprot(__PAGE_KERNEL | _PAGE_ENC)
+#define PAGE_KERNEL_NOENC	default_pgprot(__PAGE_KERNEL)
+#define PAGE_KERNEL_RO		default_pgprot(__PAGE_KERNEL_RO | _PAGE_ENC)
+#define PAGE_KERNEL_EXEC	default_pgprot(__PAGE_KERNEL_EXEC | _PAGE_ENC)
+#define PAGE_KERNEL_EXEC_NOENC	default_pgprot(__PAGE_KERNEL_EXEC)
+#define PAGE_KERNEL_RX		default_pgprot(__PAGE_KERNEL_RX | _PAGE_ENC)
+#define PAGE_KERNEL_NOCACHE	default_pgprot(__PAGE_KERNEL_NOCACHE | _PAGE_ENC)
+#define PAGE_KERNEL_LARGE	default_pgprot(__PAGE_KERNEL_LARGE | _PAGE_ENC)
+#define PAGE_KERNEL_LARGE_EXEC	default_pgprot(__PAGE_KERNEL_LARGE_EXEC | _PAGE_ENC)
+#define PAGE_KERNEL_VVAR	default_pgprot(__PAGE_KERNEL_VVAR | _PAGE_ENC)
+
+#define PAGE_KERNEL_IO		default_pgprot(__PAGE_KERNEL_IO)
+#define PAGE_KERNEL_IO_NOCACHE	default_pgprot(__PAGE_KERNEL_IO_NOCACHE)
 
 #endif	/* __ASSEMBLY__ */
 
@@ -483,6 +485,7 @@ static inline pgprot_t pgprot_large_2_4k
 typedef struct page *pgtable_t;
 
 extern pteval_t __supported_pte_mask;
+extern pteval_t __default_kernel_pte_mask;
 extern void set_nx(void);
 extern int nx_enabled;
 
diff -puN arch/x86/mm/init_32.c~KERN-pgprot-default arch/x86/mm/init_32.c
--- a/arch/x86/mm/init_32.c~KERN-pgprot-default	2018-03-21 16:31:57.341192320 -0700
+++ b/arch/x86/mm/init_32.c	2018-03-21 16:31:57.348192320 -0700
@@ -558,8 +558,14 @@ static void __init pagetable_init(void)
 	permanent_kmaps_init(pgd_base);
 }
 
-pteval_t __supported_pte_mask __read_mostly = ~(_PAGE_NX | _PAGE_GLOBAL);
+#define DEFAULT_PTE_MASK ~(_PAGE_NX | _PAGE_GLOBAL)
+/* Bits supported by the hardware: */
+pteval_t __supported_pte_mask __read_mostly = DEFAULT_PTE_MASK;
+/* Bits allowed in normal kernel mappings: */
+pteval_t __default_kernel_pte_mask __read_mostly = DEFAULT_PTE_MASK;
 EXPORT_SYMBOL_GPL(__supported_pte_mask);
+/* Used in PAGE_KERNEL_* macros which are reasonably used out-of-tree: */
+EXPORT_SYMBOL(__default_kernel_pte_mask);
 
 /* user-defined highmem size */
 static unsigned int highmem_pages = -1;
diff -puN arch/x86/mm/init_64.c~KERN-pgprot-default arch/x86/mm/init_64.c
--- a/arch/x86/mm/init_64.c~KERN-pgprot-default	2018-03-21 16:31:57.343192320 -0700
+++ b/arch/x86/mm/init_64.c	2018-03-21 16:31:57.349192320 -0700
@@ -65,8 +65,13 @@
  * around without checking the pgd every time.
  */
 
+/* Bits supported by the hardware: */
 pteval_t __supported_pte_mask __read_mostly = ~0;
+/* Bits allowed in normal kernel mappings: */
+pteval_t __default_kernel_pte_mask __read_mostly = ~0;
 EXPORT_SYMBOL_GPL(__supported_pte_mask);
+/* Used in PAGE_KERNEL_* macros which are reasonably used out-of-tree: */
+EXPORT_SYMBOL(__default_kernel_pte_mask);
 
 int force_personality32;
 
diff -puN arch/x86/mm/init.c~KERN-pgprot-default arch/x86/mm/init.c
--- a/arch/x86/mm/init.c~KERN-pgprot-default	2018-03-21 16:31:57.345192320 -0700
+++ b/arch/x86/mm/init.c	2018-03-21 16:31:57.349192320 -0700
@@ -190,6 +190,12 @@ static void __init probe_page_size_mask(
 		enable_global_pages();
 	}
 
+	/* By the default is everything supported: */
+	__default_kernel_pte_mask = __supported_pte_mask;
+	/* Except when with PTI where the kernel is mostly non-Global: */
+	if (cpu_feature_enabled(X86_FEATURE_PTI))
+		__default_kernel_pte_mask &= ~_PAGE_GLOBAL;
+
 	/* Enable 1 GB linear kernel mappings if available: */
 	if (direct_gbpages && boot_cpu_has(X86_FEATURE_GBPAGES)) {
 		printk(KERN_INFO "Using GB pages for direct mapping\n");
_

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 04/11] x86/espfix: document use of _PAGE_GLOBAL
  2018-03-23 17:44 [PATCH 00/11] Use global pages with PTI Dave Hansen
                   ` (2 preceding siblings ...)
  2018-03-23 17:44 ` [PATCH 03/11] x86/mm: introduce "default" kernel PTE mask Dave Hansen
@ 2018-03-23 17:44 ` Dave Hansen
  2018-03-23 17:44 ` [PATCH 05/11] x86/mm: do not auto-massage page protections Dave Hansen
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 38+ messages in thread
From: Dave Hansen @ 2018-03-23 17:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, Dave Hansen, aarcange, luto, torvalds, keescook, hughd,
	jgross, x86, namit


From: Dave Hansen <dave.hansen@linux.intel.com>

The "normal" kernel page table creation mechanisms using
PAGE_KERNEL_* page protections will never set _PAGE_GLOBAL with PTI.
The few places in the kernel that always want _PAGE_GLOBAL must
avoid using PAGE_KERNEL_*.

Document that we want it here and its use is not accidental.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Kees Cook <keescook@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: x86@kernel.org
Cc: Nadav Amit <namit@vmware.com>
---

 b/arch/x86/kernel/espfix_64.c |    4 ++++
 1 file changed, 4 insertions(+)

diff -puN arch/x86/kernel/espfix_64.c~espfix-use-kern-defaults-not-supported arch/x86/kernel/espfix_64.c
--- a/arch/x86/kernel/espfix_64.c~espfix-use-kern-defaults-not-supported	2018-03-21 16:31:57.951192318 -0700
+++ b/arch/x86/kernel/espfix_64.c	2018-03-21 16:31:57.954192318 -0700
@@ -195,6 +195,10 @@ void init_espfix_ap(int cpu)
 
 	pte_p = pte_offset_kernel(&pmd, addr);
 	stack_page = page_address(alloc_pages_node(node, GFP_KERNEL, 0));
+	/*
+	 * __PAGE_KERNEL_* includes _PAGE_GLOBAL, which we want since
+	 * this is mapped to userspace.
+	 */
 	pte = __pte(__pa(stack_page) | ((__PAGE_KERNEL_RO | _PAGE_ENC) & ptemask));
 	for (n = 0; n < ESPFIX_PTE_CLONES; n++)
 		set_pte(&pte_p[n*PTE_STRIDE], pte);
_

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 05/11] x86/mm: do not auto-massage page protections
  2018-03-23 17:44 [PATCH 00/11] Use global pages with PTI Dave Hansen
                   ` (3 preceding siblings ...)
  2018-03-23 17:44 ` [PATCH 04/11] x86/espfix: document use of _PAGE_GLOBAL Dave Hansen
@ 2018-03-23 17:44 ` Dave Hansen
  2018-03-23 19:15   ` Nadav Amit
                     ` (2 more replies)
  2018-03-23 17:44 ` [PATCH 06/11] x86/mm: remove extra filtering in pageattr code Dave Hansen
                   ` (6 subsequent siblings)
  11 siblings, 3 replies; 38+ messages in thread
From: Dave Hansen @ 2018-03-23 17:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, Dave Hansen, aarcange, luto, torvalds, keescook, hughd,
	jgross, x86, namit


From: Dave Hansen <dave.hansen@linux.intel.com>

A PTE is constructed from a physical address and a pgprotval_t.
__PAGE_KERNEL, for instance, is a pgprot_t and must be converted
into a pgprotval_t before it can be used to create a PTE.  This is
done implicitly within functions like set_pte() by massage_pgprot().

However, this makes it very challenging to set bits (and keep them
set) if your bit is being filtered out by massage_pgprot().

This moves the bit filtering out of set_pte() and friends.  For
users of PAGE_KERNEL*, filtering will be done automatically inside
those macros but for users of __PAGE_KERNEL*, they need to do their
own filtering now.

Note that we also just move pfn_pte/pmd/pud() over to check_pgprot()
instead of massage_pgprot().  This way, we still *look* for
unsupported bits and properly warn about them if we find them.  This
might happen if an unfiltered __PAGE_KERNEL* value was passed in,
for instance.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Kees Cook <keescook@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: x86@kernel.org
Cc: Nadav Amit <namit@vmware.com>
---

 b/arch/x86/include/asm/pgtable.h |   27 ++++++++++++++++++++++-----
 b/arch/x86/kernel/head64.c       |    2 ++
 b/arch/x86/kernel/ldt.c          |    6 +++++-
 b/arch/x86/mm/ident_map.c        |    3 +++
 b/arch/x86/mm/iomap_32.c         |    6 ++++++
 b/arch/x86/mm/kasan_init_64.c    |   14 +++++++++++++-
 b/arch/x86/power/hibernate_64.c  |   20 +++++++++++++++-----
 b/mm/early_ioremap.c             |    3 +++
 8 files changed, 69 insertions(+), 12 deletions(-)

diff -puN arch/x86/include/asm/pgtable.h~x86-no-auto-massage arch/x86/include/asm/pgtable.h
--- a/arch/x86/include/asm/pgtable.h~x86-no-auto-massage	2018-03-23 09:57:34.798820374 -0700
+++ b/arch/x86/include/asm/pgtable.h	2018-03-23 09:57:34.846820374 -0700
@@ -526,22 +526,39 @@ static inline pgprotval_t massage_pgprot
 	return protval;
 }
 
+static inline pgprotval_t check_pgprot(pgprot_t pgprot)
+{
+	pgprotval_t massaged_val = massage_pgprot(pgprot);
+
+	/* mmdebug.h can not be included here because of dependencies */
+#ifdef CONFIG_DEBUG_VM
+	WARN_ONCE(pgprot_val(pgprot) != massaged_val,
+		  "attempted to set unsupported pgprot: %016lx "
+		  "bits: %016lx supported: %016lx\n",
+		  pgprot_val(pgprot),
+		  pgprot_val(pgprot) ^ massaged_val,
+		  __supported_pte_mask);
+#endif
+
+	return massaged_val;
+}
+
 static inline pte_t pfn_pte(unsigned long page_nr, pgprot_t pgprot)
 {
 	return __pte(((phys_addr_t)page_nr << PAGE_SHIFT) |
-		     massage_pgprot(pgprot));
+		     check_pgprot(pgprot));
 }
 
 static inline pmd_t pfn_pmd(unsigned long page_nr, pgprot_t pgprot)
 {
 	return __pmd(((phys_addr_t)page_nr << PAGE_SHIFT) |
-		     massage_pgprot(pgprot));
+		     check_pgprot(pgprot));
 }
 
 static inline pud_t pfn_pud(unsigned long page_nr, pgprot_t pgprot)
 {
 	return __pud(((phys_addr_t)page_nr << PAGE_SHIFT) |
-		     massage_pgprot(pgprot));
+		     check_pgprot(pgprot));
 }
 
 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
@@ -553,7 +570,7 @@ static inline pte_t pte_modify(pte_t pte
 	 * the newprot (if present):
 	 */
 	val &= _PAGE_CHG_MASK;
-	val |= massage_pgprot(newprot) & ~_PAGE_CHG_MASK;
+	val |= check_pgprot(newprot) & ~_PAGE_CHG_MASK;
 
 	return __pte(val);
 }
@@ -563,7 +580,7 @@ static inline pmd_t pmd_modify(pmd_t pmd
 	pmdval_t val = pmd_val(pmd);
 
 	val &= _HPAGE_CHG_MASK;
-	val |= massage_pgprot(newprot) & ~_HPAGE_CHG_MASK;
+	val |= check_pgprot(newprot) & ~_HPAGE_CHG_MASK;
 
 	return __pmd(val);
 }
diff -puN arch/x86/kernel/head64.c~x86-no-auto-massage arch/x86/kernel/head64.c
--- a/arch/x86/kernel/head64.c~x86-no-auto-massage	2018-03-23 09:57:34.800820374 -0700
+++ b/arch/x86/kernel/head64.c	2018-03-23 09:57:34.846820374 -0700
@@ -195,6 +195,8 @@ unsigned long __head __startup_64(unsign
 	pud[i + 1] = (pudval_t)pmd + pgtable_flags;
 
 	pmd_entry = __PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL;
+	/* Filter out unsupported __PAGE_KERNEL_* bits: */
+	pmd_entry &= __supported_pte_mask;
 	pmd_entry += sme_get_me_mask();
 	pmd_entry +=  physaddr;
 
diff -puN arch/x86/kernel/ldt.c~x86-no-auto-massage arch/x86/kernel/ldt.c
--- a/arch/x86/kernel/ldt.c~x86-no-auto-massage	2018-03-23 09:57:34.801820374 -0700
+++ b/arch/x86/kernel/ldt.c	2018-03-23 09:57:34.847820374 -0700
@@ -145,6 +145,7 @@ map_ldt_struct(struct mm_struct *mm, str
 		unsigned long offset = i << PAGE_SHIFT;
 		const void *src = (char *)ldt->entries + offset;
 		unsigned long pfn;
+		pgprot_t pte_prot;
 		pte_t pte, *ptep;
 
 		va = (unsigned long)ldt_slot_va(slot) + offset;
@@ -163,7 +164,10 @@ map_ldt_struct(struct mm_struct *mm, str
 		 * target via some kernel interface which misses a
 		 * permission check.
 		 */
-		pte = pfn_pte(pfn, __pgprot(__PAGE_KERNEL_RO & ~_PAGE_GLOBAL));
+	       	pte_prot = __pgprot(__PAGE_KERNEL_RO & ~_PAGE_GLOBAL);
+		/* Filter out unsuppored __PAGE_KERNEL* bits: */
+		pgprot_val(pte_prot) |= __supported_pte_mask;
+		pte = pfn_pte(pfn, pte_prot);
 		set_pte_at(mm, va, ptep, pte);
 		pte_unmap_unlock(ptep, ptl);
 	}
diff -puN arch/x86/mm/ident_map.c~x86-no-auto-massage arch/x86/mm/ident_map.c
--- a/arch/x86/mm/ident_map.c~x86-no-auto-massage	2018-03-23 09:57:34.803820374 -0700
+++ b/arch/x86/mm/ident_map.c	2018-03-23 09:57:34.847820374 -0700
@@ -98,6 +98,9 @@ int kernel_ident_mapping_init(struct x86
 	if (!info->kernpg_flag)
 		info->kernpg_flag = _KERNPG_TABLE;
 
+	/* Filter out unsupported __PAGE_KERNEL_* bits: */
+	info->kernpg_flag &= __default_kernel_pte_mask;
+
 	for (; addr < end; addr = next) {
 		pgd_t *pgd = pgd_page + pgd_index(addr);
 		p4d_t *p4d;
diff -puN arch/x86/mm/iomap_32.c~x86-no-auto-massage arch/x86/mm/iomap_32.c
--- a/arch/x86/mm/iomap_32.c~x86-no-auto-massage	2018-03-23 09:57:34.805820374 -0700
+++ b/arch/x86/mm/iomap_32.c	2018-03-23 09:57:34.847820374 -0700
@@ -44,6 +44,9 @@ int iomap_create_wc(resource_size_t base
 		return ret;
 
 	*prot = __pgprot(__PAGE_KERNEL | cachemode2protval(pcm));
+	/* Filter out unsupported __PAGE_KERNEL* bits: */
+	pgprot_val(*prot) &= __default_kernel_pte_mask;
+
 	return 0;
 }
 EXPORT_SYMBOL_GPL(iomap_create_wc);
@@ -88,6 +91,9 @@ iomap_atomic_prot_pfn(unsigned long pfn,
 		prot = __pgprot(__PAGE_KERNEL |
 				cachemode2protval(_PAGE_CACHE_MODE_UC_MINUS));
 
+	/* Filter out unsupported __PAGE_KERNEL* bits: */
+	pgprot_val(prot) &= __default_kernel_pte_mask;
+
 	return (void __force __iomem *) kmap_atomic_prot_pfn(pfn, prot);
 }
 EXPORT_SYMBOL_GPL(iomap_atomic_prot_pfn);
diff -puN arch/x86/mm/kasan_init_64.c~x86-no-auto-massage arch/x86/mm/kasan_init_64.c
--- a/arch/x86/mm/kasan_init_64.c~x86-no-auto-massage	2018-03-23 09:57:34.807820374 -0700
+++ b/arch/x86/mm/kasan_init_64.c	2018-03-23 09:57:34.848820374 -0700
@@ -269,6 +269,12 @@ void __init kasan_early_init(void)
 	pudval_t pud_val = __pa_nodebug(kasan_zero_pmd) | _KERNPG_TABLE;
 	p4dval_t p4d_val = __pa_nodebug(kasan_zero_pud) | _KERNPG_TABLE;
 
+	/* Mask out unsupported __PAGE_KERNEL bits: */
+	pte_val &= __default_kernel_pte_mask;
+	pmd_val &= __default_kernel_pte_mask;
+	pud_val &= __default_kernel_pte_mask;
+	p4d_val &= __default_kernel_pte_mask;
+
 	for (i = 0; i < PTRS_PER_PTE; i++)
 		kasan_zero_pte[i] = __pte(pte_val);
 
@@ -371,7 +377,13 @@ void __init kasan_init(void)
 	 */
 	memset(kasan_zero_page, 0, PAGE_SIZE);
 	for (i = 0; i < PTRS_PER_PTE; i++) {
-		pte_t pte = __pte(__pa(kasan_zero_page) | __PAGE_KERNEL_RO | _PAGE_ENC);
+		pte_t pte;
+		pgprot_t prot;
+
+		prot = __pgprot(__PAGE_KERNEL_RO | _PAGE_ENC);
+		pgprot_val(prot) &= __default_kernel_pte_mask;
+
+		pte = __pte(__pa(kasan_zero_page) | pgprot_val(prot));
 		set_pte(&kasan_zero_pte[i], pte);
 	}
 	/* Flush TLBs again to be sure that write protection applied. */
diff -puN arch/x86/power/hibernate_64.c~x86-no-auto-massage arch/x86/power/hibernate_64.c
--- a/arch/x86/power/hibernate_64.c~x86-no-auto-massage	2018-03-23 09:57:34.809820374 -0700
+++ b/arch/x86/power/hibernate_64.c	2018-03-23 09:58:20.228820261 -0700
@@ -51,6 +51,12 @@ static int set_up_temporary_text_mapping
 	pmd_t *pmd;
 	pud_t *pud;
 	p4d_t *p4d = NULL;
+	pgprot_t pgtable_prot = __pgprot(_KERNPG_TABLE);
+	pgprot_t pmd_text_prot = __pgprot(__PAGE_KERNEL_LARGE_EXEC);
+
+	/* Filter out unsupported __PAGE_KERNEL* bits: */
+	pgprot_val(pmd_text_prot) &= __default_kernel_pte_mask;
+	pgprot_val(pgtable_prot)  &= __default_kernel_pte_mask;
 
 	/*
 	 * The new mapping only has to cover the page containing the image
@@ -81,15 +87,19 @@ static int set_up_temporary_text_mapping
 		return -ENOMEM;
 
 	set_pmd(pmd + pmd_index(restore_jump_address),
-		__pmd((jump_address_phys & PMD_MASK) | __PAGE_KERNEL_LARGE_EXEC));
+		__pmd((jump_address_phys & PMD_MASK) | pgprot_val(pmd_text_prot)));
 	set_pud(pud + pud_index(restore_jump_address),
-		__pud(__pa(pmd) | _KERNPG_TABLE));
+		__pud(__pa(pmd) | pgprot_val(pgtable_prot)));
 	if (p4d) {
-		set_p4d(p4d + p4d_index(restore_jump_address), __p4d(__pa(pud) | _KERNPG_TABLE));
-		set_pgd(pgd + pgd_index(restore_jump_address), __pgd(__pa(p4d) | _KERNPG_TABLE));
+		p4d_t new_p4d = __p4d(__pa(pud) | pgprot_val(pgtable_prot));
+		pgd_t new_pgd = __pgd(__pa(p4d) | pgprot_val(pgtable_prot));
+
+		set_p4d(p4d + p4d_index(restore_jump_address), new_p4d);
+		set_pgd(pgd + pgd_index(restore_jump_address), new_pgd);
 	} else {
 		/* No p4d for 4-level paging: point the pgd to the pud page table */
-		set_pgd(pgd + pgd_index(restore_jump_address), __pgd(__pa(pud) | _KERNPG_TABLE));
+		pgd_t new_pgd = __pgd(__pa(p4d) | pgprot_val(pgtable_prot));
+		set_pgd(pgd + pgd_index(restore_jump_address), new_pgd);
 	}
 
 	return 0;
diff -puN mm/early_ioremap.c~x86-no-auto-massage mm/early_ioremap.c
--- a/mm/early_ioremap.c~x86-no-auto-massage	2018-03-23 09:57:34.811820374 -0700
+++ b/mm/early_ioremap.c	2018-03-23 09:57:34.848820374 -0700
@@ -113,6 +113,9 @@ __early_ioremap(resource_size_t phys_add
 
 	WARN_ON(system_state >= SYSTEM_RUNNING);
 
+	/* Sanitize 'prot' against any unsupported bits: */
+	pgprot_val(prot) &= __default_kernel_pte_mask;
+
 	slot = -1;
 	for (i = 0; i < FIX_BTMAPS_SLOTS; i++) {
 		if (!prev_map[i]) {
_

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 06/11] x86/mm: remove extra filtering in pageattr code
  2018-03-23 17:44 [PATCH 00/11] Use global pages with PTI Dave Hansen
                   ` (4 preceding siblings ...)
  2018-03-23 17:44 ` [PATCH 05/11] x86/mm: do not auto-massage page protections Dave Hansen
@ 2018-03-23 17:44 ` Dave Hansen
  2018-03-23 17:44 ` [PATCH 07/11] x86/mm: comment _PAGE_GLOBAL mystery Dave Hansen
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 38+ messages in thread
From: Dave Hansen @ 2018-03-23 17:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, Dave Hansen, aarcange, luto, torvalds, keescook, hughd,
	jgross, x86, namit


From: Dave Hansen <dave.hansen@linux.intel.com>

The pageattr code has a mode where it can set or clear PTE bits in
existing PTEs, so the page protections of the *new* PTEs come from
one of two places:
1. The set/clear masks: cpa->mask_clr / cpa->mask_set
2. The existing PTE

We filter ->mask_set/clr for supported PTE bits at entry to
__change_page_attr() so we never need to filter them again.

The only other place permissions can come from is an existing PTE
and those already presumably have good bits.  We do not need to filter
them again.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Kees Cook <keescook@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: x86@kernel.org
Cc: Nadav Amit <namit@vmware.com>

---

 b/arch/x86/mm/pageattr.c |    6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff -puN arch/x86/mm/pageattr.c~x86-pageattr-dont-filter-global arch/x86/mm/pageattr.c
--- a/arch/x86/mm/pageattr.c~x86-pageattr-dont-filter-global	2018-03-21 16:31:59.191192315 -0700
+++ b/arch/x86/mm/pageattr.c	2018-03-21 16:31:59.195192315 -0700
@@ -598,7 +598,6 @@ try_preserve_large_page(pte_t *kpte, uns
 	req_prot = pgprot_clear_protnone_bits(req_prot);
         if (pgprot_val(req_prot) & _PAGE_PRESENT)
 		pgprot_val(req_prot) |= _PAGE_PSE;
-	req_prot = canon_pgprot(req_prot);
 
 	/*
 	 * old_pfn points to the large page base pfn. So we need
@@ -718,7 +717,7 @@ __split_large_page(struct cpa_data *cpa,
 	 */
 	pfn = ref_pfn;
 	for (i = 0; i < PTRS_PER_PTE; i++, pfn += pfninc)
-		set_pte(&pbase[i], pfn_pte(pfn, canon_pgprot(ref_prot)));
+		set_pte(&pbase[i], pfn_pte(pfn, ref_prot));
 
 	if (virt_addr_valid(address)) {
 		unsigned long pfn = PFN_DOWN(__pa(address));
@@ -935,7 +934,6 @@ static void populate_pte(struct cpa_data
 	pte = pte_offset_kernel(pmd, start);
 
 	pgprot = pgprot_clear_protnone_bits(pgprot);
-	pgprot = canon_pgprot(pgprot);
 
 	while (num_pages-- && start < end) {
 		set_pte(pte, pfn_pte(cpa->pfn, pgprot));
@@ -1234,7 +1232,7 @@ repeat:
 		 * after all we're only going to change it's attributes
 		 * not the memory it points to
 		 */
-		new_pte = pfn_pte(pfn, canon_pgprot(new_prot));
+		new_pte = pfn_pte(pfn, new_prot);
 		cpa->pfn = pfn;
 		/*
 		 * Do we really change anything ?
_

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 07/11] x86/mm: comment _PAGE_GLOBAL mystery
  2018-03-23 17:44 [PATCH 00/11] Use global pages with PTI Dave Hansen
                   ` (5 preceding siblings ...)
  2018-03-23 17:44 ` [PATCH 06/11] x86/mm: remove extra filtering in pageattr code Dave Hansen
@ 2018-03-23 17:44 ` Dave Hansen
  2018-03-23 17:44 ` [PATCH 08/11] x86/mm: do not forbid _PAGE_RW before init for __ro_after_init Dave Hansen
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 38+ messages in thread
From: Dave Hansen @ 2018-03-23 17:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, Dave Hansen, aarcange, luto, torvalds, keescook, hughd,
	jgross, x86, namit


From: Dave Hansen <dave.hansen@linux.intel.com>

I was mystified as to where the _PAGE_GLOBAL in the kernel page tables
for kernel text came from.  I audited all the places I could find, but
I missed one: head_64.S.

The page tables that we create in here live for a long time, and they
also have _PAGE_GLOBAL set, despite whether the processor supports it
or not.  It's harmless, and we got *lucky* that the pageattr code
accidentally clears it when we wipe it out of __supported_pte_mask and
then later try to mark kernel text read-only.

Comment some of these properties to make it easier to find and
understand in the future.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Kees Cook <keescook@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: x86@kernel.org
Cc: Nadav Amit <namit@vmware.com>
---

 b/arch/x86/kernel/head_64.S |   11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff -puN arch/x86/kernel/head_64.S~comment-global-page arch/x86/kernel/head_64.S
--- a/arch/x86/kernel/head_64.S~comment-global-page	2018-03-21 16:31:59.729192314 -0700
+++ b/arch/x86/kernel/head_64.S	2018-03-21 16:31:59.733192314 -0700
@@ -401,8 +401,13 @@ NEXT_PAGE(level3_ident_pgt)
 	.quad	level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
 	.fill	511, 8, 0
 NEXT_PAGE(level2_ident_pgt)
-	/* Since I easily can, map the first 1G.
+	/*
+	 * Since I easily can, map the first 1G.
 	 * Don't set NX because code runs from these pages.
+	 *
+	 * Note: This sets _PAGE_GLOBAL despite whether
+	 * the CPU supports it or it is enabled.  But,
+	 * the CPU should ignore the bit.
 	 */
 	PMDS(0, __PAGE_KERNEL_IDENT_LARGE_EXEC, PTRS_PER_PMD)
 #else
@@ -433,6 +438,10 @@ NEXT_PAGE(level2_kernel_pgt)
 	 * (NOTE: at +512MB starts the module area, see MODULES_VADDR.
 	 *  If you want to increase this then increase MODULES_VADDR
 	 *  too.)
+	 *
+	 *  This table is eventually used by the kernel during normal
+	 *  runtime.  Care must be taken to clear out undesired bits
+	 *  later, like _PAGE_RW or _PAGE_GLOBAL in some cases.
 	 */
 	PMDS(0, __PAGE_KERNEL_LARGE_EXEC,
 		KERNEL_IMAGE_SIZE/PMD_SIZE)
_

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 08/11] x86/mm: do not forbid _PAGE_RW before init for __ro_after_init
  2018-03-23 17:44 [PATCH 00/11] Use global pages with PTI Dave Hansen
                   ` (6 preceding siblings ...)
  2018-03-23 17:44 ` [PATCH 07/11] x86/mm: comment _PAGE_GLOBAL mystery Dave Hansen
@ 2018-03-23 17:44 ` Dave Hansen
  2018-03-23 17:45 ` [PATCH 09/11] x86/pti: enable global pages for shared areas Dave Hansen
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 38+ messages in thread
From: Dave Hansen @ 2018-03-23 17:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, Dave Hansen, keescook, aarcange, luto, torvalds, hughd,
	jgross, x86, namit


From: Dave Hansen <dave.hansen@linux.intel.com>

__ro_after_init data gets stuck in the .rodata section.  That's normally
fine because the kernel itself manages the R/W properties.

But, if we run __change_page_attr() on an area which is __ro_after_init,
the .rodata checks will trigger and force the area to be immediately
read-only, even if it is early-ish in boot.  This caused problems when
trying to clear the _PAGE_GLOBAL bit for these area in the PTI code:
it cleared _PAGE_GLOBAL like I asked, but also took it up on itself
to clear _PAGE_RW.  The kernel then oopses the next time it wrote to
a __ro_after_init data structure.

To fix this, add the kernel_set_to_readonly check, just like we have
for kernel text, just a few lines below in this function.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: x86@kernel.org
Cc: Nadav Amit <namit@vmware.com>
---

 b/arch/x86/mm/pageattr.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff -puN arch/x86/mm/pageattr.c~check-kernel_set_to_readonly arch/x86/mm/pageattr.c
--- a/arch/x86/mm/pageattr.c~check-kernel_set_to_readonly	2018-03-21 16:32:00.263192312 -0700
+++ b/arch/x86/mm/pageattr.c	2018-03-21 16:32:00.267192312 -0700
@@ -298,9 +298,11 @@ static inline pgprot_t static_protection
 
 	/*
 	 * The .rodata section needs to be read-only. Using the pfn
-	 * catches all aliases.
+	 * catches all aliases.  This also includes __ro_after_init,
+	 * so do not enforce until kernel_set_to_readonly is true.
 	 */
-	if (within(pfn, __pa_symbol(__start_rodata) >> PAGE_SHIFT,
+	if (kernel_set_to_readonly &&
+	    within(pfn, __pa_symbol(__start_rodata) >> PAGE_SHIFT,
 		   __pa_symbol(__end_rodata) >> PAGE_SHIFT))
 		pgprot_val(forbidden) |= _PAGE_RW;
 
_

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 09/11] x86/pti: enable global pages for shared areas
  2018-03-23 17:44 [PATCH 00/11] Use global pages with PTI Dave Hansen
                   ` (7 preceding siblings ...)
  2018-03-23 17:44 ` [PATCH 08/11] x86/mm: do not forbid _PAGE_RW before init for __ro_after_init Dave Hansen
@ 2018-03-23 17:45 ` Dave Hansen
  2018-03-23 19:12   ` Nadav Amit
  2018-03-23 17:45 ` [PATCH 10/11] x86/pti: clear _PAGE_GLOBAL for kernel image Dave Hansen
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 38+ messages in thread
From: Dave Hansen @ 2018-03-23 17:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, Dave Hansen, aarcange, luto, torvalds, keescook, hughd,
	jgross, x86, namit


From: Dave Hansen <dave.hansen@linux.intel.com>

The entry/exit text and cpu_entry_area are mapped into userspace and
the kernel.  But, they are not _PAGE_GLOBAL.  This creates unnecessary
TLB misses.

Add the _PAGE_GLOBAL flag for these areas.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Kees Cook <keescook@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: x86@kernel.org
Cc: Nadav Amit <namit@vmware.com>
---

 b/arch/x86/mm/cpu_entry_area.c |   10 +++++++++-
 b/arch/x86/mm/pti.c            |    9 ++++++++-
 2 files changed, 17 insertions(+), 2 deletions(-)

diff -puN arch/x86/mm/cpu_entry_area.c~kpti-why-no-global arch/x86/mm/cpu_entry_area.c
--- a/arch/x86/mm/cpu_entry_area.c~kpti-why-no-global	2018-03-21 16:32:00.797192311 -0700
+++ b/arch/x86/mm/cpu_entry_area.c	2018-03-21 16:32:00.802192311 -0700
@@ -27,8 +27,16 @@ EXPORT_SYMBOL(get_cpu_entry_area);
 void cea_set_pte(void *cea_vaddr, phys_addr_t pa, pgprot_t flags)
 {
 	unsigned long va = (unsigned long) cea_vaddr;
+	pte_t pte = pfn_pte(pa >> PAGE_SHIFT, flags);
 
-	set_pte_vaddr(va, pfn_pte(pa >> PAGE_SHIFT, flags));
+	/*
+	 * The cpu_entry_area is shared between the user and kernel
+	 * page tables.  All of its ptes can safely be global.
+	 */
+	if (boot_cpu_has(X86_FEATURE_PGE))
+		pte = pte_set_flags(pte, _PAGE_GLOBAL);
+
+	set_pte_vaddr(va, pte);
 }
 
 static void __init
diff -puN arch/x86/mm/pti.c~kpti-why-no-global arch/x86/mm/pti.c
--- a/arch/x86/mm/pti.c~kpti-why-no-global	2018-03-21 16:32:00.799192311 -0700
+++ b/arch/x86/mm/pti.c	2018-03-21 16:32:00.803192311 -0700
@@ -300,6 +300,13 @@ pti_clone_pmds(unsigned long start, unsi
 			return;
 
 		/*
+		 * Setting 'target_pmd' below creates a mapping in both
+		 * the user and kernel page tables.  It is effectively
+		 * global, so set it as global in both copies.
+		 */
+		*pmd = pmd_set_flags(*pmd, _PAGE_GLOBAL);
+
+		/*
 		 * Copy the PMD.  That is, the kernelmode and usermode
 		 * tables will share the last-level page tables of this
 		 * address range
@@ -348,7 +355,7 @@ static void __init pti_clone_entry_text(
 {
 	pti_clone_pmds((unsigned long) __entry_text_start,
 			(unsigned long) __irqentry_text_end,
-		       _PAGE_RW | _PAGE_GLOBAL);
+		       _PAGE_RW);
 }
 
 /*
_

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 10/11] x86/pti: clear _PAGE_GLOBAL for kernel image
  2018-03-23 17:44 [PATCH 00/11] Use global pages with PTI Dave Hansen
                   ` (8 preceding siblings ...)
  2018-03-23 17:45 ` [PATCH 09/11] x86/pti: enable global pages for shared areas Dave Hansen
@ 2018-03-23 17:45 ` Dave Hansen
  2018-03-23 17:45 ` [PATCH 11/11] x86/pti: leave kernel text global for !PCID Dave Hansen
  2018-03-23 18:26 ` [PATCH 00/11] Use global pages with PTI Linus Torvalds
  11 siblings, 0 replies; 38+ messages in thread
From: Dave Hansen @ 2018-03-23 17:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, Dave Hansen, aarcange, luto, torvalds, keescook, hughd,
	jgross, x86, namit


From: Dave Hansen <dave.hansen@linux.intel.com>

The kernel page tables are inherited from head_64.S which rudely marks
them as _PAGE_GLOBAL.  For PTI, we have been relying on the grace of
$DEITY and some insane behavior in pageattr.c.  Now we do it properly.

First, stop filtering out "unsupported" bits from being cleared in the
pageattr code.  It's fine to filter out *setting* these bits but it
is insane to keep us from clearing them.

Then, *explicitly* go clear _PAGE_GLOBAL from the kernel identity map.
Do not rely on pageattr to do it magically.

After this patch, we can see that "GLB" shows up in each copy of the
page tables, that we have the same number of global entries in each
and that they are the *same* entries.

# grep -c GLB /sys/kernel/debug/page_tables/*
/sys/kernel/debug/page_tables/current_kernel:11
/sys/kernel/debug/page_tables/current_user:11
/sys/kernel/debug/page_tables/kernel:11

# for f in `ls /sys/kernel/debug/page_tables/`; do grep GLB /sys/kernel/debug/page_tables/$f > $f.GLB; done
# md5sum *.GLB
9caae8ad6a1fb53aca2407ec037f612d  current_kernel.GLB
9caae8ad6a1fb53aca2407ec037f612d  current_user.GLB
9caae8ad6a1fb53aca2407ec037f612d  kernel.GLB

A quick visual audit also shows that all the entries make sense.
0xfffffe0000000000 is the cpu_entry_area and 0xffffffff81c00000
is the entry/exit text:

# grep -c GLB /sys/kernel/debug/page_tables/current_user
0xfffffe0000000000-0xfffffe0000002000           8K     ro                 GLB NX pte
0xfffffe0000002000-0xfffffe0000003000           4K     RW                 GLB NX pte
0xfffffe0000003000-0xfffffe0000006000          12K     ro                 GLB NX pte
0xfffffe0000006000-0xfffffe0000007000           4K     ro                 GLB x  pte
0xfffffe0000007000-0xfffffe000000d000          24K     RW                 GLB NX pte
0xfffffe000002d000-0xfffffe000002e000           4K     ro                 GLB NX pte
0xfffffe000002e000-0xfffffe000002f000           4K     RW                 GLB NX pte
0xfffffe000002f000-0xfffffe0000032000          12K     ro                 GLB NX pte
0xfffffe0000032000-0xfffffe0000033000           4K     ro                 GLB x  pte
0xfffffe0000033000-0xfffffe0000039000          24K     RW                 GLB NX pte
0xffffffff81c00000-0xffffffff81e00000           2M     ro         PSE     GLB x  pmd

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Kees Cook <keescook@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: x86@kernel.org
Cc: Nadav Amit <namit@vmware.com>
---

 b/arch/x86/mm/init.c     |    8 +-------
 b/arch/x86/mm/pageattr.c |   12 +++++++++---
 b/arch/x86/mm/pti.c      |   25 +++++++++++++++++++++++++
 3 files changed, 35 insertions(+), 10 deletions(-)

diff -puN arch/x86/mm/init.c~clear-global-for-pti arch/x86/mm/init.c
--- a/arch/x86/mm/init.c~clear-global-for-pti	2018-03-21 16:32:01.354192310 -0700
+++ b/arch/x86/mm/init.c	2018-03-21 16:32:01.361192310 -0700
@@ -161,12 +161,6 @@ struct map_range {
 
 static int page_size_mask;
 
-static void enable_global_pages(void)
-{
-	if (!static_cpu_has(X86_FEATURE_PTI))
-		__supported_pte_mask |= _PAGE_GLOBAL;
-}
-
 static void __init probe_page_size_mask(void)
 {
 	/*
@@ -187,7 +181,7 @@ static void __init probe_page_size_mask(
 	__supported_pte_mask &= ~_PAGE_GLOBAL;
 	if (boot_cpu_has(X86_FEATURE_PGE)) {
 		cr4_set_bits_and_update_boot(X86_CR4_PGE);
-		enable_global_pages();
+		__supported_pte_mask |= _PAGE_GLOBAL;
 	}
 
 	/* By the default is everything supported: */
diff -puN arch/x86/mm/pageattr.c~clear-global-for-pti arch/x86/mm/pageattr.c
--- a/arch/x86/mm/pageattr.c~clear-global-for-pti	2018-03-21 16:32:01.356192310 -0700
+++ b/arch/x86/mm/pageattr.c	2018-03-21 16:32:01.362192310 -0700
@@ -1411,11 +1411,11 @@ static int change_page_attr_set_clr(unsi
 	memset(&cpa, 0, sizeof(cpa));
 
 	/*
-	 * Check, if we are requested to change a not supported
-	 * feature:
+	 * Check, if we are requested to set a not supported
+	 * feature.  Clearing non-supported features is OK.
 	 */
 	mask_set = canon_pgprot(mask_set);
-	mask_clr = canon_pgprot(mask_clr);
+
 	if (!pgprot_val(mask_set) && !pgprot_val(mask_clr) && !force_split)
 		return 0;
 
@@ -1758,6 +1758,12 @@ int set_memory_4k(unsigned long addr, in
 					__pgprot(0), 1, 0, NULL);
 }
 
+int set_memory_nonglobal(unsigned long addr, int numpages)
+{
+	return change_page_attr_clear(&addr, numpages,
+				      __pgprot(_PAGE_GLOBAL), 0);
+}
+
 static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
 {
 	struct cpa_data cpa;
diff -puN arch/x86/mm/pti.c~clear-global-for-pti arch/x86/mm/pti.c
--- a/arch/x86/mm/pti.c~clear-global-for-pti	2018-03-21 16:32:01.358192310 -0700
+++ b/arch/x86/mm/pti.c	2018-03-21 16:32:01.362192310 -0700
@@ -359,6 +359,27 @@ static void __init pti_clone_entry_text(
 }
 
 /*
+ * This is the only user for it and it is not arch-generic like
+ * the other set_memory.h functions.  Just extern it.
+ */
+extern int set_memory_nonglobal(unsigned long addr, int numpages);
+void pti_set_kernel_image_nonglobal(void)
+{
+	/*
+	 * The identity map is created with PMDs, regardless of the
+	 * actual length of the kernel.  We need to clear
+	 * _PAGE_GLOBAL up to a PMD boundary, not just to the end
+	 * of the image.
+	 */
+	unsigned long start = PFN_ALIGN(_text);
+	unsigned long end = ALIGN((unsigned long)_end, PMD_PAGE_SIZE);
+
+	pr_debug("set kernel image non-global\n");
+
+	set_memory_nonglobal(start, (end - start) >> PAGE_SHIFT);
+}
+
+/*
  * Initialize kernel page table isolation
  */
 void __init pti_init(void)
@@ -369,6 +390,10 @@ void __init pti_init(void)
 	pr_info("enabled\n");
 
 	pti_clone_user_shared();
+
+	/* Undo all global bits from the init pagetables in head_64.S: */
+	pti_set_kernel_image_nonglobal();
+	/* Replace some of the global bits just for shared entry text: */
 	pti_clone_entry_text();
 	pti_setup_espfix64();
 	pti_setup_vsyscall();
_

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 11/11] x86/pti: leave kernel text global for !PCID
  2018-03-23 17:44 [PATCH 00/11] Use global pages with PTI Dave Hansen
                   ` (9 preceding siblings ...)
  2018-03-23 17:45 ` [PATCH 10/11] x86/pti: clear _PAGE_GLOBAL for kernel image Dave Hansen
@ 2018-03-23 17:45 ` Dave Hansen
  2018-03-23 18:26 ` [PATCH 00/11] Use global pages with PTI Linus Torvalds
  11 siblings, 0 replies; 38+ messages in thread
From: Dave Hansen @ 2018-03-23 17:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, Dave Hansen, aarcange, luto, torvalds, keescook, hughd,
	jgross, x86, namit


I'm sticking this at the end of the series because it's a bit weird.
It can be dropped and the rest of the series is still useful without
it.

Global pages are bad for hardening because they potentially let an
exploit read the kernel image via a Meltdown-style attack which
makes it easier to find gadgets.

But, global pages are good for performance because they reduce TLB
misses when making user/kernel transitions, especially when PCIDs
are not available, such as on older hardware, or where a hypervisor
has disabled them for some reason.

This patch implements a basic, sane policy: If you have PCIDs, you
only map a minimal amount of kernel text global.  If you do not have
PCIDs, you map all kernel text global.

This policy effectively makes PCIDs something that not only adds
performance but a little bit of hardening as well.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Kees Cook <keescook@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: x86@kernel.org
Cc: Nadav Amit <namit@vmware.com>
---

 b/arch/x86/mm/pti.c |   34 +++++++++++++++++++++++++++++++++-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff -puN arch/x86/mm/pti.c~kpti-global-text-option arch/x86/mm/pti.c
--- a/arch/x86/mm/pti.c~kpti-global-text-option	2018-03-21 16:32:14.312192277 -0700
+++ b/arch/x86/mm/pti.c	2018-03-21 16:32:14.316192277 -0700
@@ -66,12 +66,22 @@ static void __init pti_print_if_secure(c
 		pr_info("%s\n", reason);
 }
 
+enum pti_mode {
+	PTI_AUTO = 0,
+	PTI_FORCE_OFF,
+	PTI_FORCE_ON
+} pti_mode;
+
 void __init pti_check_boottime_disable(void)
 {
 	char arg[5];
 	int ret;
 
+	/* Assume mode is auto unless overridden. */
+	pti_mode = PTI_AUTO;
+
 	if (hypervisor_is_type(X86_HYPER_XEN_PV)) {
+		pti_mode = PTI_FORCE_OFF;
 		pti_print_if_insecure("disabled on XEN PV.");
 		return;
 	}
@@ -79,18 +89,23 @@ void __init pti_check_boottime_disable(v
 	ret = cmdline_find_option(boot_command_line, "pti", arg, sizeof(arg));
 	if (ret > 0)  {
 		if (ret == 3 && !strncmp(arg, "off", 3)) {
+			pti_mode = PTI_FORCE_OFF;
 			pti_print_if_insecure("disabled on command line.");
 			return;
 		}
 		if (ret == 2 && !strncmp(arg, "on", 2)) {
+			pti_mode = PTI_FORCE_ON;
 			pti_print_if_secure("force enabled on command line.");
 			goto enable;
 		}
-		if (ret == 4 && !strncmp(arg, "auto", 4))
+		if (ret == 4 && !strncmp(arg, "auto", 4)) {
+			pti_mode = PTI_AUTO;
 			goto autosel;
+		}
 	}
 
 	if (cmdline_find_option_bool(boot_command_line, "nopti")) {
+		pti_mode = PTI_FORCE_OFF;
 		pti_print_if_insecure("disabled on command line.");
 		return;
 	}
@@ -374,6 +389,23 @@ void pti_set_kernel_image_nonglobal(void
 	unsigned long start = PFN_ALIGN(_text);
 	unsigned long end = ALIGN((unsigned long)_end, PMD_PAGE_SIZE);
 
+	/*
+	 * Global pages and PCIDs are both ways to make kernel TLB
+	 * entries live longer, reduce TLB misses and improve kernel
+	 * performance.  But, leaving all kernel text Global makes
+	 * it potentially accessible to meltdown-style attacks which
+	 * make it trivial to find gadgets or defeat KASLR.
+	 *
+	 * Leave kernel text global, but only on systems that do not
+	 * have PCIDs and which have not explicitly enabled pti=on.
+	 */
+	if (!cpu_feature_enabled(X86_FEATURE_PCID) &&
+	    (pti_mode == PTI_AUTO)) {
+		pr_debug("processor does not support PCIDs, leaving "
+			 "kernel image global\n");
+		return;
+	}
+
 	pr_debug("set kernel image non-global\n");
 
 	set_memory_nonglobal(start, (end - start) >> PAGE_SHIFT);
_

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 00/11] Use global pages with PTI
  2018-03-23 17:44 [PATCH 00/11] Use global pages with PTI Dave Hansen
                   ` (10 preceding siblings ...)
  2018-03-23 17:45 ` [PATCH 11/11] x86/pti: leave kernel text global for !PCID Dave Hansen
@ 2018-03-23 18:26 ` Linus Torvalds
  2018-03-24  0:40   ` Dave Hansen
  11 siblings, 1 reply; 38+ messages in thread
From: Linus Torvalds @ 2018-03-23 18:26 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Linux Kernel Mailing List, linux-mm, Andrea Arcangeli,
	Andrew Lutomirski, Kees Cook, Hugh Dickins, Jürgen Groß,
	the arch/x86 maintainers, namit

On Fri, Mar 23, 2018 at 10:44 AM, Dave Hansen
<dave.hansen@linux.intel.com> wrote:
>
> This adds one major change from the last version of the patch set
> (present in the last patch).  It makes all kernel text global for non-
> PCID systems.  This keeps kernel data protected always, but means that
> it will be easier to find kernel gadgets via meltdown on old systems
> without PCIDs.  This heuristic is, I think, a reasonable one and it
> keeps us from having to create any new pti=foo options

Sounds sane.

The patches look reasonable, but I hate seeing a patch series like
this where the only ostensible reason is performance, and there are no
performance numbers anywhere..

             Linus

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 09/11] x86/pti: enable global pages for shared areas
  2018-03-23 17:45 ` [PATCH 09/11] x86/pti: enable global pages for shared areas Dave Hansen
@ 2018-03-23 19:12   ` Nadav Amit
  2018-03-23 19:36     ` Dave Hansen
  0 siblings, 1 reply; 38+ messages in thread
From: Nadav Amit @ 2018-03-23 19:12 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-kernel, linux-mm, aarcange, luto, torvalds, keescook,
	hughd, jgross, x86

Dave Hansen <dave.hansen@linux.intel.com> wrote:

> 
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> The entry/exit text and cpu_entry_area are mapped into userspace and
> the kernel.  But, they are not _PAGE_GLOBAL.  This creates unnecessary
> TLB misses.
> 
> Add the _PAGE_GLOBAL flag for these areas.
> 
> static void __init
> diff -puN arch/x86/mm/pti.c~kpti-why-no-global arch/x86/mm/pti.c
> --- a/arch/x86/mm/pti.c~kpti-why-no-global	2018-03-21 16:32:00.799192311 -0700
> +++ b/arch/x86/mm/pti.c	2018-03-21 16:32:00.803192311 -0700
> @@ -300,6 +300,13 @@ pti_clone_pmds(unsigned long start, unsi
> 			return;
> 
> 		/*
> +		 * Setting 'target_pmd' below creates a mapping in both
> +		 * the user and kernel page tables.  It is effectively
> +		 * global, so set it as global in both copies.
> +		 */
> +		*pmd = pmd_set_flags(*pmd, _PAGE_GLOBAL);
if (boot_cpu_has(X86_FEATURE_PGE)) ?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 05/11] x86/mm: do not auto-massage page protections
  2018-03-23 17:44 ` [PATCH 05/11] x86/mm: do not auto-massage page protections Dave Hansen
@ 2018-03-23 19:15   ` Nadav Amit
  2018-03-23 19:26     ` Dave Hansen
  2018-03-24 15:10   ` kbuild test robot
  2018-03-24 15:21   ` kbuild test robot
  2 siblings, 1 reply; 38+ messages in thread
From: Nadav Amit @ 2018-03-23 19:15 UTC (permalink / raw)
  To: Dave Hansen
  Cc: LKML, open list:MEMORY MANAGEMENT, Andrea Arcangeli, luto,
	torvalds, keescook, hughd, jgross, x86

Dave Hansen <dave.hansen@linux.intel.com> wrote:

> 
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> A PTE is constructed from a physical address and a pgprotval_t.
> __PAGE_KERNEL, for instance, is a pgprot_t and must be converted
> into a pgprotval_t before it can be used to create a PTE.  This is
> done implicitly within functions like set_pte() by massage_pgprot().
> 
> However, this makes it very challenging to set bits (and keep them
> set) if your bit is being filtered out by massage_pgprot().
> 
> This moves the bit filtering out of set_pte() and friends.  For

I don’t see that set_pte() filters the bits, so I am confused by this
sentence...

> +static inline pgprotval_t check_pgprot(pgprot_t pgprot)
> +{
> +	pgprotval_t massaged_val = massage_pgprot(pgprot);
> +
> +	/* mmdebug.h can not be included here because of dependencies */
> +#ifdef CONFIG_DEBUG_VM
> +	WARN_ONCE(pgprot_val(pgprot) != massaged_val,
> +		  "attempted to set unsupported pgprot: %016lx "
> +		  "bits: %016lx supported: %016lx\n",
> +		  pgprot_val(pgprot),
> +		  pgprot_val(pgprot) ^ massaged_val,
> +		  __supported_pte_mask);
> +#endif
Why not to use VM_WARN_ON_ONCE() and avoid the ifdef?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 05/11] x86/mm: do not auto-massage page protections
  2018-03-23 19:15   ` Nadav Amit
@ 2018-03-23 19:26     ` Dave Hansen
  2018-03-23 19:34       ` Nadav Amit
  0 siblings, 1 reply; 38+ messages in thread
From: Dave Hansen @ 2018-03-23 19:26 UTC (permalink / raw)
  To: Nadav Amit
  Cc: LKML, open list:MEMORY MANAGEMENT, Andrea Arcangeli, luto,
	torvalds, keescook, hughd, jgross, x86

On 03/23/2018 12:15 PM, Nadav Amit wrote:
>> A PTE is constructed from a physical address and a pgprotval_t.
>> __PAGE_KERNEL, for instance, is a pgprot_t and must be converted
>> into a pgprotval_t before it can be used to create a PTE.  This is
>> done implicitly within functions like set_pte() by massage_pgprot().
>>
>> However, this makes it very challenging to set bits (and keep them
>> set) if your bit is being filtered out by massage_pgprot().
>>
>> This moves the bit filtering out of set_pte() and friends.  For
> 
> I don’t see that set_pte() filters the bits, so I am confused by this
> sentence...

This was a typo/thinko.  It should be pfn_pte().

>> +static inline pgprotval_t check_pgprot(pgprot_t pgprot)
>> +{
>> +	pgprotval_t massaged_val = massage_pgprot(pgprot);
>> +
>> +	/* mmdebug.h can not be included here because of dependencies */
>> +#ifdef CONFIG_DEBUG_VM
>> +	WARN_ONCE(pgprot_val(pgprot) != massaged_val,
>> +		  "attempted to set unsupported pgprot: %016lx "
>> +		  "bits: %016lx supported: %016lx\n",
>> +		  pgprot_val(pgprot),
>> +		  pgprot_val(pgprot) ^ massaged_val,
>> +		  __supported_pte_mask);
>> +#endif
> Why not to use VM_WARN_ON_ONCE() and avoid the ifdef?

I wanted a message.  VM_WARN_ON_ONCE() doesn't let you give a message.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 05/11] x86/mm: do not auto-massage page protections
  2018-03-23 19:26     ` Dave Hansen
@ 2018-03-23 19:34       ` Nadav Amit
  2018-03-23 19:38         ` Dave Hansen
  0 siblings, 1 reply; 38+ messages in thread
From: Nadav Amit @ 2018-03-23 19:34 UTC (permalink / raw)
  To: Dave Hansen
  Cc: LKML, open list:MEMORY MANAGEMENT, Andrea Arcangeli, luto,
	torvalds, keescook, hughd, jgross, x86

Dave Hansen <dave.hansen@linux.intel.com> wrote:

> On 03/23/2018 12:15 PM, Nadav Amit wrote:
>>> A PTE is constructed from a physical address and a pgprotval_t.
>>> __PAGE_KERNEL, for instance, is a pgprot_t and must be converted
>>> into a pgprotval_t before it can be used to create a PTE.  This is
>>> done implicitly within functions like set_pte() by massage_pgprot().
>>> 
>>> However, this makes it very challenging to set bits (and keep them
>>> set) if your bit is being filtered out by massage_pgprot().
>>> 
>>> This moves the bit filtering out of set_pte() and friends.  For
>> 
>> I don’t see that set_pte() filters the bits, so I am confused by this
>> sentence...
> 
> This was a typo/thinko.  It should be pfn_pte().
> 
>>> +static inline pgprotval_t check_pgprot(pgprot_t pgprot)
>>> +{
>>> +	pgprotval_t massaged_val = massage_pgprot(pgprot);
>>> +
>>> +	/* mmdebug.h can not be included here because of dependencies */
>>> +#ifdef CONFIG_DEBUG_VM
>>> +	WARN_ONCE(pgprot_val(pgprot) != massaged_val,
>>> +		  "attempted to set unsupported pgprot: %016lx "
>>> +		  "bits: %016lx supported: %016lx\n",
>>> +		  pgprot_val(pgprot),
>>> +		  pgprot_val(pgprot) ^ massaged_val,
>>> +		  __supported_pte_mask);
>>> +#endif
>> Why not to use VM_WARN_ON_ONCE() and avoid the ifdef?
> 
> I wanted a message.  VM_WARN_ON_ONCE() doesn't let you give a message.

Right (my bad). But VM_WARN_ONCE() lets you.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 09/11] x86/pti: enable global pages for shared areas
  2018-03-23 19:12   ` Nadav Amit
@ 2018-03-23 19:36     ` Dave Hansen
  0 siblings, 0 replies; 38+ messages in thread
From: Dave Hansen @ 2018-03-23 19:36 UTC (permalink / raw)
  To: Nadav Amit
  Cc: linux-kernel, linux-mm, aarcange, luto, torvalds, keescook,
	hughd, jgross, x86

On 03/23/2018 12:12 PM, Nadav Amit wrote:
>> 		/*
>> +		 * Setting 'target_pmd' below creates a mapping in both
>> +		 * the user and kernel page tables.  It is effectively
>> +		 * global, so set it as global in both copies.
>> +		 */
>> +		*pmd = pmd_set_flags(*pmd, _PAGE_GLOBAL);
> if (boot_cpu_has(X86_FEATURE_PGE)) ?

Good catch.  I'll update that.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 05/11] x86/mm: do not auto-massage page protections
  2018-03-23 19:34       ` Nadav Amit
@ 2018-03-23 19:38         ` Dave Hansen
  0 siblings, 0 replies; 38+ messages in thread
From: Dave Hansen @ 2018-03-23 19:38 UTC (permalink / raw)
  To: Nadav Amit
  Cc: LKML, open list:MEMORY MANAGEMENT, Andrea Arcangeli, luto,
	torvalds, keescook, hughd, jgross, x86

On 03/23/2018 12:34 PM, Nadav Amit wrote:
>>>> +	/* mmdebug.h can not be included here because of dependencies */
>>>> +#ifdef CONFIG_DEBUG_VM
>>>> +	WARN_ONCE(pgprot_val(pgprot) != massaged_val,
>>>> +		  "attempted to set unsupported pgprot: %016lx "
>>>> +		  "bits: %016lx supported: %016lx\n",
>>>> +		  pgprot_val(pgprot),
>>>> +		  pgprot_val(pgprot) ^ massaged_val,
>>>> +		  __supported_pte_mask);
>>>> +#endif
>>> Why not to use VM_WARN_ON_ONCE() and avoid the ifdef?
>> I wanted a message.  VM_WARN_ON_ONCE() doesn't let you give a message.
> Right (my bad). But VM_WARN_ONCE() lets you.

I put a comment in up there about this ^^.  #including mmdebug.h caused
dependency problems, so I basically just reimplemented it using
WARN_ONCE() and an #ifdef.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 00/11] Use global pages with PTI
  2018-03-23 18:26 ` [PATCH 00/11] Use global pages with PTI Linus Torvalds
@ 2018-03-24  0:40   ` Dave Hansen
  2018-03-24  0:46     ` Linus Torvalds
                       ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: Dave Hansen @ 2018-03-24  0:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, linux-mm, Andrea Arcangeli,
	Andrew Lutomirski, Kees Cook, Hugh Dickins, Jürgen Groß,
	the arch/x86 maintainers, namit

On 03/23/2018 11:26 AM, Linus Torvalds wrote:
> On Fri, Mar 23, 2018 at 10:44 AM, Dave Hansen
> <dave.hansen@linux.intel.com> wrote:
>>
>> This adds one major change from the last version of the patch set
>> (present in the last patch).  It makes all kernel text global for non-
>> PCID systems.  This keeps kernel data protected always, but means that
>> it will be easier to find kernel gadgets via meltdown on old systems
>> without PCIDs.  This heuristic is, I think, a reasonable one and it
>> keeps us from having to create any new pti=foo options
> 
> Sounds sane.
> 
> The patches look reasonable, but I hate seeing a patch series like
> this where the only ostensible reason is performance, and there are no
> performance numbers anywhere..

Well, rats.  This somehow makes things slower with PCIDs on.  I thought
I reversed the numbers, but I actually do a "grep -c GLB
/sys/kernel/debug/page_tables/kernel" and record that in my logs right
next to the output of time(1), so it's awfully hard to screw up.

This is time doing a modestly-sized kernel compile on a 4-core Skylake
desktop.

                        User Time       Kernel Time     Clock Elapsed
Baseline ( 0 GLB PTEs)  803.79          67.77           237.30
w/series (28 GLB PTEs)  807.70 (+0.7%)  68.07 (+0.7%)   238.07 (+0.3%)

Without PCIDs, it behaves the way I would expect.

I'll ask around, but I'm open to any ideas about what the heck might be
causing this.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 00/11] Use global pages with PTI
  2018-03-24  0:40   ` Dave Hansen
@ 2018-03-24  0:46     ` Linus Torvalds
  2018-03-24  0:54       ` Linus Torvalds
  2018-03-24 11:05     ` Ingo Molnar
  2018-03-27 13:36     ` Thomas Gleixner
  2 siblings, 1 reply; 38+ messages in thread
From: Linus Torvalds @ 2018-03-24  0:46 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Linux Kernel Mailing List, linux-mm, Andrea Arcangeli,
	Andrew Lutomirski, Kees Cook, Hugh Dickins, Jürgen Groß,
	the arch/x86 maintainers, namit

On Fri, Mar 23, 2018 at 5:40 PM, Dave Hansen
<dave.hansen@linux.intel.com> wrote:
>
> Well, rats.  This somehow makes things slower with PCIDs on.

.. what happens when you enable global pages with PCID? You disabled
them explicitly because you thought they wouldn't matter..

Even with PCID, a global TLB entry for the shared pages would make
sense, because it's now just *one* entry in the TLB rather that "one
per PCID and one for the kernel mapping".

So even if in theory the lifetime of the TLB entry is the same, when
you have capacity misses it most definitely isn't.

And for process tear-down and build-up the per-PCID TLB entry does
nothing at all. While for a true global entry, it gets shared even
across process creation/deletion. So even ignoring TLB capacity
issues, with lots of shortlived processes global TLB entries are much
better.

It is, of course, possible that I misunderstood what you actually
benchmarked. But I assume the above benchmark numbers are with the
whole "don't even do global entries if you have PCID".

               Linus

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 00/11] Use global pages with PTI
  2018-03-24  0:46     ` Linus Torvalds
@ 2018-03-24  0:54       ` Linus Torvalds
  0 siblings, 0 replies; 38+ messages in thread
From: Linus Torvalds @ 2018-03-24  0:54 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Linux Kernel Mailing List, linux-mm, Andrea Arcangeli,
	Andrew Lutomirski, Kees Cook, Hugh Dickins, Jürgen Groß,
	the arch/x86 maintainers, namit

On Fri, Mar 23, 2018 at 5:46 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> It is, of course, possible that I misunderstood what you actually
> benchmarked. But I assume the above benchmark numbers are with the
> whole "don't even do global entries if you have PCID".

Oh, I went back and read your description, and realized that I _had_
misunderstood what you did.

I thought you didn't bother with global pages at all when you had PCID.

But that's not what you meant. You always do global for the actual
user-mapped kernel pages, but when you don't have PCID you do *all*
kernel test as global, whether shared or not.

So I entirely misread what the latest change was.

                     Linus

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 00/11] Use global pages with PTI
  2018-03-24  0:40   ` Dave Hansen
  2018-03-24  0:46     ` Linus Torvalds
@ 2018-03-24 11:05     ` Ingo Molnar
  2018-03-27 13:36     ` Thomas Gleixner
  2 siblings, 0 replies; 38+ messages in thread
From: Ingo Molnar @ 2018-03-24 11:05 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Linus Torvalds, Linux Kernel Mailing List, linux-mm,
	Andrea Arcangeli, Andrew Lutomirski, Kees Cook, Hugh Dickins,
	Jürgen Groß,
	the arch/x86 maintainers, namit


* Dave Hansen <dave.hansen@linux.intel.com> wrote:

> This is time doing a modestly-sized kernel compile on a 4-core Skylake
> desktop.
> 
>                         User Time       Kernel Time     Clock Elapsed
> Baseline ( 0 GLB PTEs)  803.79          67.77           237.30
> w/series (28 GLB PTEs)  807.70 (+0.7%)  68.07 (+0.7%)   238.07 (+0.3%)
> 
> Without PCIDs, it behaves the way I would expect.
>
> I'll ask around, but I'm open to any ideas about what the heck might be
> causing this.

Hm, so it's a bit weird that while user time and kernel time both increased by 
about 0.7%, elapsed time only increased by 0.3%? Typically kernel builds are much 
more parallel for that to be typical, so maybe there's some noise in the 
measurement?

Before spending too much time on the global-TLB patch angle I'd suggest investing 
a bit of time into making sure that the regression you are seeing is actually 
real:

You haven't described how you have measured kernel build times and "+0.7% 
regression" might turn out to be the real number, but sub-1% accuracy kernel build 
times are *awfully* susceptible to:

 - various sources of noise

 - systematic statistical errors which doesn't show up as 
   measurement-to-measurement noise but which skews the results:
   such as the boot-to-boot memory layout of the source code and
   object files.

 - cpufreq artifacts

Even repeated builds with 'make clean' inbetween can be misleading because the 
exact layout of key include files and binaries which get accessed the most often 
during a build are set into stone once they've been read into the page cache for 
the first time after bootup. Automated reboots between measurements can be 
misleading as well, if the file layout after bootup is too deterministic.

So here's a pretty reliable way to measure kernel build time, which tries to avoid 
the various pitfalls of caching.

First I make sure that cpufreq is set to 'performance':

  for ((cpu=0; cpu<120; cpu++)); do
    G=/sys/devices/system/cpu/cpu$cpu/cpufreq/scaling_governor
    [ -f $G ] && echo performance > $G
  done

[ ... because it can be *really* annoying to discover that an ostensible 
  performance regression was a cpufreq artifact ... again. ;-) ]

Then I copy a kernel tree to /tmp (ramfs) as root:

	cd /tmp
	rm -rf linux
	git clone ~/linux linux
	cd linux
	make defconfig >/dev/null
	
... and then we can build the kernel in such a loop (as root again):

  perf stat --repeat 10 --null --pre			'\
	cp -a kernel ../kernel.copy.$(date +%s);	 \
	rm -rf *;					 \
	git checkout .;					 \
	echo 1 > /proc/sys/vm/drop_caches;		 \
	find ../kernel* -type f | xargs cat >/dev/null;  \
	make -j kernel >/dev/null;			 \
	make clean >/dev/null 2>&1;			 \
	sync						'\
							 \
	make -j16 >/dev/null

( I have tested these by pasting them into a terminal. Adjust the ~/linux source 
  git tree and the '-j16' to your system. )

Notes:

 - the 'pre' script portion is not timed by 'perf stat', only the raw build times

 - we flush all caches via drop_caches and re-establish everything again, but:

 - we also introduce an intentional memory leak by slowly filling up ramfs with 
   copies of 'kernel/', thus continously changing the layout of free memory, 
   cached data such as compiler binaries and the source code hierarchy. (Note 
   that the leak is about 8MB per iteration, so it isn't massive.)

With 10 iterations this is the statistical stability I get this on a big box:

 Performance counter stats for 'make -j128 kernel' (10 runs):

      26.346436425 seconds time elapsed    (+- 0.19%)

... which, despite a high iteration count of 10, is still surprisingly noisy, 
right?

A 0.2% stddev is probably not enough to call a 0.7% regression with good 
confidence, so I had to use *30* iterations to make measurement noise to be about 
an order of magnitude lower than the effect I'm trying to measure:

 Performance counter stats for 'make -j128' (30 runs):

      26.334767571 seconds time elapsed    (+- 0.09% )

i.e. "26.334 +- 0.023" seconds is a number we can have pretty high confidence in, 
on this system.

And just to demonstrate that it's all real, I repeated the whole 30-iteration 
measurement again:

 Performance counter stats for 'make -j128' (30 runs):

      26.311166142 seconds time elapsed    (+- 0.07%)

Even if in the end you get a similar result, close to the +0.7% overhead you 
already measured, we should have more confidence in blaming global TLBs for the 
performance regression.

BYMMV.

Thanks,

	Ingo

[*] Note that even this doesn't eliminate certain sources of measurement error: 
    such as the boot-to-boot variance in the layout of certain key kernel data
    structures - but kernel builds are mostly user-space dominated, so drop_caches 
    should be good enough.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 05/11] x86/mm: do not auto-massage page protections
  2018-03-23 17:44 ` [PATCH 05/11] x86/mm: do not auto-massage page protections Dave Hansen
  2018-03-23 19:15   ` Nadav Amit
@ 2018-03-24 15:10   ` kbuild test robot
  2018-03-24 15:21   ` kbuild test robot
  2 siblings, 0 replies; 38+ messages in thread
From: kbuild test robot @ 2018-03-24 15:10 UTC (permalink / raw)
  To: Dave Hansen
  Cc: kbuild-all, linux-kernel, linux-mm, Dave Hansen, aarcange, luto,
	torvalds, keescook, hughd, jgross, x86, namit

[-- Attachment #1: Type: text/plain, Size: 3444 bytes --]

Hi Dave,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/auto-latest]
[also build test ERROR on next-20180323]
[cannot apply to tip/x86/core v4.16-rc6]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Dave-Hansen/Use-global-pages-with-PTI/20180324-205009
config: arm-at91_dt_defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=arm 

All errors (new ones prefixed by >>):

   mm/early_ioremap.c: In function '__early_ioremap':
>> mm/early_ioremap.c:117:22: error: '__default_kernel_pte_mask' undeclared (first use in this function); did you mean '__old_kernel_stat'?
     pgprot_val(prot) &= __default_kernel_pte_mask;
                         ^~~~~~~~~~~~~~~~~~~~~~~~~
                         __old_kernel_stat
   mm/early_ioremap.c:117:22: note: each undeclared identifier is reported only once for each function it appears in

vim +117 mm/early_ioremap.c

   104	
   105	static void __init __iomem *
   106	__early_ioremap(resource_size_t phys_addr, unsigned long size, pgprot_t prot)
   107	{
   108		unsigned long offset;
   109		resource_size_t last_addr;
   110		unsigned int nrpages;
   111		enum fixed_addresses idx;
   112		int i, slot;
   113	
   114		WARN_ON(system_state >= SYSTEM_RUNNING);
   115	
   116		/* Sanitize 'prot' against any unsupported bits: */
 > 117		pgprot_val(prot) &= __default_kernel_pte_mask;
   118	
   119		slot = -1;
   120		for (i = 0; i < FIX_BTMAPS_SLOTS; i++) {
   121			if (!prev_map[i]) {
   122				slot = i;
   123				break;
   124			}
   125		}
   126	
   127		if (WARN(slot < 0, "%s(%08llx, %08lx) not found slot\n",
   128			 __func__, (u64)phys_addr, size))
   129			return NULL;
   130	
   131		/* Don't allow wraparound or zero size */
   132		last_addr = phys_addr + size - 1;
   133		if (WARN_ON(!size || last_addr < phys_addr))
   134			return NULL;
   135	
   136		prev_size[slot] = size;
   137		/*
   138		 * Mappings have to be page-aligned
   139		 */
   140		offset = offset_in_page(phys_addr);
   141		phys_addr &= PAGE_MASK;
   142		size = PAGE_ALIGN(last_addr + 1) - phys_addr;
   143	
   144		/*
   145		 * Mappings have to fit in the FIX_BTMAP area.
   146		 */
   147		nrpages = size >> PAGE_SHIFT;
   148		if (WARN_ON(nrpages > NR_FIX_BTMAPS))
   149			return NULL;
   150	
   151		/*
   152		 * Ok, go for it..
   153		 */
   154		idx = FIX_BTMAP_BEGIN - NR_FIX_BTMAPS*slot;
   155		while (nrpages > 0) {
   156			if (after_paging_init)
   157				__late_set_fixmap(idx, phys_addr, prot);
   158			else
   159				__early_set_fixmap(idx, phys_addr, prot);
   160			phys_addr += PAGE_SIZE;
   161			--idx;
   162			--nrpages;
   163		}
   164		WARN(early_ioremap_debug, "%s(%08llx, %08lx) [%d] => %08lx + %08lx\n",
   165		     __func__, (u64)phys_addr, size, slot, offset, slot_virt[slot]);
   166	
   167		prev_map[slot] = (void __iomem *)(offset + slot_virt[slot]);
   168		return prev_map[slot];
   169	}
   170	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 23722 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 05/11] x86/mm: do not auto-massage page protections
  2018-03-23 17:44 ` [PATCH 05/11] x86/mm: do not auto-massage page protections Dave Hansen
  2018-03-23 19:15   ` Nadav Amit
  2018-03-24 15:10   ` kbuild test robot
@ 2018-03-24 15:21   ` kbuild test robot
  2 siblings, 0 replies; 38+ messages in thread
From: kbuild test robot @ 2018-03-24 15:21 UTC (permalink / raw)
  To: Dave Hansen
  Cc: kbuild-all, linux-kernel, linux-mm, Dave Hansen, aarcange, luto,
	torvalds, keescook, hughd, jgross, x86, namit

[-- Attachment #1: Type: text/plain, Size: 3451 bytes --]

Hi Dave,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/auto-latest]
[also build test ERROR on next-20180323]
[cannot apply to tip/x86/core v4.16-rc6]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Dave-Hansen/Use-global-pages-with-PTI/20180324-205009
config: arm-gemini_defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=arm 

All errors (new ones prefixed by >>):

   mm/early_ioremap.c: In function '__early_ioremap':
>> mm/early_ioremap.c:117:22: error: '__default_kernel_pte_mask' undeclared (first use in this function); did you mean '__current_kernel_time'?
     pgprot_val(prot) &= __default_kernel_pte_mask;
                         ^~~~~~~~~~~~~~~~~~~~~~~~~
                         __current_kernel_time
   mm/early_ioremap.c:117:22: note: each undeclared identifier is reported only once for each function it appears in

vim +117 mm/early_ioremap.c

   104	
   105	static void __init __iomem *
   106	__early_ioremap(resource_size_t phys_addr, unsigned long size, pgprot_t prot)
   107	{
   108		unsigned long offset;
   109		resource_size_t last_addr;
   110		unsigned int nrpages;
   111		enum fixed_addresses idx;
   112		int i, slot;
   113	
   114		WARN_ON(system_state >= SYSTEM_RUNNING);
   115	
   116		/* Sanitize 'prot' against any unsupported bits: */
 > 117		pgprot_val(prot) &= __default_kernel_pte_mask;
   118	
   119		slot = -1;
   120		for (i = 0; i < FIX_BTMAPS_SLOTS; i++) {
   121			if (!prev_map[i]) {
   122				slot = i;
   123				break;
   124			}
   125		}
   126	
   127		if (WARN(slot < 0, "%s(%08llx, %08lx) not found slot\n",
   128			 __func__, (u64)phys_addr, size))
   129			return NULL;
   130	
   131		/* Don't allow wraparound or zero size */
   132		last_addr = phys_addr + size - 1;
   133		if (WARN_ON(!size || last_addr < phys_addr))
   134			return NULL;
   135	
   136		prev_size[slot] = size;
   137		/*
   138		 * Mappings have to be page-aligned
   139		 */
   140		offset = offset_in_page(phys_addr);
   141		phys_addr &= PAGE_MASK;
   142		size = PAGE_ALIGN(last_addr + 1) - phys_addr;
   143	
   144		/*
   145		 * Mappings have to fit in the FIX_BTMAP area.
   146		 */
   147		nrpages = size >> PAGE_SHIFT;
   148		if (WARN_ON(nrpages > NR_FIX_BTMAPS))
   149			return NULL;
   150	
   151		/*
   152		 * Ok, go for it..
   153		 */
   154		idx = FIX_BTMAP_BEGIN - NR_FIX_BTMAPS*slot;
   155		while (nrpages > 0) {
   156			if (after_paging_init)
   157				__late_set_fixmap(idx, phys_addr, prot);
   158			else
   159				__early_set_fixmap(idx, phys_addr, prot);
   160			phys_addr += PAGE_SIZE;
   161			--idx;
   162			--nrpages;
   163		}
   164		WARN(early_ioremap_debug, "%s(%08llx, %08lx) [%d] => %08lx + %08lx\n",
   165		     __func__, (u64)phys_addr, size, slot, offset, slot_virt[slot]);
   166	
   167		prev_map[slot] = (void __iomem *)(offset + slot_virt[slot]);
   168		return prev_map[slot];
   169	}
   170	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 12382 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 00/11] Use global pages with PTI
  2018-03-24  0:40   ` Dave Hansen
  2018-03-24  0:46     ` Linus Torvalds
  2018-03-24 11:05     ` Ingo Molnar
@ 2018-03-27 13:36     ` Thomas Gleixner
  2018-03-27 16:32       ` Dave Hansen
  2 siblings, 1 reply; 38+ messages in thread
From: Thomas Gleixner @ 2018-03-27 13:36 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Linus Torvalds, Linux Kernel Mailing List, linux-mm,
	Andrea Arcangeli, Andrew Lutomirski, Kees Cook, Hugh Dickins,
	Jürgen Groß,
	the arch/x86 maintainers, namit

On Fri, 23 Mar 2018, Dave Hansen wrote:
> On 03/23/2018 11:26 AM, Linus Torvalds wrote:
> > On Fri, Mar 23, 2018 at 10:44 AM, Dave Hansen
> > <dave.hansen@linux.intel.com> wrote:
> >>
> >> This adds one major change from the last version of the patch set
> >> (present in the last patch).  It makes all kernel text global for non-
> >> PCID systems.  This keeps kernel data protected always, but means that
> >> it will be easier to find kernel gadgets via meltdown on old systems
> >> without PCIDs.  This heuristic is, I think, a reasonable one and it
> >> keeps us from having to create any new pti=foo options
> > 
> > Sounds sane.
> > 
> > The patches look reasonable, but I hate seeing a patch series like
> > this where the only ostensible reason is performance, and there are no
> > performance numbers anywhere..
> 
> Well, rats.  This somehow makes things slower with PCIDs on.  I thought
> I reversed the numbers, but I actually do a "grep -c GLB
> /sys/kernel/debug/page_tables/kernel" and record that in my logs right
> next to the output of time(1), so it's awfully hard to screw up.
> 
> This is time doing a modestly-sized kernel compile on a 4-core Skylake
> desktop.
> 
>                         User Time       Kernel Time     Clock Elapsed
> Baseline ( 0 GLB PTEs)  803.79          67.77           237.30
> w/series (28 GLB PTEs)  807.70 (+0.7%)  68.07 (+0.7%)   238.07 (+0.3%)
> 
> Without PCIDs, it behaves the way I would expect.

What's the performance benefit on !PCID systems? And I mean systems which
actually do not have PCID, not a PCID system with 'nopcid' on the command
line.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 00/11] Use global pages with PTI
  2018-03-27 13:36     ` Thomas Gleixner
@ 2018-03-27 16:32       ` Dave Hansen
  2018-03-27 17:51         ` Thomas Gleixner
  0 siblings, 1 reply; 38+ messages in thread
From: Dave Hansen @ 2018-03-27 16:32 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Linus Torvalds, Linux Kernel Mailing List, linux-mm,
	Andrea Arcangeli, Andrew Lutomirski, Kees Cook, Hugh Dickins,
	Jürgen Groß,
	the arch/x86 maintainers, namit

On 03/27/2018 06:36 AM, Thomas Gleixner wrote:
>>                         User Time       Kernel Time     Clock Elapsed
>> Baseline ( 0 GLB PTEs)  803.79          67.77           237.30
>> w/series (28 GLB PTEs)  807.70 (+0.7%)  68.07 (+0.7%)   238.07 (+0.3%)
>>
>> Without PCIDs, it behaves the way I would expect.
> What's the performance benefit on !PCID systems? And I mean systems which
> actually do not have PCID, not a PCID system with 'nopcid' on the command
> line.

Do you have something in mind for this?  Basically *all* of the servers
that I have access to have PCID because they are newer than ~7 years old.

That leaves *some* Ivybridge and earlier desktops, Atoms and AMD
systems.  Atoms are going to be the easiest thing to get my hands on,
but I tend to shy away from them for performance work.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 00/11] Use global pages with PTI
  2018-03-27 16:32       ` Dave Hansen
@ 2018-03-27 17:51         ` Thomas Gleixner
  2018-03-27 20:07           ` Ingo Molnar
  0 siblings, 1 reply; 38+ messages in thread
From: Thomas Gleixner @ 2018-03-27 17:51 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Linus Torvalds, Linux Kernel Mailing List, linux-mm,
	Andrea Arcangeli, Andrew Lutomirski, Kees Cook, Hugh Dickins,
	Jürgen Groß,
	the arch/x86 maintainers, namit

On Tue, 27 Mar 2018, Dave Hansen wrote:

> On 03/27/2018 06:36 AM, Thomas Gleixner wrote:
> >>                         User Time       Kernel Time     Clock Elapsed
> >> Baseline ( 0 GLB PTEs)  803.79          67.77           237.30
> >> w/series (28 GLB PTEs)  807.70 (+0.7%)  68.07 (+0.7%)   238.07 (+0.3%)
> >>
> >> Without PCIDs, it behaves the way I would expect.
> > What's the performance benefit on !PCID systems? And I mean systems which
> > actually do not have PCID, not a PCID system with 'nopcid' on the command
> > line.
> 
> Do you have something in mind for this?  Basically *all* of the servers
> that I have access to have PCID because they are newer than ~7 years old.
> 
> That leaves *some* Ivybridge and earlier desktops, Atoms and AMD

AMD is not interesting as it's not PTI and uses GLOBAL anyway.

> systems.  Atoms are going to be the easiest thing to get my hands on,
> but I tend to shy away from them for performance work.

What I have in mind is that I wonder whether the whole circus is worth it
when there is no performance advantage on PCID systems.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 00/11] Use global pages with PTI
  2018-03-27 17:51         ` Thomas Gleixner
@ 2018-03-27 20:07           ` Ingo Molnar
  2018-03-27 20:19             ` Dave Hansen
  2018-03-29  0:17             ` Dave Hansen
  0 siblings, 2 replies; 38+ messages in thread
From: Ingo Molnar @ 2018-03-27 20:07 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Dave Hansen, Linus Torvalds, Linux Kernel Mailing List, linux-mm,
	Andrea Arcangeli, Andrew Lutomirski, Kees Cook, Hugh Dickins,
	Jürgen Groß,
	the arch/x86 maintainers, namit


* Thomas Gleixner <tglx@linutronix.de> wrote:

> > systems.  Atoms are going to be the easiest thing to get my hands on,
> > but I tend to shy away from them for performance work.
> 
> What I have in mind is that I wonder whether the whole circus is worth it
> when there is no performance advantage on PCID systems.

I'd still love to:

 - To see at minimum stddev numbers, to make sure we are not looking at some weird
   statistical artifact. (I also outlined a more robust measurement method.)

 - If the numbers are right, a CPU engineer should have a look if possible, 
   because frankly this effect is not expected and is not intuitive. Where global 
   pages can be used safely they are almost always an unconditional win.
   Maybe we are missing some limitation or some interaction with PCID.

Since we'll be using PCID even on Meltdown-fixed hardware, maybe the same negative 
performance effect already exists on non-PTI kernels as well, we just never 
noticed?

I.e. there are multiple grounds to get to the bottom of this.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 00/11] Use global pages with PTI
  2018-03-27 20:07           ` Ingo Molnar
@ 2018-03-27 20:19             ` Dave Hansen
  2018-03-29  0:17             ` Dave Hansen
  1 sibling, 0 replies; 38+ messages in thread
From: Dave Hansen @ 2018-03-27 20:19 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner
  Cc: Linus Torvalds, Linux Kernel Mailing List, linux-mm,
	Andrea Arcangeli, Andrew Lutomirski, Kees Cook, Hugh Dickins,
	Jürgen Groß,
	the arch/x86 maintainers, namit

On 03/27/2018 01:07 PM, Ingo Molnar wrote:
>  - To see at minimum stddev numbers, to make sure we are not looking at some weird
>    statistical artifact. (I also outlined a more robust measurement method.)
> 
>  - If the numbers are right, a CPU engineer should have a look if possible, 
>    because frankly this effect is not expected and is not intuitive. Where global 
>    pages can be used safely they are almost always an unconditional win.
>    Maybe we are missing some limitation or some interaction with PCID.
> 
> Since we'll be using PCID even on Meltdown-fixed hardware, maybe the same negative 
> performance effect already exists on non-PTI kernels as well, we just never 
> noticed?

Yep, totally agree.  I'll do the more robust collection and also explore
on "real" !PCID hardware.  I also know the right CPU folks to go ask
about this, I just want to do the second round of robust data collection
before I bug them.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 00/11] Use global pages with PTI
  2018-03-27 20:07           ` Ingo Molnar
  2018-03-27 20:19             ` Dave Hansen
@ 2018-03-29  0:17             ` Dave Hansen
  2018-03-30 12:09               ` Ingo Molnar
  1 sibling, 1 reply; 38+ messages in thread
From: Dave Hansen @ 2018-03-29  0:17 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner
  Cc: Linus Torvalds, Linux Kernel Mailing List, linux-mm,
	Andrea Arcangeli, Andrew Lutomirski, Kees Cook, Hugh Dickins,
	Jürgen Groß,
	the arch/x86 maintainers, namit

On 03/27/2018 01:07 PM, Ingo Molnar wrote:
> * Thomas Gleixner <tglx@linutronix.de> wrote:
>>> systems.  Atoms are going to be the easiest thing to get my hands on,
>>> but I tend to shy away from them for performance work.
>> What I have in mind is that I wonder whether the whole circus is worth it
>> when there is no performance advantage on PCID systems.

I was waiting on trying to find a relatively recent Atom system (they
actually come in reasonably sized servers [1]), but I'm hitting a snag
there, so I figured I'd just share a kernel compile using Ingo's
perf-based methodology on a Skylake desktop system with PCIDs.  Here's
the kernel compile:

No Global pages (baseline): 186.951 seconds time elapsed  ( +-  0.35% )
28 Global pages (this set): 185.756 seconds time elapsed  ( +-  0.09% )
                             -1.195 seconds (-0.64%)

Lower is better here, obviously.

I also re-checked everything using will-it-scale's llseek1 test[2] which
is basically a microbenchmark of a halfway reasonable syscall.  Higher
here is better.

No Global pages (baseline): 15783951 lseeks/sec
28 Global pages (this set): 16054688 lseeks/sec
			     +270737 lseeks/sec (+1.71%)

So, both the kernel compile and the microbenchmark got measurably faster.

1.
https://ark.intel.com/products/97933/Intel-Atom-Processor-C3955-16M-Cache-up-to-2_40-GHz
2.
https://github.com/antonblanchard/will-it-scale/blob/master/tests/lseek1.c

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 00/11] Use global pages with PTI
  2018-03-29  0:17             ` Dave Hansen
@ 2018-03-30 12:09               ` Ingo Molnar
  2018-03-30 12:17                 ` Ingo Molnar
  0 siblings, 1 reply; 38+ messages in thread
From: Ingo Molnar @ 2018-03-30 12:09 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Thomas Gleixner, Linus Torvalds, Linux Kernel Mailing List,
	linux-mm, Andrea Arcangeli, Andrew Lutomirski, Kees Cook,
	Hugh Dickins, Jürgen Groß,
	the arch/x86 maintainers, namit


* Dave Hansen <dave.hansen@linux.intel.com> wrote:

> On 03/27/2018 01:07 PM, Ingo Molnar wrote:
> > * Thomas Gleixner <tglx@linutronix.de> wrote:
> >>> systems.  Atoms are going to be the easiest thing to get my hands on,
> >>> but I tend to shy away from them for performance work.
> >> What I have in mind is that I wonder whether the whole circus is worth it
> >> when there is no performance advantage on PCID systems.
> 
> I was waiting on trying to find a relatively recent Atom system (they
> actually come in reasonably sized servers [1]), but I'm hitting a snag
> there, so I figured I'd just share a kernel compile using Ingo's
> perf-based methodology on a Skylake desktop system with PCIDs.
>
> Here's the kernel compile:
> 
> No Global pages (baseline): 186.951 seconds time elapsed  ( +-  0.35% )
> 28 Global pages (this set): 185.756 seconds time elapsed  ( +-  0.09% )
>                              -1.195 seconds (-0.64%)
> 
> Lower is better here, obviously.
> 
> I also re-checked everything using will-it-scale's llseek1 test[2] which
> is basically a microbenchmark of a halfway reasonable syscall.  Higher
> here is better.
> 
> No Global pages (baseline): 15783951 lseeks/sec
> 28 Global pages (this set): 16054688 lseeks/sec
> 			     +270737 lseeks/sec (+1.71%)
> 
> So, both the kernel compile and the microbenchmark got measurably faster.

Ok, cool, this is much better!

Mind re-sending the patch-set against latest -tip so it can be merged?

At this point !PCID Intel hardware is not a primary concern, if something bad 
happens on them with global pages we can quirk global pages off on them in some 
way, or so.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 00/11] Use global pages with PTI
  2018-03-30 12:09               ` Ingo Molnar
@ 2018-03-30 12:17                 ` Ingo Molnar
  2018-03-30 20:26                   ` Dave Hansen
  0 siblings, 1 reply; 38+ messages in thread
From: Ingo Molnar @ 2018-03-30 12:17 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Thomas Gleixner, Linus Torvalds, Linux Kernel Mailing List,
	linux-mm, Andrea Arcangeli, Andrew Lutomirski, Kees Cook,
	Hugh Dickins, Jürgen Groß,
	the arch/x86 maintainers, namit


* Ingo Molnar <mingo@kernel.org> wrote:

> > No Global pages (baseline): 186.951 seconds time elapsed  ( +-  0.35% )
> > 28 Global pages (this set): 185.756 seconds time elapsed  ( +-  0.09% )
> >                              -1.195 seconds (-0.64%)
> > 
> > Lower is better here, obviously.
> > 
> > I also re-checked everything using will-it-scale's llseek1 test[2] which
> > is basically a microbenchmark of a halfway reasonable syscall.  Higher
> > here is better.
> > 
> > No Global pages (baseline): 15783951 lseeks/sec
> > 28 Global pages (this set): 16054688 lseeks/sec
> > 			     +270737 lseeks/sec (+1.71%)
> > 
> > So, both the kernel compile and the microbenchmark got measurably faster.
> 
> Ok, cool, this is much better!
> 
> Mind re-sending the patch-set against latest -tip so it can be merged?
> 
> At this point !PCID Intel hardware is not a primary concern, if something bad 
> happens on them with global pages we can quirk global pages off on them in some 
> way, or so.

BTW., the expectation on !PCID Intel hardware would be for global pages to help 
even more than the 0.6% and 1.7% you measured on PCID hardware: PCID already 
_reduces_ the cost of TLB flushes - so if there's not even PCID then global pages 
should help even more.

In theory at least. Would still be nice to measure it.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 00/11] Use global pages with PTI
  2018-03-30 12:17                 ` Ingo Molnar
@ 2018-03-30 20:26                   ` Dave Hansen
  2018-03-30 20:32                     ` Thomas Gleixner
  0 siblings, 1 reply; 38+ messages in thread
From: Dave Hansen @ 2018-03-30 20:26 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, Linus Torvalds, Linux Kernel Mailing List,
	linux-mm, Andrea Arcangeli, Andrew Lutomirski, Kees Cook,
	Hugh Dickins, Jürgen Groß,
	the arch/x86 maintainers, namit

On 03/30/2018 05:17 AM, Ingo Molnar wrote:
> BTW., the expectation on !PCID Intel hardware would be for global pages to help 
> even more than the 0.6% and 1.7% you measured on PCID hardware: PCID already 
> _reduces_ the cost of TLB flushes - so if there's not even PCID then global pages 
> should help even more.
> 
> In theory at least. Would still be nice to measure it.

I did the lseek test on a modern, non-PCID system:

No Global pages (baseline): 6077741 lseeks/sec
94 Global pages (this set): 8433111 lseeks/sec
			   +2355370 lseeks/sec (+38.8%)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 00/11] Use global pages with PTI
  2018-03-30 20:26                   ` Dave Hansen
@ 2018-03-30 20:32                     ` Thomas Gleixner
  2018-03-30 21:40                       ` Dave Hansen
  0 siblings, 1 reply; 38+ messages in thread
From: Thomas Gleixner @ 2018-03-30 20:32 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Ingo Molnar, Linus Torvalds, Linux Kernel Mailing List, linux-mm,
	Andrea Arcangeli, Andrew Lutomirski, Kees Cook, Hugh Dickins,
	Jürgen Groß,
	the arch/x86 maintainers, namit

On Fri, 30 Mar 2018, Dave Hansen wrote:

> On 03/30/2018 05:17 AM, Ingo Molnar wrote:
> > BTW., the expectation on !PCID Intel hardware would be for global pages to help 
> > even more than the 0.6% and 1.7% you measured on PCID hardware: PCID already 
> > _reduces_ the cost of TLB flushes - so if there's not even PCID then global pages 
> > should help even more.
> > 
> > In theory at least. Would still be nice to measure it.
> 
> I did the lseek test on a modern, non-PCID system:
> 
> No Global pages (baseline): 6077741 lseeks/sec
> 94 Global pages (this set): 8433111 lseeks/sec
> 			   +2355370 lseeks/sec (+38.8%)

That's all kernel text, right? What's the result for the case where global
is only set for all user/kernel shared pages?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 00/11] Use global pages with PTI
  2018-03-30 20:32                     ` Thomas Gleixner
@ 2018-03-30 21:40                       ` Dave Hansen
  2018-03-31  5:39                         ` Ingo Molnar
  0 siblings, 1 reply; 38+ messages in thread
From: Dave Hansen @ 2018-03-30 21:40 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Linus Torvalds, Linux Kernel Mailing List, linux-mm,
	Andrea Arcangeli, Andrew Lutomirski, Kees Cook, Hugh Dickins,
	Jürgen Groß,
	the arch/x86 maintainers, namit

On 03/30/2018 01:32 PM, Thomas Gleixner wrote:
> On Fri, 30 Mar 2018, Dave Hansen wrote:
> 
>> On 03/30/2018 05:17 AM, Ingo Molnar wrote:
>>> BTW., the expectation on !PCID Intel hardware would be for global pages to help 
>>> even more than the 0.6% and 1.7% you measured on PCID hardware: PCID already 
>>> _reduces_ the cost of TLB flushes - so if there's not even PCID then global pages 
>>> should help even more.
>>>
>>> In theory at least. Would still be nice to measure it.
>>
>> I did the lseek test on a modern, non-PCID system:
>>
>> No Global pages (baseline): 6077741 lseeks/sec
>> 94 Global pages (this set): 8433111 lseeks/sec
>> 			   +2355370 lseeks/sec (+38.8%)
> 
> That's all kernel text, right? What's the result for the case where global
> is only set for all user/kernel shared pages?

Yes, that's all kernel text (94 global entries).  Here's the number with
just the entry data/text set global (88 global entries on this system):

No Global pages (baseline): 6077741 lseeks/sec
88 Global Pages (kentry  ): 7528609 lseeks/sec (+23.9%)
94 Global pages (this set): 8433111 lseeks/sec (+38.8%)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 00/11] Use global pages with PTI
  2018-03-30 21:40                       ` Dave Hansen
@ 2018-03-31  5:39                         ` Ingo Molnar
  2018-03-31 18:19                           ` Dave Hansen
  0 siblings, 1 reply; 38+ messages in thread
From: Ingo Molnar @ 2018-03-31  5:39 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Thomas Gleixner, Linus Torvalds, Linux Kernel Mailing List,
	linux-mm, Andrea Arcangeli, Andrew Lutomirski, Kees Cook,
	Hugh Dickins, Jürgen Groß,
	the arch/x86 maintainers, namit


* Dave Hansen <dave.hansen@linux.intel.com> wrote:

> On 03/30/2018 01:32 PM, Thomas Gleixner wrote:
> > On Fri, 30 Mar 2018, Dave Hansen wrote:
> > 
> >> On 03/30/2018 05:17 AM, Ingo Molnar wrote:
> >>> BTW., the expectation on !PCID Intel hardware would be for global pages to help 
> >>> even more than the 0.6% and 1.7% you measured on PCID hardware: PCID already 
> >>> _reduces_ the cost of TLB flushes - so if there's not even PCID then global pages 
> >>> should help even more.
> >>>
> >>> In theory at least. Would still be nice to measure it.
> >>
> >> I did the lseek test on a modern, non-PCID system:
> >>
> >> No Global pages (baseline): 6077741 lseeks/sec
> >> 94 Global pages (this set): 8433111 lseeks/sec
> >> 			   +2355370 lseeks/sec (+38.8%)
> > 
> > That's all kernel text, right? What's the result for the case where global
> > is only set for all user/kernel shared pages?
> 
> Yes, that's all kernel text (94 global entries).  Here's the number with
> just the entry data/text set global (88 global entries on this system):
> 
> No Global pages (baseline): 6077741 lseeks/sec
> 88 Global Pages (kentry  ): 7528609 lseeks/sec (+23.9%)
> 94 Global pages (this set): 8433111 lseeks/sec (+38.8%)

Very impressive!

Please incorporate the performance numbers in patches #9 and #11.

There were a couple of valid review comments which need to be addressed as well, 
but other than that it all looks good to me and I plan to apply the next 
iteration.

In fact I think I'll try to put it into the backporting tree: as PGE was really 
the pre PTI status quo and thus we should expect few quirks/bugs in this area, 
plus we still want to share as much core PTI logic with the -stable kernels as 
possible. The performance plus doesn't hurt either ... after so much lost 
performance.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 00/11] Use global pages with PTI
  2018-03-31  5:39                         ` Ingo Molnar
@ 2018-03-31 18:19                           ` Dave Hansen
  0 siblings, 0 replies; 38+ messages in thread
From: Dave Hansen @ 2018-03-31 18:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, Linus Torvalds, Linux Kernel Mailing List,
	linux-mm, Andrea Arcangeli, Andrew Lutomirski, Kees Cook,
	Hugh Dickins, Jürgen Groß,
	the arch/x86 maintainers, namit

On 03/30/2018 10:39 PM, Ingo Molnar wrote:
> There were a couple of valid review comments which need to be addressed as well, 
> but other than that it all looks good to me and I plan to apply the next 
> iteration.

Testing on that non-PCID systems showed an oddity with parts of the
kernel image that are modified later in boot (when we set the kernel
image read-only).  We split a few of the PMD entries and the the old
(early boot) values were being used for userspace.

I don't think this is a big deal.  The most annoying thing is that it
makes it harder to quickly validate that all of the things we set to
global *should* be global.  I'll put some examples of how this looks in
the patch when I repost.

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2018-03-31 18:19 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-23 17:44 [PATCH 00/11] Use global pages with PTI Dave Hansen
2018-03-23 17:44 ` [PATCH 01/11] x86/mm: factor out pageattr _PAGE_GLOBAL setting Dave Hansen
2018-03-23 17:44 ` [PATCH 02/11] x86/mm: undo double _PAGE_PSE clearing Dave Hansen
2018-03-23 17:44 ` [PATCH 03/11] x86/mm: introduce "default" kernel PTE mask Dave Hansen
2018-03-23 17:44 ` [PATCH 04/11] x86/espfix: document use of _PAGE_GLOBAL Dave Hansen
2018-03-23 17:44 ` [PATCH 05/11] x86/mm: do not auto-massage page protections Dave Hansen
2018-03-23 19:15   ` Nadav Amit
2018-03-23 19:26     ` Dave Hansen
2018-03-23 19:34       ` Nadav Amit
2018-03-23 19:38         ` Dave Hansen
2018-03-24 15:10   ` kbuild test robot
2018-03-24 15:21   ` kbuild test robot
2018-03-23 17:44 ` [PATCH 06/11] x86/mm: remove extra filtering in pageattr code Dave Hansen
2018-03-23 17:44 ` [PATCH 07/11] x86/mm: comment _PAGE_GLOBAL mystery Dave Hansen
2018-03-23 17:44 ` [PATCH 08/11] x86/mm: do not forbid _PAGE_RW before init for __ro_after_init Dave Hansen
2018-03-23 17:45 ` [PATCH 09/11] x86/pti: enable global pages for shared areas Dave Hansen
2018-03-23 19:12   ` Nadav Amit
2018-03-23 19:36     ` Dave Hansen
2018-03-23 17:45 ` [PATCH 10/11] x86/pti: clear _PAGE_GLOBAL for kernel image Dave Hansen
2018-03-23 17:45 ` [PATCH 11/11] x86/pti: leave kernel text global for !PCID Dave Hansen
2018-03-23 18:26 ` [PATCH 00/11] Use global pages with PTI Linus Torvalds
2018-03-24  0:40   ` Dave Hansen
2018-03-24  0:46     ` Linus Torvalds
2018-03-24  0:54       ` Linus Torvalds
2018-03-24 11:05     ` Ingo Molnar
2018-03-27 13:36     ` Thomas Gleixner
2018-03-27 16:32       ` Dave Hansen
2018-03-27 17:51         ` Thomas Gleixner
2018-03-27 20:07           ` Ingo Molnar
2018-03-27 20:19             ` Dave Hansen
2018-03-29  0:17             ` Dave Hansen
2018-03-30 12:09               ` Ingo Molnar
2018-03-30 12:17                 ` Ingo Molnar
2018-03-30 20:26                   ` Dave Hansen
2018-03-30 20:32                     ` Thomas Gleixner
2018-03-30 21:40                       ` Dave Hansen
2018-03-31  5:39                         ` Ingo Molnar
2018-03-31 18:19                           ` Dave Hansen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).