linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] [v2] x86/mm/pti: close two Meltdown leaks with Global kernel mapping
@ 2018-08-02 22:58 Dave Hansen
  2018-08-02 22:58 ` [PATCH 1/7] x86/mm/pti: clear Global bit more aggressively Dave Hansen
                   ` (6 more replies)
  0 siblings, 7 replies; 23+ messages in thread
From: Dave Hansen @ 2018-08-02 22:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Hansen, keescook, tglx, mingo, aarcange, jgross, jpoimboe,
	gregkh, peterz, hughd, torvalds, bp, luto, ak


The fixes for the problem Hugh reported took a bit more surgery
than I would have liked, but they do appear to work.  Note that
the last two patches are unnecessary cleanups that could be removed
from backports.

Changes from v1:
 * Modify set_memory_np() to avoid messing with the direct map
   by limiting its changes to the high kernel image map.

--

This applies to 4.17 and 4.18.

Thanks to Hugh Dickins for initially finding the r/w kernel text
issue and coming up with an initial fix.  I found the "unused
hole" part and came up with different approach for fixing the
mess.

--

Background:

Process Context IDentifiers (PCIDs) are a hardware feature that
allows TLB entries to survive page table switches (CR3 writes).
As an optimization, the PTI code currently allows the kernel image
to be Global when running on hardware without PCIDs.  This results
in fewer TLB misses, especially upon entry.

The downside is that these Global areas are theoretically
susceptible to Meltdown.  The logic is that there are no secrets
in the kernel image, so why pay the cost of TLB misses.

Problem:

The current PTI code leaves the entire area of the kernel binary
between '_text' and '_end' as Global (on non-PCID hardware).
However, that range contains both read-write kernel data, and two
"unused" holes in addition to text.  The areas which are not text
or read-only might contain secrets once they are freed back into
the allocator.

This issue affects systems which are susceptible to Meltdown, do not
have PCIDs and which are using the default PTI_AUTO mode (no
pti=on/off on the cmdline).

PCIDs became generally available for servers in ~2010 (Westmere)
and desktop (client) parts in roughly 2011 (Sandybridge).  This
is not expected to affect anything newer than that.

Solution:

The solution for the read-write area is to clear the global bit
for the area (patch #1).

The "unused" holes need a bit more work since we free them in a
bit of an ad-hoc way, but we fix this up in patches 2-5.

Cc: Kees Cook <keescook@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/7] x86/mm/pti: clear Global bit more aggressively
  2018-08-02 22:58 [PATCH 0/7] [v2] x86/mm/pti: close two Meltdown leaks with Global kernel mapping Dave Hansen
@ 2018-08-02 22:58 ` Dave Hansen
  2018-08-05 20:30   ` [tip:x86/pti] x86/mm/pti: Clear " tip-bot for Dave Hansen
  2018-08-02 22:58 ` [PATCH 2/7] mm: allow non-direct-map arguments to free_reserved_area() Dave Hansen
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Dave Hansen @ 2018-08-02 22:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Hansen, hughd, keescook, tglx, mingo, aarcange, jgross,
	jpoimboe, gregkh, peterz, torvalds, bp, luto, ak


From: Dave Hansen <dave.hansen@linux.intel.com>

The kernel image starts out with the Global bit set across the entire
kernel image.  The bit is cleared with set_memory_nonglobal() in the
configurations with PCIDs where we do not need the performance
benefits of the Global bit.

However, this is fragile.  It means that we are stuck opting *out* of
the less-secure (Global bit set) configuration, which seems backwards.
Let's start more secure (Global bit clear) and then let things opt
back in if they want performance, or are truly mapping common data
between kernel and userspace.

This fixes a bug.  Before this patch, there are areas that are
unmapped from the user page tables (like like everything above
0xffffffff82600000 in the example below).  These have the hallmark of
being a wrong Global area: they are no identical in the
'current_kernel' and 'current_user' page table dumps.  They are also
read-write, which means they're much more likely to contain secrets.

Before this patch:

current_kernel:---[ High Kernel Mapping ]---
current_kernel-0xffffffff80000000-0xffffffff81000000          16M                               pmd
current_kernel-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
current_kernel-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
current_kernel-0xffffffff81e11000-0xffffffff82000000        1980K     RW                 GLB NX pte
current_kernel-0xffffffff82000000-0xffffffff82600000           6M     ro         PSE     GLB NX pmd
current_kernel-0xffffffff82600000-0xffffffff82c00000           6M     RW         PSE     GLB NX pmd
current_kernel-0xffffffff82c00000-0xffffffff82e00000           2M     RW                 GLB NX pte
current_kernel-0xffffffff82e00000-0xffffffff83200000           4M     RW         PSE     GLB NX pmd
current_kernel-0xffffffff83200000-0xffffffffa0000000         462M                               pmd

 current_user:---[ High Kernel Mapping ]---
 current_user-0xffffffff80000000-0xffffffff81000000          16M                               pmd
 current_user-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
 current_user-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
 current_user-0xffffffff81e11000-0xffffffff82000000        1980K     RW                 GLB NX pte
 current_user-0xffffffff82000000-0xffffffff82600000           6M     ro         PSE     GLB NX pmd
 current_user-0xffffffff82600000-0xffffffffa0000000         474M                               pmd

After this patch:

current_kernel:---[ High Kernel Mapping ]---
current_kernel-0xffffffff80000000-0xffffffff81000000          16M                               pmd
current_kernel-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
current_kernel-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
current_kernel-0xffffffff81e11000-0xffffffff82000000        1980K     RW                     NX pte
current_kernel-0xffffffff82000000-0xffffffff82600000           6M     ro         PSE     GLB NX pmd
current_kernel-0xffffffff82600000-0xffffffff82c00000           6M     RW         PSE         NX pmd
current_kernel-0xffffffff82c00000-0xffffffff82e00000           2M     RW                     NX pte
current_kernel-0xffffffff82e00000-0xffffffff83200000           4M     RW         PSE         NX pmd
current_kernel-0xffffffff83200000-0xffffffffa0000000         462M                               pmd

  current_user:---[ High Kernel Mapping ]---
  current_user-0xffffffff80000000-0xffffffff81000000          16M                               pmd
  current_user-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
  current_user-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
  current_user-0xffffffff81e11000-0xffffffff82000000        1980K     RW                     NX pte
  current_user-0xffffffff82000000-0xffffffff82600000           6M     ro         PSE     GLB NX pmd
  current_user-0xffffffff82600000-0xffffffffa0000000         474M                               pmd

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reported-by: Hugh Dickins <hughd@google.com>
Cc: Kees Cook <keescook@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
---

 b/arch/x86/mm/pageattr.c |    6 ++++++
 b/arch/x86/mm/pti.c      |   34 ++++++++++++++++++++++++----------
 2 files changed, 30 insertions(+), 10 deletions(-)

diff -puN arch/x86/mm/pageattr.c~pti-non-pcid-clear-more-global-in-kernel-mapping arch/x86/mm/pageattr.c
--- a/arch/x86/mm/pageattr.c~pti-non-pcid-clear-more-global-in-kernel-mapping	2018-08-02 14:42:35.905479118 -0700
+++ b/arch/x86/mm/pageattr.c	2018-08-02 14:42:35.912479118 -0700
@@ -1784,6 +1784,12 @@ int set_memory_nonglobal(unsigned long a
 				      __pgprot(_PAGE_GLOBAL), 0);
 }
 
+int set_memory_global(unsigned long addr, int numpages)
+{
+	return change_page_attr_set(&addr, numpages,
+				    __pgprot(_PAGE_GLOBAL), 0);
+}
+
 static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
 {
 	struct cpa_data cpa;
diff -puN arch/x86/mm/pti.c~pti-non-pcid-clear-more-global-in-kernel-mapping arch/x86/mm/pti.c
--- a/arch/x86/mm/pti.c~pti-non-pcid-clear-more-global-in-kernel-mapping	2018-08-02 14:42:35.909479118 -0700
+++ b/arch/x86/mm/pti.c	2018-08-02 14:42:35.913479118 -0700
@@ -435,6 +435,13 @@ static inline bool pti_kernel_image_glob
 }
 
 /*
+ * This is the only user for these and it is not arch-generic
+ * like the other set_memory.h functions.  Just extern them.
+ */
+extern int set_memory_nonglobal(unsigned long addr, int numpages);
+extern int set_memory_global(unsigned long addr, int numpages);
+
+/*
  * For some configurations, map all of kernel text into the user page
  * tables.  This reduces TLB misses, especially on non-PCID systems.
  */
@@ -446,7 +453,8 @@ void pti_clone_kernel_text(void)
 	 * clone the areas past rodata, they might contain secrets.
 	 */
 	unsigned long start = PFN_ALIGN(_text);
-	unsigned long end = (unsigned long)__end_rodata_hpage_align;
+	unsigned long end_clone  = (unsigned long)__end_rodata_hpage_align;
+	unsigned long end_global = PFN_ALIGN((unsigned long)__stop___ex_table);
 
 	if (!pti_kernel_image_global_ok())
 		return;
@@ -458,14 +466,18 @@ void pti_clone_kernel_text(void)
 	 * pti_set_kernel_image_nonglobal() did to clear the
 	 * global bit.
 	 */
-	pti_clone_pmds(start, end, _PAGE_RW);
+	pti_clone_pmds(start, end_clone, _PAGE_RW);
+
+	/*
+	 * pti_clone_pmds() will set the global bit in any PMDs
+	 * that it clones, but we also need to get any PTEs in
+	 * the last level for areas that are not huge-page-aligned.
+	 */
+
+	/* Set the global bit for normal non-__init kernel text: */
+	set_memory_global(start, (end_global - start) >> PAGE_SHIFT);
 }
 
-/*
- * This is the only user for it and it is not arch-generic like
- * the other set_memory.h functions.  Just extern it.
- */
-extern int set_memory_nonglobal(unsigned long addr, int numpages);
 void pti_set_kernel_image_nonglobal(void)
 {
 	/*
@@ -477,9 +489,11 @@ void pti_set_kernel_image_nonglobal(void
 	unsigned long start = PFN_ALIGN(_text);
 	unsigned long end = ALIGN((unsigned long)_end, PMD_PAGE_SIZE);
 
-	if (pti_kernel_image_global_ok())
-		return;
-
+	/*
+	 * This clears _PAGE_GLOBAL from the entire kernel image.
+	 * pti_clone_kernel_text() map put _PAGE_GLOBAL back for
+	 * areas that are mapped to userspace.
+	 */
 	set_memory_nonglobal(start, (end - start) >> PAGE_SHIFT);
 }
 
_

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 2/7] mm: allow non-direct-map arguments to free_reserved_area()
  2018-08-02 22:58 [PATCH 0/7] [v2] x86/mm/pti: close two Meltdown leaks with Global kernel mapping Dave Hansen
  2018-08-02 22:58 ` [PATCH 1/7] x86/mm/pti: clear Global bit more aggressively Dave Hansen
@ 2018-08-02 22:58 ` Dave Hansen
  2018-08-05 20:31   ` [tip:x86/pti] mm: Allow " tip-bot for Dave Hansen
  2018-08-02 22:58 ` [PATCH 3/7] x86/mm/init: pass unconverted symbol addresses to free_init_pages() Dave Hansen
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Dave Hansen @ 2018-08-02 22:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Hansen, keescook, tglx, mingo, aarcange, jgross, jpoimboe,
	gregkh, peterz, hughd, torvalds, bp, luto, ak


From: Dave Hansen <dave.hansen@linux.intel.com>

free_reserved_area() takes pointers as arguments to show which
addresses should be freed.  However, it does this in a
somewhat ambiguous way.  If it gets a kernel direct map address,
it always works.  However, if it gets an address that is
part of the kernel image alias mapping, it can fail.

It fails if all of the following happen:
 * The specified address is part of the kernel image alias
 * Poisoning is requested (forcing a memset())
 * The address is in a read-only portion of the kernel image

The memset() fails on the read-only mapping, of course.
free_reserved_area() *is* called both on the direct map and
on kernel image alias addresses.  We've just lucked out thus
far that the kernel image alias areas it gets used on are
read-write.  I'm fairly sure this has been just a happy
accident.

It is quite easy to make free_reserved_area() work for all
cases: just convert the address to a direct map address before
doing the memset(), and do this unconditionally.  There is
little chance of a regression here because we previously
did a virt_to_page() on the address for the memset, so we
know these are no highmem pages for which virt_to_page()
would fail.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
---

 b/mm/page_alloc.c |   16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff -puN mm/page_alloc.c~x86-mm-init-handle-non-linear-map-ranges-free_init_pages mm/page_alloc.c
--- a/mm/page_alloc.c~x86-mm-init-handle-non-linear-map-ranges-free_init_pages	2018-08-02 14:14:47.860483278 -0700
+++ b/mm/page_alloc.c	2018-08-02 14:14:47.865483278 -0700
@@ -6939,9 +6939,21 @@ unsigned long free_reserved_area(void *s
 	start = (void *)PAGE_ALIGN((unsigned long)start);
 	end = (void *)((unsigned long)end & PAGE_MASK);
 	for (pos = start; pos < end; pos += PAGE_SIZE, pages++) {
+		struct page *page = virt_to_page(pos);
+		void *direct_map_addr;
+
+		/*
+		 * 'direct_map_addr' might be different from 'pos'
+		 * because some architectures' virt_to_page()
+		 * work with aliases.  Getting the direct map
+		 * address ensures that we get a _writeable_
+		 * alias for the memset().
+		 */
+	       	direct_map_addr = page_address(page);
 		if ((unsigned int)poison <= 0xFF)
-			memset(pos, poison, PAGE_SIZE);
-		free_reserved_page(virt_to_page(pos));
+			memset(direct_map_addr, poison, PAGE_SIZE);
+
+		free_reserved_page(page);
 	}
 
 	if (pages && s)
_

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 3/7] x86/mm/init: pass unconverted symbol addresses to free_init_pages()
  2018-08-02 22:58 [PATCH 0/7] [v2] x86/mm/pti: close two Meltdown leaks with Global kernel mapping Dave Hansen
  2018-08-02 22:58 ` [PATCH 1/7] x86/mm/pti: clear Global bit more aggressively Dave Hansen
  2018-08-02 22:58 ` [PATCH 2/7] mm: allow non-direct-map arguments to free_reserved_area() Dave Hansen
@ 2018-08-02 22:58 ` Dave Hansen
  2018-08-04  0:18   ` Hugh Dickins
  2018-08-05 20:31   ` [tip:x86/pti] x86/mm/init: Pass " tip-bot for Dave Hansen
  2018-08-02 22:58 ` [PATCH 4/7] x86/mm/init: add helper for freeing kernel image pages Dave Hansen
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 23+ messages in thread
From: Dave Hansen @ 2018-08-02 22:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Hansen, keescook, tglx, mingo, aarcange, jgross, jpoimboe,
	gregkh, peterz, hughd, torvalds, bp, luto, ak


From: Dave Hansen <dave.hansen@linux.intel.com>

The x86 code has several places where it frees parts of kernel image:

 1. Unused SMP alternative
 2. __init code
 3. The hole between text and rodata
 4. The hole between rodata and data

We call free_init_pages() to do this.  Strangely, we convert the
symbol addresses to kernel direct map addresses in some cases
(#3, #4) but not others (#1, #2).

The virt_to_page() and the other code in free_reserved_area() now
works fine for for symbol addresses on x86, so don't bother
converting the addresses to direct map addresses before freeing
them.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
---

 b/arch/x86/mm/init_64.c |    8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff -puN arch/x86/mm/init_64.c~x86-init-do-not-convert-symbol-addresses arch/x86/mm/init_64.c
--- a/arch/x86/mm/init_64.c~x86-init-do-not-convert-symbol-addresses	2018-08-02 14:14:48.380483277 -0700
+++ b/arch/x86/mm/init_64.c	2018-08-02 14:14:48.383483277 -0700
@@ -1283,12 +1283,8 @@ void mark_rodata_ro(void)
 	set_memory_ro(start, (end-start) >> PAGE_SHIFT);
 #endif
 
-	free_init_pages("unused kernel",
-			(unsigned long) __va(__pa_symbol(text_end)),
-			(unsigned long) __va(__pa_symbol(rodata_start)));
-	free_init_pages("unused kernel",
-			(unsigned long) __va(__pa_symbol(rodata_end)),
-			(unsigned long) __va(__pa_symbol(_sdata)));
+	free_init_pages("unused kernel", text_end, rodata_start);
+	free_init_pages("unused kernel", rodata_end, _sdata);
 
 	debug_checkwx();
 
_

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 4/7] x86/mm/init: add helper for freeing kernel image pages
  2018-08-02 22:58 [PATCH 0/7] [v2] x86/mm/pti: close two Meltdown leaks with Global kernel mapping Dave Hansen
                   ` (2 preceding siblings ...)
  2018-08-02 22:58 ` [PATCH 3/7] x86/mm/init: pass unconverted symbol addresses to free_init_pages() Dave Hansen
@ 2018-08-02 22:58 ` Dave Hansen
  2018-08-05 20:32   ` [tip:x86/pti] x86/mm/init: Add " tip-bot for Dave Hansen
  2018-08-02 22:58 ` [PATCH 5/7] x86/mm/init: remove freed kernel image areas from alias mapping Dave Hansen
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Dave Hansen @ 2018-08-02 22:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Hansen, keescook, tglx, mingo, aarcange, jgross, jpoimboe,
	gregkh, peterz, hughd, torvalds, bp, luto, ak


From: Dave Hansen <dave.hansen@linux.intel.com>

When chunks of the kernel image are freed, free_init_pages()
is used directly.  Consolidate the three sites that do this.
Also update the string to give an incrementally better
description of that memory versus what was there before.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
---

 b/arch/x86/include/asm/processor.h |    1 +
 b/arch/x86/mm/init.c               |   15 ++++++++++++---
 b/arch/x86/mm/init_64.c            |    4 ++--
 3 files changed, 15 insertions(+), 5 deletions(-)

diff -puN arch/x86/include/asm/processor.h~x86-mm-init-common-kernel-image-free-helper arch/x86/include/asm/processor.h
--- a/arch/x86/include/asm/processor.h~x86-mm-init-common-kernel-image-free-helper	2018-08-02 14:14:48.891483276 -0700
+++ b/arch/x86/include/asm/processor.h	2018-08-02 14:14:48.901483276 -0700
@@ -966,6 +966,7 @@ static inline uint32_t hypervisor_cpuid_
 
 extern unsigned long arch_align_stack(unsigned long sp);
 extern void free_init_pages(char *what, unsigned long begin, unsigned long end);
+extern void free_kernel_image_pages(void *begin, void *end);
 
 void default_idle(void);
 #ifdef	CONFIG_XEN
diff -puN arch/x86/mm/init_64.c~x86-mm-init-common-kernel-image-free-helper arch/x86/mm/init_64.c
--- a/arch/x86/mm/init_64.c~x86-mm-init-common-kernel-image-free-helper	2018-08-02 14:14:48.895483276 -0700
+++ b/arch/x86/mm/init_64.c	2018-08-02 14:14:48.902483276 -0700
@@ -1283,8 +1283,8 @@ void mark_rodata_ro(void)
 	set_memory_ro(start, (end-start) >> PAGE_SHIFT);
 #endif
 
-	free_init_pages("unused kernel", text_end, rodata_start);
-	free_init_pages("unused kernel", rodata_end, _sdata);
+	free_kernel_image_pages((void *)text_end, (void *)rodata_start);
+	free_kernel_image_pages((void *)rodata_end, (void *)_sdata);
 
 	debug_checkwx();
 
diff -puN arch/x86/mm/init.c~x86-mm-init-common-kernel-image-free-helper arch/x86/mm/init.c
--- a/arch/x86/mm/init.c~x86-mm-init-common-kernel-image-free-helper	2018-08-02 14:14:48.896483276 -0700
+++ b/arch/x86/mm/init.c	2018-08-02 14:14:48.902483276 -0700
@@ -773,13 +773,22 @@ void free_init_pages(char *what, unsigne
 	}
 }
 
+/*
+ * begin/end can be in the direct map or the "high kernel mapping"
+ * used for the kernel image only.  free_init_pages() will do the
+ * right thing for either kind of address.
+ */
+void free_kernel_image_pages(void *begin, void *end)
+{
+	free_init_pages("unused kernel image",
+			(unsigned long)begin, (unsigned long)end);
+}
+
 void __ref free_initmem(void)
 {
 	e820__reallocate_tables();
 
-	free_init_pages("unused kernel",
-			(unsigned long)(&__init_begin),
-			(unsigned long)(&__init_end));
+	free_kernel_image_pages(&__init_begin, &__init_end);
 }
 
 #ifdef CONFIG_BLK_DEV_INITRD
_

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 5/7] x86/mm/init: remove freed kernel image areas from alias mapping
  2018-08-02 22:58 [PATCH 0/7] [v2] x86/mm/pti: close two Meltdown leaks with Global kernel mapping Dave Hansen
                   ` (3 preceding siblings ...)
  2018-08-02 22:58 ` [PATCH 4/7] x86/mm/init: add helper for freeing kernel image pages Dave Hansen
@ 2018-08-02 22:58 ` Dave Hansen
  2018-08-04  0:35   ` Hugh Dickins
                     ` (3 more replies)
  2018-08-02 22:58 ` [PATCH 6/7] x86/mm/pageattr: pass named flag instead of 0/1 Dave Hansen
  2018-08-02 22:58 ` [PATCH 7/7] x86/mm/pageattr: Remove implicit NX behavior Dave Hansen
  6 siblings, 4 replies; 23+ messages in thread
From: Dave Hansen @ 2018-08-02 22:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Hansen, keescook, tglx, mingo, aarcange, jgross, jpoimboe,
	gregkh, peterz, hughd, torvalds, bp, luto, ak


From: Dave Hansen <dave.hansen@linux.intel.com>

The kernel image is mapped into two places in the virtual address
space (addresses without KASLR, of course):

	1. The kernel direct map (0xffff880000000000)
	2. The "high kernel map" (0xffffffff81000000)

We actually execute out of #2.  If we get the address of a kernel
symbol, it points to #2, but almost all physical-to-virtual
translations point to #1.

Parts of the "high kernel map" alias are mapped in the userspace
page tables with the Global bit for performance reasons.  The
parts that we map to userspace do not (er, should not) have
secrets.

This is fine, except that some areas in the kernel image that
are adjacent to the non-secret-containing areas are unused holes.
We free these holes back into the normal page allocator and
reuse them as normal kernel memory.  The memory will, of course,
get *used* via the normal map, but the alias mapping is kept.

This otherwise unused alias mapping of the holes will, by default
keep the Global bit, be mapped out to userspace, and be
vulnerable to Meltdown.

Remove the alias mapping of these pages entirely.  This is likely
to fracture the 2M page mapping the kernel image near these areas,
but this should affect a minority of the area.

The pageattr code changes *all* aliases mapping the physical pages
that it operates on (by default).  We only want to modify a single
alias, so we need to tweak its behavior.

This unmapping behavior is currently dependent on PTI being in
place.  Going forward, we should at least consider doing this for
all configurations.  Having an extra read-write alias for memory
is not exactly ideal for debugging things like random memory
corruption and this does undercut features like DEBUG_PAGEALLOC
or future work like eXclusive Page Frame Ownership (XPFO).

Before this patch:

current_kernel:---[ High Kernel Mapping ]---
current_kernel-0xffffffff80000000-0xffffffff81000000          16M                               pmd
current_kernel-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
current_kernel-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
current_kernel-0xffffffff81e11000-0xffffffff82000000        1980K     RW                     NX pte
current_kernel-0xffffffff82000000-0xffffffff82600000           6M     ro         PSE     GLB NX pmd
current_kernel-0xffffffff82600000-0xffffffff82c00000           6M     RW         PSE         NX pmd
current_kernel-0xffffffff82c00000-0xffffffff82e00000           2M     RW                     NX pte
current_kernel-0xffffffff82e00000-0xffffffff83200000           4M     RW         PSE         NX pmd
current_kernel-0xffffffff83200000-0xffffffffa0000000         462M                               pmd

  current_user:---[ High Kernel Mapping ]---
  current_user-0xffffffff80000000-0xffffffff81000000          16M                               pmd
  current_user-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
  current_user-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
  current_user-0xffffffff81e11000-0xffffffff82000000        1980K     RW                     NX pte
  current_user-0xffffffff82000000-0xffffffff82600000           6M     ro         PSE     GLB NX pmd
  current_user-0xffffffff82600000-0xffffffffa0000000         474M                               pmd


After this patch:

current_kernel:---[ High Kernel Mapping ]---
current_kernel-0xffffffff80000000-0xffffffff81000000          16M                               pmd
current_kernel-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
current_kernel-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
current_kernel-0xffffffff81e11000-0xffffffff82000000        1980K                               pte
current_kernel-0xffffffff82000000-0xffffffff82400000           4M     ro         PSE     GLB NX pmd
current_kernel-0xffffffff82400000-0xffffffff82488000         544K     ro                     NX pte
current_kernel-0xffffffff82488000-0xffffffff82600000        1504K                               pte
current_kernel-0xffffffff82600000-0xffffffff82c00000           6M     RW         PSE         NX pmd
current_kernel-0xffffffff82c00000-0xffffffff82c0d000          52K     RW                     NX pte
current_kernel-0xffffffff82c0d000-0xffffffff82dc0000        1740K                               pte

  current_user:---[ High Kernel Mapping ]---
  current_user-0xffffffff80000000-0xffffffff81000000          16M                               pmd
  current_user-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
  current_user-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
  current_user-0xffffffff81e11000-0xffffffff82000000        1980K                               pte
  current_user-0xffffffff82000000-0xffffffff82400000           4M     ro         PSE     GLB NX pmd
  current_user-0xffffffff82400000-0xffffffff82488000         544K     ro                     NX pte
  current_user-0xffffffff82488000-0xffffffff82600000        1504K                               pte
  current_user-0xffffffff82600000-0xffffffffa0000000         474M                               pmd

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
---

 b/arch/x86/include/asm/set_memory.h |    1 +
 b/arch/x86/mm/init.c                |   25 +++++++++++++++++++++++--
 b/arch/x86/mm/pageattr.c            |   13 +++++++++++++
 3 files changed, 37 insertions(+), 2 deletions(-)

diff -puN arch/x86/include/asm/set_memory.h~x86-unmap-freed-areas-from-kernel-image arch/x86/include/asm/set_memory.h
--- a/arch/x86/include/asm/set_memory.h~x86-unmap-freed-areas-from-kernel-image	2018-08-02 14:14:49.462483274 -0700
+++ b/arch/x86/include/asm/set_memory.h	2018-08-02 14:14:49.471483274 -0700
@@ -46,6 +46,7 @@ int set_memory_np(unsigned long addr, in
 int set_memory_4k(unsigned long addr, int numpages);
 int set_memory_encrypted(unsigned long addr, int numpages);
 int set_memory_decrypted(unsigned long addr, int numpages);
+int set_memory_np_noalias(unsigned long addr, int numpages);
 
 int set_memory_array_uc(unsigned long *addr, int addrinarray);
 int set_memory_array_wc(unsigned long *addr, int addrinarray);
diff -puN arch/x86/mm/init.c~x86-unmap-freed-areas-from-kernel-image arch/x86/mm/init.c
--- a/arch/x86/mm/init.c~x86-unmap-freed-areas-from-kernel-image	2018-08-02 14:14:49.463483274 -0700
+++ b/arch/x86/mm/init.c	2018-08-02 14:14:49.471483274 -0700
@@ -780,8 +780,29 @@ void free_init_pages(char *what, unsigne
  */
 void free_kernel_image_pages(void *begin, void *end)
 {
-	free_init_pages("unused kernel image",
-			(unsigned long)begin, (unsigned long)end);
+	unsigned long begin_ul = (unsigned long)begin;
+	unsigned long end_ul = (unsigned long)end;
+	unsigned long len_pages = (end_ul - begin_ul) >> PAGE_SHIFT;
+
+
+	free_init_pages("unused kernel image", begin_ul, end_ul);
+
+	/*
+	 * PTI maps some of the kernel into userspace.  For
+	 * performance, this includes some kernel areas that
+	 * do not contain secrets.  Those areas might be
+	 * adjacent to the parts of the kernel image being
+	 * freed, which may contain secrets.  Remove the
+	 * "high kernel image mapping" for these freed areas,
+	 * ensuring they are not even potentially vulnerable
+	 * to Meltdown regardless of the specific optimizations
+	 * PTI is currently using.
+	 *
+	 * The "noalias" prevents unmapping the direct map
+	 * alias which is needed to access the freed pages.
+	 */
+	if (cpu_feature_enabled(X86_FEATURE_PTI))
+		set_memory_np_noalias(begin_ul, len_pages);
 }
 
 void __ref free_initmem(void)
diff -puN arch/x86/mm/pageattr.c~x86-unmap-freed-areas-from-kernel-image arch/x86/mm/pageattr.c
--- a/arch/x86/mm/pageattr.c~x86-unmap-freed-areas-from-kernel-image	2018-08-02 14:14:49.466483274 -0700
+++ b/arch/x86/mm/pageattr.c	2018-08-02 14:14:49.472483274 -0700
@@ -53,6 +53,7 @@ static DEFINE_SPINLOCK(cpa_lock);
 #define CPA_FLUSHTLB 1
 #define CPA_ARRAY 2
 #define CPA_PAGES_ARRAY 4
+#define CPA_NO_CHECK_ALIAS 8 /* Do not search for aliases */
 
 #ifdef CONFIG_PROC_FS
 static unsigned long direct_pages_count[PG_LEVEL_NUM];
@@ -1486,6 +1487,9 @@ static int change_page_attr_set_clr(unsi
 
 	/* No alias checking for _NX bit modifications */
 	checkalias = (pgprot_val(mask_set) | pgprot_val(mask_clr)) != _PAGE_NX;
+	/* Never check aliases if the caller asks for it explicitly: */
+	if (checkalias && (in_flag & CPA_NO_CHECK_ALIAS))
+		checkalias = 0;
 
 	ret = __change_page_attr_set_clr(&cpa, checkalias);
 
@@ -1772,6 +1776,15 @@ int set_memory_np(unsigned long addr, in
 	return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_PRESENT), 0);
 }
 
+int set_memory_np_noalias(unsigned long addr, int numpages)
+{
+	int cpa_flags = CPA_NO_CHECK_ALIAS;
+
+	return change_page_attr_set_clr(&addr, numpages, __pgprot(0),
+					__pgprot(_PAGE_PRESENT), 0,
+					cpa_flags, NULL);
+}
+
 int set_memory_4k(unsigned long addr, int numpages)
 {
 	return change_page_attr_set_clr(&addr, numpages, __pgprot(0),
_

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 6/7] x86/mm/pageattr: pass named flag instead of 0/1
  2018-08-02 22:58 [PATCH 0/7] [v2] x86/mm/pti: close two Meltdown leaks with Global kernel mapping Dave Hansen
                   ` (4 preceding siblings ...)
  2018-08-02 22:58 ` [PATCH 5/7] x86/mm/init: remove freed kernel image areas from alias mapping Dave Hansen
@ 2018-08-02 22:58 ` Dave Hansen
  2018-08-05 20:09   ` Thomas Gleixner
  2018-08-02 22:58 ` [PATCH 7/7] x86/mm/pageattr: Remove implicit NX behavior Dave Hansen
  6 siblings, 1 reply; 23+ messages in thread
From: Dave Hansen @ 2018-08-02 22:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Hansen, keescook, tglx, mingo, aarcange, jgross, jpoimboe,
	gregkh, peterz, hughd, torvalds, bp, luto, ak


From: Dave Hansen <dave.hansen@linux.intel.com>

This is a cleanup.  There should be functional changes in this patch.

change_page_attr_set/clear() take an 0/1 argument to indicate whether
CPA_ARRAY should be passed down to change_page_attr_set/clear().
Rather than having a 0/1 argument to turn a single flag on/off, just
pass down the flag itself.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 b/arch/x86/mm/pageattr.c |   14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff -puN arch/x86/mm/pageattr.c~x86-pageattr-pass-flags arch/x86/mm/pageattr.c
--- a/arch/x86/mm/pageattr.c~x86-pageattr-pass-flags	2018-08-02 14:14:50.037483273 -0700
+++ b/arch/x86/mm/pageattr.c	2018-08-02 14:14:50.041483273 -0700
@@ -1525,17 +1525,17 @@ out:
 }
 
 static inline int change_page_attr_set(unsigned long *addr, int numpages,
-				       pgprot_t mask, int array)
+				       pgprot_t mask, int flags)
 {
 	return change_page_attr_set_clr(addr, numpages, mask, __pgprot(0), 0,
-		(array ? CPA_ARRAY : 0), NULL);
+		flags, NULL);
 }
 
 static inline int change_page_attr_clear(unsigned long *addr, int numpages,
-					 pgprot_t mask, int array)
+					 pgprot_t mask, int flags)
 {
 	return change_page_attr_set_clr(addr, numpages, __pgprot(0), mask, 0,
-		(array ? CPA_ARRAY : 0), NULL);
+		flags, NULL);
 }
 
 static inline int cpa_set_pages_array(struct page **pages, int numpages,
@@ -1609,7 +1609,8 @@ static int _set_memory_array(unsigned lo
 				_PAGE_CACHE_MODE_UC_MINUS : new_type;
 
 	ret = change_page_attr_set(addr, addrinarray,
-				   cachemode2pgprot(set_type), 1);
+				   cachemode2pgprot(set_type),
+				   CPA_ARRAY);
 
 	if (!ret && new_type == _PAGE_CACHE_MODE_WC)
 		ret = change_page_attr_set_clr(addr, addrinarray,
@@ -1732,7 +1733,8 @@ int set_memory_array_wb(unsigned long *a
 
 	/* WB cache mode is hard wired to all cache attribute bits being 0 */
 	ret = change_page_attr_clear(addr, addrinarray,
-				      __pgprot(_PAGE_CACHE_MASK), 1);
+				      __pgprot(_PAGE_CACHE_MASK),
+				      CPA_ARRAY);
 	if (ret)
 		return ret;
 
_

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 7/7] x86/mm/pageattr: Remove implicit NX behavior
  2018-08-02 22:58 [PATCH 0/7] [v2] x86/mm/pti: close two Meltdown leaks with Global kernel mapping Dave Hansen
                   ` (5 preceding siblings ...)
  2018-08-02 22:58 ` [PATCH 6/7] x86/mm/pageattr: pass named flag instead of 0/1 Dave Hansen
@ 2018-08-02 22:58 ` Dave Hansen
  6 siblings, 0 replies; 23+ messages in thread
From: Dave Hansen @ 2018-08-02 22:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Hansen, keescook, tglx, mingo, aarcange, jgross, jpoimboe,
	gregkh, peterz, hughd, torvalds, bp, luto, ak


From: Dave Hansen <dave.hansen@linux.intel.com>

This is a cleanup.  There should be functional changes in this patch.

The pageattr code has the ability to find and change aliases
mappings.  It does this for requests by default unless the
page protections being modified contain only the NX bit.

But, this behavior is rather obscure and buried very deep
within the infrastructure.  Rather than doing it implicitly
from NX, use the new CPA_NO_CHECK_ALIAS to do it more
explicitly from the call site where NX is set.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
---

 b/arch/x86/mm/pageattr.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff -puN arch/x86/mm/pageattr.c~x86-pageattr-nx arch/x86/mm/pageattr.c
--- a/arch/x86/mm/pageattr.c~x86-pageattr-nx	2018-08-02 15:04:48.032475796 -0700
+++ b/arch/x86/mm/pageattr.c	2018-08-02 15:04:48.036475796 -0700
@@ -1485,8 +1485,6 @@ static int change_page_attr_set_clr(unsi
 	if (in_flag & (CPA_ARRAY | CPA_PAGES_ARRAY))
 		cpa.flags |= in_flag;
 
-	/* No alias checking for _NX bit modifications */
-	checkalias = (pgprot_val(mask_set) | pgprot_val(mask_clr)) != _PAGE_NX;
 	/* Never check aliases if the caller asks for it explicitly: */
 	if (checkalias && (in_flag & CPA_NO_CHECK_ALIAS))
 		checkalias = 0;
@@ -1750,7 +1748,9 @@ int set_memory_x(unsigned long addr, int
 	if (!(__supported_pte_mask & _PAGE_NX))
 		return 0;
 
-	return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_NX), 0);
+	/* NX is not required to be consistent across aliases. */
+	return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_NX),
+				      CPA_NO_CHECK_ALIAS);
 }
 EXPORT_SYMBOL(set_memory_x);
 
@@ -1759,7 +1759,9 @@ int set_memory_nx(unsigned long addr, in
 	if (!(__supported_pte_mask & _PAGE_NX))
 		return 0;
 
-	return change_page_attr_set(&addr, numpages, __pgprot(_PAGE_NX), 0);
+	/* NX is not required to be consistent across aliases. */
+	return change_page_attr_set(&addr, numpages, __pgprot(_PAGE_NX),
+				    CPA_NO_CHECK_ALIAS);
 }
 EXPORT_SYMBOL(set_memory_nx);
 
_

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/7] x86/mm/init: pass unconverted symbol addresses to free_init_pages()
  2018-08-02 22:58 ` [PATCH 3/7] x86/mm/init: pass unconverted symbol addresses to free_init_pages() Dave Hansen
@ 2018-08-04  0:18   ` Hugh Dickins
  2018-08-04 17:31     ` Linus Torvalds
  2018-08-05 20:31   ` [tip:x86/pti] x86/mm/init: Pass " tip-bot for Dave Hansen
  1 sibling, 1 reply; 23+ messages in thread
From: Hugh Dickins @ 2018-08-04  0:18 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-kernel, keescook, tglx, mingo, aarcange, jgross, jpoimboe,
	gregkh, peterz, hughd, torvalds, bp, luto, ak

On Thu, 2 Aug 2018, Dave Hansen wrote:
> 
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> The x86 code has several places where it frees parts of kernel image:
> 
>  1. Unused SMP alternative
>  2. __init code
>  3. The hole between text and rodata
>  4. The hole between rodata and data
> 
> We call free_init_pages() to do this.  Strangely, we convert the
> symbol addresses to kernel direct map addresses in some cases
> (#3, #4) but not others (#1, #2).
> 
> The virt_to_page() and the other code in free_reserved_area() now
> works fine for for symbol addresses on x86, so don't bother
> converting the addresses to direct map addresses before freeing
> them.
> 

Thanks Dave, this series works for me.

But in trying to review it, I am feeling very stupid about this 3/7
(and/or the 2/7 before it: though I don't think that it's wrong).

I simply do not understand how this change works, and how #1 and #2
work. I thought that virt_to_page() only works on virtual addresses
in the direct map: the __va(__pa_symbol()) you remove below was a
good conversion from high kernel mapping to direct map. Without it,
how does virt_to_page() then find the right page?

Is it that there's a guaranteed common alignment between the direct
map and the high kernel mapping, and the ffffffff8....... addresses
pass through the arithmetic in such a way that virt_to_page() comes
up with the right answer; and that is well-known and relied upon by
x86 developers? Or are you now adding a dependency on a coincidence?
Or does it not work at all, and the pages are actually not freed?

Hugh

p.s. if there were to be a respin (I hope not), I notice that in
both 1/7 and 2/7 commit comments, you've said " no " for " not ";
which always makes us pause and wonder if you meant " now ".

> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Kees Cook <keescook@google.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Josh Poimboeuf <jpoimboe@redhat.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Andi Kleen <ak@linux.intel.com>
> ---
> 
>  b/arch/x86/mm/init_64.c |    8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff -puN arch/x86/mm/init_64.c~x86-init-do-not-convert-symbol-addresses arch/x86/mm/init_64.c
> --- a/arch/x86/mm/init_64.c~x86-init-do-not-convert-symbol-addresses	2018-08-02 14:14:48.380483277 -0700
> +++ b/arch/x86/mm/init_64.c	2018-08-02 14:14:48.383483277 -0700
> @@ -1283,12 +1283,8 @@ void mark_rodata_ro(void)
>  	set_memory_ro(start, (end-start) >> PAGE_SHIFT);
>  #endif
>  
> -	free_init_pages("unused kernel",
> -			(unsigned long) __va(__pa_symbol(text_end)),
> -			(unsigned long) __va(__pa_symbol(rodata_start)));
> -	free_init_pages("unused kernel",
> -			(unsigned long) __va(__pa_symbol(rodata_end)),
> -			(unsigned long) __va(__pa_symbol(_sdata)));
> +	free_init_pages("unused kernel", text_end, rodata_start);
> +	free_init_pages("unused kernel", rodata_end, _sdata);
>  
>  	debug_checkwx();
>  
> _

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/7] x86/mm/init: remove freed kernel image areas from alias mapping
  2018-08-02 22:58 ` [PATCH 5/7] x86/mm/init: remove freed kernel image areas from alias mapping Dave Hansen
@ 2018-08-04  0:35   ` Hugh Dickins
  2018-08-04 21:38   ` Andy Lutomirski
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 23+ messages in thread
From: Hugh Dickins @ 2018-08-04  0:35 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-kernel, keescook, tglx, mingo, aarcange, jgross, jpoimboe,
	gregkh, peterz, hughd, torvalds, bp, luto, ak

On Thu, 2 Aug 2018, Dave Hansen wrote:
> 
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> The kernel image is mapped into two places in the virtual address
> space (addresses without KASLR, of course):
> 
> 	1. The kernel direct map (0xffff880000000000)
> 	2. The "high kernel map" (0xffffffff81000000)
> 
> We actually execute out of #2.  If we get the address of a kernel
> symbol, it points to #2, but almost all physical-to-virtual
> translations point to #1.
> 
> Parts of the "high kernel map" alias are mapped in the userspace
> page tables with the Global bit for performance reasons.  The
> parts that we map to userspace do not (er, should not) have
> secrets.

If respun, I think it would be worth reminding us here that with PTI,
the Global bit is not usually set on the high kernel map at all: it's
just used to compensate for poor performance when PCIDs not available.

> 
> This is fine, except that some areas in the kernel image that
> are adjacent to the non-secret-containing areas are unused holes.
> We free these holes back into the normal page allocator and
> reuse them as normal kernel memory.  The memory will, of course,
> get *used* via the normal map, but the alias mapping is kept.
> 
> This otherwise unused alias mapping of the holes will, by default
> keep the Global bit, be mapped out to userspace, and be
> vulnerable to Meltdown.
> 
> Remove the alias mapping of these pages entirely.  This is likely
> to fracture the 2M page mapping the kernel image near these areas,
> but this should affect a minority of the area.
> 
> The pageattr code changes *all* aliases mapping the physical pages
> that it operates on (by default).  We only want to modify a single
> alias, so we need to tweak its behavior.
> 
> This unmapping behavior is currently dependent on PTI being in
> place.  Going forward, we should at least consider doing this for
> all configurations.  Having an extra read-write alias for memory
> is not exactly ideal for debugging things like random memory
> corruption and this does undercut features like DEBUG_PAGEALLOC
> or future work like eXclusive Page Frame Ownership (XPFO).
> 
> Before this patch:
> 
> current_kernel:---[ High Kernel Mapping ]---
> current_kernel-0xffffffff80000000-0xffffffff81000000          16M                               pmd
> current_kernel-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
> current_kernel-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
> current_kernel-0xffffffff81e11000-0xffffffff82000000        1980K     RW                     NX pte
> current_kernel-0xffffffff82000000-0xffffffff82600000           6M     ro         PSE     GLB NX pmd
> current_kernel-0xffffffff82600000-0xffffffff82c00000           6M     RW         PSE         NX pmd
> current_kernel-0xffffffff82c00000-0xffffffff82e00000           2M     RW                     NX pte
> current_kernel-0xffffffff82e00000-0xffffffff83200000           4M     RW         PSE         NX pmd
> current_kernel-0xffffffff83200000-0xffffffffa0000000         462M                               pmd
> 
>   current_user:---[ High Kernel Mapping ]---
>   current_user-0xffffffff80000000-0xffffffff81000000          16M                               pmd
>   current_user-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
>   current_user-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
>   current_user-0xffffffff81e11000-0xffffffff82000000        1980K     RW                     NX pte
>   current_user-0xffffffff82000000-0xffffffff82600000           6M     ro         PSE     GLB NX pmd
>   current_user-0xffffffff82600000-0xffffffffa0000000         474M                               pmd
> 
> 
> After this patch:
> 
> current_kernel:---[ High Kernel Mapping ]---
> current_kernel-0xffffffff80000000-0xffffffff81000000          16M                               pmd
> current_kernel-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
> current_kernel-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
> current_kernel-0xffffffff81e11000-0xffffffff82000000        1980K                               pte
> current_kernel-0xffffffff82000000-0xffffffff82400000           4M     ro         PSE     GLB NX pmd
> current_kernel-0xffffffff82400000-0xffffffff82488000         544K     ro                     NX pte
> current_kernel-0xffffffff82488000-0xffffffff82600000        1504K                               pte
> current_kernel-0xffffffff82600000-0xffffffff82c00000           6M     RW         PSE         NX pmd
> current_kernel-0xffffffff82c00000-0xffffffff82c0d000          52K     RW                     NX pte
> current_kernel-0xffffffff82c0d000-0xffffffff82dc0000        1740K                               pte
> 
>   current_user:---[ High Kernel Mapping ]---
>   current_user-0xffffffff80000000-0xffffffff81000000          16M                               pmd
>   current_user-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
>   current_user-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
>   current_user-0xffffffff81e11000-0xffffffff82000000        1980K                               pte
>   current_user-0xffffffff82000000-0xffffffff82400000           4M     ro         PSE     GLB NX pmd
>   current_user-0xffffffff82400000-0xffffffff82488000         544K     ro                     NX pte

That entry correctly matches the current-kernel entry, but I would expect
both of them to be marked "GLB".  Probably of very little importance, but
it might indicate that something (my brain, perhaps) is not working quite
as you would hope.

Hugh

>   current_user-0xffffffff82488000-0xffffffff82600000        1504K                               pte
>   current_user-0xffffffff82600000-0xffffffffa0000000         474M                               pmd
> 
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Kees Cook <keescook@google.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Josh Poimboeuf <jpoimboe@redhat.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Andi Kleen <ak@linux.intel.com>
> ---
> 
>  b/arch/x86/include/asm/set_memory.h |    1 +
>  b/arch/x86/mm/init.c                |   25 +++++++++++++++++++++++--
>  b/arch/x86/mm/pageattr.c            |   13 +++++++++++++
>  3 files changed, 37 insertions(+), 2 deletions(-)
> 
> diff -puN arch/x86/include/asm/set_memory.h~x86-unmap-freed-areas-from-kernel-image arch/x86/include/asm/set_memory.h
> --- a/arch/x86/include/asm/set_memory.h~x86-unmap-freed-areas-from-kernel-image	2018-08-02 14:14:49.462483274 -0700
> +++ b/arch/x86/include/asm/set_memory.h	2018-08-02 14:14:49.471483274 -0700
> @@ -46,6 +46,7 @@ int set_memory_np(unsigned long addr, in
>  int set_memory_4k(unsigned long addr, int numpages);
>  int set_memory_encrypted(unsigned long addr, int numpages);
>  int set_memory_decrypted(unsigned long addr, int numpages);
> +int set_memory_np_noalias(unsigned long addr, int numpages);
>  
>  int set_memory_array_uc(unsigned long *addr, int addrinarray);
>  int set_memory_array_wc(unsigned long *addr, int addrinarray);
> diff -puN arch/x86/mm/init.c~x86-unmap-freed-areas-from-kernel-image arch/x86/mm/init.c
> --- a/arch/x86/mm/init.c~x86-unmap-freed-areas-from-kernel-image	2018-08-02 14:14:49.463483274 -0700
> +++ b/arch/x86/mm/init.c	2018-08-02 14:14:49.471483274 -0700
> @@ -780,8 +780,29 @@ void free_init_pages(char *what, unsigne
>   */
>  void free_kernel_image_pages(void *begin, void *end)
>  {
> -	free_init_pages("unused kernel image",
> -			(unsigned long)begin, (unsigned long)end);
> +	unsigned long begin_ul = (unsigned long)begin;
> +	unsigned long end_ul = (unsigned long)end;
> +	unsigned long len_pages = (end_ul - begin_ul) >> PAGE_SHIFT;
> +
> +
> +	free_init_pages("unused kernel image", begin_ul, end_ul);
> +
> +	/*
> +	 * PTI maps some of the kernel into userspace.  For
> +	 * performance, this includes some kernel areas that
> +	 * do not contain secrets.  Those areas might be
> +	 * adjacent to the parts of the kernel image being
> +	 * freed, which may contain secrets.  Remove the
> +	 * "high kernel image mapping" for these freed areas,
> +	 * ensuring they are not even potentially vulnerable
> +	 * to Meltdown regardless of the specific optimizations
> +	 * PTI is currently using.
> +	 *
> +	 * The "noalias" prevents unmapping the direct map
> +	 * alias which is needed to access the freed pages.
> +	 */
> +	if (cpu_feature_enabled(X86_FEATURE_PTI))
> +		set_memory_np_noalias(begin_ul, len_pages);
>  }
>  
>  void __ref free_initmem(void)
> diff -puN arch/x86/mm/pageattr.c~x86-unmap-freed-areas-from-kernel-image arch/x86/mm/pageattr.c
> --- a/arch/x86/mm/pageattr.c~x86-unmap-freed-areas-from-kernel-image	2018-08-02 14:14:49.466483274 -0700
> +++ b/arch/x86/mm/pageattr.c	2018-08-02 14:14:49.472483274 -0700
> @@ -53,6 +53,7 @@ static DEFINE_SPINLOCK(cpa_lock);
>  #define CPA_FLUSHTLB 1
>  #define CPA_ARRAY 2
>  #define CPA_PAGES_ARRAY 4
> +#define CPA_NO_CHECK_ALIAS 8 /* Do not search for aliases */
>  
>  #ifdef CONFIG_PROC_FS
>  static unsigned long direct_pages_count[PG_LEVEL_NUM];
> @@ -1486,6 +1487,9 @@ static int change_page_attr_set_clr(unsi
>  
>  	/* No alias checking for _NX bit modifications */
>  	checkalias = (pgprot_val(mask_set) | pgprot_val(mask_clr)) != _PAGE_NX;
> +	/* Never check aliases if the caller asks for it explicitly: */
> +	if (checkalias && (in_flag & CPA_NO_CHECK_ALIAS))
> +		checkalias = 0;
>  
>  	ret = __change_page_attr_set_clr(&cpa, checkalias);
>  
> @@ -1772,6 +1776,15 @@ int set_memory_np(unsigned long addr, in
>  	return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_PRESENT), 0);
>  }
>  
> +int set_memory_np_noalias(unsigned long addr, int numpages)
> +{
> +	int cpa_flags = CPA_NO_CHECK_ALIAS;
> +
> +	return change_page_attr_set_clr(&addr, numpages, __pgprot(0),
> +					__pgprot(_PAGE_PRESENT), 0,
> +					cpa_flags, NULL);
> +}
> +
>  int set_memory_4k(unsigned long addr, int numpages)
>  {
>  	return change_page_attr_set_clr(&addr, numpages, __pgprot(0),
> _
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/7] x86/mm/init: pass unconverted symbol addresses to free_init_pages()
  2018-08-04  0:18   ` Hugh Dickins
@ 2018-08-04 17:31     ` Linus Torvalds
  2018-08-04 18:23       ` Hugh Dickins
  2018-08-05  6:11       ` Andi Kleen
  0 siblings, 2 replies; 23+ messages in thread
From: Linus Torvalds @ 2018-08-04 17:31 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Dave Hansen, Linux Kernel Mailing List, Kees Cook,
	Thomas Gleixner, Ingo Molnar, Andrea Arcangeli,
	Jürgen Groß,
	Josh Poimboeuf, Greg Kroah-Hartman, Peter Zijlstra,
	Borislav Petkov, Andrew Lutomirski, Andi Kleen

On Fri, Aug 3, 2018 at 5:19 PM Hugh Dickins <hughd@google.com> wrote:
>
> I thought that virt_to_page() only works on virtual addresses
> in the direct map

You're right that virt_to_page() does not work on any _actual_ virtual
mappings (ie no user pages, and no vmalloc() pages etc). It does not
follow page tables at all.

And on 32-bit, it literally ends up doing (see __phys_addr_nodebug()) a simple

    #define __phys_addr_nodebug(x)  ((x) - PAGE_OFFSET)

However, on x86-64, we have *two* cases of direct mappings: we have
the one at __START_KERNEL_map, and we have the one at PAGE_OFFSET.

And virt_to_page() handles both of those direct mappings.

Annoying? Yes. And it has caused bugs in the past. And I entirely
forget why we needed it on x86-64.

[ Goes around and rummages ]

Oh, never mind, looking around reminded me why: we want to map the
kernel text in the top 31 bits, so that we can use the faster
-mcmodel=kernel because all symbols fit in sign-extended 32 bits.

Maybe there was some other reason too, but I think that's it.

                   Linus

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/7] x86/mm/init: pass unconverted symbol addresses to free_init_pages()
  2018-08-04 17:31     ` Linus Torvalds
@ 2018-08-04 18:23       ` Hugh Dickins
  2018-08-05  6:11       ` Andi Kleen
  1 sibling, 0 replies; 23+ messages in thread
From: Hugh Dickins @ 2018-08-04 18:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Hugh Dickins, Dave Hansen, Linux Kernel Mailing List, Kees Cook,
	Thomas Gleixner, Ingo Molnar, Andrea Arcangeli,
	Jürgen Groß,
	Josh Poimboeuf, Greg Kroah-Hartman, Peter Zijlstra,
	Borislav Petkov, Andrew Lutomirski, Andi Kleen

On Sat, 4 Aug 2018, Linus Torvalds wrote:
> On Fri, Aug 3, 2018 at 5:19 PM Hugh Dickins <hughd@google.com> wrote:
> >
> > I thought that virt_to_page() only works on virtual addresses
> > in the direct map
> 
> You're right that virt_to_page() does not work on any _actual_ virtual
> mappings (ie no user pages, and no vmalloc() pages etc). It does not
> follow page tables at all.
> 
> And on 32-bit, it literally ends up doing (see __phys_addr_nodebug()) a simple
> 
>     #define __phys_addr_nodebug(x)  ((x) - PAGE_OFFSET)
> 
> However, on x86-64, we have *two* cases of direct mappings: we have
> the one at __START_KERNEL_map, and we have the one at PAGE_OFFSET.
> 
> And virt_to_page() handles both of those direct mappings.
> 
> Annoying? Yes. And it has caused bugs in the past. And I entirely
> forget why we needed it on x86-64.
> 
> [ Goes around and rummages ]
> 
> Oh, never mind, looking around reminded me why: we want to map the
> kernel text in the top 31 bits, so that we can use the faster
> -mcmodel=kernel because all symbols fit in sign-extended 32 bits.
> 
> Maybe there was some other reason too, but I think that's it.

Thanks a lot for writing that up. You shamed me into grepping
a little harder than I did yesterday, when all I could find were
"- PAGE_OFFSET" conversions (maybe I got lost in 32-bit-land).
I had missed __phys_addr_nodebug(), where the __START_KERNEL_map
alternative is handled.

Hugh

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/7] x86/mm/init: remove freed kernel image areas from alias mapping
  2018-08-02 22:58 ` [PATCH 5/7] x86/mm/init: remove freed kernel image areas from alias mapping Dave Hansen
  2018-08-04  0:35   ` Hugh Dickins
@ 2018-08-04 21:38   ` Andy Lutomirski
  2018-08-06 15:17     ` Dave Hansen
  2018-08-05 20:32   ` [tip:x86/pti] x86/mm/init: Remove " tip-bot for Dave Hansen
  2018-08-06 20:21   ` [tip:x86/pti-urgent] " tip-bot for Dave Hansen
  3 siblings, 1 reply; 23+ messages in thread
From: Andy Lutomirski @ 2018-08-04 21:38 UTC (permalink / raw)
  To: Dave Hansen
  Cc: LKML, Kees Cook, Thomas Gleixner, Ingo Molnar, Andrea Arcangeli,
	Juergen Gross, Josh Poimboeuf, Greg KH, Peter Zijlstra,
	Hugh Dickins, Linus Torvalds, Borislav Petkov, Andrew Lutomirski,
	Andi Kleen

On Thu, Aug 2, 2018 at 3:58 PM, Dave Hansen <dave.hansen@linux.intel.com> wrote:
>
> From: Dave Hansen <dave.hansen@linux.intel.com>
>
> The kernel image is mapped into two places in the virtual address
> space (addresses without KASLR, of course):
>
>         1. The kernel direct map (0xffff880000000000)
>         2. The "high kernel map" (0xffffffff81000000)
>
> We actually execute out of #2.  If we get the address of a kernel
> symbol, it points to #2, but almost all physical-to-virtual
> translations point to #1.
>
> Parts of the "high kernel map" alias are mapped in the userspace
> page tables with the Global bit for performance reasons.  The
> parts that we map to userspace do not (er, should not) have
> secrets.
>
> This is fine, except that some areas in the kernel image that
> are adjacent to the non-secret-containing areas are unused holes.
> We free these holes back into the normal page allocator and
> reuse them as normal kernel memory.  The memory will, of course,
> get *used* via the normal map, but the alias mapping is kept.
>
> This otherwise unused alias mapping of the holes will, by default
> keep the Global bit, be mapped out to userspace, and be
> vulnerable to Meltdown.
>
> Remove the alias mapping of these pages entirely.  This is likely
> to fracture the 2M page mapping the kernel image near these areas,
> but this should affect a minority of the area.
>
> The pageattr code changes *all* aliases mapping the physical pages
> that it operates on (by default).  We only want to modify a single
> alias, so we need to tweak its behavior.
>
> This unmapping behavior is currently dependent on PTI being in
> place.  Going forward, we should at least consider doing this for
> all configurations.  Having an extra read-write alias for memory
> is not exactly ideal for debugging things like random memory
> corruption and this does undercut features like DEBUG_PAGEALLOC
> or future work like eXclusive Page Frame Ownership (XPFO).
>

I like this patch, and I tend to think we should (eventually) enable
it regardless of PTI.  Cleaning up the memory map is generally a good
thing.

Also, just to make sure I fully understand: the kernel text is aliased
in both the direct map and the high map, right?  This means that we
should be able to make the high kernel mapping have proper RO
permissions very early without breaking text_poke() at the minor cost
of needing to force a serializing instruction at the end of each big
block of text pokes.  I think this would be worthwhile, although I
suspect we'll uncover *tons* of bugs in the process.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/7] x86/mm/init: pass unconverted symbol addresses to free_init_pages()
  2018-08-04 17:31     ` Linus Torvalds
  2018-08-04 18:23       ` Hugh Dickins
@ 2018-08-05  6:11       ` Andi Kleen
  1 sibling, 0 replies; 23+ messages in thread
From: Andi Kleen @ 2018-08-05  6:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Hugh Dickins, Dave Hansen, Linux Kernel Mailing List, Kees Cook,
	Thomas Gleixner, Ingo Molnar, Andrea Arcangeli,
	Jürgen Groß,
	Josh Poimboeuf, Greg Kroah-Hartman, Peter Zijlstra,
	Borislav Petkov, Andrew Lutomirski

> [ Goes around and rummages ]
> 
> Oh, never mind, looking around reminded me why: we want to map the
> kernel text in the top 31 bits, so that we can use the faster
> -mcmodel=kernel because all symbols fit in sign-extended 32 bits.
> 
> Maybe there was some other reason too, but I think that's it.

No that was the only reason.

Large code model would be extremely expensive, and PIC linked
kernel also had some issues. So we ended up with this set up.

-Andi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 6/7] x86/mm/pageattr: pass named flag instead of 0/1
  2018-08-02 22:58 ` [PATCH 6/7] x86/mm/pageattr: pass named flag instead of 0/1 Dave Hansen
@ 2018-08-05 20:09   ` Thomas Gleixner
  2018-08-06 15:09     ` Dave Hansen
  0 siblings, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2018-08-05 20:09 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-kernel, keescook, mingo, aarcange, jgross, jpoimboe,
	gregkh, peterz, hughd, torvalds, bp, luto, ak



On Thu, 2 Aug 2018, Dave Hansen wrote:

> 
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> This is a cleanup.  There should be functional changes in this patch.

Missing 'no' after 'be', right?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [tip:x86/pti] x86/mm/pti: Clear Global bit more aggressively
  2018-08-02 22:58 ` [PATCH 1/7] x86/mm/pti: clear Global bit more aggressively Dave Hansen
@ 2018-08-05 20:30   ` tip-bot for Dave Hansen
  0 siblings, 0 replies; 23+ messages in thread
From: tip-bot for Dave Hansen @ 2018-08-05 20:30 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, dave.hansen, aarcange, peterz, ak, keescook, hughd,
	mingo, torvalds, gregkh, luto, tglx, jpoimboe, bp, hpa, jgross

Commit-ID:  eac7073aa69aa1cac819aa712146284f53f642b1
Gitweb:     https://git.kernel.org/tip/eac7073aa69aa1cac819aa712146284f53f642b1
Author:     Dave Hansen <dave.hansen@linux.intel.com>
AuthorDate: Thu, 2 Aug 2018 15:58:25 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sun, 5 Aug 2018 22:21:02 +0200

x86/mm/pti: Clear Global bit more aggressively

The kernel image starts out with the Global bit set across the entire
kernel image.  The bit is cleared with set_memory_nonglobal() in the
configurations with PCIDs where the performance benefits of the Global bit
are not needed.

However, this is fragile.  It means that we are stuck opting *out* of the
less-secure (Global bit set) configuration, which seems backwards.  Let's
start more secure (Global bit clear) and then let things opt back in if
they want performance, or are truly mapping common data between kernel and
userspace.

This fixes a bug.  Before this patch, there are areas that are unmapped
from the user page tables (like like everything above 0xffffffff82600000 in
the example below).  These have the hallmark of being a wrong Global area:
they are not identical in the 'current_kernel' and 'current_user' page
table dumps.  They are also read-write, which means they're much more
likely to contain secrets.

Before this patch:

current_kernel:---[ High Kernel Mapping ]---
current_kernel-0xffffffff80000000-0xffffffff81000000          16M                               pmd
current_kernel-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
current_kernel-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
current_kernel-0xffffffff81e11000-0xffffffff82000000        1980K     RW                 GLB NX pte
current_kernel-0xffffffff82000000-0xffffffff82600000           6M     ro         PSE     GLB NX pmd
current_kernel-0xffffffff82600000-0xffffffff82c00000           6M     RW         PSE     GLB NX pmd
current_kernel-0xffffffff82c00000-0xffffffff82e00000           2M     RW                 GLB NX pte
current_kernel-0xffffffff82e00000-0xffffffff83200000           4M     RW         PSE     GLB NX pmd
current_kernel-0xffffffff83200000-0xffffffffa0000000         462M                               pmd

 current_user:---[ High Kernel Mapping ]---
 current_user-0xffffffff80000000-0xffffffff81000000          16M                               pmd
 current_user-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
 current_user-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
 current_user-0xffffffff81e11000-0xffffffff82000000        1980K     RW                 GLB NX pte
 current_user-0xffffffff82000000-0xffffffff82600000           6M     ro         PSE     GLB NX pmd
 current_user-0xffffffff82600000-0xffffffffa0000000         474M                               pmd

After this patch:

current_kernel:---[ High Kernel Mapping ]---
current_kernel-0xffffffff80000000-0xffffffff81000000          16M                               pmd
current_kernel-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
current_kernel-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
current_kernel-0xffffffff81e11000-0xffffffff82000000        1980K     RW                     NX pte
current_kernel-0xffffffff82000000-0xffffffff82600000           6M     ro         PSE     GLB NX pmd
current_kernel-0xffffffff82600000-0xffffffff82c00000           6M     RW         PSE         NX pmd
current_kernel-0xffffffff82c00000-0xffffffff82e00000           2M     RW                     NX pte
current_kernel-0xffffffff82e00000-0xffffffff83200000           4M     RW         PSE         NX pmd
current_kernel-0xffffffff83200000-0xffffffffa0000000         462M                               pmd

  current_user:---[ High Kernel Mapping ]---
  current_user-0xffffffff80000000-0xffffffff81000000          16M                               pmd
  current_user-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
  current_user-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
  current_user-0xffffffff81e11000-0xffffffff82000000        1980K     RW                     NX pte
  current_user-0xffffffff82000000-0xffffffff82600000           6M     ro         PSE     GLB NX pmd
  current_user-0xffffffff82600000-0xffffffffa0000000         474M                               pmd

Fixes: 0f561fce4d69 ("x86/pti: Enable global pages for shared areas")
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@google.com
Cc: aarcange@redhat.com
Cc: jgross@suse.com
Cc: jpoimboe@redhat.com
Cc: gregkh@linuxfoundation.org
Cc: peterz@infradead.org
Cc: torvalds@linux-foundation.org
Cc: bp@alien8.de
Cc: luto@kernel.org
Cc: ak@linux.intel.com
Cc: Kees Cook <keescook@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Link: https://lkml.kernel.org/r/20180802225825.A100C071@viggo.jf.intel.com

---
 arch/x86/mm/pageattr.c |  6 ++++++
 arch/x86/mm/pti.c      | 34 ++++++++++++++++++++++++----------
 2 files changed, 30 insertions(+), 10 deletions(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 3bded76e8d5c..c04153796f61 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1784,6 +1784,12 @@ int set_memory_nonglobal(unsigned long addr, int numpages)
 				      __pgprot(_PAGE_GLOBAL), 0);
 }
 
+int set_memory_global(unsigned long addr, int numpages)
+{
+	return change_page_attr_set(&addr, numpages,
+				    __pgprot(_PAGE_GLOBAL), 0);
+}
+
 static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
 {
 	struct cpa_data cpa;
diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index 4d418e705878..8d88d067b3d7 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -434,6 +434,13 @@ static inline bool pti_kernel_image_global_ok(void)
 	return true;
 }
 
+/*
+ * This is the only user for these and it is not arch-generic
+ * like the other set_memory.h functions.  Just extern them.
+ */
+extern int set_memory_nonglobal(unsigned long addr, int numpages);
+extern int set_memory_global(unsigned long addr, int numpages);
+
 /*
  * For some configurations, map all of kernel text into the user page
  * tables.  This reduces TLB misses, especially on non-PCID systems.
@@ -446,7 +453,8 @@ void pti_clone_kernel_text(void)
 	 * clone the areas past rodata, they might contain secrets.
 	 */
 	unsigned long start = PFN_ALIGN(_text);
-	unsigned long end = (unsigned long)__end_rodata_hpage_align;
+	unsigned long end_clone  = (unsigned long)__end_rodata_hpage_align;
+	unsigned long end_global = PFN_ALIGN((unsigned long)__stop___ex_table);
 
 	if (!pti_kernel_image_global_ok())
 		return;
@@ -458,14 +466,18 @@ void pti_clone_kernel_text(void)
 	 * pti_set_kernel_image_nonglobal() did to clear the
 	 * global bit.
 	 */
-	pti_clone_pmds(start, end, _PAGE_RW);
+	pti_clone_pmds(start, end_clone, _PAGE_RW);
+
+	/*
+	 * pti_clone_pmds() will set the global bit in any PMDs
+	 * that it clones, but we also need to get any PTEs in
+	 * the last level for areas that are not huge-page-aligned.
+	 */
+
+	/* Set the global bit for normal non-__init kernel text: */
+	set_memory_global(start, (end_global - start) >> PAGE_SHIFT);
 }
 
-/*
- * This is the only user for it and it is not arch-generic like
- * the other set_memory.h functions.  Just extern it.
- */
-extern int set_memory_nonglobal(unsigned long addr, int numpages);
 void pti_set_kernel_image_nonglobal(void)
 {
 	/*
@@ -477,9 +489,11 @@ void pti_set_kernel_image_nonglobal(void)
 	unsigned long start = PFN_ALIGN(_text);
 	unsigned long end = ALIGN((unsigned long)_end, PMD_PAGE_SIZE);
 
-	if (pti_kernel_image_global_ok())
-		return;
-
+	/*
+	 * This clears _PAGE_GLOBAL from the entire kernel image.
+	 * pti_clone_kernel_text() map put _PAGE_GLOBAL back for
+	 * areas that are mapped to userspace.
+	 */
 	set_memory_nonglobal(start, (end - start) >> PAGE_SHIFT);
 }
 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [tip:x86/pti] mm: Allow non-direct-map arguments to free_reserved_area()
  2018-08-02 22:58 ` [PATCH 2/7] mm: allow non-direct-map arguments to free_reserved_area() Dave Hansen
@ 2018-08-05 20:31   ` tip-bot for Dave Hansen
  0 siblings, 0 replies; 23+ messages in thread
From: tip-bot for Dave Hansen @ 2018-08-05 20:31 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: luto, hughd, hpa, bp, aarcange, peterz, tglx, jpoimboe, gregkh,
	torvalds, linux-kernel, ak, mingo, dave.hansen, jgross, keescook

Commit-ID:  0d83432811f26871295a9bc24d3c387924da6071
Gitweb:     https://git.kernel.org/tip/0d83432811f26871295a9bc24d3c387924da6071
Author:     Dave Hansen <dave.hansen@linux.intel.com>
AuthorDate: Thu, 2 Aug 2018 15:58:26 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sun, 5 Aug 2018 22:21:02 +0200

mm: Allow non-direct-map arguments to free_reserved_area()

free_reserved_area() takes pointers as arguments to show which addresses
should be freed.  However, it does this in a somewhat ambiguous way.  If it
gets a kernel direct map address, it always works.  However, if it gets an
address that is part of the kernel image alias mapping, it can fail.

It fails if all of the following happen:
 * The specified address is part of the kernel image alias
 * Poisoning is requested (forcing a memset())
 * The address is in a read-only portion of the kernel image

The memset() fails on the read-only mapping, of course.
free_reserved_area() *is* called both on the direct map and on kernel image
alias addresses.  We've just lucked out thus far that the kernel image
alias areas it gets used on are read-write.  I'm fairly sure this has been
just a happy accident.

It is quite easy to make free_reserved_area() work for all cases: just
convert the address to a direct map address before doing the memset(), and
do this unconditionally.  There is little chance of a regression here
because we previously did a virt_to_page() on the address for the memset,
so we know these are not highmem pages for which virt_to_page() would fail.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@google.com
Cc: aarcange@redhat.com
Cc: jgross@suse.com
Cc: jpoimboe@redhat.com
Cc: gregkh@linuxfoundation.org
Cc: peterz@infradead.org
Cc: hughd@google.com
Cc: torvalds@linux-foundation.org
Cc: bp@alien8.de
Cc: luto@kernel.org
Cc: ak@linux.intel.com
Cc: Kees Cook <keescook@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Link: https://lkml.kernel.org/r/20180802225826.1287AE3E@viggo.jf.intel.com

---
 mm/page_alloc.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a790ef4be74e..3222193c46c6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6939,9 +6939,21 @@ unsigned long free_reserved_area(void *start, void *end, int poison, char *s)
 	start = (void *)PAGE_ALIGN((unsigned long)start);
 	end = (void *)((unsigned long)end & PAGE_MASK);
 	for (pos = start; pos < end; pos += PAGE_SIZE, pages++) {
+		struct page *page = virt_to_page(pos);
+		void *direct_map_addr;
+
+		/*
+		 * 'direct_map_addr' might be different from 'pos'
+		 * because some architectures' virt_to_page()
+		 * work with aliases.  Getting the direct map
+		 * address ensures that we get a _writeable_
+		 * alias for the memset().
+		 */
+		direct_map_addr = page_address(page);
 		if ((unsigned int)poison <= 0xFF)
-			memset(pos, poison, PAGE_SIZE);
-		free_reserved_page(virt_to_page(pos));
+			memset(direct_map_addr, poison, PAGE_SIZE);
+
+		free_reserved_page(page);
 	}
 
 	if (pages && s)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [tip:x86/pti] x86/mm/init: Pass unconverted symbol addresses to free_init_pages()
  2018-08-02 22:58 ` [PATCH 3/7] x86/mm/init: pass unconverted symbol addresses to free_init_pages() Dave Hansen
  2018-08-04  0:18   ` Hugh Dickins
@ 2018-08-05 20:31   ` tip-bot for Dave Hansen
  1 sibling, 0 replies; 23+ messages in thread
From: tip-bot for Dave Hansen @ 2018-08-05 20:31 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, aarcange, keescook, linux-kernel, mingo, gregkh, luto,
	hughd, torvalds, dave.hansen, peterz, jgross, jpoimboe, tglx, ak,
	bp

Commit-ID:  9f515cdb411ef34f1aaf4c40bb0c932cf6db5de1
Gitweb:     https://git.kernel.org/tip/9f515cdb411ef34f1aaf4c40bb0c932cf6db5de1
Author:     Dave Hansen <dave.hansen@linux.intel.com>
AuthorDate: Thu, 2 Aug 2018 15:58:28 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sun, 5 Aug 2018 22:21:02 +0200

x86/mm/init: Pass unconverted symbol addresses to free_init_pages()

The x86 code has several places where it frees parts of kernel image:

 1. Unused SMP alternative
 2. __init code
 3. The hole between text and rodata
 4. The hole between rodata and data

We call free_init_pages() to do this.  Strangely, we convert the symbol
addresses to kernel direct map addresses in some cases (#3, #4) but not
others (#1, #2).

The virt_to_page() and the other code in free_reserved_area() now works
fine for for symbol addresses on x86, so don't bother converting the
addresses to direct map addresses before freeing them.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@google.com
Cc: aarcange@redhat.com
Cc: jgross@suse.com
Cc: jpoimboe@redhat.com
Cc: gregkh@linuxfoundation.org
Cc: peterz@infradead.org
Cc: hughd@google.com
Cc: torvalds@linux-foundation.org
Cc: bp@alien8.de
Cc: luto@kernel.org
Cc: ak@linux.intel.com
Cc: Kees Cook <keescook@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Link: https://lkml.kernel.org/r/20180802225828.89B2D0E2@viggo.jf.intel.com

---
 arch/x86/mm/init_64.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index a688617c727e..73158655c4a7 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1283,12 +1283,8 @@ void mark_rodata_ro(void)
 	set_memory_ro(start, (end-start) >> PAGE_SHIFT);
 #endif
 
-	free_init_pages("unused kernel",
-			(unsigned long) __va(__pa_symbol(text_end)),
-			(unsigned long) __va(__pa_symbol(rodata_start)));
-	free_init_pages("unused kernel",
-			(unsigned long) __va(__pa_symbol(rodata_end)),
-			(unsigned long) __va(__pa_symbol(_sdata)));
+	free_init_pages("unused kernel", text_end, rodata_start);
+	free_init_pages("unused kernel", rodata_end, _sdata);
 
 	debug_checkwx();
 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [tip:x86/pti] x86/mm/init: Add helper for freeing kernel image pages
  2018-08-02 22:58 ` [PATCH 4/7] x86/mm/init: add helper for freeing kernel image pages Dave Hansen
@ 2018-08-05 20:32   ` tip-bot for Dave Hansen
  0 siblings, 0 replies; 23+ messages in thread
From: tip-bot for Dave Hansen @ 2018-08-05 20:32 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: ak, hpa, aarcange, gregkh, jgross, torvalds, peterz, hughd,
	linux-kernel, tglx, bp, keescook, dave.hansen, mingo, jpoimboe,
	luto

Commit-ID:  6ea2738e0ca0e626c75202fb051c1e88d7a950fa
Gitweb:     https://git.kernel.org/tip/6ea2738e0ca0e626c75202fb051c1e88d7a950fa
Author:     Dave Hansen <dave.hansen@linux.intel.com>
AuthorDate: Thu, 2 Aug 2018 15:58:29 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sun, 5 Aug 2018 22:21:03 +0200

x86/mm/init: Add helper for freeing kernel image pages

When chunks of the kernel image are freed, free_init_pages() is used
directly.  Consolidate the three sites that do this.  Also update the
string to give an incrementally better description of that memory versus
what was there before.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@google.com
Cc: aarcange@redhat.com
Cc: jgross@suse.com
Cc: jpoimboe@redhat.com
Cc: gregkh@linuxfoundation.org
Cc: peterz@infradead.org
Cc: hughd@google.com
Cc: torvalds@linux-foundation.org
Cc: bp@alien8.de
Cc: luto@kernel.org
Cc: ak@linux.intel.com
Cc: Kees Cook <keescook@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Link: https://lkml.kernel.org/r/20180802225829.FE0E32EA@viggo.jf.intel.com

---
 arch/x86/include/asm/processor.h |  1 +
 arch/x86/mm/init.c               | 15 ++++++++++++---
 arch/x86/mm/init_64.c            |  4 ++--
 3 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index cfd29ee8c3da..59663c08c949 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -966,6 +966,7 @@ static inline uint32_t hypervisor_cpuid_base(const char *sig, uint32_t leaves)
 
 extern unsigned long arch_align_stack(unsigned long sp);
 extern void free_init_pages(char *what, unsigned long begin, unsigned long end);
+extern void free_kernel_image_pages(void *begin, void *end);
 
 void default_idle(void);
 #ifdef	CONFIG_XEN
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index cee58a972cb2..bc11dedffc45 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -773,13 +773,22 @@ void free_init_pages(char *what, unsigned long begin, unsigned long end)
 	}
 }
 
+/*
+ * begin/end can be in the direct map or the "high kernel mapping"
+ * used for the kernel image only.  free_init_pages() will do the
+ * right thing for either kind of address.
+ */
+void free_kernel_image_pages(void *begin, void *end)
+{
+	free_init_pages("unused kernel image",
+			(unsigned long)begin, (unsigned long)end);
+}
+
 void __ref free_initmem(void)
 {
 	e820__reallocate_tables();
 
-	free_init_pages("unused kernel",
-			(unsigned long)(&__init_begin),
-			(unsigned long)(&__init_end));
+	free_kernel_image_pages(&__init_begin, &__init_end);
 }
 
 #ifdef CONFIG_BLK_DEV_INITRD
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 73158655c4a7..68c292cb1ebf 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1283,8 +1283,8 @@ void mark_rodata_ro(void)
 	set_memory_ro(start, (end-start) >> PAGE_SHIFT);
 #endif
 
-	free_init_pages("unused kernel", text_end, rodata_start);
-	free_init_pages("unused kernel", rodata_end, _sdata);
+	free_kernel_image_pages((void *)text_end, (void *)rodata_start);
+	free_kernel_image_pages((void *)rodata_end, (void *)_sdata);
 
 	debug_checkwx();
 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [tip:x86/pti] x86/mm/init: Remove freed kernel image areas from alias mapping
  2018-08-02 22:58 ` [PATCH 5/7] x86/mm/init: remove freed kernel image areas from alias mapping Dave Hansen
  2018-08-04  0:35   ` Hugh Dickins
  2018-08-04 21:38   ` Andy Lutomirski
@ 2018-08-05 20:32   ` tip-bot for Dave Hansen
  2018-08-06 20:21   ` [tip:x86/pti-urgent] " tip-bot for Dave Hansen
  3 siblings, 0 replies; 23+ messages in thread
From: tip-bot for Dave Hansen @ 2018-08-05 20:32 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, luto, torvalds, dave.hansen, jgross, keescook, ak,
	bp, hpa, peterz, hughd, gregkh, tglx, jpoimboe, aarcange, mingo

Commit-ID:  2140e26f3d98e615c60c5f6a97d8421a077d61eb
Gitweb:     https://git.kernel.org/tip/2140e26f3d98e615c60c5f6a97d8421a077d61eb
Author:     Dave Hansen <dave.hansen@linux.intel.com>
AuthorDate: Thu, 2 Aug 2018 15:58:31 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sun, 5 Aug 2018 22:21:03 +0200

x86/mm/init: Remove freed kernel image areas from alias mapping

The kernel image is mapped into two places in the virtual address space
(addresses without KASLR, of course):

	1. The kernel direct map (0xffff880000000000)
	2. The "high kernel map" (0xffffffff81000000)

We actually execute out of #2.  If we get the address of a kernel symbol,
it points to #2, but almost all physical-to-virtual translations point to
#1.

Parts of the "high kernel map" alias are mapped in the userspace page
tables with the Global bit for performance reasons.  The parts that we map
to userspace do not (er, should not) have secrets. When PTI is enabled then
the global bit is usually not set in the high mapping and just used to
compensate for poor performance on systems which lack PCID.

This is fine, except that some areas in the kernel image that are adjacent
to the non-secret-containing areas are unused holes.  We free these holes
back into the normal page allocator and reuse them as normal kernel memory.
The memory will, of course, get *used* via the normal map, but the alias
mapping is kept.

This otherwise unused alias mapping of the holes will, by default keep the
Global bit, be mapped out to userspace, and be vulnerable to Meltdown.

Remove the alias mapping of these pages entirely.  This is likely to
fracture the 2M page mapping the kernel image near these areas, but this
should affect a minority of the area.

The pageattr code changes *all* aliases mapping the physical pages that it
operates on (by default).  We only want to modify a single alias, so we
need to tweak its behavior.

This unmapping behavior is currently dependent on PTI being in place.
Going forward, we should at least consider doing this for all
configurations.  Having an extra read-write alias for memory is not exactly
ideal for debugging things like random memory corruption and this does
undercut features like DEBUG_PAGEALLOC or future work like eXclusive Page
Frame Ownership (XPFO).

Before this patch:

current_kernel:---[ High Kernel Mapping ]---
current_kernel-0xffffffff80000000-0xffffffff81000000          16M                               pmd
current_kernel-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
current_kernel-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
current_kernel-0xffffffff81e11000-0xffffffff82000000        1980K     RW                     NX pte
current_kernel-0xffffffff82000000-0xffffffff82600000           6M     ro         PSE     GLB NX pmd
current_kernel-0xffffffff82600000-0xffffffff82c00000           6M     RW         PSE         NX pmd
current_kernel-0xffffffff82c00000-0xffffffff82e00000           2M     RW                     NX pte
current_kernel-0xffffffff82e00000-0xffffffff83200000           4M     RW         PSE         NX pmd
current_kernel-0xffffffff83200000-0xffffffffa0000000         462M                               pmd

  current_user:---[ High Kernel Mapping ]---
  current_user-0xffffffff80000000-0xffffffff81000000          16M                               pmd
  current_user-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
  current_user-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
  current_user-0xffffffff81e11000-0xffffffff82000000        1980K     RW                     NX pte
  current_user-0xffffffff82000000-0xffffffff82600000           6M     ro         PSE     GLB NX pmd
  current_user-0xffffffff82600000-0xffffffffa0000000         474M                               pmd

After this patch:

current_kernel:---[ High Kernel Mapping ]---
current_kernel-0xffffffff80000000-0xffffffff81000000          16M                               pmd
current_kernel-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
current_kernel-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
current_kernel-0xffffffff81e11000-0xffffffff82000000        1980K                               pte
current_kernel-0xffffffff82000000-0xffffffff82400000           4M     ro         PSE     GLB NX pmd
current_kernel-0xffffffff82400000-0xffffffff82488000         544K     ro                     NX pte
current_kernel-0xffffffff82488000-0xffffffff82600000        1504K                               pte
current_kernel-0xffffffff82600000-0xffffffff82c00000           6M     RW         PSE         NX pmd
current_kernel-0xffffffff82c00000-0xffffffff82c0d000          52K     RW                     NX pte
current_kernel-0xffffffff82c0d000-0xffffffff82dc0000        1740K                               pte

  current_user:---[ High Kernel Mapping ]---
  current_user-0xffffffff80000000-0xffffffff81000000          16M                               pmd
  current_user-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
  current_user-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
  current_user-0xffffffff81e11000-0xffffffff82000000        1980K                               pte
  current_user-0xffffffff82000000-0xffffffff82400000           4M     ro         PSE     GLB NX pmd
  current_user-0xffffffff82400000-0xffffffff82488000         544K     ro                     NX pte
  current_user-0xffffffff82488000-0xffffffff82600000        1504K                               pte
  current_user-0xffffffff82600000-0xffffffffa0000000         474M                               pmd

Fixes: 0f561fce4d69 ("x86/pti: Enable global pages for shared areas")
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@google.com
Cc: aarcange@redhat.com
Cc: jgross@suse.com
Cc: jpoimboe@redhat.com
Cc: gregkh@linuxfoundation.org
Cc: peterz@infradead.org
Cc: hughd@google.com
Cc: torvalds@linux-foundation.org
Cc: bp@alien8.de
Cc: luto@kernel.org
Cc: ak@linux.intel.com
Cc: Kees Cook <keescook@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Link: https://lkml.kernel.org/r/20180802225831.5F6A2BFC@viggo.jf.intel.com

---
 arch/x86/include/asm/set_memory.h |  1 +
 arch/x86/mm/init.c                | 25 +++++++++++++++++++++++--
 arch/x86/mm/pageattr.c            | 13 +++++++++++++
 3 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
index bd090367236c..34cffcef7375 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -46,6 +46,7 @@ int set_memory_np(unsigned long addr, int numpages);
 int set_memory_4k(unsigned long addr, int numpages);
 int set_memory_encrypted(unsigned long addr, int numpages);
 int set_memory_decrypted(unsigned long addr, int numpages);
+int set_memory_np_noalias(unsigned long addr, int numpages);
 
 int set_memory_array_uc(unsigned long *addr, int addrinarray);
 int set_memory_array_wc(unsigned long *addr, int addrinarray);
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index bc11dedffc45..2f1005c3fb90 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -780,8 +780,29 @@ void free_init_pages(char *what, unsigned long begin, unsigned long end)
  */
 void free_kernel_image_pages(void *begin, void *end)
 {
-	free_init_pages("unused kernel image",
-			(unsigned long)begin, (unsigned long)end);
+	unsigned long begin_ul = (unsigned long)begin;
+	unsigned long end_ul = (unsigned long)end;
+	unsigned long len_pages = (end_ul - begin_ul) >> PAGE_SHIFT;
+
+
+	free_init_pages("unused kernel image", begin_ul, end_ul);
+
+	/*
+	 * PTI maps some of the kernel into userspace.  For
+	 * performance, this includes some kernel areas that
+	 * do not contain secrets.  Those areas might be
+	 * adjacent to the parts of the kernel image being
+	 * freed, which may contain secrets.  Remove the
+	 * "high kernel image mapping" for these freed areas,
+	 * ensuring they are not even potentially vulnerable
+	 * to Meltdown regardless of the specific optimizations
+	 * PTI is currently using.
+	 *
+	 * The "noalias" prevents unmapping the direct map
+	 * alias which is needed to access the freed pages.
+	 */
+	if (cpu_feature_enabled(X86_FEATURE_PTI))
+		set_memory_np_noalias(begin_ul, len_pages);
 }
 
 void __ref free_initmem(void)
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index c04153796f61..e4f9448ad7f1 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -53,6 +53,7 @@ static DEFINE_SPINLOCK(cpa_lock);
 #define CPA_FLUSHTLB 1
 #define CPA_ARRAY 2
 #define CPA_PAGES_ARRAY 4
+#define CPA_NO_CHECK_ALIAS 8 /* Do not search for aliases */
 
 #ifdef CONFIG_PROC_FS
 static unsigned long direct_pages_count[PG_LEVEL_NUM];
@@ -1486,6 +1487,9 @@ static int change_page_attr_set_clr(unsigned long *addr, int numpages,
 
 	/* No alias checking for _NX bit modifications */
 	checkalias = (pgprot_val(mask_set) | pgprot_val(mask_clr)) != _PAGE_NX;
+	/* Never check aliases if the caller asks for it explicitly: */
+	if (checkalias && (in_flag & CPA_NO_CHECK_ALIAS))
+		checkalias = 0;
 
 	ret = __change_page_attr_set_clr(&cpa, checkalias);
 
@@ -1772,6 +1776,15 @@ int set_memory_np(unsigned long addr, int numpages)
 	return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_PRESENT), 0);
 }
 
+int set_memory_np_noalias(unsigned long addr, int numpages)
+{
+	int cpa_flags = CPA_NO_CHECK_ALIAS;
+
+	return change_page_attr_set_clr(&addr, numpages, __pgprot(0),
+					__pgprot(_PAGE_PRESENT), 0,
+					cpa_flags, NULL);
+}
+
 int set_memory_4k(unsigned long addr, int numpages)
 {
 	return change_page_attr_set_clr(&addr, numpages, __pgprot(0),

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 6/7] x86/mm/pageattr: pass named flag instead of 0/1
  2018-08-05 20:09   ` Thomas Gleixner
@ 2018-08-06 15:09     ` Dave Hansen
  0 siblings, 0 replies; 23+ messages in thread
From: Dave Hansen @ 2018-08-06 15:09 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, keescook, mingo, aarcange, jgross, jpoimboe,
	gregkh, peterz, hughd, torvalds, bp, luto, ak

On 08/05/2018 01:09 PM, Thomas Gleixner wrote:
>> From: Dave Hansen <dave.hansen@linux.intel.com>
>>
>> This is a cleanup.  There should be functional changes in this patch.
> Missing 'no' after 'be', right?

Correct.  Should be:

"This is a cleanup.  There should be no functional changes in this patch."

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/7] x86/mm/init: remove freed kernel image areas from alias mapping
  2018-08-04 21:38   ` Andy Lutomirski
@ 2018-08-06 15:17     ` Dave Hansen
  0 siblings, 0 replies; 23+ messages in thread
From: Dave Hansen @ 2018-08-06 15:17 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: LKML, Kees Cook, Thomas Gleixner, Ingo Molnar, Andrea Arcangeli,
	Juergen Gross, Josh Poimboeuf, Greg KH, Peter Zijlstra,
	Hugh Dickins, Linus Torvalds, Borislav Petkov, Andi Kleen

On 08/04/2018 02:38 PM, Andy Lutomirski wrote:
> On Thu, Aug 2, 2018 at 3:58 PM, Dave Hansen <dave.hansen@linux.intel.com> wrote:
>> This otherwise unused alias mapping of the holes will, by default
>> keep the Global bit, be mapped out to userspace, and be
>> vulnerable to Meltdown.
>>
>> Remove the alias mapping of these pages entirely.  This is likely
>> to fracture the 2M page mapping the kernel image near these areas,
>> but this should affect a minority of the area.
...
> 
> I like this patch, and I tend to think we should (eventually) enable
> it regardless of PTI.  Cleaning up the memory map is generally a good
> thing.
> 
> Also, just to make sure I fully understand: the kernel text is aliased
> in both the direct map and the high map, right?

Yes.  I don't think the double mapping was because of anything that we
really intentionally designed, though.  I think it was just easiest to
leave it in place and it didn't hurt anything.

> This means that we should be able to make the high kernel mapping
> have proper RO permissions very early without breaking text_poke() at
> the minor cost of needing to force a serializing instruction at the
> end of each big block of text pokes.  I think this would be
> worthwhile, although I suspect we'll uncover *tons* of bugs in the
> process.

Yeah, this could easily happen much earlier.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [tip:x86/pti-urgent] x86/mm/init: Remove freed kernel image areas from alias mapping
  2018-08-02 22:58 ` [PATCH 5/7] x86/mm/init: remove freed kernel image areas from alias mapping Dave Hansen
                     ` (2 preceding siblings ...)
  2018-08-05 20:32   ` [tip:x86/pti] x86/mm/init: Remove " tip-bot for Dave Hansen
@ 2018-08-06 20:21   ` tip-bot for Dave Hansen
  3 siblings, 0 replies; 23+ messages in thread
From: tip-bot for Dave Hansen @ 2018-08-06 20:21 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, luto, bp, gregkh, dave.hansen, torvalds, aarcange,
	hughd, jgross, jpoimboe, mingo, ak, hpa, peterz, tglx, jroedel,
	keescook

Commit-ID:  c40a56a7818cfe735fc93a69e1875f8bba834483
Gitweb:     https://git.kernel.org/tip/c40a56a7818cfe735fc93a69e1875f8bba834483
Author:     Dave Hansen <dave.hansen@linux.intel.com>
AuthorDate: Thu, 2 Aug 2018 15:58:31 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Mon, 6 Aug 2018 20:54:16 +0200

x86/mm/init: Remove freed kernel image areas from alias mapping

The kernel image is mapped into two places in the virtual address space
(addresses without KASLR, of course):

	1. The kernel direct map (0xffff880000000000)
	2. The "high kernel map" (0xffffffff81000000)

We actually execute out of #2.  If we get the address of a kernel symbol,
it points to #2, but almost all physical-to-virtual translations point to

Parts of the "high kernel map" alias are mapped in the userspace page
tables with the Global bit for performance reasons.  The parts that we map
to userspace do not (er, should not) have secrets. When PTI is enabled then
the global bit is usually not set in the high mapping and just used to
compensate for poor performance on systems which lack PCID.

This is fine, except that some areas in the kernel image that are adjacent
to the non-secret-containing areas are unused holes.  We free these holes
back into the normal page allocator and reuse them as normal kernel memory.
The memory will, of course, get *used* via the normal map, but the alias
mapping is kept.

This otherwise unused alias mapping of the holes will, by default keep the
Global bit, be mapped out to userspace, and be vulnerable to Meltdown.

Remove the alias mapping of these pages entirely.  This is likely to
fracture the 2M page mapping the kernel image near these areas, but this
should affect a minority of the area.

The pageattr code changes *all* aliases mapping the physical pages that it
operates on (by default).  We only want to modify a single alias, so we
need to tweak its behavior.

This unmapping behavior is currently dependent on PTI being in place.
Going forward, we should at least consider doing this for all
configurations.  Having an extra read-write alias for memory is not exactly
ideal for debugging things like random memory corruption and this does
undercut features like DEBUG_PAGEALLOC or future work like eXclusive Page
Frame Ownership (XPFO).

Before this patch:

current_kernel:---[ High Kernel Mapping ]---
current_kernel-0xffffffff80000000-0xffffffff81000000          16M                               pmd
current_kernel-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
current_kernel-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
current_kernel-0xffffffff81e11000-0xffffffff82000000        1980K     RW                     NX pte
current_kernel-0xffffffff82000000-0xffffffff82600000           6M     ro         PSE     GLB NX pmd
current_kernel-0xffffffff82600000-0xffffffff82c00000           6M     RW         PSE         NX pmd
current_kernel-0xffffffff82c00000-0xffffffff82e00000           2M     RW                     NX pte
current_kernel-0xffffffff82e00000-0xffffffff83200000           4M     RW         PSE         NX pmd
current_kernel-0xffffffff83200000-0xffffffffa0000000         462M                               pmd

  current_user:---[ High Kernel Mapping ]---
  current_user-0xffffffff80000000-0xffffffff81000000          16M                               pmd
  current_user-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
  current_user-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
  current_user-0xffffffff81e11000-0xffffffff82000000        1980K     RW                     NX pte
  current_user-0xffffffff82000000-0xffffffff82600000           6M     ro         PSE     GLB NX pmd
  current_user-0xffffffff82600000-0xffffffffa0000000         474M                               pmd

After this patch:

current_kernel:---[ High Kernel Mapping ]---
current_kernel-0xffffffff80000000-0xffffffff81000000          16M                               pmd
current_kernel-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
current_kernel-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
current_kernel-0xffffffff81e11000-0xffffffff82000000        1980K                               pte
current_kernel-0xffffffff82000000-0xffffffff82400000           4M     ro         PSE     GLB NX pmd
current_kernel-0xffffffff82400000-0xffffffff82488000         544K     ro                     NX pte
current_kernel-0xffffffff82488000-0xffffffff82600000        1504K                               pte
current_kernel-0xffffffff82600000-0xffffffff82c00000           6M     RW         PSE         NX pmd
current_kernel-0xffffffff82c00000-0xffffffff82c0d000          52K     RW                     NX pte
current_kernel-0xffffffff82c0d000-0xffffffff82dc0000        1740K                               pte

  current_user:---[ High Kernel Mapping ]---
  current_user-0xffffffff80000000-0xffffffff81000000          16M                               pmd
  current_user-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
  current_user-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
  current_user-0xffffffff81e11000-0xffffffff82000000        1980K                               pte
  current_user-0xffffffff82000000-0xffffffff82400000           4M     ro         PSE     GLB NX pmd
  current_user-0xffffffff82400000-0xffffffff82488000         544K     ro                     NX pte
  current_user-0xffffffff82488000-0xffffffff82600000        1504K                               pte
  current_user-0xffffffff82600000-0xffffffffa0000000         474M                               pmd

[ tglx: Do not unmap on 32bit as there is only one mapping ]

Fixes: 0f561fce4d69 ("x86/pti: Enable global pages for shared areas")
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Joerg Roedel <jroedel@suse.de>
Link: https://lkml.kernel.org/r/20180802225831.5F6A2BFC@viggo.jf.intel.com
---
 arch/x86/include/asm/set_memory.h |  1 +
 arch/x86/mm/init.c                | 26 ++++++++++++++++++++++++--
 arch/x86/mm/pageattr.c            | 13 +++++++++++++
 3 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
index bd090367236c..34cffcef7375 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -46,6 +46,7 @@ int set_memory_np(unsigned long addr, int numpages);
 int set_memory_4k(unsigned long addr, int numpages);
 int set_memory_encrypted(unsigned long addr, int numpages);
 int set_memory_decrypted(unsigned long addr, int numpages);
+int set_memory_np_noalias(unsigned long addr, int numpages);
 
 int set_memory_array_uc(unsigned long *addr, int addrinarray);
 int set_memory_array_wc(unsigned long *addr, int addrinarray);
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index bc11dedffc45..74b157ac078d 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -780,8 +780,30 @@ void free_init_pages(char *what, unsigned long begin, unsigned long end)
  */
 void free_kernel_image_pages(void *begin, void *end)
 {
-	free_init_pages("unused kernel image",
-			(unsigned long)begin, (unsigned long)end);
+	unsigned long begin_ul = (unsigned long)begin;
+	unsigned long end_ul = (unsigned long)end;
+	unsigned long len_pages = (end_ul - begin_ul) >> PAGE_SHIFT;
+
+
+	free_init_pages("unused kernel image", begin_ul, end_ul);
+
+	/*
+	 * PTI maps some of the kernel into userspace.  For performance,
+	 * this includes some kernel areas that do not contain secrets.
+	 * Those areas might be adjacent to the parts of the kernel image
+	 * being freed, which may contain secrets.  Remove the "high kernel
+	 * image mapping" for these freed areas, ensuring they are not even
+	 * potentially vulnerable to Meltdown regardless of the specific
+	 * optimizations PTI is currently using.
+	 *
+	 * The "noalias" prevents unmapping the direct map alias which is
+	 * needed to access the freed pages.
+	 *
+	 * This is only valid for 64bit kernels. 32bit has only one mapping
+	 * which can't be treated in this way for obvious reasons.
+	 */
+	if (IS_ENABLED(CONFIG_X86_64) && cpu_feature_enabled(X86_FEATURE_PTI))
+		set_memory_np_noalias(begin_ul, len_pages);
 }
 
 void __ref free_initmem(void)
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index c04153796f61..0a74996a1149 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -53,6 +53,7 @@ static DEFINE_SPINLOCK(cpa_lock);
 #define CPA_FLUSHTLB 1
 #define CPA_ARRAY 2
 #define CPA_PAGES_ARRAY 4
+#define CPA_NO_CHECK_ALIAS 8 /* Do not search for aliases */
 
 #ifdef CONFIG_PROC_FS
 static unsigned long direct_pages_count[PG_LEVEL_NUM];
@@ -1486,6 +1487,9 @@ static int change_page_attr_set_clr(unsigned long *addr, int numpages,
 
 	/* No alias checking for _NX bit modifications */
 	checkalias = (pgprot_val(mask_set) | pgprot_val(mask_clr)) != _PAGE_NX;
+	/* Has caller explicitly disabled alias checking? */
+	if (in_flag & CPA_NO_CHECK_ALIAS)
+		checkalias = 0;
 
 	ret = __change_page_attr_set_clr(&cpa, checkalias);
 
@@ -1772,6 +1776,15 @@ int set_memory_np(unsigned long addr, int numpages)
 	return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_PRESENT), 0);
 }
 
+int set_memory_np_noalias(unsigned long addr, int numpages)
+{
+	int cpa_flags = CPA_NO_CHECK_ALIAS;
+
+	return change_page_attr_set_clr(&addr, numpages, __pgprot(0),
+					__pgprot(_PAGE_PRESENT), 0,
+					cpa_flags, NULL);
+}
+
 int set_memory_4k(unsigned long addr, int numpages)
 {
 	return change_page_attr_set_clr(&addr, numpages, __pgprot(0),

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2018-08-06 20:23 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-02 22:58 [PATCH 0/7] [v2] x86/mm/pti: close two Meltdown leaks with Global kernel mapping Dave Hansen
2018-08-02 22:58 ` [PATCH 1/7] x86/mm/pti: clear Global bit more aggressively Dave Hansen
2018-08-05 20:30   ` [tip:x86/pti] x86/mm/pti: Clear " tip-bot for Dave Hansen
2018-08-02 22:58 ` [PATCH 2/7] mm: allow non-direct-map arguments to free_reserved_area() Dave Hansen
2018-08-05 20:31   ` [tip:x86/pti] mm: Allow " tip-bot for Dave Hansen
2018-08-02 22:58 ` [PATCH 3/7] x86/mm/init: pass unconverted symbol addresses to free_init_pages() Dave Hansen
2018-08-04  0:18   ` Hugh Dickins
2018-08-04 17:31     ` Linus Torvalds
2018-08-04 18:23       ` Hugh Dickins
2018-08-05  6:11       ` Andi Kleen
2018-08-05 20:31   ` [tip:x86/pti] x86/mm/init: Pass " tip-bot for Dave Hansen
2018-08-02 22:58 ` [PATCH 4/7] x86/mm/init: add helper for freeing kernel image pages Dave Hansen
2018-08-05 20:32   ` [tip:x86/pti] x86/mm/init: Add " tip-bot for Dave Hansen
2018-08-02 22:58 ` [PATCH 5/7] x86/mm/init: remove freed kernel image areas from alias mapping Dave Hansen
2018-08-04  0:35   ` Hugh Dickins
2018-08-04 21:38   ` Andy Lutomirski
2018-08-06 15:17     ` Dave Hansen
2018-08-05 20:32   ` [tip:x86/pti] x86/mm/init: Remove " tip-bot for Dave Hansen
2018-08-06 20:21   ` [tip:x86/pti-urgent] " tip-bot for Dave Hansen
2018-08-02 22:58 ` [PATCH 6/7] x86/mm/pageattr: pass named flag instead of 0/1 Dave Hansen
2018-08-05 20:09   ` Thomas Gleixner
2018-08-06 15:09     ` Dave Hansen
2018-08-02 22:58 ` [PATCH 7/7] x86/mm/pageattr: Remove implicit NX behavior Dave Hansen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).