linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Hansen <dave.hansen@linux.intel.com>
To: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org, dave.hansen@linux.intel.com,
	moritz.lipp@iaik.tugraz.at, daniel.gruss@iaik.tugraz.at,
	michael.schwarz@iaik.tugraz.at,
	richard.fellner@student.tugraz.at, luto@kernel.org,
	torvalds@linux-foundation.org, keescook@google.com,
	hughd@google.com, x86@kernel.org
Subject: [PATCH 09/30] x86, kaiser: only populate shadow page tables for userspace
Date: Wed, 08 Nov 2017 11:47:03 -0800	[thread overview]
Message-ID: <20171108194703.45704117@viggo.jf.intel.com> (raw)
In-Reply-To: <20171108194646.907A1942@viggo.jf.intel.com>


From: Dave Hansen <dave.hansen@linux.intel.com>

KAISER has two copies of the page tables: one for the kernel and
one for when we are running in userspace.  There is also a kernel
portion of each of the page tables: the part that *maps* the
kernel.

The kernel portion is relatively static and uses pre-populated
PGDs.  Nobody ever calls set_pgd() on the kernel portion during
normal operation.

The userspace portion of the page tables is updated frequently as
userspace pages are mapped and we demand-allocate page table
pages.  These updates of the userspace *portion* of the tables
need to be reflected into both the kernel and user/shadow copies.

The original KAISER patches did this by effectively looking at
the address that we are updating *for*.  If it is <PAGE_OFFSET,
we are doing an update for the userspace portion of the page
tables and must make an entry in the shadow.  We also make the
kernel copy if this new entry unusable for userspace.

However, this has a wrinkle: we have a few places where we use
low addresses in supervisor (kernel) mode.  When we make EFI
calls, we they use traditionaly user addresses in supervisor mode
and trip over these checks.  The trampoline code that we use for
booting secondary CPUs has a similar issue.

Remember, we need to do two things for a userspace PGD: populate
the shadow and sabotage the kernel PGD so it can not be used in
userspace.  This patch fixes the wrinkle by only doing those two
things when we are dealing with a user address *and* the PGD has
_PAGE_USER set.  That way, we do not accidentally sabotage the
in-kernel users of low addresses that are typically used only for
userspace.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Moritz Lipp <moritz.lipp@iaik.tugraz.at>
Cc: Daniel Gruss <daniel.gruss@iaik.tugraz.at>
Cc: Michael Schwarz <michael.schwarz@iaik.tugraz.at>
Cc: Richard Fellner <richard.fellner@student.tugraz.at>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Kees Cook <keescook@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: x86@kernel.org
---

 b/arch/x86/include/asm/pgtable_64.h |   94 +++++++++++++++++++++++-------------
 1 file changed, 61 insertions(+), 33 deletions(-)

diff -puN arch/x86/include/asm/pgtable_64.h~kaiser-set-pgd-careful-plus-NX arch/x86/include/asm/pgtable_64.h
--- a/arch/x86/include/asm/pgtable_64.h~kaiser-set-pgd-careful-plus-NX	2017-11-08 10:45:30.776681391 -0800
+++ b/arch/x86/include/asm/pgtable_64.h	2017-11-08 10:45:30.779681391 -0800
@@ -177,38 +177,76 @@ static inline p4d_t *native_get_normal_p
 /*
  * Page table pages are page-aligned.  The lower half of the top
  * level is used for userspace and the top half for the kernel.
- * This returns true for user pages that need to get copied into
- * both the user and kernel copies of the page tables, and false
- * for kernel pages that should only be in the kernel copy.
+ *
+ * Returns true for parts of the PGD that map userspace and
+ * false for the parts that map the kernel.
  */
-static inline bool is_userspace_pgd(void *__ptr)
+static inline bool pgdp_maps_userspace(void *__ptr)
 {
 	unsigned long ptr = (unsigned long)__ptr;
 
 	return ((ptr % PAGE_SIZE) < (PAGE_SIZE / 2));
 }
 
+/*
+ * Does this PGD allow access via userspace?
+ */
+static inline bool pgd_userspace_access(pgd_t pgd)
+{
+	return (pgd.pgd & _PAGE_USER);
+}
+
+/*
+ * Returns the pgd_t that the kernel should use in its page tables.
+ */
+static inline pgd_t kaiser_set_shadow_pgd(pgd_t *pgdp, pgd_t pgd)
+{
+#ifdef CONFIG_KAISER
+	if (pgd_userspace_access(pgd)) {
+		if (pgdp_maps_userspace(pgdp)) {
+			/*
+			 * The user/shadow page tables get the full
+			 * PGD, accessible to userspace:
+			 */
+			native_get_shadow_pgd(pgdp)->pgd = pgd.pgd;
+			/*
+			 * For the copy of the pgd that the kernel
+			 * uses, make it unusable to userspace.  This
+			 * ensures if we get out to userspace with the
+			 * wrong CR3 value, userspace will crash
+			 * instead of running.
+			 */
+			pgd.pgd |= _PAGE_NX;
+		}
+	} else if (!pgd.pgd) {
+		/*
+		 * We are clearing the PGD and can not check  _PAGE_USER
+		 * in the zero'd PGD.  We never do this on the
+		 * pre-populated kernel PGDs, except for pgd_bad().
+		 */
+		if (pgdp_maps_userspace(pgdp)) {
+			native_get_shadow_pgd(pgdp)->pgd = pgd.pgd;
+		} else {
+			/*
+			 * Uh, we are very confused.  We have been
+			 * asked to clear a PGD that is in the kernel
+			 * part of the address space.  We preallocated
+			 * all the KAISER PGDs, so this should never
+			 * happen.
+			 */
+			WARN_ON_ONCE(1);
+		}
+	}
+#endif
+	/* return the copy of the PGD we want the kernel to use: */
+	return pgd;
+}
+
+
 static inline void native_set_p4d(p4d_t *p4dp, p4d_t p4d)
 {
 #if defined(CONFIG_KAISER) && !defined(CONFIG_X86_5LEVEL)
-	/*
-	 * set_pgd() does not get called when we are running
-	 * CONFIG_X86_5LEVEL=y.  So, just hack around it.  We
-	 * know here that we have a p4d but that it is really at
-	 * the top level of the page tables; it is really just a
-	 * pgd.
-	 */
-	/* Do we need to also populate the shadow p4d? */
-	if (is_userspace_pgd(p4dp))
-		native_get_shadow_p4d(p4dp)->pgd = p4d.pgd;
-	/*
-	 * Even if the entry is *mapping* userspace, ensure
-	 * that userspace can not use it.  This way, if we
-	 * get out to userspace with the wrong CR3 value,
-	 * userspace will crash instead of running.
-	 */
-	if (!p4d.pgd.pgd)
-		p4dp->pgd.pgd = p4d.pgd.pgd | _PAGE_NX;
+	p4dp->pgd = kaiser_set_shadow_pgd(&p4dp->pgd, p4d.pgd);
 #else /* CONFIG_KAISER */
 	*p4dp = p4d;
 #endif
@@ -226,17 +264,7 @@ static inline void native_p4d_clear(p4d_
 static inline void native_set_pgd(pgd_t *pgdp, pgd_t pgd)
 {
 #ifdef CONFIG_KAISER
-	/* Do we need to also populate the shadow pgd? */
-	if (is_userspace_pgd(pgdp))
-		native_get_shadow_pgd(pgdp)->pgd = pgd.pgd;
-	/*
-	 * Even if the entry is mapping userspace, ensure
-	 * that it is unusable for userspace.  This way,
-	 * if we get out to userspace with the wrong CR3
-	 * value, userspace will crash instead of running.
-	 */
-	if (!pgd_none(pgd))
-		pgdp->pgd = pgd.pgd | _PAGE_NX;
+	*pgdp = kaiser_set_shadow_pgd(pgdp, pgd);
 #else /* CONFIG_KAISER */
 	*pgdp = pgd;
 #endif
_

  parent reply	other threads:[~2017-11-08 19:47 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-08 19:46 [PATCH 00/30] [v2] KAISER: unmap most of the kernel from userspace page tables Dave Hansen
2017-11-08 19:46 ` [PATCH 01/30] x86, mm: do not set _PAGE_USER for init_mm " Dave Hansen
2017-11-08 19:52   ` Linus Torvalds
2017-11-08 20:11     ` Dave Hansen
2017-11-09 10:29   ` Borislav Petkov
2017-11-08 19:46 ` [PATCH 02/30] x86, tlb: make CR4-based TLB flushes more robust Dave Hansen
2017-11-09 10:48   ` Borislav Petkov
2017-11-09 10:51     ` Thomas Gleixner
2017-11-09 11:02       ` Borislav Petkov
2017-11-08 19:46 ` [PATCH 03/30] x86, mm: document X86_CR4_PGE toggling behavior Dave Hansen
2017-11-09 12:21   ` Borislav Petkov
2017-11-08 19:46 ` [PATCH 04/30] x86, kaiser: disable global pages by default with KAISER Dave Hansen
2017-11-09 12:51   ` Borislav Petkov
2017-11-09 22:19   ` Thomas Gleixner
2017-11-08 19:46 ` [PATCH 05/30] x86, kaiser: prepare assembly for entry/exit CR3 switching Dave Hansen
2017-11-09 13:20   ` Borislav Petkov
2017-11-09 15:34     ` Dave Hansen
2017-11-09 15:59       ` Borislav Petkov
2017-11-08 19:46 ` [PATCH 06/30] x86, kaiser: introduce user-mapped percpu areas Dave Hansen
2017-11-08 19:46 ` [PATCH 07/30] x86, kaiser: mark percpu data structures required for entry/exit Dave Hansen
2017-11-08 19:47 ` [PATCH 08/30] x86, kaiser: unmap kernel from userspace page tables (core patch) Dave Hansen
2017-11-10 12:57   ` Ingo Molnar
2017-11-08 19:47 ` Dave Hansen [this message]
2017-11-08 19:47 ` [PATCH 10/30] x86, kaiser: allow NX to be set in p4d/pgd Dave Hansen
2017-11-08 19:47 ` [PATCH 11/30] x86, kaiser: make sure static PGDs are 8k in size Dave Hansen
2017-11-08 19:47 ` [PATCH 12/30] x86, kaiser: map GDT into user page tables Dave Hansen
2017-11-08 19:47 ` [PATCH 13/30] x86, kaiser: map dynamically-allocated LDTs Dave Hansen
2017-11-08 19:47 ` [PATCH 14/30] x86, kaiser: map espfix structures Dave Hansen
2017-11-08 19:47 ` [PATCH 15/30] x86, kaiser: map entry stack variables Dave Hansen
2017-11-08 19:47 ` [PATCH 16/30] x86, kaiser: map trace interrupt entry Dave Hansen
2017-11-08 19:47 ` [PATCH 17/30] x86, kaiser: map debug IDT tables Dave Hansen
2017-11-08 19:47 ` [PATCH 18/30] x86, kaiser: map virtually-addressed performance monitoring buffers Dave Hansen
2017-11-10 12:17   ` Peter Zijlstra
2017-11-08 19:47 ` [PATCH 19/30] x86, mm: Move CR3 construction functions Dave Hansen
2017-11-08 19:47 ` [PATCH 20/30] x86, mm: remove hard-coded ASID limit checks Dave Hansen
2017-11-10 12:20   ` Peter Zijlstra
2017-11-10 18:41     ` Dave Hansen
2017-11-08 19:47 ` [PATCH 21/30] x86, mm: put mmu-to-h/w ASID translation in one place Dave Hansen
2017-11-08 19:47 ` [PATCH 22/30] x86, pcid, kaiser: allow flushing for future ASID switches Dave Hansen
2017-11-10 12:25   ` Peter Zijlstra
2017-11-08 19:47 ` [PATCH 23/30] x86, kaiser: use PCID feature to make user and kernel switches faster Dave Hansen
2017-11-08 19:47 ` [PATCH 24/30] x86, kaiser: disable native VSYSCALL Dave Hansen
2017-11-09 19:04   ` Andy Lutomirski
2017-11-09 19:26     ` Dave Hansen
2017-11-10  0:53       ` Andy Lutomirski
2017-11-10  0:57         ` Dave Hansen
2017-11-10  1:04           ` Andy Lutomirski
2017-11-10  1:22             ` Dave Hansen
2017-11-10  2:25               ` Andy Lutomirski
2017-11-10  6:31                 ` Dave Hansen
2017-11-10 22:06                   ` Andy Lutomirski
2017-11-10 23:04                     ` Dave Hansen
2017-11-13  3:52                       ` Andy Lutomirski
2017-11-13 21:07                         ` Dave Hansen
2017-11-14  2:15                           ` Andy Lutomirski
2017-11-08 19:47 ` [PATCH 25/30] x86, kaiser: add debugfs file to turn KAISER on/off at runtime Dave Hansen
2017-11-08 19:47 ` [PATCH 26/30] x86, kaiser: add a function to check for KAISER being enabled Dave Hansen
2017-11-08 19:47 ` [PATCH 27/30] x86, kaiser: un-poison PGDs at runtime Dave Hansen
2017-11-08 19:47 ` [PATCH 28/30] x86, kaiser: allow KAISER to be enabled/disabled " Dave Hansen
2017-11-08 19:47 ` [PATCH 29/30] x86, kaiser: add Kconfig Dave Hansen
2017-11-08 19:47 ` [PATCH 30/30] x86, kaiser, xen: Dynamically disable KAISER when running under Xen PV Dave Hansen
2017-11-09 15:01   ` Juergen Gross
2017-11-10 19:30 [PATCH 00/30] [v3] KAISER: unmap most of the kernel from userspace page tables Dave Hansen
2017-11-10 19:31 ` [PATCH 09/30] x86, kaiser: only populate shadow page tables for userspace Dave Hansen
2017-11-20 20:12   ` Thomas Gleixner
2017-11-21  7:05     ` Ingo Molnar
2017-11-21 22:09     ` Dave Hansen
2017-11-22  3:44       ` Andy Lutomirski
2017-11-22 23:30         ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171108194703.45704117@viggo.jf.intel.com \
    --to=dave.hansen@linux.intel.com \
    --cc=daniel.gruss@iaik.tugraz.at \
    --cc=hughd@google.com \
    --cc=keescook@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=michael.schwarz@iaik.tugraz.at \
    --cc=moritz.lipp@iaik.tugraz.at \
    --cc=richard.fellner@student.tugraz.at \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).