linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dave Hansen <dave.hansen@linux.intel.com>
To: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org, dave.hansen@linux.intel.com,
	moritz.lipp@iaik.tugraz.at, daniel.gruss@iaik.tugraz.at,
	michael.schwarz@iaik.tugraz.at,
	richard.fellner@student.tugraz.at, luto@kernel.org,
	torvalds@linux-foundation.org, keescook@google.com,
	hughd@google.com, x86@kernel.org
Subject: [PATCH 09/30] x86, kaiser: only populate shadow page tables for userspace
Date: Fri, 10 Nov 2017 11:31:13 -0800	[thread overview]
Message-ID: <20171110193113.E35BC3BF@viggo.jf.intel.com> (raw)
In-Reply-To: <20171110193058.BECA7D88@viggo.jf.intel.com>


From: Dave Hansen <dave.hansen@linux.intel.com>

KAISER has two copies of the page tables: one for the kernel and
one for when running in userspace.  There is also a kernel
portion of each of the page tables: the part that *maps* the
kernel.

The kernel portion is relatively static and uses pre-populated
PGDs.  Nobody ever calls set_pgd() on the kernel portion during
normal operation.

The userspace portion of the page tables is updated frequently as
userspace pages are mapped and page table pages are allocated.
These updates of the userspace *portion* of the tables need to be
reflected into both the kernel and user/shadow copies.

The original KAISER patches did this by effectively looking at
the address that is being updated.  If it is <PAGE_OFFSET,
it is considered to be doing an update for the userspace portion of the page
tables and must make an entry in the shadow.

However, this has a wrinkle: there are a few places where low
addresses are used in supervisor (kernel) mode.  When EFI calls
are made, they use what are traditionally user addresses in
supervisor mode and trip over these checks.  The trampoline code
that used for booting secondary CPUs has a similar issue.

Remember, there are two things that KAISER needs performed on a
userspace PGD:

 1. Populate the shadow itself
 2. Poison the kernel PGD so it can not be used by userspace.

This patch only performs these actions when dealing with a user
address *and* the PGD has _PAGE_USER set.  That way, in-kernel
users of low addresses typically used by userspace are not
accidentally poisoned.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Moritz Lipp <moritz.lipp@iaik.tugraz.at>
Cc: Daniel Gruss <daniel.gruss@iaik.tugraz.at>
Cc: Michael Schwarz <michael.schwarz@iaik.tugraz.at>
Cc: Richard Fellner <richard.fellner@student.tugraz.at>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Kees Cook <keescook@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: x86@kernel.org
---

 b/arch/x86/include/asm/pgtable_64.h |   94 +++++++++++++++++++++++-------------
 1 file changed, 61 insertions(+), 33 deletions(-)

diff -puN arch/x86/include/asm/pgtable_64.h~kaiser-set-pgd-careful-plus-NX arch/x86/include/asm/pgtable_64.h
--- a/arch/x86/include/asm/pgtable_64.h~kaiser-set-pgd-careful-plus-NX	2017-11-10 11:22:09.932244947 -0800
+++ b/arch/x86/include/asm/pgtable_64.h	2017-11-10 11:22:09.935244947 -0800
@@ -177,38 +177,76 @@ static inline p4d_t *native_get_normal_p
 /*
  * Page table pages are page-aligned.  The lower half of the top
  * level is used for userspace and the top half for the kernel.
- * This returns true for user pages that need to get copied into
- * both the user and kernel copies of the page tables, and false
- * for kernel pages that should only be in the kernel copy.
+ *
+ * Returns true for parts of the PGD that map userspace and
+ * false for the parts that map the kernel.
  */
-static inline bool is_userspace_pgd(void *__ptr)
+static inline bool pgdp_maps_userspace(void *__ptr)
 {
 	unsigned long ptr = (unsigned long)__ptr;
 
 	return ((ptr % PAGE_SIZE) < (PAGE_SIZE / 2));
 }
 
+/*
+ * Does this PGD allow access via userspace?
+ */
+static inline bool pgd_userspace_access(pgd_t pgd)
+{
+	return (pgd.pgd & _PAGE_USER);
+}
+
+/*
+ * Returns the pgd_t that the kernel should use in its page tables.
+ */
+static inline pgd_t kaiser_set_shadow_pgd(pgd_t *pgdp, pgd_t pgd)
+{
+#ifdef CONFIG_KAISER
+	if (pgd_userspace_access(pgd)) {
+		if (pgdp_maps_userspace(pgdp)) {
+			/*
+			 * The user/shadow page tables get the full
+			 * PGD, accessible to userspace:
+			 */
+			native_get_shadow_pgd(pgdp)->pgd = pgd.pgd;
+			/*
+			 * For the copy of the pgd that the kernel
+			 * uses, make it unusable to userspace.  This
+			 * ensures if we get out to userspace with the
+			 * wrong CR3 value, userspace will crash
+			 * instead of running.
+			 */
+			pgd.pgd |= _PAGE_NX;
+		}
+	} else if (!pgd.pgd) {
+		/*
+		 * We are clearing the PGD and can not check  _PAGE_USER
+		 * in the zero'd PGD.  We never do this on the
+		 * pre-populated kernel PGDs, except for pgd_bad().
+		 */
+		if (pgdp_maps_userspace(pgdp)) {
+			native_get_shadow_pgd(pgdp)->pgd = pgd.pgd;
+		} else {
+			/*
+			 * Uh, we are very confused.  We have been
+			 * asked to clear a PGD that is in the kernel
+			 * part of the address space.  We preallocated
+			 * all the KAISER PGDs, so this should never
+			 * happen.
+			 */
+			WARN_ON_ONCE(1);
+		}
+	}
+#endif
+	/* return the copy of the PGD we want the kernel to use: */
+	return pgd;
+}
+
+
 static inline void native_set_p4d(p4d_t *p4dp, p4d_t p4d)
 {
 #if defined(CONFIG_KAISER) && !defined(CONFIG_X86_5LEVEL)
-	/*
-	 * set_pgd() does not get called when we are running
-	 * CONFIG_X86_5LEVEL=y.  So, just hack around it.  We
-	 * know here that we have a p4d but that it is really at
-	 * the top level of the page tables; it is really just a
-	 * pgd.
-	 */
-	/* Do we need to also populate the shadow p4d? */
-	if (is_userspace_pgd(p4dp))
-		native_get_shadow_p4d(p4dp)->pgd = p4d.pgd;
-	/*
-	 * Even if the entry is *mapping* userspace, ensure
-	 * that userspace can not use it.  This way, if we
-	 * get out to userspace with the wrong CR3 value,
-	 * userspace will crash instead of running.
-	 */
-	if (!p4d.pgd.pgd)
-		p4dp->pgd.pgd = p4d.pgd.pgd | _PAGE_NX;
+	p4dp->pgd = kaiser_set_shadow_pgd(&p4dp->pgd, p4d.pgd);
 #else /* CONFIG_KAISER */
 	*p4dp = p4d;
 #endif
@@ -226,17 +264,7 @@ static inline void native_p4d_clear(p4d_
 static inline void native_set_pgd(pgd_t *pgdp, pgd_t pgd)
 {
 #ifdef CONFIG_KAISER
-	/* Do we need to also populate the shadow pgd? */
-	if (is_userspace_pgd(pgdp))
-		native_get_shadow_pgd(pgdp)->pgd = pgd.pgd;
-	/*
-	 * Even if the entry is mapping userspace, ensure
-	 * that it is unusable for userspace.  This way,
-	 * if we get out to userspace with the wrong CR3
-	 * value, userspace will crash instead of running.
-	 */
-	if (!pgd_none(pgd))
-		pgdp->pgd = pgd.pgd | _PAGE_NX;
+	*pgdp = kaiser_set_shadow_pgd(pgdp, pgd);
 #else /* CONFIG_KAISER */
 	*pgdp = pgd;
 #endif
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2017-11-10 19:31 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-10 19:30 [PATCH 00/30] [v3] KAISER: unmap most of the kernel from userspace page tables Dave Hansen
2017-11-10 19:31 ` [PATCH 01/30] x86, mm: do not set _PAGE_USER for init_mm " Dave Hansen
2017-11-10 19:31 ` [PATCH 02/30] x86, tlb: Make CR4-based TLB flushes more robust Dave Hansen
2017-11-10 19:31 ` [PATCH 03/30] x86/mm: Document X86_CR4_PGE toggling behavior Dave Hansen
2017-11-10 19:31 ` [PATCH 04/30] x86, kaiser: disable global pages by default with KAISER Dave Hansen
2017-11-14 19:38   ` Rik van Riel
2017-11-26 14:48     ` Ingo Molnar
2017-11-27 11:37       ` Thomas Gleixner
2017-11-27 13:20         ` [PATCH v2] x86/mm/kaiser: Disable " Ingo Molnar
2017-11-27 13:23           ` Thomas Gleixner
2017-11-27 13:27             ` Ingo Molnar
2017-11-10 19:31 ` [PATCH 05/30] x86, kaiser: prepare assembly for entry/exit CR3 switching Dave Hansen
2017-11-20 12:17   ` Thomas Gleixner
2017-11-10 19:31 ` [PATCH 06/30] x86, kaiser: introduce user-mapped per-cpu areas Dave Hansen
2017-11-10 19:31 ` [PATCH 07/30] x86, kaiser: mark per-cpu data structures required for entry/exit Dave Hansen
2017-11-10 19:31 ` [PATCH 08/30] x86, kaiser: unmap kernel from userspace page tables (core patch) Dave Hansen
2017-11-20 17:21   ` Thomas Gleixner
2017-11-22 22:45     ` Dave Hansen
2017-11-22 22:50     ` Dave Hansen
2017-11-22 22:54     ` Dave Hansen
2017-11-22 23:11     ` Dave Hansen
2017-11-10 19:31 ` Dave Hansen [this message]
2017-11-20 20:12   ` [PATCH 09/30] x86, kaiser: only populate shadow page tables for userspace Thomas Gleixner
2017-11-21  7:05     ` Ingo Molnar
2017-11-21 22:09     ` Dave Hansen
2017-11-22  3:44       ` Andy Lutomirski
2017-11-22 23:30         ` Dave Hansen
2017-11-10 19:31 ` [PATCH 10/30] x86, kaiser: allow NX poison to be set in p4d/pgd Dave Hansen
2017-11-10 19:31 ` [PATCH 11/30] x86, kaiser: make sure static PGDs are 8k in size Dave Hansen
2017-11-10 19:31 ` [PATCH 12/30] x86, kaiser: map GDT into user page tables Dave Hansen
2017-11-20 20:22   ` Thomas Gleixner
2017-11-20 20:46     ` Andy Lutomirski
2017-11-20 20:55       ` Thomas Gleixner
2017-11-21 21:19       ` Dave Hansen
2017-11-21 22:46         ` Andy Lutomirski
2017-11-21 23:17           ` Dave Hansen
2017-11-21 23:32             ` Andy Lutomirski
2017-11-21 23:42               ` Dave Hansen
2017-11-22  0:17                 ` Andy Lutomirski
2017-11-22  0:37                   ` Dave Hansen
2017-11-21 22:12     ` Dave Hansen
2017-11-10 19:31 ` [PATCH 13/30] x86, kaiser: map dynamically-allocated LDTs Dave Hansen
2017-11-10 19:31 ` [PATCH 14/30] x86, kaiser: map espfix structures Dave Hansen
2017-11-10 19:31 ` [PATCH 15/30] x86, kaiser: map entry stack variables Dave Hansen
2017-11-10 19:31 ` [PATCH 16/30] x86, kaiser: map trace interrupt entry Dave Hansen
2017-11-10 19:31 ` [PATCH 17/30] x86, kaiser: map debug IDT tables Dave Hansen
2017-11-20 20:40   ` Thomas Gleixner
2017-11-21 22:16     ` Dave Hansen
2017-11-20 20:44   ` Andy Lutomirski
2017-11-20 20:54     ` Thomas Gleixner
2017-11-10 19:31 ` [PATCH 18/30] x86, kaiser: map virtually-addressed performance monitoring buffers Dave Hansen
2017-11-14 18:20   ` Peter Zijlstra
2017-11-14 18:28     ` Dave Hansen
2017-11-14 19:10       ` Hugh Dickins
2017-11-14 19:24         ` Andy Lutomirski
2017-11-15  9:41         ` Peter Zijlstra
2017-11-10 19:31 ` [PATCH 19/30] x86, mm: Move CR3 construction functions Dave Hansen
2017-11-10 19:31 ` [PATCH 20/30] x86, mm: remove hard-coded ASID limit checks Dave Hansen
2017-11-20 20:47   ` Thomas Gleixner
2017-11-10 19:31 ` [PATCH 21/30] x86, mm: put mmu-to-h/w ASID translation in one place Dave Hansen
2017-11-10 22:03   ` Andy Lutomirski
2017-11-10 22:09     ` Dave Hansen
2017-11-10 22:10       ` Andy Lutomirski
2017-11-10 19:31 ` [PATCH 22/30] x86, pcid, kaiser: allow flushing for future ASID switches Dave Hansen
2017-11-10 19:31 ` [PATCH 23/30] x86, kaiser: use PCID feature to make user and kernel switches faster Dave Hansen
2017-11-16 19:19   ` Andrea Arcangeli
2017-11-16 19:25     ` Dave Hansen
2017-11-10 19:31 ` [PATCH 24/30] x86, kaiser: disable native VSYSCALL Dave Hansen
2017-11-10 19:31 ` [PATCH 25/30] x86, kaiser: add debugfs file to turn KAISER on/off at runtime Dave Hansen
2017-11-10 19:31 ` [PATCH 26/30] x86, kaiser: add a function to check for KAISER being enabled Dave Hansen
2017-11-10 19:31 ` [PATCH 27/30] x86, kaiser: un-poison PGDs at runtime Dave Hansen
2017-11-10 19:31 ` [PATCH 28/30] x86, kaiser: allow KAISER to be enabled/disabled " Dave Hansen
2017-11-10 19:32 ` [PATCH 29/30] x86, kaiser: add Kconfig Dave Hansen
2017-11-10 19:32 ` [PATCH 30/30] x86, kaiser, xen: Dynamically disable KAISER when running under Xen PV Dave Hansen
2017-11-20 16:02 ` [PATCH 00/30] [v3] KAISER: unmap most of the kernel from userspace page tables Juerg Haefliger
  -- strict thread matches above, loose matches on Subject: below --
2017-11-08 19:46 [PATCH 00/30] [v2] " Dave Hansen
2017-11-08 19:47 ` [PATCH 09/30] x86, kaiser: only populate shadow page tables for userspace Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171110193113.E35BC3BF@viggo.jf.intel.com \
    --to=dave.hansen@linux.intel.com \
    --cc=daniel.gruss@iaik.tugraz.at \
    --cc=hughd@google.com \
    --cc=keescook@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=michael.schwarz@iaik.tugraz.at \
    --cc=moritz.lipp@iaik.tugraz.at \
    --cc=richard.fellner@student.tugraz.at \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).