linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Hansen <dave.hansen@linux.intel.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>, Kees Cook <keescook@google.com>,
	Hugh Dickins <hughd@google.com>
Subject: Re: [PATCH 00/23] KAISER: unmap most of the kernel from userspace page tables
Date: Wed, 1 Nov 2017 10:31:50 -0700	[thread overview]
Message-ID: <8bacac66-7d3e-b15d-a73b-92c55c0b1908@linux.intel.com> (raw)
In-Reply-To: <CA+55aFypdyt+3-JyD3U1da5EqznncxKZZKPGn4ykkD=4Q4rdvw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2141 bytes --]

On 11/01/2017 09:08 AM, Linus Torvalds wrote:
> On Tue, Oct 31, 2017 at 4:44 PM, Dave Hansen
> <dave.hansen@linux.intel.com> wrote:
>> On 10/31/2017 04:27 PM, Linus Torvalds wrote:
>>>  (c) am I reading the code correctly, and the shadow page tables are
>>> *completely* duplicated?
>>>
>>>      That seems insane. Why isn't only tyhe top level shadowed, and
>>> then lower levels are shared between the shadowed and the "kernel"
>>> page tables?
>>
>> There are obviously two PGDs.  The userspace half of the PGD is an exact
>> copy so all the lower levels are shared.  The userspace copying is
>> done via the code we add to native_set_pgd().
> 
> So the thing that made me think you do all levels was that confusing
> kaiser_pagetable_walk() code (and to a lesser degree
> get_pa_from_mapping()).
> 
> That code definitely walks and allocates all levels.
> 
> So it really doesn't seem to be just sharing the top page table entry.

Yeah, they're quite lightly commented and badly named now that I go look
at them.

get_pa_from_mapping() should be called something like
get_pa_from_kernel_map().  Its job is to look at the main (kernel) page
tables and go get an address from there.  It's only ever called on
kernel addresses.

kaiser_pagetable_walk() should probably be
kaiser_shadow_pagetable_walk().  Its job is to walk the shadow copy and
find the location of a 4k PTE.  You can then populate that PTE with the
address you got from get_pa_from_mapping() (or clear it in the remove
mapping case).

I've attached an update to the core patch and Documentation that should
help clear this up.

> And that worries me because that seems to be a very fundamental coherency issue.
> 
> I'm assuming that this is about mapping only the individual kernel
> parts, but I'd like to get comments and clarification about that.

I assume that you're really worried about having to go two places to do
one thing, like clearing a dirty bit, or unmapping a PTE, especially
when we have to do that for userspace.  Thankfully, the sharing of the
page tables (under the PGD) for userspace gets rid of most of this
nastiness.

I hope that's more clear now.

[-- Attachment #2: kaiser-core-update1.patch --]
[-- Type: text/x-patch, Size: 4884 bytes --]

diff --git a/Documentation/x86/kaiser.txt b/Documentation/x86/kaiser.txt
index 67a70d2..5b5e9c4 100644
--- a/Documentation/x86/kaiser.txt
+++ b/Documentation/x86/kaiser.txt
@@ -1,3 +1,6 @@
+Overview
+========
+
 KAISER is a countermeasure against attacks on kernel address
 information.  There are at least three existing, published,
 approaches using the shared user/kernel mapping and hardware features
@@ -18,6 +21,35 @@ This helps ensure that side-channel attacks that leverage the
 paging structures do not function when KAISER is enabled.  It
 can be enabled by setting CONFIG_KAISER=y
 
+Page Table Management
+=====================
+
+KAISER logically keeps a "copy" of the page tables which unmap
+the kernel while in userspace.  The kernel manages the page
+tables as normal, but the "copying" is done with a few tricks
+that mean that we do not have to manage two full copies.
+
+The first trick is that for any any new kernel mapping, we
+presume that we do not want it mapped to userspace.  That means
+we normally have no copying to do.  We only copy the kernel
+entries over to the shadow in response to a kaiser_add_*()
+call which is rare.
+
+For a new userspace mapping, the kernel makes the entries in
+its page tables like normal.  The only difference is when the
+kernel makes entries in the top (PGD) level.  In addition to
+setting the entry in the main kernel PGD, a copy if the entry
+is made in the shadow PGD.
+
+PGD entries always point to another page table.  Two PGD
+entries pointing to the same thing gives us shared page tables
+for all the lower entries.  This leaves a single, shared set of
+userspace page tables to manage.  One PTE to lock, one set set
+of accessed bits, dirty bits, etc...
+
+Overhead
+========
+
 Protection against side-channel attacks is important.  But,
 this protection comes at a cost:
 
diff --git a/arch/x86/mm/kaiser.c b/arch/x86/mm/kaiser.c
index 57f7637..cde9014 100644
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -49,9 +49,21 @@
 static DEFINE_SPINLOCK(shadow_table_allocation_lock);
 
 /*
+ * This is a generic page table walker used only for walking kernel
+ * addresses.  We use it too help recreate the "shadow" page tables
+ * which are used while we are in userspace.
+ *
+ * This can be called on any kernel memory addresses and will work
+ * with any page sizes and any types: normal linear map memory,
+ * vmalloc(), even kmap().
+ *
+ * Note: this is only used when mapping new *kernel* entries into
+ * the user/shadow page tables.  It is never used for userspace
+ * addresses.
+ *
  * Returns -1 on error.
  */
-static inline unsigned long get_pa_from_mapping(unsigned long vaddr)
+static inline unsigned long get_pa_from_kernel_map(unsigned long vaddr)
 {
 	pgd_t *pgd;
 	p4d_t *p4d;
@@ -59,6 +71,8 @@ static inline unsigned long get_pa_from_mapping(unsigned long vaddr)
 	pmd_t *pmd;
 	pte_t *pte;
 
+	WARN_ON_ONCE(vaddr < PAGE_OFFSET);
+
 	pgd = pgd_offset_k(vaddr);
 	/*
 	 * We made all the kernel PGDs present in kaiser_init().
@@ -111,13 +125,19 @@ static inline unsigned long get_pa_from_mapping(unsigned long vaddr)
 }
 
 /*
- * This is a relatively normal page table walk, except that it
- * also tries to allocate page tables pages along the way.
+ * Walk the shadow copy of the page tables (optionally) trying to
+ * allocate page table pages on the way down.  Does not support
+ * large pages since the data we are mapping is (generally) not
+ * large enough or aligned to 2MB.
+ *
+ * Note: this is only used when mapping *new* kernel data into the
+ * user/shadow page tables.  It is never used for userspace data.
  *
  * Returns a pointer to a PTE on success, or NULL on failure.
  */
 #define KAISER_WALK_ATOMIC  0x1
-static pte_t *kaiser_pagetable_walk(unsigned long address, unsigned long flags)
+static pte_t *kaiser_shadow_pagetable_walk(unsigned long address,
+					   unsigned long flags)
 {
 	pmd_t *pmd;
 	pud_t *pud;
@@ -207,11 +227,11 @@ int kaiser_add_user_map(const void *__start_addr, unsigned long size,
 	unsigned long target_address;
 
 	for (; address < end_addr; address += PAGE_SIZE) {
-		target_address = get_pa_from_mapping(address);
+		target_address = get_pa_from_kernel_map(address);
 		if (target_address == -1)
 			return -EIO;
 
-		pte = kaiser_pagetable_walk(address, false);
+		pte = kaiser_shadow_pagetable_walk(address, false);
 		/*
 		 * Errors come from either -ENOMEM for a page
 		 * table page, or something screwy that did a
@@ -348,7 +368,7 @@ void kaiser_remove_mapping(unsigned long start, unsigned long size)
 		 * context.  This should not do any allocations because we
 		 * should only be walking things that are known to be mapped.
 		 */
-		pte_t *pte = kaiser_pagetable_walk(addr, KAISER_WALK_ATOMIC);
+		pte_t *pte = kaiser_shadow_pagetable_walk(addr, KAISER_WALK_ATOMIC);
 
 		/*
 		 * We are removing a mapping that shoud

  reply	other threads:[~2017-11-01 17:31 UTC|newest]

Thread overview: 102+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-31 22:31 [PATCH 00/23] KAISER: unmap most of the kernel from userspace page tables Dave Hansen
2017-10-31 22:31 ` [PATCH 01/23] x86, kaiser: prepare assembly for entry/exit CR3 switching Dave Hansen
2017-11-01  0:43   ` Brian Gerst
2017-11-01  1:08     ` Dave Hansen
2017-11-01 18:18   ` Borislav Petkov
2017-11-01 18:27     ` Dave Hansen
2017-11-01 20:42       ` Borislav Petkov
2017-11-01 21:01   ` Thomas Gleixner
2017-11-01 22:58     ` Dave Hansen
2017-10-31 22:31 ` [PATCH 02/23] x86, kaiser: do not set _PAGE_USER for init_mm page tables Dave Hansen
2017-11-01 21:11   ` Thomas Gleixner
2017-11-01 21:24     ` Andy Lutomirski
2017-11-01 21:28       ` Thomas Gleixner
2017-11-01 21:52         ` Dave Hansen
2017-11-01 22:11           ` Thomas Gleixner
2017-11-01 22:12           ` Linus Torvalds
2017-11-01 22:20             ` Thomas Gleixner
2017-11-01 22:45               ` Kees Cook
2017-11-02  7:10               ` Andy Lutomirski
2017-11-02 11:33                 ` Thomas Gleixner
2017-11-02 11:59                   ` Andy Lutomirski
2017-11-02 12:56                     ` Thomas Gleixner
2017-11-02 16:38                   ` Dave Hansen
2017-11-02 18:19                     ` Andy Lutomirski
2017-11-02 18:24                       ` Thomas Gleixner
2017-11-02 18:24                       ` Linus Torvalds
2017-11-02 18:40                         ` Thomas Gleixner
2017-11-02 18:57                           ` Linus Torvalds
2017-11-02 21:41                             ` Thomas Gleixner
2017-11-02  7:07         ` Andy Lutomirski
2017-11-02 11:21           ` Thomas Gleixner
2017-10-31 22:31 ` [PATCH 03/23] x86, kaiser: disable global pages Dave Hansen
2017-11-01 21:18   ` Thomas Gleixner
2017-11-01 22:12     ` Dave Hansen
2017-11-01 22:28       ` Thomas Gleixner
2017-10-31 22:31 ` [PATCH 04/23] x86, tlb: make CR4-based TLB flushes more robust Dave Hansen
2017-11-01  8:01   ` Andy Lutomirski
2017-11-01 10:11     ` Kirill A. Shutemov
2017-11-01 10:38       ` Andy Lutomirski
2017-11-01 10:56         ` Kirill A. Shutemov
2017-11-01 11:18           ` Andy Lutomirski
2017-11-01 22:21             ` Dave Hansen
2017-11-01 21:25   ` Thomas Gleixner
2017-11-01 22:24     ` Dave Hansen
2017-11-01 22:30       ` Thomas Gleixner
2017-10-31 22:31 ` [PATCH 05/23] x86, mm: document X86_CR4_PGE toggling behavior Dave Hansen
2017-10-31 23:31   ` Kees Cook
2017-10-31 22:31 ` [PATCH 06/23] x86, kaiser: introduce user-mapped percpu areas Dave Hansen
2017-11-01 21:47   ` Thomas Gleixner
2017-10-31 22:31 ` [PATCH 07/23] x86, kaiser: unmap kernel from userspace page tables (core patch) Dave Hansen
2017-10-31 22:32 ` [PATCH 08/23] x86, kaiser: only populate shadow page tables for userspace Dave Hansen
2017-10-31 23:35   ` Kees Cook
2017-10-31 22:32 ` [PATCH 09/23] x86, kaiser: allow NX to be set in p4d/pgd Dave Hansen
2017-10-31 22:32 ` [PATCH 10/23] x86, kaiser: make sure static PGDs are 8k in size Dave Hansen
2017-10-31 22:32 ` [PATCH 11/23] x86, kaiser: map GDT into user page tables Dave Hansen
2017-10-31 22:32 ` [PATCH 12/23] x86, kaiser: map dynamically-allocated LDTs Dave Hansen
2017-11-01  8:00   ` Andy Lutomirski
2017-11-01  8:06     ` Ingo Molnar
2017-10-31 22:32 ` [PATCH 13/23] x86, kaiser: map espfix structures Dave Hansen
2017-10-31 22:32 ` [PATCH 14/23] x86, kaiser: map entry stack variables Dave Hansen
2017-10-31 22:32 ` [PATCH 15/23] x86, kaiser: map trace interrupt entry Dave Hansen
2017-10-31 22:32 ` [PATCH 16/23] x86, kaiser: map debug IDT tables Dave Hansen
2017-10-31 22:32 ` [PATCH 17/23] x86, kaiser: map virtually-addressed performance monitoring buffers Dave Hansen
2017-10-31 22:32 ` [PATCH 18/23] x86, mm: Move CR3 construction functions Dave Hansen
2017-10-31 22:32 ` [PATCH 19/23] x86, mm: remove hard-coded ASID limit checks Dave Hansen
2017-10-31 22:32 ` [PATCH 20/23] x86, mm: put mmu-to-h/w ASID translation in one place Dave Hansen
2017-10-31 22:32 ` [PATCH 21/23] x86, pcid, kaiser: allow flushing for future ASID switches Dave Hansen
2017-11-01  8:03   ` Andy Lutomirski
2017-11-01 14:17     ` Dave Hansen
2017-11-01 20:31       ` Andy Lutomirski
2017-11-01 20:59         ` Dave Hansen
2017-11-01 21:04           ` Andy Lutomirski
2017-11-01 21:06             ` Dave Hansen
2017-10-31 22:32 ` [PATCH 22/23] x86, kaiser: use PCID feature to make user and kernel switches faster Dave Hansen
2017-10-31 22:32 ` [PATCH 23/23] x86, kaiser: add Kconfig Dave Hansen
2017-10-31 23:59   ` Kees Cook
2017-11-01  9:07     ` Borislav Petkov
2017-10-31 23:27 ` [PATCH 00/23] KAISER: unmap most of the kernel from userspace page tables Linus Torvalds
2017-10-31 23:44   ` Dave Hansen
2017-11-01  0:21     ` Dave Hansen
2017-11-01  7:59     ` Andy Lutomirski
2017-11-01 16:08     ` Linus Torvalds
2017-11-01 17:31       ` Dave Hansen [this message]
2017-11-01 17:58         ` Randy Dunlap
2017-11-01 18:27         ` Linus Torvalds
2017-11-01 18:46           ` Dave Hansen
2017-11-01 19:05             ` Linus Torvalds
2017-11-01 20:33               ` Andy Lutomirski
2017-11-02  7:32                 ` Andy Lutomirski
2017-11-02  7:54                   ` Andy Lutomirski
2017-11-01 15:53   ` Dave Hansen
2017-11-01  8:54 ` Ingo Molnar
2017-11-01 14:09   ` Thomas Gleixner
2017-11-01 22:14   ` Dave Hansen
2017-11-01 22:28     ` Linus Torvalds
2017-11-02  8:03     ` Peter Zijlstra
2017-11-03 11:07     ` Kirill A. Shutemov
2017-11-02 19:01 ` Will Deacon
2017-11-02 19:38   ` Dave Hansen
2017-11-03 13:41     ` Will Deacon
2017-11-22 16:19 ` Pavel Machek
2017-11-23 10:47   ` Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8bacac66-7d3e-b15d-a73b-92c55c0b1908@linux.intel.com \
    --to=dave.hansen@linux.intel.com \
    --cc=hughd@google.com \
    --cc=keescook@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).