From: Dave Hansen <dave.hansen@linux.intel.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
linux-mm <linux-mm@kvack.org>, Kees Cook <keescook@google.com>,
Hugh Dickins <hughd@google.com>
Subject: Re: [PATCH 00/23] KAISER: unmap most of the kernel from userspace page tables
Date: Wed, 1 Nov 2017 10:31:50 -0700 [thread overview]
Message-ID: <8bacac66-7d3e-b15d-a73b-92c55c0b1908@linux.intel.com> (raw)
In-Reply-To: <CA+55aFypdyt+3-JyD3U1da5EqznncxKZZKPGn4ykkD=4Q4rdvw@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 2141 bytes --]
On 11/01/2017 09:08 AM, Linus Torvalds wrote:
> On Tue, Oct 31, 2017 at 4:44 PM, Dave Hansen
> <dave.hansen@linux.intel.com> wrote:
>> On 10/31/2017 04:27 PM, Linus Torvalds wrote:
>>> (c) am I reading the code correctly, and the shadow page tables are
>>> *completely* duplicated?
>>>
>>> That seems insane. Why isn't only tyhe top level shadowed, and
>>> then lower levels are shared between the shadowed and the "kernel"
>>> page tables?
>>
>> There are obviously two PGDs. The userspace half of the PGD is an exact
>> copy so all the lower levels are shared. The userspace copying is
>> done via the code we add to native_set_pgd().
>
> So the thing that made me think you do all levels was that confusing
> kaiser_pagetable_walk() code (and to a lesser degree
> get_pa_from_mapping()).
>
> That code definitely walks and allocates all levels.
>
> So it really doesn't seem to be just sharing the top page table entry.
Yeah, they're quite lightly commented and badly named now that I go look
at them.
get_pa_from_mapping() should be called something like
get_pa_from_kernel_map(). Its job is to look at the main (kernel) page
tables and go get an address from there. It's only ever called on
kernel addresses.
kaiser_pagetable_walk() should probably be
kaiser_shadow_pagetable_walk(). Its job is to walk the shadow copy and
find the location of a 4k PTE. You can then populate that PTE with the
address you got from get_pa_from_mapping() (or clear it in the remove
mapping case).
I've attached an update to the core patch and Documentation that should
help clear this up.
> And that worries me because that seems to be a very fundamental coherency issue.
>
> I'm assuming that this is about mapping only the individual kernel
> parts, but I'd like to get comments and clarification about that.
I assume that you're really worried about having to go two places to do
one thing, like clearing a dirty bit, or unmapping a PTE, especially
when we have to do that for userspace. Thankfully, the sharing of the
page tables (under the PGD) for userspace gets rid of most of this
nastiness.
I hope that's more clear now.
[-- Attachment #2: kaiser-core-update1.patch --]
[-- Type: text/x-patch, Size: 4884 bytes --]
diff --git a/Documentation/x86/kaiser.txt b/Documentation/x86/kaiser.txt
index 67a70d2..5b5e9c4 100644
--- a/Documentation/x86/kaiser.txt
+++ b/Documentation/x86/kaiser.txt
@@ -1,3 +1,6 @@
+Overview
+========
+
KAISER is a countermeasure against attacks on kernel address
information. There are at least three existing, published,
approaches using the shared user/kernel mapping and hardware features
@@ -18,6 +21,35 @@ This helps ensure that side-channel attacks that leverage the
paging structures do not function when KAISER is enabled. It
can be enabled by setting CONFIG_KAISER=y
+Page Table Management
+=====================
+
+KAISER logically keeps a "copy" of the page tables which unmap
+the kernel while in userspace. The kernel manages the page
+tables as normal, but the "copying" is done with a few tricks
+that mean that we do not have to manage two full copies.
+
+The first trick is that for any any new kernel mapping, we
+presume that we do not want it mapped to userspace. That means
+we normally have no copying to do. We only copy the kernel
+entries over to the shadow in response to a kaiser_add_*()
+call which is rare.
+
+For a new userspace mapping, the kernel makes the entries in
+its page tables like normal. The only difference is when the
+kernel makes entries in the top (PGD) level. In addition to
+setting the entry in the main kernel PGD, a copy if the entry
+is made in the shadow PGD.
+
+PGD entries always point to another page table. Two PGD
+entries pointing to the same thing gives us shared page tables
+for all the lower entries. This leaves a single, shared set of
+userspace page tables to manage. One PTE to lock, one set set
+of accessed bits, dirty bits, etc...
+
+Overhead
+========
+
Protection against side-channel attacks is important. But,
this protection comes at a cost:
diff --git a/arch/x86/mm/kaiser.c b/arch/x86/mm/kaiser.c
index 57f7637..cde9014 100644
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -49,9 +49,21 @@
static DEFINE_SPINLOCK(shadow_table_allocation_lock);
/*
+ * This is a generic page table walker used only for walking kernel
+ * addresses. We use it too help recreate the "shadow" page tables
+ * which are used while we are in userspace.
+ *
+ * This can be called on any kernel memory addresses and will work
+ * with any page sizes and any types: normal linear map memory,
+ * vmalloc(), even kmap().
+ *
+ * Note: this is only used when mapping new *kernel* entries into
+ * the user/shadow page tables. It is never used for userspace
+ * addresses.
+ *
* Returns -1 on error.
*/
-static inline unsigned long get_pa_from_mapping(unsigned long vaddr)
+static inline unsigned long get_pa_from_kernel_map(unsigned long vaddr)
{
pgd_t *pgd;
p4d_t *p4d;
@@ -59,6 +71,8 @@ static inline unsigned long get_pa_from_mapping(unsigned long vaddr)
pmd_t *pmd;
pte_t *pte;
+ WARN_ON_ONCE(vaddr < PAGE_OFFSET);
+
pgd = pgd_offset_k(vaddr);
/*
* We made all the kernel PGDs present in kaiser_init().
@@ -111,13 +125,19 @@ static inline unsigned long get_pa_from_mapping(unsigned long vaddr)
}
/*
- * This is a relatively normal page table walk, except that it
- * also tries to allocate page tables pages along the way.
+ * Walk the shadow copy of the page tables (optionally) trying to
+ * allocate page table pages on the way down. Does not support
+ * large pages since the data we are mapping is (generally) not
+ * large enough or aligned to 2MB.
+ *
+ * Note: this is only used when mapping *new* kernel data into the
+ * user/shadow page tables. It is never used for userspace data.
*
* Returns a pointer to a PTE on success, or NULL on failure.
*/
#define KAISER_WALK_ATOMIC 0x1
-static pte_t *kaiser_pagetable_walk(unsigned long address, unsigned long flags)
+static pte_t *kaiser_shadow_pagetable_walk(unsigned long address,
+ unsigned long flags)
{
pmd_t *pmd;
pud_t *pud;
@@ -207,11 +227,11 @@ int kaiser_add_user_map(const void *__start_addr, unsigned long size,
unsigned long target_address;
for (; address < end_addr; address += PAGE_SIZE) {
- target_address = get_pa_from_mapping(address);
+ target_address = get_pa_from_kernel_map(address);
if (target_address == -1)
return -EIO;
- pte = kaiser_pagetable_walk(address, false);
+ pte = kaiser_shadow_pagetable_walk(address, false);
/*
* Errors come from either -ENOMEM for a page
* table page, or something screwy that did a
@@ -348,7 +368,7 @@ void kaiser_remove_mapping(unsigned long start, unsigned long size)
* context. This should not do any allocations because we
* should only be walking things that are known to be mapped.
*/
- pte_t *pte = kaiser_pagetable_walk(addr, KAISER_WALK_ATOMIC);
+ pte_t *pte = kaiser_shadow_pagetable_walk(addr, KAISER_WALK_ATOMIC);
/*
* We are removing a mapping that shoud
next prev parent reply other threads:[~2017-11-01 17:31 UTC|newest]
Thread overview: 102+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-31 22:31 [PATCH 00/23] KAISER: unmap most of the kernel from userspace page tables Dave Hansen
2017-10-31 22:31 ` [PATCH 01/23] x86, kaiser: prepare assembly for entry/exit CR3 switching Dave Hansen
2017-11-01 0:43 ` Brian Gerst
2017-11-01 1:08 ` Dave Hansen
2017-11-01 18:18 ` Borislav Petkov
2017-11-01 18:27 ` Dave Hansen
2017-11-01 20:42 ` Borislav Petkov
2017-11-01 21:01 ` Thomas Gleixner
2017-11-01 22:58 ` Dave Hansen
2017-10-31 22:31 ` [PATCH 02/23] x86, kaiser: do not set _PAGE_USER for init_mm page tables Dave Hansen
2017-11-01 21:11 ` Thomas Gleixner
2017-11-01 21:24 ` Andy Lutomirski
2017-11-01 21:28 ` Thomas Gleixner
2017-11-01 21:52 ` Dave Hansen
2017-11-01 22:11 ` Thomas Gleixner
2017-11-01 22:12 ` Linus Torvalds
2017-11-01 22:20 ` Thomas Gleixner
2017-11-01 22:45 ` Kees Cook
2017-11-02 7:10 ` Andy Lutomirski
2017-11-02 11:33 ` Thomas Gleixner
2017-11-02 11:59 ` Andy Lutomirski
2017-11-02 12:56 ` Thomas Gleixner
2017-11-02 16:38 ` Dave Hansen
2017-11-02 18:19 ` Andy Lutomirski
2017-11-02 18:24 ` Thomas Gleixner
2017-11-02 18:24 ` Linus Torvalds
2017-11-02 18:40 ` Thomas Gleixner
2017-11-02 18:57 ` Linus Torvalds
2017-11-02 21:41 ` Thomas Gleixner
2017-11-02 7:07 ` Andy Lutomirski
2017-11-02 11:21 ` Thomas Gleixner
2017-10-31 22:31 ` [PATCH 03/23] x86, kaiser: disable global pages Dave Hansen
2017-11-01 21:18 ` Thomas Gleixner
2017-11-01 22:12 ` Dave Hansen
2017-11-01 22:28 ` Thomas Gleixner
2017-10-31 22:31 ` [PATCH 04/23] x86, tlb: make CR4-based TLB flushes more robust Dave Hansen
2017-11-01 8:01 ` Andy Lutomirski
2017-11-01 10:11 ` Kirill A. Shutemov
2017-11-01 10:38 ` Andy Lutomirski
2017-11-01 10:56 ` Kirill A. Shutemov
2017-11-01 11:18 ` Andy Lutomirski
2017-11-01 22:21 ` Dave Hansen
2017-11-01 21:25 ` Thomas Gleixner
2017-11-01 22:24 ` Dave Hansen
2017-11-01 22:30 ` Thomas Gleixner
2017-10-31 22:31 ` [PATCH 05/23] x86, mm: document X86_CR4_PGE toggling behavior Dave Hansen
2017-10-31 23:31 ` Kees Cook
2017-10-31 22:31 ` [PATCH 06/23] x86, kaiser: introduce user-mapped percpu areas Dave Hansen
2017-11-01 21:47 ` Thomas Gleixner
2017-10-31 22:31 ` [PATCH 07/23] x86, kaiser: unmap kernel from userspace page tables (core patch) Dave Hansen
2017-10-31 22:32 ` [PATCH 08/23] x86, kaiser: only populate shadow page tables for userspace Dave Hansen
2017-10-31 23:35 ` Kees Cook
2017-10-31 22:32 ` [PATCH 09/23] x86, kaiser: allow NX to be set in p4d/pgd Dave Hansen
2017-10-31 22:32 ` [PATCH 10/23] x86, kaiser: make sure static PGDs are 8k in size Dave Hansen
2017-10-31 22:32 ` [PATCH 11/23] x86, kaiser: map GDT into user page tables Dave Hansen
2017-10-31 22:32 ` [PATCH 12/23] x86, kaiser: map dynamically-allocated LDTs Dave Hansen
2017-11-01 8:00 ` Andy Lutomirski
2017-11-01 8:06 ` Ingo Molnar
2017-10-31 22:32 ` [PATCH 13/23] x86, kaiser: map espfix structures Dave Hansen
2017-10-31 22:32 ` [PATCH 14/23] x86, kaiser: map entry stack variables Dave Hansen
2017-10-31 22:32 ` [PATCH 15/23] x86, kaiser: map trace interrupt entry Dave Hansen
2017-10-31 22:32 ` [PATCH 16/23] x86, kaiser: map debug IDT tables Dave Hansen
2017-10-31 22:32 ` [PATCH 17/23] x86, kaiser: map virtually-addressed performance monitoring buffers Dave Hansen
2017-10-31 22:32 ` [PATCH 18/23] x86, mm: Move CR3 construction functions Dave Hansen
2017-10-31 22:32 ` [PATCH 19/23] x86, mm: remove hard-coded ASID limit checks Dave Hansen
2017-10-31 22:32 ` [PATCH 20/23] x86, mm: put mmu-to-h/w ASID translation in one place Dave Hansen
2017-10-31 22:32 ` [PATCH 21/23] x86, pcid, kaiser: allow flushing for future ASID switches Dave Hansen
2017-11-01 8:03 ` Andy Lutomirski
2017-11-01 14:17 ` Dave Hansen
2017-11-01 20:31 ` Andy Lutomirski
2017-11-01 20:59 ` Dave Hansen
2017-11-01 21:04 ` Andy Lutomirski
2017-11-01 21:06 ` Dave Hansen
2017-10-31 22:32 ` [PATCH 22/23] x86, kaiser: use PCID feature to make user and kernel switches faster Dave Hansen
2017-10-31 22:32 ` [PATCH 23/23] x86, kaiser: add Kconfig Dave Hansen
2017-10-31 23:59 ` Kees Cook
2017-11-01 9:07 ` Borislav Petkov
2017-10-31 23:27 ` [PATCH 00/23] KAISER: unmap most of the kernel from userspace page tables Linus Torvalds
2017-10-31 23:44 ` Dave Hansen
2017-11-01 0:21 ` Dave Hansen
2017-11-01 7:59 ` Andy Lutomirski
2017-11-01 16:08 ` Linus Torvalds
2017-11-01 17:31 ` Dave Hansen [this message]
2017-11-01 17:58 ` Randy Dunlap
2017-11-01 18:27 ` Linus Torvalds
2017-11-01 18:46 ` Dave Hansen
2017-11-01 19:05 ` Linus Torvalds
2017-11-01 20:33 ` Andy Lutomirski
2017-11-02 7:32 ` Andy Lutomirski
2017-11-02 7:54 ` Andy Lutomirski
2017-11-01 15:53 ` Dave Hansen
2017-11-01 8:54 ` Ingo Molnar
2017-11-01 14:09 ` Thomas Gleixner
2017-11-01 22:14 ` Dave Hansen
2017-11-01 22:28 ` Linus Torvalds
2017-11-02 8:03 ` Peter Zijlstra
2017-11-03 11:07 ` Kirill A. Shutemov
2017-11-02 19:01 ` Will Deacon
2017-11-02 19:38 ` Dave Hansen
2017-11-03 13:41 ` Will Deacon
2017-11-22 16:19 ` Pavel Machek
2017-11-23 10:47 ` Pavel Machek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8bacac66-7d3e-b15d-a73b-92c55c0b1908@linux.intel.com \
--to=dave.hansen@linux.intel.com \
--cc=hughd@google.com \
--cc=keescook@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).