linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Hansen <dave.hansen@linux.intel.com>
To: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org, dave.hansen@linux.intel.com
Subject: [PATCH 00/23] KAISER: unmap most of the kernel from userspace page tables
Date: Tue, 31 Oct 2017 15:31:46 -0700	[thread overview]
Message-ID: <20171031223146.6B47C861@viggo.jf.intel.com> (raw)

tl;dr:

KAISER makes it harder to defeat KASLR, but makes syscalls and
interrupts slower.  These patches are based on work from a team at
Graz University of Technology posted here[1].  The major addition is
support for Intel PCIDs which builds on top of Andy Lutomorski's PCID
work merged for 4.14.  PCIDs make KAISER's overhead very reasonable
for a wide variety of use cases.

Full Description:

KAISER is a countermeasure against attacks on kernel address
information.  There are at least three existing, published,
approaches using the shared user/kernel mapping and hardware features
to defeat KASLR.  One approach referenced in the paper locates the
kernel by observing differences in page fault timing between
present-but-inaccessable kernel pages and non-present pages.

KAISER addresses this by unmapping (most of) the kernel when
userspace runs.  It leaves the existing page tables largely alone and
refers to them as "kernel page tables".  For running userspace, a new
"shadow" copy of the page tables is allocated for each process.  The
shadow page tables map all the same user memory as the "kernel" copy,
but only maps a minimal set of kernel memory.

When we enter the kernel via syscalls, interrupts or exceptions,
page tables are switched to the full "kernel" copy.  When the system
switches back to user mode, the "shadow" copy is used.  Process
Context IDentifiers (PCIDs) are used to to ensure that the TLB is not
flushed when switching between page tables, which makes syscalls
roughly 2x faster than without it.  PCIDs are usable on Haswell and
newer CPUs (the ones with "v4", or called fourth-generation Core).

The minimal kernel page tables try to map only what is needed to
enter/exit the kernel such as the entry/exit functions, interrupt
descriptors (IDT) and the kernel stacks.  This minimal set of data
can still reveal the kernel's ASLR base address.  But, this minimal
kernel data is all trusted, which makes it harder to exploit than
data in the kernel direct map which contains loads of user-controlled
data.

KAISER will affect performance for anything that does system calls or
interrupts: everything.  Just the new instructions (CR3 manipulation)
add a few hundred cycles to a syscall or interrupt.  Most workloads
that we have run show single-digit regressions.  5% is a good round
number for what is typical.  The worst we have seen is a roughly 30%
regression on a loopback networking test that did a ton of syscalls
and context switches.  More details about possible performance
impacts are in the new Documentation/ file.

This code is based on a version I downloaded from
(https://github.com/IAIK/KAISER).  It has been heavily modified.

The approach is described in detail in a paper[2].  However, there is
some incorrect and information in the paper, both on how Linux and
the hardware works.  For instance, I do not share the opinion that
KAISER has "runtime overhead of only 0.28%".  Please rely on this
patch series as the canonical source of information about this
submission.

Here is one example of how the kernel image grow with CONFIG_KAISER
on and off.  Most of the size increase is presumably from additional
alignment requirements for mapping entry/exit code and structures.

    text    data     bss      dec filename
11786064 7356724 2928640 22071428 vmlinux-nokaiser
11798203 7371704 2928640 22098547 vmlinux-kaiser
  +12139  +14980       0   +27119

To give folks an idea what the performance impact is like, I took
the following test and ran it single-threaded:

	https://github.com/antonblanchard/will-it-scale/blob/master/tests/lseek1.c

It's a pretty quick syscall so this shows how much KAISER slows
down syscalls (and how much PCIDs help).  The units here are
lseeks/second:

        no kaiser: 5.2M
    kaiser+  pcid: 3.0M
    kaiser+nopcid: 2.2M

"nopcid" is literally with the "nopcid" command-line option which
turns PCIDs off entirely.

Thanks to:
The original KAISER team at Graz University of Technology.
Andy Lutomirski for all the help with the entry code.
Kirill Shutemov for a helpful review of the code.

1. https://github.com/IAIK/KAISER
2. https://gruss.cc/files/kaiser.pdf

--

The code is available here:

	https://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-kaiser.git/

 Documentation/x86/kaiser.txt                | 128 ++++++
 arch/x86/Kconfig                            |   4 +
 arch/x86/entry/calling.h                    |  77 ++++
 arch/x86/entry/entry_64.S                   |  34 +-
 arch/x86/entry/entry_64_compat.S            |  13 +
 arch/x86/events/intel/ds.c                  |  57 ++-
 arch/x86/include/asm/cpufeatures.h          |   1 +
 arch/x86/include/asm/desc.h                 |   2 +-
 arch/x86/include/asm/hw_irq.h               |   2 +-
 arch/x86/include/asm/kaiser.h               |  59 +++
 arch/x86/include/asm/mmu_context.h          |  29 +-
 arch/x86/include/asm/pgalloc.h              |  32 +-
 arch/x86/include/asm/pgtable.h              |  20 +-
 arch/x86/include/asm/pgtable_64.h           | 121 ++++++
 arch/x86/include/asm/pgtable_types.h        |  16 +
 arch/x86/include/asm/processor.h            |   2 +-
 arch/x86/include/asm/tlbflush.h             | 230 +++++++++--
 arch/x86/include/uapi/asm/processor-flags.h |   3 +-
 arch/x86/kernel/cpu/common.c                |  21 +-
 arch/x86/kernel/espfix_64.c                 |  22 +-
 arch/x86/kernel/head_64.S                   |  30 +-
 arch/x86/kernel/irqinit.c                   |   2 +-
 arch/x86/kernel/ldt.c                       |  25 +-
 arch/x86/kernel/process.c                   |   2 +-
 arch/x86/kernel/process_64.c                |   2 +-
 arch/x86/kvm/x86.c                          |   3 +-
 arch/x86/mm/Makefile                        |   1 +
 arch/x86/mm/init.c                          |  75 ++--
 arch/x86/mm/kaiser.c                        | 416 ++++++++++++++++++++
 arch/x86/mm/pageattr.c                      |  63 ++-
 arch/x86/mm/pgtable.c                       |  16 +-
 arch/x86/mm/tlb.c                           | 105 ++++-
 include/asm-generic/vmlinux.lds.h           |  17 +
 include/linux/kaiser.h                      |  34 ++
 include/linux/percpu-defs.h                 |  32 +-
 init/main.c                                 |   2 +
 kernel/fork.c                               |   6 +
 security/Kconfig                            |  10 +
 38 files changed, 1565 insertions(+), 149 deletions(-)

             reply	other threads:[~2017-10-31 22:31 UTC|newest]

Thread overview: 102+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-31 22:31 Dave Hansen [this message]
2017-10-31 22:31 ` [PATCH 01/23] x86, kaiser: prepare assembly for entry/exit CR3 switching Dave Hansen
2017-11-01  0:43   ` Brian Gerst
2017-11-01  1:08     ` Dave Hansen
2017-11-01 18:18   ` Borislav Petkov
2017-11-01 18:27     ` Dave Hansen
2017-11-01 20:42       ` Borislav Petkov
2017-11-01 21:01   ` Thomas Gleixner
2017-11-01 22:58     ` Dave Hansen
2017-10-31 22:31 ` [PATCH 02/23] x86, kaiser: do not set _PAGE_USER for init_mm page tables Dave Hansen
2017-11-01 21:11   ` Thomas Gleixner
2017-11-01 21:24     ` Andy Lutomirski
2017-11-01 21:28       ` Thomas Gleixner
2017-11-01 21:52         ` Dave Hansen
2017-11-01 22:11           ` Thomas Gleixner
2017-11-01 22:12           ` Linus Torvalds
2017-11-01 22:20             ` Thomas Gleixner
2017-11-01 22:45               ` Kees Cook
2017-11-02  7:10               ` Andy Lutomirski
2017-11-02 11:33                 ` Thomas Gleixner
2017-11-02 11:59                   ` Andy Lutomirski
2017-11-02 12:56                     ` Thomas Gleixner
2017-11-02 16:38                   ` Dave Hansen
2017-11-02 18:19                     ` Andy Lutomirski
2017-11-02 18:24                       ` Thomas Gleixner
2017-11-02 18:24                       ` Linus Torvalds
2017-11-02 18:40                         ` Thomas Gleixner
2017-11-02 18:57                           ` Linus Torvalds
2017-11-02 21:41                             ` Thomas Gleixner
2017-11-02  7:07         ` Andy Lutomirski
2017-11-02 11:21           ` Thomas Gleixner
2017-10-31 22:31 ` [PATCH 03/23] x86, kaiser: disable global pages Dave Hansen
2017-11-01 21:18   ` Thomas Gleixner
2017-11-01 22:12     ` Dave Hansen
2017-11-01 22:28       ` Thomas Gleixner
2017-10-31 22:31 ` [PATCH 04/23] x86, tlb: make CR4-based TLB flushes more robust Dave Hansen
2017-11-01  8:01   ` Andy Lutomirski
2017-11-01 10:11     ` Kirill A. Shutemov
2017-11-01 10:38       ` Andy Lutomirski
2017-11-01 10:56         ` Kirill A. Shutemov
2017-11-01 11:18           ` Andy Lutomirski
2017-11-01 22:21             ` Dave Hansen
2017-11-01 21:25   ` Thomas Gleixner
2017-11-01 22:24     ` Dave Hansen
2017-11-01 22:30       ` Thomas Gleixner
2017-10-31 22:31 ` [PATCH 05/23] x86, mm: document X86_CR4_PGE toggling behavior Dave Hansen
2017-10-31 23:31   ` Kees Cook
2017-10-31 22:31 ` [PATCH 06/23] x86, kaiser: introduce user-mapped percpu areas Dave Hansen
2017-11-01 21:47   ` Thomas Gleixner
2017-10-31 22:31 ` [PATCH 07/23] x86, kaiser: unmap kernel from userspace page tables (core patch) Dave Hansen
2017-10-31 22:32 ` [PATCH 08/23] x86, kaiser: only populate shadow page tables for userspace Dave Hansen
2017-10-31 23:35   ` Kees Cook
2017-10-31 22:32 ` [PATCH 09/23] x86, kaiser: allow NX to be set in p4d/pgd Dave Hansen
2017-10-31 22:32 ` [PATCH 10/23] x86, kaiser: make sure static PGDs are 8k in size Dave Hansen
2017-10-31 22:32 ` [PATCH 11/23] x86, kaiser: map GDT into user page tables Dave Hansen
2017-10-31 22:32 ` [PATCH 12/23] x86, kaiser: map dynamically-allocated LDTs Dave Hansen
2017-11-01  8:00   ` Andy Lutomirski
2017-11-01  8:06     ` Ingo Molnar
2017-10-31 22:32 ` [PATCH 13/23] x86, kaiser: map espfix structures Dave Hansen
2017-10-31 22:32 ` [PATCH 14/23] x86, kaiser: map entry stack variables Dave Hansen
2017-10-31 22:32 ` [PATCH 15/23] x86, kaiser: map trace interrupt entry Dave Hansen
2017-10-31 22:32 ` [PATCH 16/23] x86, kaiser: map debug IDT tables Dave Hansen
2017-10-31 22:32 ` [PATCH 17/23] x86, kaiser: map virtually-addressed performance monitoring buffers Dave Hansen
2017-10-31 22:32 ` [PATCH 18/23] x86, mm: Move CR3 construction functions Dave Hansen
2017-10-31 22:32 ` [PATCH 19/23] x86, mm: remove hard-coded ASID limit checks Dave Hansen
2017-10-31 22:32 ` [PATCH 20/23] x86, mm: put mmu-to-h/w ASID translation in one place Dave Hansen
2017-10-31 22:32 ` [PATCH 21/23] x86, pcid, kaiser: allow flushing for future ASID switches Dave Hansen
2017-11-01  8:03   ` Andy Lutomirski
2017-11-01 14:17     ` Dave Hansen
2017-11-01 20:31       ` Andy Lutomirski
2017-11-01 20:59         ` Dave Hansen
2017-11-01 21:04           ` Andy Lutomirski
2017-11-01 21:06             ` Dave Hansen
2017-10-31 22:32 ` [PATCH 22/23] x86, kaiser: use PCID feature to make user and kernel switches faster Dave Hansen
2017-10-31 22:32 ` [PATCH 23/23] x86, kaiser: add Kconfig Dave Hansen
2017-10-31 23:59   ` Kees Cook
2017-11-01  9:07     ` Borislav Petkov
2017-10-31 23:27 ` [PATCH 00/23] KAISER: unmap most of the kernel from userspace page tables Linus Torvalds
2017-10-31 23:44   ` Dave Hansen
2017-11-01  0:21     ` Dave Hansen
2017-11-01  7:59     ` Andy Lutomirski
2017-11-01 16:08     ` Linus Torvalds
2017-11-01 17:31       ` Dave Hansen
2017-11-01 17:58         ` Randy Dunlap
2017-11-01 18:27         ` Linus Torvalds
2017-11-01 18:46           ` Dave Hansen
2017-11-01 19:05             ` Linus Torvalds
2017-11-01 20:33               ` Andy Lutomirski
2017-11-02  7:32                 ` Andy Lutomirski
2017-11-02  7:54                   ` Andy Lutomirski
2017-11-01 15:53   ` Dave Hansen
2017-11-01  8:54 ` Ingo Molnar
2017-11-01 14:09   ` Thomas Gleixner
2017-11-01 22:14   ` Dave Hansen
2017-11-01 22:28     ` Linus Torvalds
2017-11-02  8:03     ` Peter Zijlstra
2017-11-03 11:07     ` Kirill A. Shutemov
2017-11-02 19:01 ` Will Deacon
2017-11-02 19:38   ` Dave Hansen
2017-11-03 13:41     ` Will Deacon
2017-11-22 16:19 ` Pavel Machek
2017-11-23 10:47   ` Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171031223146.6B47C861@viggo.jf.intel.com \
    --to=dave.hansen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).