linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexandre Chartre <alexandre.chartre@oracle.com>
To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	hpa@zytor.com, x86@kernel.org, dave.hansen@linux.intel.com,
	luto@kernel.org, peterz@infradead.org,
	linux-kernel@vger.kernel.org, thomas.lendacky@amd.com,
	jroedel@suse.de
Cc: konrad.wilk@oracle.com, jan.setjeeilers@oracle.com,
	junaids@google.com, oweisse@google.com, rppt@linux.vnet.ibm.com,
	graf@amazon.de, mgross@linux.intel.com, kuzuno@gmail.com,
	alexandre.chartre@oracle.com
Subject: [RFC][PATCH v2 00/21] x86/pti: Defer CR3 switch to C code
Date: Mon, 16 Nov 2020 15:47:36 +0100	[thread overview]
Message-ID: <20201116144757.1920077-1-alexandre.chartre@oracle.com> (raw)

Version 2 addressing comments from Andy:

- paranoid_entry/exit is back to assembly code. This avoids having
  a C version of SWAPGS and the need to disable stack-protector.
  (remove patches 8, 9, 21 from v1).

- SAVE_AND_SWITCH_TO_KERNEL_CR3 and RESTORE_CR3 are removed from
  paranoid_entry/exit and move to C (patch 19).

- __per_cpu_offset is mapped into the user page-table (patch 11)
  so that paranoid_entry can update GS before CR3 is switched.

- use a different stack canary with the user and kernel page-tables.
  This is a new patch in v2 to not leak the kernel stack canary
  in the user page-table (patch 21).

Patches are now based on v5.10-rc4.

----

With Page Table Isolation (PTI), syscalls as well as interrupts and
exceptions occurring in userspace enter the kernel with a user
page-table. The kernel entry code will then switch the page-table
from the user page-table to the kernel page-table by updating the
CR3 control register. This CR3 switch is currently done early in
the kernel entry sequence using assembly code.

This RFC proposes to defer the PTI CR3 switch until we reach C code.
The benefit is that this simplifies the assembly entry code, and make
the PTI CR3 switch code easier to understand. This also paves the way
for further possible projects such an easier integration of Address
Space Isolation (ASI), or the possibilily to execute some selected
syscall or interrupt handlers without switching to the kernel page-table
(and thus avoid the PTI page-table switch overhead).

Deferring CR3 switch to C code means that we need to run more of the
kernel entry code with the user page-table. To do so, we need to:

 - map more syscall, interrupt and exception entry code into the user
   page-table (map all noinstr code);

 - map additional data used in the entry code (such as stack canary);

 - run more entry code on the trampoline stack (which is mapped both
   in the kernel and in the user page-table) until we switch to the
   kernel page-table and then switch to the kernel stack;

 - have a per-task trampoline stack instead of a per-cpu trampoline
   stack, so the task can be scheduled out while it hasn't switched
   to the kernel stack.

Note that, for now, the CR3 switch can only be pushed as far as interrupts
remain disabled in the entry code. This is because the CR3 switch is done
based on the privilege level from the CS register from the interrupt frame.
I plan to fix this but that's some extra complication (need to track if the
user page-table is used or not).

The proposed patchset is in RFC state to get early feedback about this
proposal.

The code survives running a kernel build and LTP. Note that changes are
only for 64-bit at the moment, I haven't looked at 32-bit yet but I will
definitively check it.

Patches are based on v5.10-rc4.

Thanks,

alex.

-----

Alexandre Chartre (21):
  x86/syscall: Add wrapper for invoking syscall function
  x86/entry: Update asm_call_on_stack to support more function arguments
  x86/entry: Consolidate IST entry from userspace
  x86/sev-es: Define a setup stack function for the VC idtentry
  x86/entry: Implement ret_from_fork body with C code
  x86/pti: Provide C variants of PTI switch CR3 macros
  x86/entry: Fill ESPFIX stack using C code
  x86/pti: Introduce per-task PTI trampoline stack
  x86/pti: Function to clone page-table entries from a specified mm
  x86/pti: Function to map per-cpu page-table entry
  x86/pti: Extend PTI user mappings
  x86/pti: Use PTI stack instead of trampoline stack
  x86/pti: Execute syscall functions on the kernel stack
  x86/pti: Execute IDT handlers on the kernel stack
  x86/pti: Execute IDT handlers with error code on the kernel stack
  x86/pti: Execute system vector handlers on the kernel stack
  x86/pti: Execute page fault handler on the kernel stack
  x86/pti: Execute NMI handler on the kernel stack
  x86/pti: Defer CR3 switch to C code for IST entries
  x86/pti: Defer CR3 switch to C code for non-IST and syscall entries
  x86/pti: Use a different stack canary with the user and kernel
    page-table

 arch/x86/entry/common.c               |  58 ++++-
 arch/x86/entry/entry_64.S             | 346 +++++++++++---------------
 arch/x86/entry/entry_64_compat.S      |  22 --
 arch/x86/include/asm/entry-common.h   | 194 +++++++++++++++
 arch/x86/include/asm/idtentry.h       | 130 +++++++++-
 arch/x86/include/asm/irq_stack.h      |  11 +
 arch/x86/include/asm/page_64_types.h  |  36 ++-
 arch/x86/include/asm/processor.h      |   3 +
 arch/x86/include/asm/pti.h            |  18 ++
 arch/x86/include/asm/stackprotector.h |  35 ++-
 arch/x86/include/asm/switch_to.h      |   7 +-
 arch/x86/include/asm/traps.h          |   2 +-
 arch/x86/kernel/cpu/mce/core.c        |   7 +-
 arch/x86/kernel/espfix_64.c           |  41 +++
 arch/x86/kernel/nmi.c                 |  34 ++-
 arch/x86/kernel/sev-es.c              |  63 +++++
 arch/x86/kernel/traps.c               |  61 +++--
 arch/x86/mm/fault.c                   |  11 +-
 arch/x86/mm/pti.c                     |  76 ++++--
 include/linux/sched.h                 |   8 +
 kernel/fork.c                         |  25 ++
 21 files changed, 874 insertions(+), 314 deletions(-)

-- 
2.18.4


             reply	other threads:[~2020-11-16 14:46 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-16 14:47 Alexandre Chartre [this message]
2020-11-16 14:47 ` [RFC][PATCH v2 01/21] x86/syscall: Add wrapper for invoking syscall function Alexandre Chartre
2020-11-16 14:47 ` [RFC][PATCH v2 02/21] x86/entry: Update asm_call_on_stack to support more function arguments Alexandre Chartre
2020-11-16 14:47 ` [RFC][PATCH v2 03/21] x86/entry: Consolidate IST entry from userspace Alexandre Chartre
2020-11-16 14:47 ` [RFC][PATCH v2 04/21] x86/sev-es: Define a setup stack function for the VC idtentry Alexandre Chartre
2020-11-16 14:47 ` [RFC][PATCH v2 05/21] x86/entry: Implement ret_from_fork body with C code Alexandre Chartre
2020-11-16 14:47 ` [RFC][PATCH v2 06/21] x86/pti: Provide C variants of PTI switch CR3 macros Alexandre Chartre
2020-11-16 14:47 ` [RFC][PATCH v2 07/21] x86/entry: Fill ESPFIX stack using C code Alexandre Chartre
2020-11-16 14:47 ` [RFC][PATCH v2 08/21] x86/pti: Introduce per-task PTI trampoline stack Alexandre Chartre
2020-11-16 14:47 ` [RFC][PATCH v2 09/21] x86/pti: Function to clone page-table entries from a specified mm Alexandre Chartre
2020-11-16 14:47 ` [RFC][PATCH v2 10/21] x86/pti: Function to map per-cpu page-table entry Alexandre Chartre
2020-11-16 14:47 ` [RFC][PATCH v2 11/21] x86/pti: Extend PTI user mappings Alexandre Chartre
2020-11-16 19:48   ` Andy Lutomirski
2020-11-16 20:21     ` Alexandre Chartre
2020-11-16 23:06       ` Andy Lutomirski
2020-11-17  8:42         ` Alexandre Chartre
2020-11-17 15:49           ` Andy Lutomirski
2020-11-19 19:15           ` Thomas Gleixner
2020-11-16 14:47 ` [RFC][PATCH v2 12/21] x86/pti: Use PTI stack instead of trampoline stack Alexandre Chartre
2020-11-16 16:57   ` Andy Lutomirski
2020-11-16 18:10     ` Alexandre Chartre
2020-11-16 18:34       ` Andy Lutomirski
2020-11-16 19:37         ` Alexandre Chartre
2020-11-17 15:09         ` Alexandre Chartre
2020-11-17 15:52           ` Andy Lutomirski
2020-11-17 17:01             ` Alexandre Chartre
2020-11-19  1:49               ` Andy Lutomirski
2020-11-19  8:05                 ` Alexandre Chartre
2020-11-19 12:06                   ` Alexandre Chartre
2020-11-19 16:06                     ` Andy Lutomirski
2020-11-19 17:02                       ` Alexandre Chartre
2020-11-16 21:24       ` David Laight
2020-11-17  8:27         ` Alexandre Chartre
2020-11-19 19:10       ` Thomas Gleixner
2020-11-19 19:55         ` Alexandre Chartre
2020-11-19 21:20           ` Thomas Gleixner
2020-11-24  7:20   ` [x86/pti] 5da9e742d1: PANIC:double_fault kernel test robot
2020-11-16 14:47 ` [RFC][PATCH v2 13/21] x86/pti: Execute syscall functions on the kernel stack Alexandre Chartre
2020-11-16 14:47 ` [RFC][PATCH v2 14/21] x86/pti: Execute IDT handlers " Alexandre Chartre
2020-11-16 14:47 ` [RFC][PATCH v2 15/21] x86/pti: Execute IDT handlers with error code " Alexandre Chartre
2020-11-16 14:47 ` [RFC][PATCH v2 16/21] x86/pti: Execute system vector handlers " Alexandre Chartre
2020-11-16 14:47 ` [RFC][PATCH v2 17/21] x86/pti: Execute page fault handler " Alexandre Chartre
2020-11-16 14:47 ` [RFC][PATCH v2 18/21] x86/pti: Execute NMI " Alexandre Chartre
2020-11-16 14:47 ` [RFC][PATCH v2 19/21] x86/pti: Defer CR3 switch to C code for IST entries Alexandre Chartre
2020-11-16 14:47 ` [RFC][PATCH v2 20/21] x86/pti: Defer CR3 switch to C code for non-IST and syscall entries Alexandre Chartre
2020-11-16 14:47 ` [RFC][PATCH v2 21/21] x86/pti: Use a different stack canary with the user and kernel page-table Alexandre Chartre
2020-11-16 16:56   ` Andy Lutomirski
2020-11-16 18:34     ` Alexandre Chartre
2020-11-16 20:17 ` [RFC][PATCH v2 00/21] x86/pti: Defer CR3 switch to C code Borislav Petkov
2020-11-17  7:56   ` Alexandre Chartre
2020-11-17 16:55     ` Borislav Petkov
2020-11-17 18:12       ` Alexandre Chartre
2020-11-17 18:28         ` Borislav Petkov
2020-11-17 19:02           ` Alexandre Chartre
2020-11-17 21:23             ` Borislav Petkov
2020-11-18  7:08               ` Alexandre Chartre
2020-11-17 21:26         ` Borislav Petkov
2020-11-18  7:41           ` Alexandre Chartre
2020-11-18  9:30             ` David Laight
2020-11-18 10:29               ` Alexandre Chartre
2020-11-18 13:22                 ` David Laight
2020-11-18 17:15                   ` Alexandre Chartre
2020-11-18 11:29             ` Borislav Petkov
2020-11-18 19:37               ` Alexandre Chartre
2020-11-16 20:24 ` Borislav Petkov
2020-11-17  8:19   ` Alexandre Chartre
2020-11-17 17:07     ` Borislav Petkov
2020-11-17 18:24       ` Alexandre Chartre
2020-11-19 19:32     ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201116144757.1920077-1-alexandre.chartre@oracle.com \
    --to=alexandre.chartre@oracle.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=graf@amazon.de \
    --cc=hpa@zytor.com \
    --cc=jan.setjeeilers@oracle.com \
    --cc=jroedel@suse.de \
    --cc=junaids@google.com \
    --cc=konrad.wilk@oracle.com \
    --cc=kuzuno@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mgross@linux.intel.com \
    --cc=mingo@redhat.com \
    --cc=oweisse@google.com \
    --cc=peterz@infradead.org \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).