linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH 00/24] x86/pti: Defer CR3 switch to C code
@ 2020-11-09 14:44 Alexandre Chartre
  2020-11-09 14:44 ` [RFC][PATCH 01/24] x86/syscall: Add wrapper for invoking syscall function Alexandre Chartre
                   ` (24 more replies)
  0 siblings, 25 replies; 34+ messages in thread
From: Alexandre Chartre @ 2020-11-09 14:44 UTC (permalink / raw)
  To: tglx, mingo, bp, hpa, x86, dave.hansen, luto, peterz,
	linux-kernel, thomas.lendacky, jroedel
  Cc: konrad.wilk, jan.setjeeilers, junaids, oweisse, rppt, graf,
	mgross, kuzuno, alexandre.chartre

[Resending without messing up email addresses (hopefully!),
 Please reply using this email thread to have correct emails.
 Sorry for the noise.]

With Page Table Isolation (PTI), syscalls as well as interrupts and
exceptions occurring in userspace enter the kernel with a user
page-table. The kernel entry code will then switch the page-table
from the user page-table to the kernel page-table by updating the
CR3 control register. This CR3 switch is currently done early in
the kernel entry sequence using assembly code.

This RFC proposes to defer the PTI CR3 switch until we reach C code.
The benefit is that this simplifies the assembly entry code, and make
the PTI CR3 switch code easier to understand. This also paves the way
for further possible projects such an easier integration of Address
Space Isolation (ASI), or the possibilily to execute some selected
syscall or interrupt handlers without switching to the kernel page-table
(and thus avoid the PTI page-table switch overhead).

Deferring CR3 switch to C code means that we need to run more of the
kernel entry code with the user page-table. To do so, we need to:

 - map more syscall, interrupt and exception entry code into the user
   page-table (map all noinstr code);

 - map additional data used in the entry code (such as stack canary);

 - run more entry code on the trampoline stack (which is mapped both
   in the kernel and in the user page-table) until we switch to the
   kernel page-table and then switch to the kernel stack;

 - have a per-task trampoline stack instead of a per-cpu trampoline
   stack, so the task can be scheduled out while it hasn't switched
   to the kernel stack.

Note that, for now, the CR3 switch can only be pushed as far as interrupts
remain disabled in the entry code. This is because the CR3 switch is done
based on the privilege level from the CS register from the interrupt frame.
I plan to fix this but that's some extra complication (need to track if the
user page-table is used or not).

The proposed patchset is in RFC state to get early feedback about this
proposal.

The code survives running a kernel build and LTP. Note that changes are
only for 64-bit at the moment, I haven't looked at 32-bit yet but I will
definitively check it.

Code is based on v5.10-rc3.

Thanks,

alex.

-----

Alexandre Chartre (24):
  x86/syscall: Add wrapper for invoking syscall function
  x86/entry: Update asm_call_on_stack to support more function arguments
  x86/entry: Consolidate IST entry from userspace
  x86/sev-es: Define a setup stack function for the VC idtentry
  x86/entry: Implement ret_from_fork body with C code
  x86/pti: Provide C variants of PTI switch CR3 macros
  x86/entry: Fill ESPFIX stack using C code
  x86/entry: Add C version of SWAPGS and SWAPGS_UNSAFE_STACK
  x86/entry: Add C version of paranoid_entry/exit
  x86/pti: Introduce per-task PTI trampoline stack
  x86/pti: Function to clone page-table entries from a specified mm
  x86/pti: Function to map per-cpu page-table entry
  x86/pti: Extend PTI user mappings
  x86/pti: Use PTI stack instead of trampoline stack
  x86/pti: Execute syscall functions on the kernel stack
  x86/pti: Execute IDT handlers on the kernel stack
  x86/pti: Execute IDT handlers with error code on the kernel stack
  x86/pti: Execute system vector handlers on the kernel stack
  x86/pti: Execute page fault handler on the kernel stack
  x86/pti: Execute NMI handler on the kernel stack
  x86/entry: Disable stack-protector for IST entry C handlers
  x86/entry: Defer paranoid entry/exit to C code
  x86/entry: Remove paranoid_entry and paranoid_exit
  x86/pti: Defer CR3 switch to C code for non-IST and syscall entries

 arch/x86/entry/common.c               | 259 ++++++++++++-
 arch/x86/entry/entry_64.S             | 513 ++++++++------------------
 arch/x86/entry/entry_64_compat.S      |  22 --
 arch/x86/include/asm/entry-common.h   | 108 ++++++
 arch/x86/include/asm/idtentry.h       | 153 +++++++-
 arch/x86/include/asm/irq_stack.h      |  11 +
 arch/x86/include/asm/page_64_types.h  |  36 +-
 arch/x86/include/asm/paravirt.h       |  15 +
 arch/x86/include/asm/paravirt_types.h |  17 +-
 arch/x86/include/asm/processor.h      |   3 +
 arch/x86/include/asm/pti.h            |  18 +
 arch/x86/include/asm/switch_to.h      |   7 +-
 arch/x86/include/asm/traps.h          |   2 +-
 arch/x86/kernel/cpu/mce/core.c        |   7 +-
 arch/x86/kernel/espfix_64.c           |  41 ++
 arch/x86/kernel/nmi.c                 |  34 +-
 arch/x86/kernel/sev-es.c              |  52 +++
 arch/x86/kernel/traps.c               |  61 +--
 arch/x86/mm/fault.c                   |  11 +-
 arch/x86/mm/pti.c                     |  71 ++--
 kernel/fork.c                         |  22 ++
 21 files changed, 1002 insertions(+), 461 deletions(-)

-- 
2.18.4


^ permalink raw reply	[flat|nested] 34+ messages in thread
* [RFC][PATCH 00/24] x86/pti: Defer CR3 switch to C code
@ 2020-11-09 11:22 Alexandre Chartre
  2020-11-09 11:22 ` [RFC][PATCH 01/24] x86/syscall: Add wrapper for invoking syscall function Alexandre Chartre
  0 siblings, 1 reply; 34+ messages in thread
From: Alexandre Chartre @ 2020-11-09 11:22 UTC (permalink / raw)
  To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	hpa@zytor.com, x86@kernel.org, dave.hansen@linux.intel.com,
	luto@kernel.org, peterz@infradead.org,
	linux-kernel@vger.kernel.org, thomas.lendacky@amd.com,
	jroedel@suse.de
  Cc: konrad.wilk@oracle.com, jan.setjeeilers@oracle.com,
	junaids@google.com, oweisse@google.com, rppt@linux.vnet.ibm.com,
	graf@amazon.de, mgross@linux.intel.com, kuzuno@gmail.com,
	alexandre.chartre@oracle.com


With Page Table Isolation (PTI), syscalls as well as interrupts and
exceptions occurring in userspace enter the kernel with a user
page-table. The kernel entry code will then switch the page-table
from the user page-table to the kernel page-table by updating the
CR3 control register. This CR3 switch is currently done early in
the kernel entry sequence using assembly code.

This RFC proposes to defer the PTI CR3 switch until we reach C code.
The benefit is that this simplifies the assembly entry code, and make
the PTI CR3 switch code easier to understand. This also paves the way
for further possible projects such an easier integration of Address
Space Isolation (ASI), or the possibilily to execute some selected
syscall or interrupt handlers without switching to the kernel page-table
(and thus avoid the PTI page-table switch overhead).

Deferring CR3 switch to C code means that we need to run more of the
kernel entry code with the user page-table. To do so, we need to:

 - map more syscall, interrupt and exception entry code into the user
   page-table (map all noinstr code);

 - map additional data used in the entry code (such as stack canary);

 - run more entry code on the trampoline stack (which is mapped both
   in the kernel and in the user page-table) until we switch to the
   kernel page-table and then switch to the kernel stack;

 - have a per-task trampoline stack instead of a per-cpu trampoline
   stack, so the task can be scheduled out while it hasn't switched
   to the kernel stack.

Note that, for now, the CR3 switch can only be pushed as far as interrupts
remain disabled in the entry code. This is because the CR3 switch is done
based on the privilege level from the CS register from the interrupt frame.
I plan to fix this but that's some extra complication (need to track if the
user page-table is used or not).

The proposed patchset is in RFC state to get early feedback about this
proposal.

The code survives running a kernel build and LTP. Note that changes are
only for 64-bit at the moment, I haven't looked at 32-bit yet but I will
definitively check it.

Code is based on v5.10-rc3.

Thanks,

alex.

-----

Alexandre Chartre (24):
  x86/syscall: Add wrapper for invoking syscall function
  x86/entry: Update asm_call_on_stack to support more function arguments
  x86/entry: Consolidate IST entry from userspace
  x86/sev-es: Define a setup stack function for the VC idtentry
  x86/entry: Implement ret_from_fork body with C code
  x86/pti: Provide C variants of PTI switch CR3 macros
  x86/entry: Fill ESPFIX stack using C code
  x86/entry: Add C version of SWAPGS and SWAPGS_UNSAFE_STACK
  x86/entry: Add C version of paranoid_entry/exit
  x86/pti: Introduce per-task PTI trampoline stack
  x86/pti: Function to clone page-table entries from a specified mm
  x86/pti: Function to map per-cpu page-table entry
  x86/pti: Extend PTI user mappings
  x86/pti: Use PTI stack instead of trampoline stack
  x86/pti: Execute syscall functions on the kernel stack
  x86/pti: Execute IDT handlers on the kernel stack
  x86/pti: Execute IDT handlers with error code on the kernel stack
  x86/pti: Execute system vector handlers on the kernel stack
  x86/pti: Execute page fault handler on the kernel stack
  x86/pti: Execute NMI handler on the kernel stack
  x86/entry: Disable stack-protector for IST entry C handlers
  x86/entry: Defer paranoid entry/exit to C code
  x86/entry: Remove paranoid_entry and paranoid_exit
  x86/pti: Defer CR3 switch to C code for non-IST and syscall entries

 arch/x86/entry/common.c               | 259 ++++++++++++-
 arch/x86/entry/entry_64.S             | 513 ++++++++------------------
 arch/x86/entry/entry_64_compat.S      |  22 --
 arch/x86/include/asm/entry-common.h   | 108 ++++++
 arch/x86/include/asm/idtentry.h       | 153 +++++++-
 arch/x86/include/asm/irq_stack.h      |  11 +
 arch/x86/include/asm/page_64_types.h  |  36 +-
 arch/x86/include/asm/paravirt.h       |  15 +
 arch/x86/include/asm/paravirt_types.h |  17 +-
 arch/x86/include/asm/processor.h      |   3 +
 arch/x86/include/asm/pti.h            |  18 +
 arch/x86/include/asm/switch_to.h      |   7 +-
 arch/x86/include/asm/traps.h          |   2 +-
 arch/x86/kernel/cpu/mce/core.c        |   7 +-
 arch/x86/kernel/espfix_64.c           |  41 ++
 arch/x86/kernel/nmi.c                 |  34 +-
 arch/x86/kernel/sev-es.c              |  52 +++
 arch/x86/kernel/traps.c               |  61 +--
 arch/x86/mm/fault.c                   |  11 +-
 arch/x86/mm/pti.c                     |  71 ++--
 kernel/fork.c                         |  22 ++
 21 files changed, 1002 insertions(+), 461 deletions(-)

-- 
2.18.4


^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2020-11-11  8:53 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-09 14:44 [RFC][PATCH 00/24] x86/pti: Defer CR3 switch to C code Alexandre Chartre
2020-11-09 14:44 ` [RFC][PATCH 01/24] x86/syscall: Add wrapper for invoking syscall function Alexandre Chartre
2020-11-09 14:44 ` [RFC][PATCH 02/24] x86/entry: Update asm_call_on_stack to support more function arguments Alexandre Chartre
2020-11-09 14:44 ` [RFC][PATCH 03/24] x86/entry: Consolidate IST entry from userspace Alexandre Chartre
2020-11-09 14:44 ` [RFC][PATCH 04/24] x86/sev-es: Define a setup stack function for the VC idtentry Alexandre Chartre
2020-11-09 14:44 ` [RFC][PATCH 05/24] x86/entry: Implement ret_from_fork body with C code Alexandre Chartre
2020-11-09 14:44 ` [RFC][PATCH 06/24] x86/pti: Provide C variants of PTI switch CR3 macros Alexandre Chartre
2020-11-09 14:44 ` [RFC][PATCH 07/24] x86/entry: Fill ESPFIX stack using C code Alexandre Chartre
2020-11-09 14:44 ` [RFC][PATCH 08/24] x86/entry: Add C version of SWAPGS and SWAPGS_UNSAFE_STACK Alexandre Chartre
2020-11-09 19:55   ` Alexandre Chartre
2020-11-09 14:44 ` [RFC][PATCH 09/24] x86/entry: Add C version of paranoid_entry/exit Alexandre Chartre
2020-11-09 14:44 ` [RFC][PATCH 10/24] x86/pti: Introduce per-task PTI trampoline stack Alexandre Chartre
2020-11-09 14:44 ` [RFC][PATCH 11/24] x86/pti: Function to clone page-table entries from a specified mm Alexandre Chartre
2020-11-09 14:44 ` [RFC][PATCH 12/24] x86/pti: Function to map per-cpu page-table entry Alexandre Chartre
2020-11-09 14:44 ` [RFC][PATCH 13/24] x86/pti: Extend PTI user mappings Alexandre Chartre
2020-11-09 19:56   ` Alexandre Chartre
2020-11-10 23:39     ` Andy Lutomirski
2020-11-11  8:55       ` Alexandre Chartre
2020-11-09 14:44 ` [RFC][PATCH 14/24] x86/pti: Use PTI stack instead of trampoline stack Alexandre Chartre
2020-11-09 14:44 ` [RFC][PATCH 15/24] x86/pti: Execute syscall functions on the kernel stack Alexandre Chartre
2020-11-09 14:44 ` [RFC][PATCH 16/24] x86/pti: Execute IDT handlers " Alexandre Chartre
2020-11-09 14:44 ` [RFC][PATCH 17/24] x86/pti: Execute IDT handlers with error code " Alexandre Chartre
2020-11-09 14:44 ` [RFC][PATCH 18/24] x86/pti: Execute system vector handlers " Alexandre Chartre
2020-11-09 14:44 ` [RFC][PATCH 19/24] x86/pti: Execute page fault handler " Alexandre Chartre
2020-11-09 14:44 ` [RFC][PATCH 20/24] x86/pti: Execute NMI " Alexandre Chartre
2020-11-09 14:44 ` [RFC][PATCH 21/24] x86/entry: Disable stack-protector for IST entry C handlers Alexandre Chartre
2020-11-09 14:44 ` [RFC][PATCH 22/24] x86/entry: Defer paranoid entry/exit to C code Alexandre Chartre
2020-11-09 14:44 ` [RFC][PATCH 23/24] x86/entry: Remove paranoid_entry and paranoid_exit Alexandre Chartre
2020-11-09 14:44 ` [RFC][PATCH 24/24] x86/pti: Defer CR3 switch to C code for non-IST and syscall entries Alexandre Chartre
2020-11-09 19:35 ` [RFC][PATCH 00/24] x86/pti: Defer CR3 switch to C code Dave Hansen
2020-11-09 19:53   ` Alexandre Chartre
  -- strict thread matches above, loose matches on Subject: below --
2020-11-09 11:22 Alexandre Chartre
2020-11-09 11:22 ` [RFC][PATCH 01/24] x86/syscall: Add wrapper for invoking syscall function Alexandre Chartre
2020-11-09 17:25   ` Andy Lutomirski
2020-11-09 17:45     ` Alexandre Chartre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).