[PATCH 00/35] Shadow stacks for userspace

* [PATCH 00/35] Shadow stacks for userspace
@ 2022-01-30 21:18 Rick Edgecombe
  2022-01-30 21:18 ` [PATCH 01/35] Documentation/x86: Add CET description Rick Edgecombe
                   ` (36 more replies)
  0 siblings, 37 replies; 154+ messages in thread
From: Rick Edgecombe @ 2022-01-30 21:18 UTC (permalink / raw)
  To: x86, H . Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H . J . Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V . Shankar, Dave Martin, Weijiang Yang,
	Kirill A . Shutemov, joao.moreira, John Allen, kcc, eranian
  Cc: rick.p.edgecombe

Hi,

This is a slight reboot of the userspace CET series. I will be taking over the 
series from Yu-cheng. Per some internal recommendations, I’ve reset the version
number and am calling it a new series. Hopefully, it doesn’t cause confusion.

The new plan is to upstream only userspace Shadow Stack support at this point. 
IBT can follow later, but for now I’ll focus solely on the most in-demand and
widely available (with the feature on AMD CPUs now) part of CET.

I thought as part of this reset, it might be useful to more fully write-up the 
design and summarize the history of the previous CET series. So this slightly
long cover letter does that. The "Updates" section has the changes, if anyone
doesn't want the history.

Why is Shadow Stack Wanted
==========================
The main use case for userspace shadow stack is providing protection against 
return oriented programming attacks. Fedora and Ubuntu already have many/most 
packages enabled for shadow stack. The main missing piece is Linux kernel 
support and there seems to be a high amount of interest in the ecosystem for
getting this feature supported. Besides security, Google has also done some
work on using shadow stack to improve performance and reliability of tracing.

Userspace Shadow Stack Implementation
=====================================
Shadow stack works by maintaining a secondary (shadow) stack that cannot be 
directly modified by applications. When executing a CALL instruction, the 
processor pushes the return address to both the normal stack and to the special 
permissioned shadow stack. Upon ret, the processor pops the shadow stack copy 
and compares it to the normal stack copy. If the two differ, the processor 
raises a control protection fault. This implementation supports shadow stack on 
64 bit kernels only, with support for 32 bit only via IA32 emulation.

	Shadow Stack Memory
	-------------------
	The majority of this series deals with changes for handling the special 
	shadow stack memory permissions. This memory is specified by the 
	Dirty+RO PTE bits. A tricky aspect of this is that this combination was 
	previously used to specify COW memory. So Linux needs to handle COW 
	differently when shadow stack is in use. The solution is to use a 
	software PTE bit to denote COW memory, and take care to clear the dirty
	bit when setting the memory RO.

	Setup and Upkeep of HW Registers
	--------------------------------
	Using userspace CET requires a CR4 bit set, and also the manipulation 
	of two xsave managed MSRs. The kernel needs to modify these registers 
	during various operations like clone and signal handling. These 
	operations may happen when the registers are restored to the CPU, or 
	saved in an xsave buffer. Since the recent AMX triggered FPU overhaul 
	removed direct access to the xsave buffer, this series adds an 
	interface to operate on the supervisor xstate.

	New ABIs
	--------
	This series introduces some new ABIs. The primary one is the shadow 
	stack itself. Since it is readable and the shadow stack pointer is 
	exposed to user space, applications can easily read and process the 
	shadow stack. And in fact the tracing usages plan to do exactly that.

	Most of the shadow stack contents are written by HW, but some of the 
	entries are added by the kernel. The main place for this is signals. As 
	part of handling the signal the kernel does some manual adjustment of 
	the shadow stack that userspace depends on.

	In addition to the contents of the shadow stack there is also user 
	visible behavior around when new shadow stacks are created and set in 
	the shadow stack pointer (SSP) register. This is relatively 
	straightforward – shadow stacks are created when new stacks are created 
	(thread creation, fork, etc). It is more or less what is required to 
	keep apps working.

	For situations when userspace creates a new stack (i.e. makecontext(), 
	fibers, etc), a new syscall is provided for creating shadow stack 
	memory. To make the shadow stack usable, it needs to have a restore 
	token written to the protected memory. So the syscall provides a way to 
	specificity this should be done by the kernel.

	When a shadow stack violation happens (when the return address of stack 
	not matching return address in shadow stack), a segfault is generated 
	with a new si_code specific to CET violations.

	Lastly, a new arch_prctl interface is created for controlling the 
	enablement of CET-like features. It is intended to also be used for 
	LAM. It operates on the feature status per-thread, so for process wide 
	enabling it is intended to be used early in things like dynamic 
	linker/loaders. However, it can be used later for per-thread enablement 
	of features like WRSS.

	WRSS
	----
	WRSS is an instruction that can write to shadow stacks. The HW provides 
	a way to enable this instruction for userspace use. Since shadow 
	stack’s are created initially protected, enabling WRSS allows any apps 
	that want to do unusual things with their stacks to have a way to 
	weaken protection and make things more flexible. A new feature bit is 
	defined to control enabling/disabling of WRSS.

History
=======
The branding “CET” really consists of two features: “Shadow Stack” and 
“Indirect Branch Tracking”. They both restrict previously allowed, but rarely 
valid behaviors and require userspace to change to avoid these behaviors before 
enabling the protection. These raw HW features need to be assembled into a 
software solution across userspace and kernel in order to add security value.
The kernel part of this solution has evolved iteratively starting with a lengthy
RFC period. 

Until now, the enabling effort was trying to support both Shadow Stack and IBT. 
This history will focus on a few areas of the shadow stack development history 
that I thought stood out.

	Signals
	-------
	Originally signals placed the location of the shadow stack restore 
	token inside the saved state on the stack. This was problematic from a 
	past ABI promises perspective. So the restore location was instead just 
	assumed from the shadow stack pointer. This works because in normal 
	allowed cases of calling sigreturn, the shadow stack pointer should be 
	right at the restore token at that time. There is no alternate shadow 
	stack support. If an alt shadow stack is added later we would need to 
	find a place to store the regular shadow stack token location. Options 
	could be to push something on the alt shadow stack, or to keep 
	something on the kernel side. So the current design keeps things simple 
	while slightly kicking the can down the road if alt shadow stacks 
	become a thing later. Siglongjmp is handled in glibc, using the incssp 
	instruction to unwind the shadow stack over the token.

	Shadow Stack Allocation
	-----------------------
	makecontext() implementations need a way to create new shadow stacks 
	with restore token’s such that they can be pivoted to from userspace. 
	The first interface to do this was an arch_prctl(). It created a shadow 
	stack with a restore token pre-setup, since the kernel has an 
	instruction that can write to user shadow stacks. However, this 
	interface was abandoned for being strange.

	The next version created PROT_SHADOW_STACK. This interface had two 
	problems. One, it left no options but for userspace to create writable 
	memory, write a restore token, then mproctect() it PROT_SHADOW_STACK. 
	The writable window left the shadow stack exposed, weakening the 
	security. Second, it caused problems with the guard pages. Since the 
	memory was initially created writable it did not have a guard page, but 
	then was mprotected later to a type of memory that should have one. 
	This resulted in missing guard pages and confused rb_subtree_gap’s.

	This version introduces a new syscall that behaves similarly to the 
	initial arch_prctl() interface in that it has the kernel write the 
	restore token.

	Enabling Interface
	------------------
	For the entire history of the original CET series, the design was to 
	enable shadow stack automatically if the feature bit was detected in 
	the elf header. Then it was userspace’s responsibility to turn it off 
	via an arch_prctl() if it was not desired, and this was handled by the 
	glibc dynamic loader. Glibc’s standard behavior (when CET if configured 
	is to leave shadow stack enabled if the executable and all linked 
	libraries are marked with shadow stacks.

	Many distros (Fedora and others) have binaries already marked with 
	shadow stack, waiting for kernel support. Unfortunately their glibc 
	binaries expect the original arch_prctl() interface for allocating 
	shadow stacks, as those changes were pushed ahead of kernel support. 
	The net result of it all is, when updating to a kernel with shadow 
	stack these binaries would suddenly get shadow stack enabled and expect 
	the arch_prctl() interface to be there. And so calls to makecontext() 
	will fail, resulting in visible breakages. This series deals with this 
	problem as described below in "Updates".

Updates
=======
These updates were mostly driven by public comments, but a lot of the design 
elements are new. I would like some extra scrutiny on the updates.

	New syscall for Shadow Stack Allocation
	---------------------------------------
	A new syscall is added for allocating shadow stacks to replace 
	PROT_SHADOW_STACK. Several options were considered, as described in the 
	“x86/cet/shstk: Introduce map_shadow_stack syscall”.

	Xsave Managed Supervisor State Modifications
	--------------------------------------------
	The shadow stack feature requires the kernel to modify xsaves managed 
	state. On one of the last versions of Yu-cheng’s series Boris had 
	commented on the pattern it was using to do this not necessarily being 
	ideal. The pattern was to force a restore to the registers and always 
	do the modification there. Then Thomas did an overhaul of the fpu code, 
	part of which consisted of making raw access to the xsave buffer 
	private to the fpu code. So this series tries to expose access again, 
	and in a way that addresses Boris’ comments.

	The method is to provide functions like wmsrl/rdmsrl, but that can 
	direct the operation to the correct location (registers or buffer), 
	while giving the proper notice to the fpu subsystem so things don’t get 
	clobbered or corrupted.

	In the past a solution like this was discussed as part of the PASID 
	series, and Thomas was not in favor. In CET’s case there is a more 
	logic around the CET MSR’s than in PASID's, and wrapping this logic 
	minimizes near identical open coded logic needed to do this more 
	efficiently. In addition it resolves the above described problem of 
	having no access to the xsave buffer. So it is being put forward here 
	under the supposition that CET’s usage may lead to a different 
	conclusion, not to try to ignore past direction.

	The user interrupt series has similar needs as CET, and will also use
	this internal interface if it’s found acceptable.

	Support for WRSS
	----------------
	Andy Lutomirski had asked if we change the shadow stack allocation API 
	such that userspace cannot create arbitrary shadow stacks, then we look 
	at exposing an interface to enable the WRSS instruction for userspace. 
	This way app’s that want to do unexpected things with shadow stacks 
	would still have the option to create shadow stacks with arbitrary 
	data.

	Switch Enabling Interface
	-------------------------
	As described above there is a problem with userspace binaries waiting 
	to break as soon as the kernel supports CET. This needs to be prevented 
	by changing the interface such that the old binaries will not enable 
	shadow stack AND behave as if shadow stack is not enabled. They should 
	run normally without shadow stack protection. Creating a new feature 
	(SHSTK2) for shadow stack was explored. SHSTK would never be supported 
	by the kernel, and all the userspace build tools would be updated to 
	target SHSTK2 instead of SHSTK. So old SHSTK binaries would be cleanly
	disabled.

	But there are existing downsides to automatic elf header processing 
	based enabling. The elf header feature spec is not defined by the 
	kernel and there are proposals to expand it to describe additional 
	logic. A simpler interface where the kernel is simply told what to 
	enable, and leaves all the decision making to userspace, is more 
	flexible for userspace and simpler for the kernel. There also already 
	needs to be an ARCH_X86_FEATURE_ENABLE arch_prctl() for WRSS (and 
	likely LAM will use it too), so it avoids there being two ways to turn 
	on these types of features. The only tricky part for shadow stack, is 
	that it has to be enabled very early. Wherever the shadow stack is 
	enabled, the app cannot return from that point, otherwise there will be 
	a shadow stack violation. It turns out glibc can enable shadow stack 
	this early, so it works nicely. So not automatically enabling any 
	features in the elf header will cleanly disable all old binaries, which 
	expect the kernel to enable CET features automatically. Then after the 
	kernel changes are upstream, glibc can be updated to use the new
	interface. This is the solution implemented in this series.

	Expand Commit Logs
	------------------
	As part of spinning up on this series, I found some of the commit logs 
	did not describe the changes in enough detail for me understand their 
	purpose. I tried to expand the logs and comments, where I had to go 
	digging. Hopefully it’s useful.

	Limit to only Intel Processors
	------------------------------
	Shadow stack is supported on some AMD processors, but this revision 
	(with expanded HW usage and xsaves changes) has only has been tested on 
	Intel ones. So this series has a patch to limit shadow stack support to 
	Intel processors. Ideally the patch would not even make it to mainline, 
	and should be dropped as soon as this testing is done. It's included 
	just in case.

Future Work
===========
Even though this is now exclusively a shadow stack series, there is still some 
remaining shadow stack work to be done.

	Ptrace
	------
	Early in the series, there was a patch to allow IA32_U_CET and
	IA32_PL3_SSP to be set. This patch was dropped and planned as a follow
	up to basic support, and it remains the plan. It will be needed for
	in-progress gdb support.

	CRIU Support
	------------
	In the past there was some speculation on the mailing list about 
	whether CRIU would need to be taught about CET. It turns out, it does. 
	The first issue hit is that CRIU calls sigreturn directly from its 
	“parasite code” that it injects into the dumper process. This violates
	this shadow stack implementation’s protection that intends to prevent
	attackers from doing this.

	With so many packages already enabled with shadow stack, there is 
	probably desire to make it work seamlessly. But in the meantime if 
	distros want to support shadow stack and CRIU, users could manually 
	disabled shadow stack via “GLIBC_TUNABLES=glibc.cpu.x86_shstk=off” for 
	a process they will wants to dump. It’s not ideal.

	I’d like to hear what people think about having shadow stack in the 
	kernel without this resolved. Nothing would change for any users until 
	they enable shadow stack in the kernel and update to a glibc configured
	with CET. Should CRIU userspace be solved before kernel support?

	Selftests
	---------
	There are some CET selftests being worked on and they are not included
	here.

Thanks,

Rick

Rick Edgecombe (7):
  x86/mm: Prevent VM_WRITE shadow stacks
  x86/fpu: Add helpers for modifying supervisor xstate
  x86/fpu: Add unsafe xsave buffer helpers
  x86/cet/shstk: Introduce map_shadow_stack syscall
  selftests/x86: Add map_shadow_stack syscall test
  x86/cet/shstk: Support wrss for userspace
  x86/cpufeatures: Limit shadow stack to Intel CPUs

Yu-cheng Yu (28):
  Documentation/x86: Add CET description
  x86/cet/shstk: Add Kconfig option for Shadow Stack
  x86/cpufeatures: Add CET CPU feature flags for Control-flow
    Enforcement Technology (CET)
  x86/cpufeatures: Introduce CPU setup and option parsing for CET
  x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states
  x86/cet: Add control-protection fault handler
  x86/mm: Remove _PAGE_DIRTY from kernel RO pages
  x86/mm: Move pmd_write(), pud_write() up in the file
  x86/mm: Introduce _PAGE_COW
  drm/i915/gvt: Change _PAGE_DIRTY to _PAGE_DIRTY_BITS
  x86/mm: Update pte_modify for _PAGE_COW
  x86/mm: Update ptep_set_wrprotect() and pmdp_set_wrprotect() for
    transition from _PAGE_DIRTY to _PAGE_COW
  mm: Move VM_UFFD_MINOR_BIT from 37 to 38
  mm: Introduce VM_SHADOW_STACK for shadow stack memory
  x86/mm: Check Shadow Stack page fault errors
  x86/mm: Update maybe_mkwrite() for shadow stack
  mm: Fixup places that call pte_mkwrite() directly
  mm: Add guard pages around a shadow stack.
  mm/mmap: Add shadow stack pages to memory accounting
  mm: Update can_follow_write_pte() for shadow stack
  mm/mprotect: Exclude shadow stack from preserve_write
  mm: Re-introduce vm_flags to do_mmap()
  x86/cet/shstk: Add user-mode shadow stack support
  x86/process: Change copy_thread() argument 'arg' to 'stack_size'
  x86/cet/shstk: Handle thread shadow stack
  x86/cet/shstk: Introduce shadow stack token setup/verify routines
  x86/cet/shstk: Handle signals for shadow stack
  x86/cet/shstk: Add arch_prctl elf feature functions

 .../admin-guide/kernel-parameters.txt         |   4 +
 Documentation/filesystems/proc.rst            |   1 +
 Documentation/x86/cet.rst                     | 145 ++++++
 Documentation/x86/index.rst                   |   1 +
 arch/arm/kernel/signal.c                      |   2 +-
 arch/arm64/kernel/signal.c                    |   2 +-
 arch/arm64/kernel/signal32.c                  |   2 +-
 arch/sparc/kernel/signal32.c                  |   2 +-
 arch/sparc/kernel/signal_64.c                 |   2 +-
 arch/x86/Kconfig                              |  22 +
 arch/x86/Kconfig.assembler                    |   5 +
 arch/x86/entry/syscalls/syscall_32.tbl        |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl        |   1 +
 arch/x86/ia32/ia32_signal.c                   |  25 +-
 arch/x86/include/asm/cet.h                    |  54 +++
 arch/x86/include/asm/cpufeatures.h            |   1 +
 arch/x86/include/asm/disabled-features.h      |   8 +-
 arch/x86/include/asm/fpu/api.h                |   8 +
 arch/x86/include/asm/fpu/types.h              |  23 +-
 arch/x86/include/asm/fpu/xstate.h             |   6 +-
 arch/x86/include/asm/idtentry.h               |   4 +
 arch/x86/include/asm/mman.h                   |  24 +
 arch/x86/include/asm/mmu_context.h            |   2 +
 arch/x86/include/asm/msr-index.h              |  20 +
 arch/x86/include/asm/page_types.h             |   7 +
 arch/x86/include/asm/pgtable.h                | 302 ++++++++++--
 arch/x86/include/asm/pgtable_types.h          |  48 +-
 arch/x86/include/asm/processor.h              |   6 +
 arch/x86/include/asm/special_insns.h          |  30 ++
 arch/x86/include/asm/trap_pf.h                |   2 +
 arch/x86/include/uapi/asm/mman.h              |   8 +-
 arch/x86/include/uapi/asm/prctl.h             |  10 +
 arch/x86/include/uapi/asm/processor-flags.h   |   2 +
 arch/x86/kernel/Makefile                      |   1 +
 arch/x86/kernel/cpu/common.c                  |  20 +
 arch/x86/kernel/cpu/cpuid-deps.c              |   1 +
 arch/x86/kernel/elf_feature_prctl.c           |  72 +++
 arch/x86/kernel/fpu/xstate.c                  | 167 ++++++-
 arch/x86/kernel/idt.c                         |   4 +
 arch/x86/kernel/process.c                     |  17 +-
 arch/x86/kernel/process_64.c                  |   2 +
 arch/x86/kernel/shstk.c                       | 446 ++++++++++++++++++
 arch/x86/kernel/signal.c                      |  13 +
 arch/x86/kernel/signal_compat.c               |   2 +-
 arch/x86/kernel/traps.c                       |  62 +++
 arch/x86/mm/fault.c                           |  19 +
 arch/x86/mm/mmap.c                            |  48 ++
 arch/x86/mm/pat/set_memory.c                  |   2 +-
 arch/x86/mm/pgtable.c                         |  25 +
 drivers/gpu/drm/i915/gvt/gtt.c                |   2 +-
 fs/aio.c                                      |   2 +-
 fs/proc/task_mmu.c                            |   3 +
 include/linux/mm.h                            |  19 +-
 include/linux/pgtable.h                       |   8 +
 include/linux/syscalls.h                      |   1 +
 include/uapi/asm-generic/siginfo.h            |   3 +-
 include/uapi/asm-generic/unistd.h             |   2 +-
 ipc/shm.c                                     |   2 +-
 kernel/sys_ni.c                               |   1 +
 mm/gup.c                                      |  16 +-
 mm/huge_memory.c                              |  27 +-
 mm/memory.c                                   |   5 +-
 mm/migrate.c                                  |   3 +-
 mm/mmap.c                                     |  15 +-
 mm/mprotect.c                                 |   9 +-
 mm/nommu.c                                    |   4 +-
 mm/util.c                                     |   2 +-
 tools/testing/selftests/x86/Makefile          |   9 +-
 .../selftests/x86/test_map_shadow_stack.c     |  75 +++
 69 files changed, 1797 insertions(+), 92 deletions(-)
 create mode 100644 Documentation/x86/cet.rst
 create mode 100644 arch/x86/include/asm/cet.h
 create mode 100644 arch/x86/include/asm/mman.h
 create mode 100644 arch/x86/kernel/elf_feature_prctl.c
 create mode 100644 arch/x86/kernel/shstk.c
 create mode 100644 tools/testing/selftests/x86/test_map_shadow_stack.c

base-commit: e783362eb54cd99b2cac8b3a9aeac942e6f6ac07
-- 
2.17.1

^ permalink raw reply	[flat|nested] 154+ messages in thread