linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v19 00/25] Control-flow Enforcement: Shadow Stack
@ 2021-02-03 22:55 Yu-cheng Yu
  2021-02-03 22:55 ` [PATCH v19 01/25] Documentation/x86: Add CET description Yu-cheng Yu
                   ` (24 more replies)
  0 siblings, 25 replies; 67+ messages in thread
From: Yu-cheng Yu @ 2021-02-03 22:55 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu
  Cc: Yu-cheng Yu

Control-flow Enforcement (CET) is a new Intel processor feature that blocks
return/jump-oriented programming attacks.  Details are in "Intel 64 and
IA-32 Architectures Software Developer's Manual" [1].

CET can protect applications and the kernel.  This series enables only
application-level protection, and has three parts:

  - Shadow stack [2],
  - Indirect branch tracking [3], and
  - Selftests [4].

I have run tests on these patches for quite some time, and they have been
very stable.  Linux distributions with CET are available now, and Intel
processors with CET are already on the market.  It would be nice if CET
support can be accepted into the kernel.  I will be working to address any
issues should they come up.

Changes in v19:
- Patch #2: update Kconfig text, remove CPU_SUP_INTEL dependency.
- Patch #5: update commit log, add information for XSAVES supervisor states.
- Patch #21: update commit log, add information for shadow stack restore
  token and signal context extension.
- Patch #24: change IS_ENABLED() to cpu_feature_enabled(), EOPNOTSUPP to
  ENOTSUPP, copy_status_to_user() to cet_copy_status_to_user(), remove
  unnecessary type casts.
- Rebase to Linus tree v5.11-rc6.

[1] Intel 64 and IA-32 Architectures Software Developer's Manual:

    https://software.intel.com/en-us/download/intel-64-and-ia-32-
    architectures-sdm-combined-volumes-1-2a-2b-2c-2d-3a-3b-3c-3d-and-4

[2] CET Shadow Stack patches v18:

    https://lkml.kernel.org/r/20210127212524.10188-1-yu-cheng.yu@intel.com/

[3] Indirect Branch Tracking patches v18.

    https://lkml.kernel.org/r/20210127213028.11362-1-yu-cheng.yu@intel.com/

[4] I am holding off the selftests changes and working to get Reviewed-by's.
    The earlier version of the selftests patches:

    https://lkml.kernel.org/r/20200521211720.20236-1-yu-cheng.yu@intel.com/

[5] The kernel ptrace patch is tested with an Intel-internal updated GDB.
    I am holding off the kernel ptrace patch to re-test it with my earlier
    patch for fixing regset holes.

Yu-cheng Yu (25):
  Documentation/x86: Add CET description
  x86/cet/shstk: Add Kconfig option for user-mode control-flow
    protection
  x86/cpufeatures: Add CET CPU feature flags for Control-flow
    Enforcement Technology (CET)
  x86/cpufeatures: Introduce X86_FEATURE_CET and setup functions
  x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states
  x86/cet: Add control-protection fault handler
  x86/mm: Remove _PAGE_DIRTY from kernel RO pages
  x86/mm: Introduce _PAGE_COW
  drm/i915/gvt: Change _PAGE_DIRTY to _PAGE_DIRTY_BITS
  x86/mm: Update pte_modify for _PAGE_COW
  x86/mm: Update ptep_set_wrprotect() and pmdp_set_wrprotect() for
    transition from _PAGE_DIRTY to _PAGE_COW
  mm: Introduce VM_SHSTK for shadow stack memory
  x86/mm: Shadow Stack page fault error checking
  x86/mm: Update maybe_mkwrite() for shadow stack
  mm: Fixup places that call pte_mkwrite() directly
  mm: Add guard pages around a shadow stack.
  mm/mmap: Add shadow stack pages to memory accounting
  mm: Update can_follow_write_pte() for shadow stack
  mm: Re-introduce vm_flags to do_mmap()
  x86/cet/shstk: User-mode shadow stack support
  x86/cet/shstk: Handle signals for shadow stack
  ELF: Introduce arch_setup_elf_property()
  x86/cet/shstk: Handle thread shadow stack
  x86/cet/shstk: Add arch_prctl functions for shadow stack
  mm: Introduce PROT_SHSTK for shadow stack

 .../admin-guide/kernel-parameters.txt         |   6 +
 Documentation/x86/index.rst                   |   1 +
 Documentation/x86/intel_cet.rst               | 136 +++++++
 arch/arm64/include/asm/elf.h                  |   5 +
 arch/x86/Kconfig                              |  28 ++
 arch/x86/Kconfig.assembler                    |   5 +
 arch/x86/ia32/ia32_signal.c                   |  17 +
 arch/x86/include/asm/cet.h                    |  42 +++
 arch/x86/include/asm/cpufeatures.h            |   4 +-
 arch/x86/include/asm/disabled-features.h      |  17 +-
 arch/x86/include/asm/elf.h                    |  13 +
 arch/x86/include/asm/fpu/internal.h           |  10 +
 arch/x86/include/asm/fpu/types.h              |  23 +-
 arch/x86/include/asm/fpu/xstate.h             |   6 +-
 arch/x86/include/asm/idtentry.h               |   4 +
 arch/x86/include/asm/mman.h                   |  85 +++++
 arch/x86/include/asm/mmu_context.h            |   3 +
 arch/x86/include/asm/msr-index.h              |  19 +
 arch/x86/include/asm/page_64_types.h          |  10 +
 arch/x86/include/asm/pgtable.h                | 194 +++++++++-
 arch/x86/include/asm/pgtable_types.h          |  48 ++-
 arch/x86/include/asm/processor.h              |   5 +
 arch/x86/include/asm/special_insns.h          |  32 ++
 arch/x86/include/asm/trap_pf.h                |   2 +
 arch/x86/include/uapi/asm/mman.h              |  28 +-
 arch/x86/include/uapi/asm/prctl.h             |   4 +
 arch/x86/include/uapi/asm/processor-flags.h   |   2 +
 arch/x86/include/uapi/asm/sigcontext.h        |   9 +
 arch/x86/kernel/Makefile                      |   2 +
 arch/x86/kernel/cet.c                         | 343 ++++++++++++++++++
 arch/x86/kernel/cet_prctl.c                   |  60 +++
 arch/x86/kernel/cpu/common.c                  |  16 +
 arch/x86/kernel/cpu/cpuid-deps.c              |   2 +
 arch/x86/kernel/cpu/intel.c                   |   3 +
 arch/x86/kernel/fpu/signal.c                  | 100 +++++
 arch/x86/kernel/fpu/xstate.c                  |  10 +-
 arch/x86/kernel/idt.c                         |   4 +
 arch/x86/kernel/process.c                     |  14 +-
 arch/x86/kernel/process_64.c                  |  32 ++
 arch/x86/kernel/signal.c                      |  10 +
 arch/x86/kernel/signal_compat.c               |   2 +-
 arch/x86/kernel/traps.c                       |  60 +++
 arch/x86/mm/fault.c                           |  19 +
 arch/x86/mm/mmap.c                            |   2 +
 arch/x86/mm/pat/set_memory.c                  |   2 +-
 arch/x86/mm/pgtable.c                         |  25 ++
 drivers/gpu/drm/i915/gvt/gtt.c                |   2 +-
 fs/aio.c                                      |   2 +-
 fs/binfmt_elf.c                               |   4 +
 fs/proc/task_mmu.c                            |   3 +
 include/linux/elf.h                           |   6 +
 include/linux/mm.h                            |  38 +-
 include/linux/pgtable.h                       |  35 ++
 include/uapi/asm-generic/siginfo.h            |   3 +-
 include/uapi/linux/elf.h                      |   9 +
 ipc/shm.c                                     |   2 +-
 mm/gup.c                                      |   8 +-
 mm/huge_memory.c                              |  10 +-
 mm/memory.c                                   |   5 +-
 mm/migrate.c                                  |   3 +-
 mm/mmap.c                                     |  23 +-
 mm/mprotect.c                                 |   2 +-
 mm/nommu.c                                    |   4 +-
 mm/util.c                                     |   2 +-
 64 files changed, 1541 insertions(+), 84 deletions(-)
 create mode 100644 Documentation/x86/intel_cet.rst
 create mode 100644 arch/x86/include/asm/cet.h
 create mode 100644 arch/x86/include/asm/mman.h
 create mode 100644 arch/x86/kernel/cet.c
 create mode 100644 arch/x86/kernel/cet_prctl.c

-- 
2.21.0


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v19 01/25] Documentation/x86: Add CET description
  2021-02-03 22:55 [PATCH v19 00/25] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
@ 2021-02-03 22:55 ` Yu-cheng Yu
  2021-02-04 19:52   ` Kees Cook
  2021-02-03 22:55 ` [PATCH v19 02/25] x86/cet/shstk: Add Kconfig option for user-mode control-flow protection Yu-cheng Yu
                   ` (23 subsequent siblings)
  24 siblings, 1 reply; 67+ messages in thread
From: Yu-cheng Yu @ 2021-02-03 22:55 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu
  Cc: Yu-cheng Yu

Explain no_user_shstk/no_user_ibt kernel parameters, and introduce a new
document on Control-flow Enforcement Technology (CET).

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 .../admin-guide/kernel-parameters.txt         |   6 +
 Documentation/x86/index.rst                   |   1 +
 Documentation/x86/intel_cet.rst               | 136 ++++++++++++++++++
 3 files changed, 143 insertions(+)
 create mode 100644 Documentation/x86/intel_cet.rst

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a10b545c2070..96f65530768c 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3202,6 +3202,12 @@
 			noexec=on: enable non-executable mappings (default)
 			noexec=off: disable non-executable mappings
 
+	no_user_shstk	[X86-64] Disable Shadow Stack for user-mode
+			applications
+
+	no_user_ibt	[X86-64] Disable Indirect Branch Tracking for user-mode
+			applications
+
 	nosmap		[X86,PPC]
 			Disable SMAP (Supervisor Mode Access Prevention)
 			even if it is supported by processor.
diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index 4693e192b447..cf5250a3cc70 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -21,6 +21,7 @@ x86-specific Documentation
    tlb
    mtrr
    pat
+   intel_cet
    intel-iommu
    intel_txt
    amd-memory-encryption
diff --git a/Documentation/x86/intel_cet.rst b/Documentation/x86/intel_cet.rst
new file mode 100644
index 000000000000..93aa289e4439
--- /dev/null
+++ b/Documentation/x86/intel_cet.rst
@@ -0,0 +1,136 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=========================================
+Control-flow Enforcement Technology (CET)
+=========================================
+
+[1] Overview
+============
+
+Control-flow Enforcement Technology (CET) is an Intel processor feature
+that provides protection against return/jump-oriented programming (ROP)
+attacks.  It can be set up to protect both applications and the kernel.
+Only user-mode protection is implemented in the 64-bit kernel, including
+support for running legacy 32-bit applications.
+
+CET introduces Shadow Stack and Indirect Branch Tracking.  Shadow stack is
+a secondary stack allocated from memory and cannot be directly modified by
+applications.  When executing a CALL instruction, the processor pushes the
+return address to both the normal stack and the shadow stack.  Upon
+function return, the processor pops the shadow stack copy and compares it
+to the normal stack copy.  If the two differ, the processor raises a
+control-protection fault.  Indirect branch tracking verifies indirect
+CALL/JMP targets are intended as marked by the compiler with 'ENDBR'
+opcodes.
+
+There is a Kconfig option:
+
+    X86_CET.
+
+To build a CET-enabled kernel, Binutils v2.31 and GCC v8.1 or LLVM v10.0.1
+or later are required.  To build a CET-enabled application, GLIBC v2.28 or
+later is also required.
+
+There are two command-line options for disabling CET features::
+
+    no_user_shstk - disables user shadow stack, and
+    no_user_ibt   - disables user indirect branch tracking.
+
+At run time, /proc/cpuinfo shows CET features if the processor supports
+CET.
+
+[2] Application Enabling
+========================
+
+An application's CET capability is marked in its ELF header and can be
+verified from readelf/llvm-readelf output:
+
+    readelf -n <application> | grep -a SHSTK
+        properties: x86 feature: IBT, SHSTK
+
+If an application supports CET and is statically linked, it will run with
+CET protection.  If the application needs any shared libraries, the loader
+checks all dependencies and enables CET when all requirements are met.
+
+[3] Backward Compatibility
+==========================
+
+GLIBC provides a few CET tunables via the GLIBC_TUNABLES environment
+variable:
+
+GLIBC_TUNABLES=glibc.tune.hwcaps=-SHSTK,-IBT
+    Turn off SHSTK/IBT.
+
+GLIBC_TUNABLES=glibc.tune.x86_shstk=<on, permissive>
+    This controls how dlopen() handles SHSTK legacy libraries::
+
+        on         - continue with SHSTK enabled;
+        permissive - continue with SHSTK off.
+
+Details can be found in the GLIBC manual pages.
+
+[4] CET arch_prctl()'s
+======================
+
+Several arch_prctl()'s have been added for CET:
+
+arch_prctl(ARCH_X86_CET_STATUS, u64 *addr)
+    Return CET feature status.
+
+    The parameter 'addr' is a pointer to a user buffer.
+    On returning to the caller, the kernel fills the following
+    information::
+
+        *addr       = shadow stack/indirect branch tracking status
+        *(addr + 1) = shadow stack base address
+        *(addr + 2) = shadow stack size
+
+arch_prctl(ARCH_X86_CET_DISABLE, unsigned int features)
+    Disable shadow stack and/or indirect branch tracking as specified in
+    'features'.  Return -EPERM if CET is locked.
+
+arch_prctl(ARCH_X86_CET_LOCK)
+    Lock in all CET features.  They cannot be turned off afterwards.
+
+Note:
+  There is no CET-enabling arch_prctl function.  By design, CET is enabled
+  automatically if the binary and the system can support it.
+
+[5] The implementation of the Shadow Stack
+==========================================
+
+Shadow Stack size
+-----------------
+
+A task's shadow stack is allocated from memory to a fixed size of
+MIN(RLIMIT_STACK, 4 GB).  In other words, the shadow stack is allocated to
+the maximum size of the normal stack, but capped to 4 GB.  However,
+a compat-mode application's address space is smaller, each of its thread's
+shadow stack size is MIN(1/4 RLIMIT_STACK, 4 GB).
+
+Signal
+------
+
+The main program and its signal handlers use the same shadow stack.
+Because the shadow stack stores only return addresses, a large shadow
+stack covers the condition that both the program stack and the signal
+alternate stack run out.
+
+The kernel creates a restore token for the shadow stack restoring address
+and verifies that token when restoring from the signal handler.
+
+Fork
+----
+
+The shadow stack's vma has VM_SHSTK flag set; its PTEs are required to be
+read-only and dirty.  When a shadow stack PTE is not RO and dirty, a
+shadow access triggers a page fault with the shadow stack access bit set
+in the page fault error code.
+
+When a task forks a child, its shadow stack PTEs are copied and both the
+parent's and the child's shadow stack PTEs are cleared of the dirty bit.
+Upon the next shadow stack access, the resulting shadow stack page fault
+is handled by page copy/re-use.
+
+When a pthread child is created, the kernel allocates a new shadow stack
+for the new thread.
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v19 02/25] x86/cet/shstk: Add Kconfig option for user-mode control-flow protection
  2021-02-03 22:55 [PATCH v19 00/25] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
  2021-02-03 22:55 ` [PATCH v19 01/25] Documentation/x86: Add CET description Yu-cheng Yu
@ 2021-02-03 22:55 ` Yu-cheng Yu
  2021-02-04 19:56   ` Kees Cook
  2021-02-03 22:55 ` [PATCH v19 03/25] x86/cpufeatures: Add CET CPU feature flags for Control-flow Enforcement Technology (CET) Yu-cheng Yu
                   ` (22 subsequent siblings)
  24 siblings, 1 reply; 67+ messages in thread
From: Yu-cheng Yu @ 2021-02-03 22:55 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu
  Cc: Yu-cheng Yu

Shadow Stack provides protection against function return address
corruption.  It is active when the processor supports it, the kernel has
CONFIG_X86_CET enabled, and the application is built for the feature.
This is only implemented for the 64-bit kernel.  When it is enabled, legacy
non-Shadow Stack applications continue to work, but without protection.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/Kconfig           | 22 ++++++++++++++++++++++
 arch/x86/Kconfig.assembler |  5 +++++
 2 files changed, 27 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 21f851179ff0..074b3c0e6bf6 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1951,6 +1951,28 @@ config X86_SGX
 
 	  If unsure, say N.
 
+config ARCH_HAS_SHADOW_STACK
+	def_bool n
+
+config X86_CET
+	prompt "Intel Control-flow protection for user-mode"
+	def_bool n
+	depends on X86_64
+	depends on AS_WRUSS
+	select ARCH_USES_HIGH_VMA_FLAGS
+	select ARCH_HAS_SHADOW_STACK
+	help
+	  Control-flow protection is a set of hardware features which place
+	  additional restrictions on indirect branches.  These help
+	  mitigate ROP attacks.  Applications must be enabled to use it,
+	  and old userspace does not get protection "for free".
+	  Support for this feature is present on Tiger Lake family of
+	  processors released in 2020 or later.  Enabling this feature
+	  increases kernel text size by 3.7 KB.
+	  See Documentation/x86/intel_cet.rst for more information.
+
+	  If unsure, say N.
+
 config EFI
 	bool "EFI runtime service support"
 	depends on ACPI
diff --git a/arch/x86/Kconfig.assembler b/arch/x86/Kconfig.assembler
index 26b8c08e2fc4..00c79dd93651 100644
--- a/arch/x86/Kconfig.assembler
+++ b/arch/x86/Kconfig.assembler
@@ -19,3 +19,8 @@ config AS_TPAUSE
 	def_bool $(as-instr,tpause %ecx)
 	help
 	  Supported by binutils >= 2.31.1 and LLVM integrated assembler >= V7
+
+config AS_WRUSS
+	def_bool $(as-instr,wrussq %rax$(comma)(%rbx))
+	help
+	  Supported by binutils >= 2.31 and LLVM integrated assembler
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v19 03/25] x86/cpufeatures: Add CET CPU feature flags for Control-flow Enforcement Technology (CET)
  2021-02-03 22:55 [PATCH v19 00/25] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
  2021-02-03 22:55 ` [PATCH v19 01/25] Documentation/x86: Add CET description Yu-cheng Yu
  2021-02-03 22:55 ` [PATCH v19 02/25] x86/cet/shstk: Add Kconfig option for user-mode control-flow protection Yu-cheng Yu
@ 2021-02-03 22:55 ` Yu-cheng Yu
  2021-02-04 19:57   ` Kees Cook
  2021-02-03 22:55 ` [PATCH v19 04/25] x86/cpufeatures: Introduce X86_FEATURE_CET and setup functions Yu-cheng Yu
                   ` (21 subsequent siblings)
  24 siblings, 1 reply; 67+ messages in thread
From: Yu-cheng Yu @ 2021-02-03 22:55 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu
  Cc: Yu-cheng Yu

Add CPU feature flags for Control-flow Enforcement Technology (CET).

CPUID.(EAX=7,ECX=0):ECX[bit 7] Shadow stack
CPUID.(EAX=7,ECX=0):EDX[bit 20] Indirect Branch Tracking

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/include/asm/cpufeatures.h       |  2 ++
 arch/x86/include/asm/disabled-features.h | 12 ++++++++++--
 arch/x86/kernel/cpu/cpuid-deps.c         |  2 ++
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 84b887825f12..292fe87b26b3 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -345,6 +345,7 @@
 #define X86_FEATURE_OSPKE		(16*32+ 4) /* OS Protection Keys Enable */
 #define X86_FEATURE_WAITPKG		(16*32+ 5) /* UMONITOR/UMWAIT/TPAUSE Instructions */
 #define X86_FEATURE_AVX512_VBMI2	(16*32+ 6) /* Additional AVX512 Vector Bit Manipulation Instructions */
+#define X86_FEATURE_SHSTK		(16*32+ 7) /* Shadow Stack */
 #define X86_FEATURE_GFNI		(16*32+ 8) /* Galois Field New Instructions */
 #define X86_FEATURE_VAES		(16*32+ 9) /* Vector AES */
 #define X86_FEATURE_VPCLMULQDQ		(16*32+10) /* Carry-Less Multiplication Double Quadword */
@@ -377,6 +378,7 @@
 #define X86_FEATURE_TSXLDTRK		(18*32+16) /* TSX Suspend Load Address Tracking */
 #define X86_FEATURE_PCONFIG		(18*32+18) /* Intel PCONFIG */
 #define X86_FEATURE_ARCH_LBR		(18*32+19) /* Intel ARCH LBR */
+#define X86_FEATURE_IBT			(18*32+20) /* Indirect Branch Tracking */
 #define X86_FEATURE_AVX512_FP16		(18*32+23) /* AVX512 FP16 */
 #define X86_FEATURE_SPEC_CTRL		(18*32+26) /* "" Speculation Control (IBRS + IBPB) */
 #define X86_FEATURE_INTEL_STIBP		(18*32+27) /* "" Single Thread Indirect Branch Predictors */
diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index 7947cb1782da..a66e7f34b788 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -68,6 +68,14 @@
 # define DISABLE_SGX	(1 << (X86_FEATURE_SGX & 31))
 #endif
 
+#ifdef CONFIG_X86_CET
+#define DISABLE_SHSTK	0
+#define DISABLE_IBT	0
+#else
+#define DISABLE_SHSTK	(1 << (X86_FEATURE_SHSTK & 31))
+#define DISABLE_IBT	(1 << (X86_FEATURE_IBT & 31))
+#endif
+
 /*
  * Make sure to add features to the correct mask
  */
@@ -88,9 +96,9 @@
 #define DISABLED_MASK14	0
 #define DISABLED_MASK15	0
 #define DISABLED_MASK16	(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP| \
-			 DISABLE_ENQCMD)
+			 DISABLE_ENQCMD|DISABLE_SHSTK)
 #define DISABLED_MASK17	0
-#define DISABLED_MASK18	0
+#define DISABLED_MASK18	(DISABLE_IBT)
 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 19)
 
 #endif /* _ASM_X86_DISABLED_FEATURES_H */
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index 42af31b64c2c..52d9a682a0e6 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -72,6 +72,8 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_AVX512_FP16,		X86_FEATURE_AVX512BW  },
 	{ X86_FEATURE_ENQCMD,			X86_FEATURE_XSAVES    },
 	{ X86_FEATURE_PER_THREAD_MBA,		X86_FEATURE_MBA       },
+	{ X86_FEATURE_SHSTK,			X86_FEATURE_XSAVES    },
+	{ X86_FEATURE_IBT,			X86_FEATURE_XSAVES    },
 	{}
 };
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v19 04/25] x86/cpufeatures: Introduce X86_FEATURE_CET and setup functions
  2021-02-03 22:55 [PATCH v19 00/25] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (2 preceding siblings ...)
  2021-02-03 22:55 ` [PATCH v19 03/25] x86/cpufeatures: Add CET CPU feature flags for Control-flow Enforcement Technology (CET) Yu-cheng Yu
@ 2021-02-03 22:55 ` Yu-cheng Yu
  2021-02-04 19:58   ` Kees Cook
  2021-02-05 13:43   ` Borislav Petkov
  2021-02-03 22:55 ` [PATCH v19 05/25] x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states Yu-cheng Yu
                   ` (20 subsequent siblings)
  24 siblings, 2 replies; 67+ messages in thread
From: Yu-cheng Yu @ 2021-02-03 22:55 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu
  Cc: Yu-cheng Yu

Introduce a software-defined X86_FEATURE_CET, which indicates either Shadow
Stack or Indirect Branch Tracking (or both) is present.  Also introduce
related cpu init/setup functions.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/include/asm/cpufeatures.h          |  2 +-
 arch/x86/include/asm/disabled-features.h    |  5 ++++-
 arch/x86/include/uapi/asm/processor-flags.h |  2 ++
 arch/x86/kernel/cpu/common.c                | 16 ++++++++++++++++
 arch/x86/kernel/cpu/intel.c                 |  3 +++
 5 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 292fe87b26b3..d1866659edbd 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -108,7 +108,7 @@
 #define X86_FEATURE_EXTD_APICID		( 3*32+26) /* Extended APICID (8 bits) */
 #define X86_FEATURE_AMD_DCM		( 3*32+27) /* AMD multi-node processor */
 #define X86_FEATURE_APERFMPERF		( 3*32+28) /* P-State hardware coordination feedback capability (APERF/MPERF MSRs) */
-/* free					( 3*32+29) */
+#define X86_FEATURE_CET			( 3*32+29) /* Control-flow enforcement */
 #define X86_FEATURE_NONSTOP_TSC_S3	( 3*32+30) /* TSC doesn't stop in S3 state */
 #define X86_FEATURE_TSC_KNOWN_FREQ	( 3*32+31) /* TSC has known frequency */
 
diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index a66e7f34b788..ab28dc7b30d0 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -71,9 +71,11 @@
 #ifdef CONFIG_X86_CET
 #define DISABLE_SHSTK	0
 #define DISABLE_IBT	0
+#define DISABLE_CET	0
 #else
 #define DISABLE_SHSTK	(1 << (X86_FEATURE_SHSTK & 31))
 #define DISABLE_IBT	(1 << (X86_FEATURE_IBT & 31))
+#define DISABLE_CET	(1 << (X86_FEATURE_CET & 31))
 #endif
 
 /*
@@ -82,7 +84,8 @@
 #define DISABLED_MASK0	(DISABLE_VME)
 #define DISABLED_MASK1	0
 #define DISABLED_MASK2	0
-#define DISABLED_MASK3	(DISABLE_CYRIX_ARR|DISABLE_CENTAUR_MCR|DISABLE_K6_MTRR)
+#define DISABLED_MASK3	(DISABLE_CYRIX_ARR|DISABLE_CENTAUR_MCR|DISABLE_K6_MTRR| \
+			 DISABLE_CET)
 #define DISABLED_MASK4	(DISABLE_PCID)
 #define DISABLED_MASK5	0
 #define DISABLED_MASK6	0
diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h
index bcba3c643e63..a8df907e8017 100644
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -130,6 +130,8 @@
 #define X86_CR4_SMAP		_BITUL(X86_CR4_SMAP_BIT)
 #define X86_CR4_PKE_BIT		22 /* enable Protection Keys support */
 #define X86_CR4_PKE		_BITUL(X86_CR4_PKE_BIT)
+#define X86_CR4_CET_BIT		23 /* enable Control-flow Enforcement */
+#define X86_CR4_CET		_BITUL(X86_CR4_CET_BIT)
 
 /*
  * x86-64 Task Priority Register, CR8
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 35ad8480c464..803d0749b302 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -510,6 +510,14 @@ static __init int setup_disable_pku(char *arg)
 __setup("nopku", setup_disable_pku);
 #endif /* CONFIG_X86_64 */
 
+static __always_inline void setup_cet(struct cpuinfo_x86 *c)
+{
+	if (!cpu_feature_enabled(X86_FEATURE_CET))
+		return;
+
+	cr4_set_bits(X86_CR4_CET);
+}
+
 /*
  * Some CPU features depend on higher CPUID levels, which may not always
  * be available due to CPUID level capping or broken virtualization
@@ -1252,6 +1260,13 @@ static void __init cpu_parse_early_param(void)
 	if (cmdline_find_option_bool(boot_command_line, "noxsaves"))
 		setup_clear_cpu_cap(X86_FEATURE_XSAVES);
 
+#ifdef CONFIG_X86_CET
+	if (cmdline_find_option_bool(boot_command_line, "no_user_shstk"))
+		setup_clear_cpu_cap(X86_FEATURE_SHSTK);
+	if (cmdline_find_option_bool(boot_command_line, "no_user_ibt"))
+		setup_clear_cpu_cap(X86_FEATURE_IBT);
+#endif
+
 	arglen = cmdline_find_option(boot_command_line, "clearcpuid", arg, sizeof(arg));
 	if (arglen <= 0)
 		return;
@@ -1591,6 +1606,7 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 
 	x86_init_rdrand(c);
 	setup_pku(c);
+	setup_cet(c);
 
 	/*
 	 * Clear/Set all flags overridden by options, need do it
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 59a1e3ce3f14..1784b9f95239 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -333,6 +333,9 @@ static void early_init_intel(struct cpuinfo_x86 *c)
 
 static void bsp_init_intel(struct cpuinfo_x86 *c)
 {
+	if (cpu_has(c, X86_FEATURE_SHSTK) || cpu_has(c, X86_FEATURE_IBT))
+		setup_force_cpu_cap(X86_FEATURE_CET);
+
 	resctrl_cpu_detect(c);
 }
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v19 05/25] x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states
  2021-02-03 22:55 [PATCH v19 00/25] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (3 preceding siblings ...)
  2021-02-03 22:55 ` [PATCH v19 04/25] x86/cpufeatures: Introduce X86_FEATURE_CET and setup functions Yu-cheng Yu
@ 2021-02-03 22:55 ` Yu-cheng Yu
  2021-02-04 19:59   ` Kees Cook
  2021-02-03 22:55 ` [PATCH v19 06/25] x86/cet: Add control-protection fault handler Yu-cheng Yu
                   ` (19 subsequent siblings)
  24 siblings, 1 reply; 67+ messages in thread
From: Yu-cheng Yu @ 2021-02-03 22:55 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu
  Cc: Yu-cheng Yu

Control-flow Enforcement Technology (CET) introduces these MSRs:

    MSR_IA32_U_CET (user-mode CET settings),
    MSR_IA32_PL3_SSP (user-mode shadow stack pointer),

    MSR_IA32_PL0_SSP (kernel-mode shadow stack pointer),
    MSR_IA32_PL1_SSP (Privilege Level 1 shadow stack pointer),
    MSR_IA32_PL2_SSP (Privilege Level 2 shadow stack pointer),
    MSR_IA32_S_CET (kernel-mode CET settings),
    MSR_IA32_INT_SSP_TAB (exception shadow stack table).

The two user-mode MSRs belong to XFEATURE_CET_USER.  The first three of
kernel-mode MSRs belong to XFEATURE_CET_KERNEL.  Both XSAVES states are
supervisor states.  This means that there is no direct, unprivileged access
to these states, making it harder for an attacker to subvert CET.

For sigreturn and future ptrace() support, shadow stack address and MSR
reserved bits are checked before written to the supervisor states.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/include/asm/fpu/types.h  | 23 +++++++++++++++++++++--
 arch/x86/include/asm/fpu/xstate.h |  6 ++++--
 arch/x86/include/asm/msr-index.h  | 19 +++++++++++++++++++
 arch/x86/kernel/fpu/xstate.c      | 10 +++++++++-
 4 files changed, 53 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index f5a38a5f3ae1..035eb0ec665e 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -115,8 +115,8 @@ enum xfeature {
 	XFEATURE_PT_UNIMPLEMENTED_SO_FAR,
 	XFEATURE_PKRU,
 	XFEATURE_PASID,
-	XFEATURE_RSRVD_COMP_11,
-	XFEATURE_RSRVD_COMP_12,
+	XFEATURE_CET_USER,
+	XFEATURE_CET_KERNEL,
 	XFEATURE_RSRVD_COMP_13,
 	XFEATURE_RSRVD_COMP_14,
 	XFEATURE_LBR,
@@ -135,6 +135,8 @@ enum xfeature {
 #define XFEATURE_MASK_PT		(1 << XFEATURE_PT_UNIMPLEMENTED_SO_FAR)
 #define XFEATURE_MASK_PKRU		(1 << XFEATURE_PKRU)
 #define XFEATURE_MASK_PASID		(1 << XFEATURE_PASID)
+#define XFEATURE_MASK_CET_USER		(1 << XFEATURE_CET_USER)
+#define XFEATURE_MASK_CET_KERNEL	(1 << XFEATURE_CET_KERNEL)
 #define XFEATURE_MASK_LBR		(1 << XFEATURE_LBR)
 
 #define XFEATURE_MASK_FPSSE		(XFEATURE_MASK_FP | XFEATURE_MASK_SSE)
@@ -237,6 +239,23 @@ struct pkru_state {
 	u32				pad;
 } __packed;
 
+/*
+ * State component 11 is Control-flow Enforcement user states
+ */
+struct cet_user_state {
+	u64 user_cet;			/* user control-flow settings */
+	u64 user_ssp;			/* user shadow stack pointer */
+};
+
+/*
+ * State component 12 is Control-flow Enforcement kernel states
+ */
+struct cet_kernel_state {
+	u64 kernel_ssp;			/* kernel shadow stack */
+	u64 pl1_ssp;			/* privilege level 1 shadow stack */
+	u64 pl2_ssp;			/* privilege level 2 shadow stack */
+};
+
 /*
  * State component 15: Architectural LBR configuration state.
  * The size of Arch LBR state depends on the number of LBRs (lbr_depth).
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 47a92232d595..582f3575e0bd 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -35,7 +35,8 @@
 				      XFEATURE_MASK_BNDCSR)
 
 /* All currently supported supervisor features */
-#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID)
+#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID | \
+					    XFEATURE_MASK_CET_USER)
 
 /*
  * A supervisor state component may not always contain valuable information,
@@ -62,7 +63,8 @@
  * Unsupported supervisor features. When a supervisor feature in this mask is
  * supported in the future, move it to the supported supervisor feature mask.
  */
-#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT)
+#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT | \
+					      XFEATURE_MASK_CET_KERNEL)
 
 /* All supervisor states including supported and unsupported states. */
 #define XFEATURE_MASK_SUPERVISOR_ALL (XFEATURE_MASK_SUPERVISOR_SUPPORTED | \
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 546d6ecf0a35..fae6b3ea1f6d 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -933,4 +933,23 @@
 #define MSR_VM_IGNNE                    0xc0010115
 #define MSR_VM_HSAVE_PA                 0xc0010117
 
+/* Control-flow Enforcement Technology MSRs */
+#define MSR_IA32_U_CET		0x6a0 /* user mode cet setting */
+#define MSR_IA32_S_CET		0x6a2 /* kernel mode cet setting */
+#define CET_SHSTK_EN		BIT_ULL(0)
+#define CET_WRSS_EN		BIT_ULL(1)
+#define CET_ENDBR_EN		BIT_ULL(2)
+#define CET_LEG_IW_EN		BIT_ULL(3)
+#define CET_NO_TRACK_EN		BIT_ULL(4)
+#define CET_SUPPRESS_DISABLE	BIT_ULL(5)
+#define CET_RESERVED		(BIT_ULL(6) | BIT_ULL(7) | BIT_ULL(8) | BIT_ULL(9))
+#define CET_SUPPRESS		BIT_ULL(10)
+#define CET_WAIT_ENDBR		BIT_ULL(11)
+
+#define MSR_IA32_PL0_SSP	0x6a4 /* kernel shadow stack pointer */
+#define MSR_IA32_PL1_SSP	0x6a5 /* ring-1 shadow stack pointer */
+#define MSR_IA32_PL2_SSP	0x6a6 /* ring-2 shadow stack pointer */
+#define MSR_IA32_PL3_SSP	0x6a7 /* user shadow stack pointer */
+#define MSR_IA32_INT_SSP_TAB	0x6a8 /* exception shadow stack table */
+
 #endif /* _ASM_X86_MSR_INDEX_H */
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 5d8047441a0a..22eedf8066bf 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -38,6 +38,8 @@ static const char *xfeature_names[] =
 	"Processor Trace (unused)"	,
 	"Protection Keys User registers",
 	"PASID state",
+	"Control-flow User registers"	,
+	"Control-flow Kernel registers"	,
 	"unknown xstate feature"	,
 };
 
@@ -53,6 +55,8 @@ static short xsave_cpuid_features[] __initdata = {
 	X86_FEATURE_INTEL_PT,
 	X86_FEATURE_PKU,
 	X86_FEATURE_ENQCMD,
+	X86_FEATURE_CET, /* XFEATURE_CET_USER */
+	X86_FEATURE_CET, /* XFEATURE_CET_KERNEL */
 };
 
 /*
@@ -321,6 +325,8 @@ static void __init print_xstate_features(void)
 	print_xstate_feature(XFEATURE_MASK_Hi16_ZMM);
 	print_xstate_feature(XFEATURE_MASK_PKRU);
 	print_xstate_feature(XFEATURE_MASK_PASID);
+	print_xstate_feature(XFEATURE_MASK_CET_USER);
+	print_xstate_feature(XFEATURE_MASK_CET_KERNEL);
 }
 
 /*
@@ -596,6 +602,8 @@ static void check_xstate_against_struct(int nr)
 	XCHECK_SZ(sz, nr, XFEATURE_Hi16_ZMM,  struct avx_512_hi16_state);
 	XCHECK_SZ(sz, nr, XFEATURE_PKRU,      struct pkru_state);
 	XCHECK_SZ(sz, nr, XFEATURE_PASID,     struct ia32_pasid_state);
+	XCHECK_SZ(sz, nr, XFEATURE_CET_USER,   struct cet_user_state);
+	XCHECK_SZ(sz, nr, XFEATURE_CET_KERNEL, struct cet_kernel_state);
 
 	/*
 	 * Make *SURE* to add any feature numbers in below if
@@ -605,7 +613,7 @@ static void check_xstate_against_struct(int nr)
 	if ((nr < XFEATURE_YMM) ||
 	    (nr >= XFEATURE_MAX) ||
 	    (nr == XFEATURE_PT_UNIMPLEMENTED_SO_FAR) ||
-	    ((nr >= XFEATURE_RSRVD_COMP_11) && (nr <= XFEATURE_LBR))) {
+	    ((nr >= XFEATURE_RSRVD_COMP_13) && (nr <= XFEATURE_LBR))) {
 		WARN_ONCE(1, "no structure for xstate: %d\n", nr);
 		XSTATE_WARN_ON(1);
 	}
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v19 06/25] x86/cet: Add control-protection fault handler
  2021-02-03 22:55 [PATCH v19 00/25] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (4 preceding siblings ...)
  2021-02-03 22:55 ` [PATCH v19 05/25] x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states Yu-cheng Yu
@ 2021-02-03 22:55 ` Yu-cheng Yu
  2021-02-04 20:09   ` Kees Cook
  2021-02-05 13:59   ` Borislav Petkov
  2021-02-03 22:55 ` [PATCH v19 07/25] x86/mm: Remove _PAGE_DIRTY from kernel RO pages Yu-cheng Yu
                   ` (18 subsequent siblings)
  24 siblings, 2 replies; 67+ messages in thread
From: Yu-cheng Yu @ 2021-02-03 22:55 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu
  Cc: Yu-cheng Yu, Michael Kerrisk

A control-protection fault is triggered when a control-flow transfer
attempt violates Shadow Stack or Indirect Branch Tracking constraints.
For example, the return address for a RET instruction differs from the copy
on the shadow stack; or an indirect JMP instruction, without the NOTRACK
prefix, arrives at a non-ENDBR opcode.

The control-protection fault handler works in a similar way as the general
protection fault handler.  It provides the si_code SEGV_CPERR to the signal
handler.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
---
 arch/x86/include/asm/idtentry.h    |  4 ++
 arch/x86/kernel/idt.c              |  4 ++
 arch/x86/kernel/signal_compat.c    |  2 +-
 arch/x86/kernel/traps.c            | 60 ++++++++++++++++++++++++++++++
 include/uapi/asm-generic/siginfo.h |  3 +-
 5 files changed, 71 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index f656aabd1545..ff4b3bf634da 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -574,6 +574,10 @@ DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_SS,	exc_stack_segment);
 DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_GP,	exc_general_protection);
 DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_AC,	exc_alignment_check);
 
+#ifdef CONFIG_X86_CET
+DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_CP, exc_control_protection);
+#endif
+
 /* Raw exception entries which need extra work */
 DECLARE_IDTENTRY_RAW(X86_TRAP_UD,		exc_invalid_op);
 DECLARE_IDTENTRY_RAW(X86_TRAP_BP,		exc_int3);
diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
index ee1a283f8e96..e8166d9bbb10 100644
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -105,6 +105,10 @@ static const __initconst struct idt_data def_idts[] = {
 #elif defined(CONFIG_X86_32)
 	SYSG(IA32_SYSCALL_VECTOR,	entry_INT80_32),
 #endif
+
+#ifdef CONFIG_X86_CET
+	INTG(X86_TRAP_CP,		asm_exc_control_protection),
+#endif
 };
 
 /*
diff --git a/arch/x86/kernel/signal_compat.c b/arch/x86/kernel/signal_compat.c
index a5330ff498f0..dd92490b1e7f 100644
--- a/arch/x86/kernel/signal_compat.c
+++ b/arch/x86/kernel/signal_compat.c
@@ -27,7 +27,7 @@ static inline void signal_compat_build_tests(void)
 	 */
 	BUILD_BUG_ON(NSIGILL  != 11);
 	BUILD_BUG_ON(NSIGFPE  != 15);
-	BUILD_BUG_ON(NSIGSEGV != 9);
+	BUILD_BUG_ON(NSIGSEGV != 10);
 	BUILD_BUG_ON(NSIGBUS  != 5);
 	BUILD_BUG_ON(NSIGTRAP != 5);
 	BUILD_BUG_ON(NSIGCHLD != 6);
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 7f5aec758f0e..f5354c35df32 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -606,6 +606,66 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
 	cond_local_irq_disable(regs);
 }
 
+#ifdef CONFIG_X86_CET
+static const char * const control_protection_err[] = {
+	"unknown",
+	"near-ret",
+	"far-ret/iret",
+	"endbranch",
+	"rstorssp",
+	"setssbsy",
+};
+
+/*
+ * When a control protection exception occurs, send a signal to the responsible
+ * application.  Currently, control protection is only enabled for user mode.
+ * This exception should not come from kernel mode.
+ */
+DEFINE_IDTENTRY_ERRORCODE(exc_control_protection)
+{
+	static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
+				      DEFAULT_RATELIMIT_BURST);
+	struct task_struct *tsk;
+
+	if (!user_mode(regs)) {
+		pr_emerg("PANIC: unexpected kernel control protection fault\n");
+		die("kernel control protection fault", regs, error_code);
+		panic("Machine halted.");
+	}
+
+	cond_local_irq_enable(regs);
+
+	if (!boot_cpu_has(X86_FEATURE_CET))
+		WARN_ONCE(1, "Control protection fault with CET support disabled\n");
+
+	tsk = current;
+	tsk->thread.error_code = error_code;
+	tsk->thread.trap_nr = X86_TRAP_CP;
+
+	if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV) &&
+	    __ratelimit(&rs)) {
+		unsigned int max_err;
+		unsigned long ssp;
+
+		max_err = ARRAY_SIZE(control_protection_err) - 1;
+		if (error_code < 0 || error_code > max_err)
+			error_code = 0;
+
+		rdmsrl(MSR_IA32_PL3_SSP, ssp);
+		pr_emerg("%s[%d] control protection ip:%lx sp:%lx ssp:%lx error:%lx(%s)",
+			 tsk->comm, task_pid_nr(tsk),
+			 regs->ip, regs->sp, ssp, error_code,
+			 control_protection_err[error_code]);
+		print_vma_addr(KERN_CONT " in ", regs->ip);
+		pr_cont("\n");
+	}
+
+	force_sig_fault(SIGSEGV, SEGV_CPERR,
+			(void __user *)uprobe_get_trap_addr(regs));
+	cond_local_irq_disable(regs);
+}
+#endif
+
 static bool do_int3(struct pt_regs *regs)
 {
 	int res;
diff --git a/include/uapi/asm-generic/siginfo.h b/include/uapi/asm-generic/siginfo.h
index d2597000407a..1c2ea91284a0 100644
--- a/include/uapi/asm-generic/siginfo.h
+++ b/include/uapi/asm-generic/siginfo.h
@@ -231,7 +231,8 @@ typedef struct siginfo {
 #define SEGV_ADIPERR	7	/* Precise MCD exception */
 #define SEGV_MTEAERR	8	/* Asynchronous ARM MTE error */
 #define SEGV_MTESERR	9	/* Synchronous ARM MTE exception */
-#define NSIGSEGV	9
+#define SEGV_CPERR	10	/* Control protection fault */
+#define NSIGSEGV	10
 
 /*
  * SIGBUS si_codes
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v19 07/25] x86/mm: Remove _PAGE_DIRTY from kernel RO pages
  2021-02-03 22:55 [PATCH v19 00/25] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (5 preceding siblings ...)
  2021-02-03 22:55 ` [PATCH v19 06/25] x86/cet: Add control-protection fault handler Yu-cheng Yu
@ 2021-02-03 22:55 ` Yu-cheng Yu
  2021-02-04 20:10   ` Kees Cook
  2021-02-03 22:55 ` [PATCH v19 08/25] x86/mm: Introduce _PAGE_COW Yu-cheng Yu
                   ` (17 subsequent siblings)
  24 siblings, 1 reply; 67+ messages in thread
From: Yu-cheng Yu @ 2021-02-03 22:55 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu
  Cc: Yu-cheng Yu, Christoph Hellwig

The x86 family of processors do not directly create read-only and Dirty
PTEs.  These PTEs are created by software.  One such case is that kernel
read-only pages are historically setup as Dirty.

New processors that support Shadow Stack regard read-only and Dirty PTEs as
shadow stack pages.  This results in ambiguity between shadow stack and
kernel read-only pages.  To resolve this, removed Dirty from kernel read-
only pages.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 arch/x86/include/asm/pgtable_types.h | 6 +++---
 arch/x86/mm/pat/set_memory.c         | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 394757ee030a..b8b79d618bbc 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -194,10 +194,10 @@ enum page_cache_mode {
 #define _KERNPG_TABLE		 (__PP|__RW|   0|___A|   0|___D|   0|   0| _ENC)
 #define _PAGE_TABLE_NOENC	 (__PP|__RW|_USR|___A|   0|___D|   0|   0)
 #define _PAGE_TABLE		 (__PP|__RW|_USR|___A|   0|___D|   0|   0| _ENC)
-#define __PAGE_KERNEL_RO	 (__PP|   0|   0|___A|__NX|___D|   0|___G)
-#define __PAGE_KERNEL_ROX	 (__PP|   0|   0|___A|   0|___D|   0|___G)
+#define __PAGE_KERNEL_RO	 (__PP|   0|   0|___A|__NX|   0|   0|___G)
+#define __PAGE_KERNEL_ROX	 (__PP|   0|   0|___A|   0|   0|   0|___G)
 #define __PAGE_KERNEL_NOCACHE	 (__PP|__RW|   0|___A|__NX|___D|   0|___G| __NC)
-#define __PAGE_KERNEL_VVAR	 (__PP|   0|_USR|___A|__NX|___D|   0|___G)
+#define __PAGE_KERNEL_VVAR	 (__PP|   0|_USR|___A|__NX|   0|   0|___G)
 #define __PAGE_KERNEL_LARGE	 (__PP|__RW|   0|___A|__NX|___D|_PSE|___G)
 #define __PAGE_KERNEL_LARGE_EXEC (__PP|__RW|   0|___A|   0|___D|_PSE|___G)
 #define __PAGE_KERNEL_WP	 (__PP|__RW|   0|___A|__NX|___D|   0|___G| __WP)
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 16f878c26667..6bebb95a6988 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -1932,7 +1932,7 @@ int set_memory_nx(unsigned long addr, int numpages)
 
 int set_memory_ro(unsigned long addr, int numpages)
 {
-	return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_RW), 0);
+	return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_RW | _PAGE_DIRTY), 0);
 }
 
 int set_memory_rw(unsigned long addr, int numpages)
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v19 08/25] x86/mm: Introduce _PAGE_COW
  2021-02-03 22:55 [PATCH v19 00/25] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (6 preceding siblings ...)
  2021-02-03 22:55 ` [PATCH v19 07/25] x86/mm: Remove _PAGE_DIRTY from kernel RO pages Yu-cheng Yu
@ 2021-02-03 22:55 ` Yu-cheng Yu
  2021-02-04 20:19   ` Kees Cook
  2021-02-03 22:55 ` [PATCH v19 09/25] drm/i915/gvt: Change _PAGE_DIRTY to _PAGE_DIRTY_BITS Yu-cheng Yu
                   ` (16 subsequent siblings)
  24 siblings, 1 reply; 67+ messages in thread
From: Yu-cheng Yu @ 2021-02-03 22:55 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu
  Cc: Yu-cheng Yu

There is essentially no room left in the x86 hardware PTEs on some OSes
(not Linux).  That left the hardware architects looking for a way to
represent a new memory type (shadow stack) within the existing bits.
They chose to repurpose a lightly-used state: Write=0, Dirty=1.

The reason it's lightly used is that Dirty=1 is normally set by hardware
and cannot normally be set by hardware on a Write=0 PTE.  Software must
normally be involved to create one of these PTEs, so software can simply
opt to not create them.

In places where Linux normally creates Write=0, Dirty=1, it can use the
software-defined _PAGE_COW in place of the hardware _PAGE_DIRTY.  In other
words, whenever Linux needs to create Write=0, Dirty=1, it instead creates
Write=0, Cow=1, except for shadow stack, which is Write=0, Dirty=1.  This
clearly separates shadow stack from other data, and results in the
following:

(a) A modified, copy-on-write (COW) page: (Write=0, Cow=1)
(b) A R/O page that has been COW'ed: (Write=0, Cow=1)
    The user page is in a R/O VMA, and get_user_pages() needs a writable
    copy.  The page fault handler creates a copy of the page and sets
    the new copy's PTE as Write=0 and Cow=1.
(c) A shadow stack PTE: (Write=0, Dirty=1)
(d) A shared shadow stack PTE: (Write=0, Cow=1)
    When a shadow stack page is being shared among processes (this happens
    at fork()), its PTE is made Dirty=0, so the next shadow stack access
    causes a fault, and the page is duplicated and Dirty=1 is set again.
    This is the COW equivalent for shadow stack pages, even though it's
    copy-on-access rather than copy-on-write.
(e) A page where the processor observed a Write=1 PTE, started a write, set
    Dirty=1, but then observed a Write=0 PTE.  That's possible today, but
    will not happen on processors that support shadow stack.

Define _PAGE_COW and update pte_*() helpers and apply the same changes to
pmd and pud.

After this, there are six free bits left in the 64-bit PTE, and no more
free bits in the 32-bit PTE (except for PAE) and Shadow Stack is not
implemented for the 32-bit kernel.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/include/asm/pgtable.h       | 125 ++++++++++++++++++++++++---
 arch/x86/include/asm/pgtable_types.h |  42 ++++++++-
 2 files changed, 154 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index a02c67291cfc..4b0ec61510dc 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -121,9 +121,9 @@ extern pmdval_t early_pmd_flags;
  * The following only work if pte_present() is true.
  * Undefined behaviour if not..
  */
-static inline int pte_dirty(pte_t pte)
+static inline bool pte_dirty(pte_t pte)
 {
-	return pte_flags(pte) & _PAGE_DIRTY;
+	return pte_flags(pte) & _PAGE_DIRTY_BITS;
 }
 
 
@@ -160,9 +160,9 @@ static inline int pte_young(pte_t pte)
 	return pte_flags(pte) & _PAGE_ACCESSED;
 }
 
-static inline int pmd_dirty(pmd_t pmd)
+static inline bool pmd_dirty(pmd_t pmd)
 {
-	return pmd_flags(pmd) & _PAGE_DIRTY;
+	return pmd_flags(pmd) & _PAGE_DIRTY_BITS;
 }
 
 static inline int pmd_young(pmd_t pmd)
@@ -170,9 +170,9 @@ static inline int pmd_young(pmd_t pmd)
 	return pmd_flags(pmd) & _PAGE_ACCESSED;
 }
 
-static inline int pud_dirty(pud_t pud)
+static inline bool pud_dirty(pud_t pud)
 {
-	return pud_flags(pud) & _PAGE_DIRTY;
+	return pud_flags(pud) & _PAGE_DIRTY_BITS;
 }
 
 static inline int pud_young(pud_t pud)
@@ -182,7 +182,14 @@ static inline int pud_young(pud_t pud)
 
 static inline int pte_write(pte_t pte)
 {
-	return pte_flags(pte) & _PAGE_RW;
+	/*
+	 * If _PAGE_DIRTY is set, the PTE must either have _PAGE_RW or be
+	 * a shadow stack PTE, which is logically writable.
+	 */
+	if (cpu_feature_enabled(X86_FEATURE_SHSTK))
+		return pte_flags(pte) & (_PAGE_RW | _PAGE_DIRTY);
+	else
+		return pte_flags(pte) & _PAGE_RW;
 }
 
 static inline int pte_huge(pte_t pte)
@@ -333,7 +340,7 @@ static inline pte_t pte_clear_uffd_wp(pte_t pte)
 
 static inline pte_t pte_mkclean(pte_t pte)
 {
-	return pte_clear_flags(pte, _PAGE_DIRTY);
+	return pte_clear_flags(pte, _PAGE_DIRTY_BITS);
 }
 
 static inline pte_t pte_mkold(pte_t pte)
@@ -343,6 +350,18 @@ static inline pte_t pte_mkold(pte_t pte)
 
 static inline pte_t pte_wrprotect(pte_t pte)
 {
+	/*
+	 * Blindly clearing _PAGE_RW might accidentally create
+	 * a shadow stack PTE (RW=0, Dirty=1).  Move the hardware
+	 * dirty value to the software bit.
+	 */
+	if (cpu_feature_enabled(X86_FEATURE_SHSTK)) {
+		if (pte_flags(pte) & _PAGE_DIRTY) {
+			pte = pte_clear_flags(pte, _PAGE_DIRTY);
+			pte = pte_set_flags(pte, _PAGE_COW);
+		}
+	}
+
 	return pte_clear_flags(pte, _PAGE_RW);
 }
 
@@ -353,6 +372,18 @@ static inline pte_t pte_mkexec(pte_t pte)
 
 static inline pte_t pte_mkdirty(pte_t pte)
 {
+	pteval_t dirty = _PAGE_DIRTY;
+
+	/* Avoid creating (HW)Dirty=1, Write=0 PTEs */
+	if (cpu_feature_enabled(X86_FEATURE_SHSTK) && !pte_write(pte))
+		dirty = _PAGE_COW;
+
+	return pte_set_flags(pte, dirty | _PAGE_SOFT_DIRTY);
+}
+
+static inline pte_t pte_mkwrite_shstk(pte_t pte)
+{
+	pte = pte_clear_flags(pte, _PAGE_COW);
 	return pte_set_flags(pte, _PAGE_DIRTY | _PAGE_SOFT_DIRTY);
 }
 
@@ -363,6 +394,13 @@ static inline pte_t pte_mkyoung(pte_t pte)
 
 static inline pte_t pte_mkwrite(pte_t pte)
 {
+	if (cpu_feature_enabled(X86_FEATURE_SHSTK)) {
+		if (pte_flags(pte) & _PAGE_COW) {
+			pte = pte_clear_flags(pte, _PAGE_COW);
+			pte = pte_set_flags(pte, _PAGE_DIRTY);
+		}
+	}
+
 	return pte_set_flags(pte, _PAGE_RW);
 }
 
@@ -434,16 +472,40 @@ static inline pmd_t pmd_mkold(pmd_t pmd)
 
 static inline pmd_t pmd_mkclean(pmd_t pmd)
 {
-	return pmd_clear_flags(pmd, _PAGE_DIRTY);
+	return pmd_clear_flags(pmd, _PAGE_DIRTY_BITS);
 }
 
 static inline pmd_t pmd_wrprotect(pmd_t pmd)
 {
+	/*
+	 * Blindly clearing _PAGE_RW might accidentally create
+	 * a shadow stack PMD (RW=0, Dirty=1).  Move the hardware
+	 * dirty value to the software bit.
+	 */
+	if (cpu_feature_enabled(X86_FEATURE_SHSTK)) {
+		if (pmd_flags(pmd) & _PAGE_DIRTY) {
+			pmd = pmd_clear_flags(pmd, _PAGE_DIRTY);
+			pmd = pmd_set_flags(pmd, _PAGE_COW);
+		}
+	}
+
 	return pmd_clear_flags(pmd, _PAGE_RW);
 }
 
 static inline pmd_t pmd_mkdirty(pmd_t pmd)
 {
+	pmdval_t dirty = _PAGE_DIRTY;
+
+	/* Avoid creating (HW)Dirty=1, Write=0 PMDs */
+	if (cpu_feature_enabled(X86_FEATURE_SHSTK) && !(pmd_flags(pmd) & _PAGE_RW))
+		dirty = _PAGE_COW;
+
+	return pmd_set_flags(pmd, dirty | _PAGE_SOFT_DIRTY);
+}
+
+static inline pmd_t pmd_mkwrite_shstk(pmd_t pmd)
+{
+	pmd = pmd_clear_flags(pmd, _PAGE_COW);
 	return pmd_set_flags(pmd, _PAGE_DIRTY | _PAGE_SOFT_DIRTY);
 }
 
@@ -464,6 +526,13 @@ static inline pmd_t pmd_mkyoung(pmd_t pmd)
 
 static inline pmd_t pmd_mkwrite(pmd_t pmd)
 {
+	if (cpu_feature_enabled(X86_FEATURE_SHSTK)) {
+		if (pmd_flags(pmd) & _PAGE_COW) {
+			pmd = pmd_clear_flags(pmd, _PAGE_COW);
+			pmd = pmd_set_flags(pmd, _PAGE_DIRTY);
+		}
+	}
+
 	return pmd_set_flags(pmd, _PAGE_RW);
 }
 
@@ -488,17 +557,35 @@ static inline pud_t pud_mkold(pud_t pud)
 
 static inline pud_t pud_mkclean(pud_t pud)
 {
-	return pud_clear_flags(pud, _PAGE_DIRTY);
+	return pud_clear_flags(pud, _PAGE_DIRTY_BITS);
 }
 
 static inline pud_t pud_wrprotect(pud_t pud)
 {
+	/*
+	 * Blindly clearing _PAGE_RW might accidentally create
+	 * a shadow stack PUD (RW=0, Dirty=1).  Move the hardware
+	 * dirty value to the software bit.
+	 */
+	if (cpu_feature_enabled(X86_FEATURE_SHSTK)) {
+		if (pud_flags(pud) & _PAGE_DIRTY) {
+			pud = pud_clear_flags(pud, _PAGE_DIRTY);
+			pud = pud_set_flags(pud, _PAGE_COW);
+		}
+	}
+
 	return pud_clear_flags(pud, _PAGE_RW);
 }
 
 static inline pud_t pud_mkdirty(pud_t pud)
 {
-	return pud_set_flags(pud, _PAGE_DIRTY | _PAGE_SOFT_DIRTY);
+	pudval_t dirty = _PAGE_DIRTY;
+
+	/* Avoid creating (HW)Dirty=1, Write=0 PUDs */
+	if (cpu_feature_enabled(X86_FEATURE_SHSTK) && !(pud_flags(pud) & _PAGE_RW))
+		dirty = _PAGE_COW;
+
+	return pud_set_flags(pud, dirty | _PAGE_SOFT_DIRTY);
 }
 
 static inline pud_t pud_mkdevmap(pud_t pud)
@@ -518,6 +605,13 @@ static inline pud_t pud_mkyoung(pud_t pud)
 
 static inline pud_t pud_mkwrite(pud_t pud)
 {
+	if (cpu_feature_enabled(X86_FEATURE_SHSTK)) {
+		if (pud_flags(pud) & _PAGE_COW) {
+			pud = pud_clear_flags(pud, _PAGE_COW);
+			pud = pud_set_flags(pud, _PAGE_DIRTY);
+		}
+	}
+
 	return pud_set_flags(pud, _PAGE_RW);
 }
 
@@ -1131,7 +1225,14 @@ extern int pmdp_clear_flush_young(struct vm_area_struct *vma,
 #define pmd_write pmd_write
 static inline int pmd_write(pmd_t pmd)
 {
-	return pmd_flags(pmd) & _PAGE_RW;
+	/*
+	 * If _PAGE_DIRTY is set, then the PMD must either have _PAGE_RW or
+	 * be a shadow stack PMD, which is logically writable.
+	 */
+	if (cpu_feature_enabled(X86_FEATURE_SHSTK))
+		return pmd_flags(pmd) & (_PAGE_RW | _PAGE_DIRTY);
+	else
+		return pmd_flags(pmd) & _PAGE_RW;
 }
 
 #define __HAVE_ARCH_PMDP_HUGE_GET_AND_CLEAR
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index b8b79d618bbc..437d7ff0ae80 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -23,7 +23,8 @@
 #define _PAGE_BIT_SOFTW2	10	/* " */
 #define _PAGE_BIT_SOFTW3	11	/* " */
 #define _PAGE_BIT_PAT_LARGE	12	/* On 2MB or 1GB pages */
-#define _PAGE_BIT_SOFTW4	58	/* available for programmer */
+#define _PAGE_BIT_SOFTW4	57	/* available for programmer */
+#define _PAGE_BIT_SOFTW5	58	/* available for programmer */
 #define _PAGE_BIT_PKEY_BIT0	59	/* Protection Keys, bit 1/4 */
 #define _PAGE_BIT_PKEY_BIT1	60	/* Protection Keys, bit 2/4 */
 #define _PAGE_BIT_PKEY_BIT2	61	/* Protection Keys, bit 3/4 */
@@ -36,6 +37,15 @@
 #define _PAGE_BIT_SOFT_DIRTY	_PAGE_BIT_SOFTW3 /* software dirty tracking */
 #define _PAGE_BIT_DEVMAP	_PAGE_BIT_SOFTW4
 
+/*
+ * Indicates a copy-on-write page.
+ */
+#ifdef CONFIG_X86_CET
+#define _PAGE_BIT_COW		_PAGE_BIT_SOFTW5 /* copy-on-write */
+#else
+#define _PAGE_BIT_COW		0
+#endif
+
 /* If _PAGE_BIT_PRESENT is clear, we use these: */
 /* - if the user mapped it with PROT_NONE; pte_present gives true */
 #define _PAGE_BIT_PROTNONE	_PAGE_BIT_GLOBAL
@@ -117,6 +127,36 @@
 #define _PAGE_DEVMAP	(_AT(pteval_t, 0))
 #endif
 
+/*
+ * The hardware requires shadow stack to be read-only and Dirty.
+ * _PAGE_COW is a software-only bit used to separate copy-on-write PTEs
+ * from shadow stack PTEs:
+ * (a) A modified, copy-on-write (COW) page: (Write=0, Cow=1)
+ * (b) A R/O page that has been COW'ed: (Write=0, Cow=1)
+ *     The user page is in a R/O VMA, and get_user_pages() needs a
+ *     writable copy.  The page fault handler creates a copy of the page
+ *     and sets the new copy's PTE as Write=0, Cow=1.
+ * (c) A shadow stack PTE: (Write=0, Dirty=1)
+ * (d) A shared (copy-on-access) shadow stack PTE: (Write=0, Cow=1)
+ *     When a shadow stack page is being shared among processes (this
+ *     happens at fork()), its PTE is cleared of _PAGE_DIRTY, so the next
+ *     shadow stack access causes a fault, and the page is duplicated and
+ *     _PAGE_DIRTY is set again.  This is the COW equivalent for shadow
+ *     stack pages, even though it's copy-on-access rather than
+ *     copy-on-write.
+ * (e) A page where the processor observed a Write=1 PTE, started a write,
+ *     set Dirty=1, but then observed a Write=0 PTE (changed by another
+ *     thread).  That's possible today, but will not happen on processors
+ *     that support shadow stack.
+ */
+#ifdef CONFIG_X86_CET
+#define _PAGE_COW	(_AT(pteval_t, 1) << _PAGE_BIT_COW)
+#else
+#define _PAGE_COW	(_AT(pteval_t, 0))
+#endif
+
+#define _PAGE_DIRTY_BITS (_PAGE_DIRTY | _PAGE_COW)
+
 #define _PAGE_PROTNONE	(_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE)
 
 /*
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v19 09/25] drm/i915/gvt: Change _PAGE_DIRTY to _PAGE_DIRTY_BITS
  2021-02-03 22:55 [PATCH v19 00/25] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (7 preceding siblings ...)
  2021-02-03 22:55 ` [PATCH v19 08/25] x86/mm: Introduce _PAGE_COW Yu-cheng Yu
@ 2021-02-03 22:55 ` Yu-cheng Yu
  2021-02-03 22:55 ` [PATCH v19 10/25] x86/mm: Update pte_modify for _PAGE_COW Yu-cheng Yu
                   ` (15 subsequent siblings)
  24 siblings, 0 replies; 67+ messages in thread
From: Yu-cheng Yu @ 2021-02-03 22:55 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu
  Cc: Yu-cheng Yu, David Airlie, Joonas Lahtinen, Jani Nikula,
	Daniel Vetter, Rodrigo Vivi, Zhenyu Wang, Zhi Wang

After the introduction of _PAGE_COW, a modified page's PTE can have either
_PAGE_DIRTY or _PAGE_COW.  Change _PAGE_DIRTY to _PAGE_DIRTY_BITS.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: David Airlie <airlied@linux.ie>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
---
 drivers/gpu/drm/i915/gvt/gtt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index 897c007ea96a..937b6083b2dc 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -1216,7 +1216,7 @@ static int split_2MB_gtt_entry(struct intel_vgpu *vgpu,
 	}
 
 	/* Clear dirty field. */
-	se->val64 &= ~_PAGE_DIRTY;
+	se->val64 &= ~_PAGE_DIRTY_BITS;
 
 	ops->clear_pse(se);
 	ops->clear_ips(se);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v19 10/25] x86/mm: Update pte_modify for _PAGE_COW
  2021-02-03 22:55 [PATCH v19 00/25] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (8 preceding siblings ...)
  2021-02-03 22:55 ` [PATCH v19 09/25] drm/i915/gvt: Change _PAGE_DIRTY to _PAGE_DIRTY_BITS Yu-cheng Yu
@ 2021-02-03 22:55 ` Yu-cheng Yu
  2021-02-04 20:20   ` Kees Cook
  2021-02-03 22:55 ` [PATCH v19 11/25] x86/mm: Update ptep_set_wrprotect() and pmdp_set_wrprotect() for transition from _PAGE_DIRTY to _PAGE_COW Yu-cheng Yu
                   ` (14 subsequent siblings)
  24 siblings, 1 reply; 67+ messages in thread
From: Yu-cheng Yu @ 2021-02-03 22:55 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu
  Cc: Yu-cheng Yu

The read-only and Dirty PTE has been used to indicate copy-on-write pages.
However, newer x86 processors also regard a read-only and Dirty PTE as a
shadow stack page.  In order to separate the two, the software-defined
_PAGE_COW is created to replace _PAGE_DIRTY for the copy-on-write case, and
pte_*() are updated.

Pte_modify() changes a PTE to 'newprot', but it doesn't use the pte_*().
Introduce fixup_dirty_pte(), which sets a dirty PTE, based on _PAGE_RW,
to either _PAGE_DIRTY or _PAGE_COW.

Apply the same changes to pmd_modify().

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/include/asm/pgtable.h | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 4b0ec61510dc..4a7fd3e5b7df 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -726,6 +726,21 @@ static inline pmd_t pmd_mkinvalid(pmd_t pmd)
 
 static inline u64 flip_protnone_guard(u64 oldval, u64 val, u64 mask);
 
+static inline pteval_t fixup_dirty_pte(pteval_t pteval)
+{
+	pte_t pte = __pte(pteval);
+
+	if (cpu_feature_enabled(X86_FEATURE_SHSTK) && pte_dirty(pte)) {
+		pte = pte_mkclean(pte);
+
+		if (pte_flags(pte) & _PAGE_RW)
+			pte = pte_set_flags(pte, _PAGE_DIRTY);
+		else
+			pte = pte_set_flags(pte, _PAGE_COW);
+	}
+	return pte_val(pte);
+}
+
 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 {
 	pteval_t val = pte_val(pte), oldval = val;
@@ -736,16 +751,34 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 	 */
 	val &= _PAGE_CHG_MASK;
 	val |= check_pgprot(newprot) & ~_PAGE_CHG_MASK;
+	val = fixup_dirty_pte(val);
 	val = flip_protnone_guard(oldval, val, PTE_PFN_MASK);
 	return __pte(val);
 }
 
+static inline int pmd_write(pmd_t pmd);
+static inline pmdval_t fixup_dirty_pmd(pmdval_t pmdval)
+{
+	pmd_t pmd = __pmd(pmdval);
+
+	if (cpu_feature_enabled(X86_FEATURE_SHSTK) && pmd_dirty(pmd)) {
+		pmd = pmd_mkclean(pmd);
+
+		if (pmd_flags(pmd) & _PAGE_RW)
+			pmd = pmd_set_flags(pmd, _PAGE_DIRTY);
+		else
+			pmd = pmd_set_flags(pmd, _PAGE_COW);
+	}
+	return pmd_val(pmd);
+}
+
 static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
 {
 	pmdval_t val = pmd_val(pmd), oldval = val;
 
 	val &= _HPAGE_CHG_MASK;
 	val |= check_pgprot(newprot) & ~_HPAGE_CHG_MASK;
+	val = fixup_dirty_pmd(val);
 	val = flip_protnone_guard(oldval, val, PHYSICAL_PMD_PAGE_MASK);
 	return __pmd(val);
 }
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v19 11/25] x86/mm: Update ptep_set_wrprotect() and pmdp_set_wrprotect() for transition from _PAGE_DIRTY to _PAGE_COW
  2021-02-03 22:55 [PATCH v19 00/25] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (9 preceding siblings ...)
  2021-02-03 22:55 ` [PATCH v19 10/25] x86/mm: Update pte_modify for _PAGE_COW Yu-cheng Yu
@ 2021-02-03 22:55 ` Yu-cheng Yu
  2021-02-04 20:21   ` Kees Cook
  2021-02-03 22:55 ` [PATCH v19 12/25] mm: Introduce VM_SHSTK for shadow stack memory Yu-cheng Yu
                   ` (13 subsequent siblings)
  24 siblings, 1 reply; 67+ messages in thread
From: Yu-cheng Yu @ 2021-02-03 22:55 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu
  Cc: Yu-cheng Yu

When Shadow Stack is introduced, [R/O + _PAGE_DIRTY] PTE is reserved for
shadow stack.  Copy-on-write PTEs have [R/O + _PAGE_COW].

When a PTE goes from [R/W + _PAGE_DIRTY] to [R/O + _PAGE_COW], it could
become a transient shadow stack PTE in two cases:

The first case is that some processors can start a write but end up seeing
a read-only PTE by the time they get to the Dirty bit, creating a transient
shadow stack PTE.  However, this will not occur on processors supporting
Shadow Stack, and a TLB flush is not necessary.

The second case is that when _PAGE_DIRTY is replaced with _PAGE_COW non-
atomically, a transient shadow stack PTE can be created as a result.
Thus, prevent that with cmpxchg.

Dave Hansen, Jann Horn, Andy Lutomirski, and Peter Zijlstra provided many
insights to the issue.  Jann Horn provided the cmpxchg solution.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/include/asm/pgtable.h | 36 ++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 4a7fd3e5b7df..2144a25ca2a3 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1229,6 +1229,24 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm,
 static inline void ptep_set_wrprotect(struct mm_struct *mm,
 				      unsigned long addr, pte_t *ptep)
 {
+	/*
+	 * If Shadow Stack is enabled, pte_wrprotect() moves _PAGE_DIRTY
+	 * to _PAGE_COW (see comments at pte_wrprotect()).
+	 * When a thread reads a RW=1, Dirty=0 PTE and before changing it
+	 * to RW=0, Dirty=0, another thread could have written to the page
+	 * and the PTE is RW=1, Dirty=1 now.  Use try_cmpxchg() to detect
+	 * PTE changes and update old_pte, then try again.
+	 */
+	if (cpu_feature_enabled(X86_FEATURE_SHSTK)) {
+		pte_t old_pte, new_pte;
+
+		old_pte = READ_ONCE(*ptep);
+		do {
+			new_pte = pte_wrprotect(old_pte);
+		} while (!try_cmpxchg(&ptep->pte, &old_pte.pte, new_pte.pte));
+
+		return;
+	}
 	clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte);
 }
 
@@ -1286,6 +1304,24 @@ static inline pud_t pudp_huge_get_and_clear(struct mm_struct *mm,
 static inline void pmdp_set_wrprotect(struct mm_struct *mm,
 				      unsigned long addr, pmd_t *pmdp)
 {
+	/*
+	 * If Shadow Stack is enabled, pmd_wrprotect() moves _PAGE_DIRTY
+	 * to _PAGE_COW (see comments at pmd_wrprotect()).
+	 * When a thread reads a RW=1, Dirty=0 PMD and before changing it
+	 * to RW=0, Dirty=0, another thread could have written to the page
+	 * and the PMD is RW=1, Dirty=1 now.  Use try_cmpxchg() to detect
+	 * PMD changes and update old_pmd, then try again.
+	 */
+	if (cpu_feature_enabled(X86_FEATURE_SHSTK)) {
+		pmd_t old_pmd, new_pmd;
+
+		old_pmd = READ_ONCE(*pmdp);
+		do {
+			new_pmd = pmd_wrprotect(old_pmd);
+		} while (!try_cmpxchg((pmdval_t *)pmdp, (pmdval_t *)&old_pmd, pmd_val(new_pmd)));
+
+		return;
+	}
 	clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp);
 }
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v19 12/25] mm: Introduce VM_SHSTK for shadow stack memory
  2021-02-03 22:55 [PATCH v19 00/25] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (10 preceding siblings ...)
  2021-02-03 22:55 ` [PATCH v19 11/25] x86/mm: Update ptep_set_wrprotect() and pmdp_set_wrprotect() for transition from _PAGE_DIRTY to _PAGE_COW Yu-cheng Yu
@ 2021-02-03 22:55 ` Yu-cheng Yu
  2021-02-04 20:46   ` Cyrill Gorcunov
  2021-02-03 22:55 ` [PATCH v19 13/25] x86/mm: Shadow Stack page fault error checking Yu-cheng Yu
                   ` (12 subsequent siblings)
  24 siblings, 1 reply; 67+ messages in thread
From: Yu-cheng Yu @ 2021-02-03 22:55 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu
  Cc: Yu-cheng Yu

A shadow stack PTE must be read-only and have _PAGE_DIRTY set.  However,
read-only and Dirty PTEs also exist for copy-on-write (COW) pages.  These
two cases are handled differently for page faults.  Introduce VM_SHSTK to
track shadow stack VMAs.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/mm/mmap.c | 2 ++
 fs/proc/task_mmu.c | 3 +++
 include/linux/mm.h | 8 ++++++++
 3 files changed, 13 insertions(+)

diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index c90c20904a60..a22c6b6fc607 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -165,6 +165,8 @@ unsigned long get_mmap_base(int is_legacy)
 
 const char *arch_vma_name(struct vm_area_struct *vma)
 {
+	if (vma->vm_flags & VM_SHSTK)
+		return "[shadow stack]";
 	return NULL;
 }
 
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 602e3a52884d..59623dcd92bb 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -661,6 +661,9 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
 		[ilog2(VM_PKEY_BIT4)]	= "",
 #endif
 #endif /* CONFIG_ARCH_HAS_PKEYS */
+#ifdef CONFIG_X86_CET
+		[ilog2(VM_SHSTK)]	= "ss",
+#endif
 	};
 	size_t i;
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ecdf8a8cd6ae..7be5f8b874aa 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -312,11 +312,13 @@ extern unsigned int kobjsize(const void *objp);
 #define VM_HIGH_ARCH_BIT_2	34	/* bit only usable on 64-bit architectures */
 #define VM_HIGH_ARCH_BIT_3	35	/* bit only usable on 64-bit architectures */
 #define VM_HIGH_ARCH_BIT_4	36	/* bit only usable on 64-bit architectures */
+#define VM_HIGH_ARCH_BIT_5	37	/* bit only usable on 64-bit architectures */
 #define VM_HIGH_ARCH_0	BIT(VM_HIGH_ARCH_BIT_0)
 #define VM_HIGH_ARCH_1	BIT(VM_HIGH_ARCH_BIT_1)
 #define VM_HIGH_ARCH_2	BIT(VM_HIGH_ARCH_BIT_2)
 #define VM_HIGH_ARCH_3	BIT(VM_HIGH_ARCH_BIT_3)
 #define VM_HIGH_ARCH_4	BIT(VM_HIGH_ARCH_BIT_4)
+#define VM_HIGH_ARCH_5	BIT(VM_HIGH_ARCH_BIT_5)
 #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */
 
 #ifdef CONFIG_ARCH_HAS_PKEYS
@@ -332,6 +334,12 @@ extern unsigned int kobjsize(const void *objp);
 #endif
 #endif /* CONFIG_ARCH_HAS_PKEYS */
 
+#ifdef CONFIG_X86_CET
+# define VM_SHSTK	VM_HIGH_ARCH_5
+#else
+# define VM_SHSTK	VM_NONE
+#endif
+
 #if defined(CONFIG_X86)
 # define VM_PAT		VM_ARCH_1	/* PAT reserves whole VMA at once (x86) */
 #elif defined(CONFIG_PPC)
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v19 13/25] x86/mm: Shadow Stack page fault error checking
  2021-02-03 22:55 [PATCH v19 00/25] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (11 preceding siblings ...)
  2021-02-03 22:55 ` [PATCH v19 12/25] mm: Introduce VM_SHSTK for shadow stack memory Yu-cheng Yu
@ 2021-02-03 22:55 ` Yu-cheng Yu
  2021-02-03 22:55 ` [PATCH v19 14/25] x86/mm: Update maybe_mkwrite() for shadow stack Yu-cheng Yu
                   ` (11 subsequent siblings)
  24 siblings, 0 replies; 67+ messages in thread
From: Yu-cheng Yu @ 2021-02-03 22:55 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu
  Cc: Yu-cheng Yu

Shadow stack accesses are those that are performed by the CPU where it
expects to encounter a shadow stack mapping.  These accesses are performed
implicitly by CALL/RET at the site of the shadow stack pointer.  These
accesses are made explicitly by shadow stack management instructions like
WRUSSQ.

Shadow stacks accesses to shadow-stack mapping can see faults in normal,
valid operation just like regular accesses to regular mappings.  Shadow
stacks need some of the same features like delayed allocation, swap and
copy-on-write.

Shadow stack accesses can also result in errors, such as when a shadow
stack overflows, or if a shadow stack access occurs to a non-shadow-stack
mapping.

In handling a shadow stack page fault, verify it occurs within a shadow
stack mapping.  It is always an error otherwise.  For valid shadow stack
accesses, set FAULT_FLAG_WRITE to effect copy-on-write.  Because clearing
_PAGE_DIRTY (vs. _PAGE_RW) is used to trigger the fault, shadow stack read
fault and shadow stack write fault are not differentiated and both are
handled as a write access.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/include/asm/trap_pf.h |  2 ++
 arch/x86/mm/fault.c            | 19 +++++++++++++++++++
 2 files changed, 21 insertions(+)

diff --git a/arch/x86/include/asm/trap_pf.h b/arch/x86/include/asm/trap_pf.h
index 10b1de500ab1..afa524325e55 100644
--- a/arch/x86/include/asm/trap_pf.h
+++ b/arch/x86/include/asm/trap_pf.h
@@ -11,6 +11,7 @@
  *   bit 3 ==				1: use of reserved bit detected
  *   bit 4 ==				1: fault was an instruction fetch
  *   bit 5 ==				1: protection keys block access
+ *   bit 6 ==				1: shadow stack access fault
  *   bit 15 ==				1: SGX MMU page-fault
  */
 enum x86_pf_error_code {
@@ -20,6 +21,7 @@ enum x86_pf_error_code {
 	X86_PF_RSVD	=		1 << 3,
 	X86_PF_INSTR	=		1 << 4,
 	X86_PF_PK	=		1 << 5,
+	X86_PF_SHSTK	=		1 << 6,
 	X86_PF_SGX	=		1 << 15,
 };
 
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index f1f1b5a0956a..4cfb8c474a82 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1131,6 +1131,17 @@ access_error(unsigned long error_code, struct vm_area_struct *vma)
 				       (error_code & X86_PF_INSTR), foreign))
 		return 1;
 
+	/*
+	 * Verify a shadow stack access is within a shadow stack VMA.
+	 * It is always an error otherwise.  Normal data access to a
+	 * shadow stack area is checked in the case followed.
+	 */
+	if (error_code & X86_PF_SHSTK) {
+		if (!(vma->vm_flags & VM_SHSTK))
+			return 1;
+		return 0;
+	}
+
 	if (error_code & X86_PF_WRITE) {
 		/* write, present and write, not present: */
 		if (unlikely(!(vma->vm_flags & VM_WRITE)))
@@ -1296,6 +1307,14 @@ void do_user_addr_fault(struct pt_regs *regs,
 
 	perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
 
+	/*
+	 * Clearing _PAGE_DIRTY is used to detect shadow stack access.
+	 * This method cannot distinguish shadow stack read vs. write.
+	 * For valid shadow stack accesses, set FAULT_FLAG_WRITE to effect
+	 * copy-on-write.
+	 */
+	if (hw_error_code & X86_PF_SHSTK)
+		flags |= FAULT_FLAG_WRITE;
 	if (hw_error_code & X86_PF_WRITE)
 		flags |= FAULT_FLAG_WRITE;
 	if (hw_error_code & X86_PF_INSTR)
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v19 14/25] x86/mm: Update maybe_mkwrite() for shadow stack
  2021-02-03 22:55 [PATCH v19 00/25] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (12 preceding siblings ...)
  2021-02-03 22:55 ` [PATCH v19 13/25] x86/mm: Shadow Stack page fault error checking Yu-cheng Yu
@ 2021-02-03 22:55 ` Yu-cheng Yu
  2021-02-04 20:22   ` Kees Cook
  2021-02-03 22:55 ` [PATCH v19 15/25] mm: Fixup places that call pte_mkwrite() directly Yu-cheng Yu
                   ` (10 subsequent siblings)
  24 siblings, 1 reply; 67+ messages in thread
From: Yu-cheng Yu @ 2021-02-03 22:55 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu
  Cc: Yu-cheng Yu

When serving a page fault, maybe_mkwrite() makes a PTE writable if its vma
has VM_WRITE.

A shadow stack vma has VM_SHSTK.  Its PTEs have _PAGE_DIRTY, but not
_PAGE_WRITE.  In fork(), _PAGE_DIRTY is cleared to effect copy-on-write,
and in page fault, _PAGE_DIRTY is restored and the shadow stack page is
writable again.

Update maybe_mkwrite() by introducing arch_maybe_mkwrite(), which sets
_PAGE_DIRTY for a shadow stack PTE.

Apply the same changes to maybe_pmd_mkwrite().

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/Kconfig        |  4 ++++
 arch/x86/mm/pgtable.c   | 18 ++++++++++++++++++
 include/linux/mm.h      |  2 ++
 include/linux/pgtable.h | 24 ++++++++++++++++++++++++
 mm/huge_memory.c        |  2 ++
 5 files changed, 50 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 074b3c0e6bf6..35de64559a59 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1954,6 +1954,9 @@ config X86_SGX
 config ARCH_HAS_SHADOW_STACK
 	def_bool n
 
+config ARCH_MAYBE_MKWRITE
+	def_bool n
+
 config X86_CET
 	prompt "Intel Control-flow protection for user-mode"
 	def_bool n
@@ -1961,6 +1964,7 @@ config X86_CET
 	depends on AS_WRUSS
 	select ARCH_USES_HIGH_VMA_FLAGS
 	select ARCH_HAS_SHADOW_STACK
+	select ARCH_MAYBE_MKWRITE
 	help
 	  Control-flow protection is a set of hardware features which place
 	  additional restrictions on indirect branches.  These help
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index f6a9e2e36642..0f4fbf51a9fc 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -610,6 +610,24 @@ int pmdp_clear_flush_young(struct vm_area_struct *vma,
 }
 #endif
 
+#ifdef CONFIG_ARCH_MAYBE_MKWRITE
+pte_t arch_maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
+{
+	if (likely(vma->vm_flags & VM_SHSTK))
+		pte = pte_mkwrite_shstk(pte);
+	return pte;
+}
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+pmd_t arch_maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma)
+{
+	if (likely(vma->vm_flags & VM_SHSTK))
+		pmd = pmd_mkwrite_shstk(pmd);
+	return pmd;
+}
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+#endif /* CONFIG_ARCH_MAYBE_MKWRITE */
+
 /**
  * reserve_top_address - reserves a hole in the top of kernel address space
  * @reserve - size of hole to reserve
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7be5f8b874aa..abd756e426fc 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -993,6 +993,8 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
 {
 	if (likely(vma->vm_flags & VM_WRITE))
 		pte = pte_mkwrite(pte);
+	else
+		pte = arch_maybe_mkwrite(pte, vma);
 	return pte;
 }
 
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 8fcdfa52eb4b..d8452218d09b 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1439,6 +1439,30 @@ static inline bool arch_has_pfn_modify_check(void)
 }
 #endif /* !_HAVE_ARCH_PFN_MODIFY_ALLOWED */
 
+#ifdef CONFIG_MMU
+#ifdef CONFIG_ARCH_MAYBE_MKWRITE
+pte_t arch_maybe_mkwrite(pte_t pte, struct vm_area_struct *vma);
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+pmd_t arch_maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma);
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+
+#else /* !CONFIG_ARCH_MAYBE_MKWRITE */
+static inline pte_t arch_maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
+{
+	return pte;
+}
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+static inline pmd_t arch_maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma)
+{
+	return pmd;
+}
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+
+#endif /* CONFIG_ARCH_MAYBE_MKWRITE */
+#endif /* CONFIG_MMU */
+
 /*
  * Architecture PAGE_KERNEL_* fallbacks
  *
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9237976abe72..bfec65c9308b 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -478,6 +478,8 @@ pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma)
 {
 	if (likely(vma->vm_flags & VM_WRITE))
 		pmd = pmd_mkwrite(pmd);
+	else
+		pmd = arch_maybe_pmd_mkwrite(pmd, vma);
 	return pmd;
 }
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v19 15/25] mm: Fixup places that call pte_mkwrite() directly
  2021-02-03 22:55 [PATCH v19 00/25] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (13 preceding siblings ...)
  2021-02-03 22:55 ` [PATCH v19 14/25] x86/mm: Update maybe_mkwrite() for shadow stack Yu-cheng Yu
@ 2021-02-03 22:55 ` Yu-cheng Yu
  2021-02-04 20:22   ` Kees Cook
  2021-02-03 22:55 ` [PATCH v19 16/25] mm: Add guard pages around a shadow stack Yu-cheng Yu
                   ` (9 subsequent siblings)
  24 siblings, 1 reply; 67+ messages in thread
From: Yu-cheng Yu @ 2021-02-03 22:55 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu
  Cc: Yu-cheng Yu

When serving a page fault, maybe_mkwrite() makes a PTE writable if it is in
a writable vma.  A shadow stack vma is writable, but its PTEs need
_PAGE_DIRTY to be set to become writable.  For this reason, maybe_mkwrite()
has been updated.

There are a few places that call pte_mkwrite() directly, but effect the
same result as from maybe_mkwrite().  These sites need to be updated for
shadow stack as well.  Thus, change them to maybe_mkwrite():

- do_anonymous_page() and migrate_vma_insert_page() check VM_WRITE directly
  and call pte_mkwrite(), which is the same as maybe_mkwrite().  Change
  them to maybe_mkwrite().

- In do_numa_page(), if the numa entry 'was-writable', then pte_mkwrite()
  is called directly.  Fix it by doing maybe_mkwrite().

- In change_pte_range(), pte_mkwrite() is called directly.  Replace it with
  maybe_mkwrite().

  A shadow stack vma is writable but has different vma
flags, and handled accordingly in maybe_mkwrite().

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 mm/memory.c   | 5 ++---
 mm/migrate.c  | 3 +--
 mm/mprotect.c | 2 +-
 3 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index feff48e1465a..1de649c61013 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3549,8 +3549,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
 
 	entry = mk_pte(page, vma->vm_page_prot);
 	entry = pte_sw_mkyoung(entry);
-	if (vma->vm_flags & VM_WRITE)
-		entry = pte_mkwrite(pte_mkdirty(entry));
+	entry = maybe_mkwrite(pte_mkdirty(entry), vma);
 
 	vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address,
 			&vmf->ptl);
@@ -4205,7 +4204,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
 	pte = pte_modify(old_pte, vma->vm_page_prot);
 	pte = pte_mkyoung(pte);
 	if (was_writable)
-		pte = pte_mkwrite(pte);
+		pte = maybe_mkwrite(pte, vma);
 	ptep_modify_prot_commit(vma, vmf->address, vmf->pte, old_pte, pte);
 	update_mmu_cache(vma, vmf->address, vmf->pte);
 
diff --git a/mm/migrate.c b/mm/migrate.c
index c0efe921bca5..8878d75c0db7 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2964,8 +2964,7 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate,
 		}
 	} else {
 		entry = mk_pte(page, vma->vm_page_prot);
-		if (vma->vm_flags & VM_WRITE)
-			entry = pte_mkwrite(pte_mkdirty(entry));
+		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
 	}
 
 	ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl);
diff --git a/mm/mprotect.c b/mm/mprotect.c
index ab709023e9aa..ed26e675c423 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -135,7 +135,7 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 			if (dirty_accountable && pte_dirty(ptent) &&
 					(pte_soft_dirty(ptent) ||
 					 !(vma->vm_flags & VM_SOFTDIRTY))) {
-				ptent = pte_mkwrite(ptent);
+				ptent = maybe_mkwrite(ptent, vma);
 			}
 			ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent);
 			pages++;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v19 16/25] mm: Add guard pages around a shadow stack.
  2021-02-03 22:55 [PATCH v19 00/25] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (14 preceding siblings ...)
  2021-02-03 22:55 ` [PATCH v19 15/25] mm: Fixup places that call pte_mkwrite() directly Yu-cheng Yu
@ 2021-02-03 22:55 ` Yu-cheng Yu
  2021-02-04 20:24   ` Kees Cook
  2021-02-03 22:55 ` [PATCH v19 17/25] mm/mmap: Add shadow stack pages to memory accounting Yu-cheng Yu
                   ` (8 subsequent siblings)
  24 siblings, 1 reply; 67+ messages in thread
From: Yu-cheng Yu @ 2021-02-03 22:55 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu
  Cc: Yu-cheng Yu

INCSSP(Q/D) increments shadow stack pointer and 'pops and discards' the
first and the last elements in the range, effectively touches those memory
areas.

The maximum moving distance by INCSSPQ is 255 * 8 = 2040 bytes and
255 * 4 = 1020 bytes by INCSSPD.  Both ranges are far from PAGE_SIZE.
Thus, putting a gap page on both ends of a shadow stack prevents INCSSP,
CALL, and RET from going beyond.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/include/asm/page_64_types.h | 10 ++++++++++
 include/linux/mm.h                   | 24 ++++++++++++++++++++----
 2 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index 645bd1d0ee07..fda4c023e009 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -115,4 +115,14 @@
 #define KERNEL_IMAGE_SIZE	(512 * 1024 * 1024)
 #endif
 
+/*
+ * Shadow stack pointer is moved by CALL, RET, and INCSSP(Q/D).  INCSSPQ
+ * moves shadow stack pointer up to 255 * 8 = ~2 KB (~1KB for INCSSPD) and
+ * touches the first and the last element in the range, which triggers a
+ * page fault if the range is not in a shadow stack.  Because of this,
+ * creating 4-KB guard pages around a shadow stack prevents these
+ * instructions from going beyond.
+ */
+#define ARCH_SHADOW_STACK_GUARD_GAP PAGE_SIZE
+
 #endif /* _ASM_X86_PAGE_64_DEFS_H */
diff --git a/include/linux/mm.h b/include/linux/mm.h
index abd756e426fc..c12b3d36dbd3 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2640,6 +2640,10 @@ extern vm_fault_t filemap_page_mkwrite(struct vm_fault *vmf);
 int __must_check write_one_page(struct page *page);
 void task_dirty_inc(struct task_struct *tsk);
 
+#ifndef ARCH_SHADOW_STACK_GUARD_GAP
+#define ARCH_SHADOW_STACK_GUARD_GAP 0
+#endif
+
 extern unsigned long stack_guard_gap;
 /* Generic expand stack which grows the stack according to GROWS{UP,DOWN} */
 extern int expand_stack(struct vm_area_struct *vma, unsigned long address);
@@ -2672,9 +2676,15 @@ static inline struct vm_area_struct * find_vma_intersection(struct mm_struct * m
 static inline unsigned long vm_start_gap(struct vm_area_struct *vma)
 {
 	unsigned long vm_start = vma->vm_start;
+	unsigned long gap = 0;
 
-	if (vma->vm_flags & VM_GROWSDOWN) {
-		vm_start -= stack_guard_gap;
+	if (vma->vm_flags & VM_GROWSDOWN)
+		gap = stack_guard_gap;
+	else if (vma->vm_flags & VM_SHSTK)
+		gap = ARCH_SHADOW_STACK_GUARD_GAP;
+
+	if (gap != 0) {
+		vm_start -= gap;
 		if (vm_start > vma->vm_start)
 			vm_start = 0;
 	}
@@ -2684,9 +2694,15 @@ static inline unsigned long vm_start_gap(struct vm_area_struct *vma)
 static inline unsigned long vm_end_gap(struct vm_area_struct *vma)
 {
 	unsigned long vm_end = vma->vm_end;
+	unsigned long gap = 0;
+
+	if (vma->vm_flags & VM_GROWSUP)
+		gap = stack_guard_gap;
+	else if (vma->vm_flags & VM_SHSTK)
+		gap = ARCH_SHADOW_STACK_GUARD_GAP;
 
-	if (vma->vm_flags & VM_GROWSUP) {
-		vm_end += stack_guard_gap;
+	if (gap != 0) {
+		vm_end += gap;
 		if (vm_end < vma->vm_end)
 			vm_end = -PAGE_SIZE;
 	}
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v19 17/25] mm/mmap: Add shadow stack pages to memory accounting
  2021-02-03 22:55 [PATCH v19 00/25] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (15 preceding siblings ...)
  2021-02-03 22:55 ` [PATCH v19 16/25] mm: Add guard pages around a shadow stack Yu-cheng Yu
@ 2021-02-03 22:55 ` Yu-cheng Yu
  2021-02-04 20:24   ` Kees Cook
  2021-02-03 22:55 ` [PATCH v19 18/25] mm: Update can_follow_write_pte() for shadow stack Yu-cheng Yu
                   ` (7 subsequent siblings)
  24 siblings, 1 reply; 67+ messages in thread
From: Yu-cheng Yu @ 2021-02-03 22:55 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu
  Cc: Yu-cheng Yu

Account shadow stack pages to stack memory.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/mm/pgtable.c   |  7 +++++++
 include/linux/pgtable.h | 11 +++++++++++
 mm/mmap.c               |  5 +++++
 3 files changed, 23 insertions(+)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 0f4fbf51a9fc..948d28c29964 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -895,3 +895,10 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 
 #endif /* CONFIG_X86_64 */
 #endif	/* CONFIG_HAVE_ARCH_HUGE_VMAP */
+
+#ifdef CONFIG_ARCH_HAS_SHADOW_STACK
+bool arch_shadow_stack_mapping(vm_flags_t vm_flags)
+{
+	return (vm_flags & VM_SHSTK);
+}
+#endif
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index d8452218d09b..b888b246b8b1 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1463,6 +1463,17 @@ static inline pmd_t arch_maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma
 #endif /* CONFIG_ARCH_MAYBE_MKWRITE */
 #endif /* CONFIG_MMU */
 
+#ifdef CONFIG_MMU
+#ifdef CONFIG_ARCH_HAS_SHADOW_STACK
+bool arch_shadow_stack_mapping(vm_flags_t vm_flags);
+#else
+static inline bool arch_shadow_stack_mapping(vm_flags_t vm_flags)
+{
+	return false;
+}
+#endif /* CONFIG_ARCH_HAS_SHADOW_STACK */
+#endif /* CONFIG_MMU */
+
 /*
  * Architecture PAGE_KERNEL_* fallbacks
  *
diff --git a/mm/mmap.c b/mm/mmap.c
index dc7206032387..51200b821898 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1720,6 +1720,9 @@ static inline int accountable_mapping(struct file *file, vm_flags_t vm_flags)
 	if (file && is_file_hugepages(file))
 		return 0;
 
+	if (arch_shadow_stack_mapping(vm_flags))
+		return 1;
+
 	return (vm_flags & (VM_NORESERVE | VM_SHARED | VM_WRITE)) == VM_WRITE;
 }
 
@@ -3389,6 +3392,8 @@ void vm_stat_account(struct mm_struct *mm, vm_flags_t flags, long npages)
 		mm->stack_vm += npages;
 	else if (is_data_mapping(flags))
 		mm->data_vm += npages;
+	else if (arch_shadow_stack_mapping(flags))
+		mm->stack_vm += npages;
 }
 
 static vm_fault_t special_mapping_fault(struct vm_fault *vmf);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v19 18/25] mm: Update can_follow_write_pte() for shadow stack
  2021-02-03 22:55 [PATCH v19 00/25] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (16 preceding siblings ...)
  2021-02-03 22:55 ` [PATCH v19 17/25] mm/mmap: Add shadow stack pages to memory accounting Yu-cheng Yu
@ 2021-02-03 22:55 ` Yu-cheng Yu
  2021-02-04 20:27   ` Kees Cook
  2021-02-03 22:55 ` [PATCH v19 19/25] mm: Re-introduce vm_flags to do_mmap() Yu-cheng Yu
                   ` (6 subsequent siblings)
  24 siblings, 1 reply; 67+ messages in thread
From: Yu-cheng Yu @ 2021-02-03 22:55 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu
  Cc: Yu-cheng Yu

Can_follow_write_pte() ensures a read-only page is COWed by checking the
FOLL_COW flag, and uses pte_dirty() to validate the flag is still valid.

Like a writable data page, a shadow stack page is writable, and becomes
read-only during copy-on-write, but it is always dirty.  Thus, in the
can_follow_write_pte() check, it belongs to the writable page case and
should be excluded from the read-only page pte_dirty() check.  Apply
the same changes to can_follow_write_pmd().

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 mm/gup.c         | 8 +++++---
 mm/huge_memory.c | 8 +++++---
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index e4c224cd9661..66ab67626f57 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -357,10 +357,12 @@ static int follow_pfn_pte(struct vm_area_struct *vma, unsigned long address,
  * FOLL_FORCE can write to even unwritable pte's, but only
  * after we've gone through a COW cycle and they are dirty.
  */
-static inline bool can_follow_write_pte(pte_t pte, unsigned int flags)
+static inline bool can_follow_write_pte(pte_t pte, unsigned int flags,
+					struct vm_area_struct *vma)
 {
 	return pte_write(pte) ||
-		((flags & FOLL_FORCE) && (flags & FOLL_COW) && pte_dirty(pte));
+		((flags & FOLL_FORCE) && (flags & FOLL_COW) && pte_dirty(pte) &&
+				  !arch_shadow_stack_mapping(vma->vm_flags));
 }
 
 static struct page *follow_page_pte(struct vm_area_struct *vma,
@@ -403,7 +405,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
 	}
 	if ((flags & FOLL_NUMA) && pte_protnone(pte))
 		goto no_page;
-	if ((flags & FOLL_WRITE) && !can_follow_write_pte(pte, flags)) {
+	if ((flags & FOLL_WRITE) && !can_follow_write_pte(pte, flags, vma)) {
 		pte_unmap_unlock(ptep, ptl);
 		return NULL;
 	}
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index bfec65c9308b..eb64e2b56bc9 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1337,10 +1337,12 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd)
  * FOLL_FORCE can write to even unwritable pmd's, but only
  * after we've gone through a COW cycle and they are dirty.
  */
-static inline bool can_follow_write_pmd(pmd_t pmd, unsigned int flags)
+static inline bool can_follow_write_pmd(pmd_t pmd, unsigned int flags,
+					struct vm_area_struct *vma)
 {
 	return pmd_write(pmd) ||
-	       ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pmd_dirty(pmd));
+	       ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pmd_dirty(pmd) &&
+				  !arch_shadow_stack_mapping(vma->vm_flags));
 }
 
 struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
@@ -1353,7 +1355,7 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
 
 	assert_spin_locked(pmd_lockptr(mm, pmd));
 
-	if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, flags))
+	if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, flags, vma))
 		goto out;
 
 	/* Avoid dumping huge zero page */
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v19 19/25] mm: Re-introduce vm_flags to do_mmap()
  2021-02-03 22:55 [PATCH v19 00/25] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (17 preceding siblings ...)
  2021-02-03 22:55 ` [PATCH v19 18/25] mm: Update can_follow_write_pte() for shadow stack Yu-cheng Yu
@ 2021-02-03 22:55 ` Yu-cheng Yu
  2021-02-04 20:28   ` Kees Cook
  2021-02-03 22:55 ` [PATCH v19 20/25] x86/cet/shstk: User-mode shadow stack support Yu-cheng Yu
                   ` (5 subsequent siblings)
  24 siblings, 1 reply; 67+ messages in thread
From: Yu-cheng Yu @ 2021-02-03 22:55 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu
  Cc: Yu-cheng Yu, Peter Collingbourne, Andrew Morton

There was no more caller passing vm_flags to do_mmap(), and vm_flags was
removed from the function's input by:

    commit 45e55300f114 ("mm: remove unnecessary wrapper function do_mmap_pgoff()").

There is a new user now.  Shadow stack allocation passes VM_SHSTK to
do_mmap().  Re-introduce vm_flags to do_mmap(), but without the old wrapper
do_mmap_pgoff().  Instead, make all callers of the wrapper pass a zero
vm_flags to do_mmap().

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Reviewed-by: Peter Collingbourne <pcc@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: linux-mm@kvack.org
---
 fs/aio.c           |  2 +-
 include/linux/mm.h |  3 ++-
 ipc/shm.c          |  2 +-
 mm/mmap.c          | 10 +++++-----
 mm/nommu.c         |  4 ++--
 mm/util.c          |  2 +-
 6 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 1f32da13d39e..b5d0586209a7 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -529,7 +529,7 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events)
 
 	ctx->mmap_base = do_mmap(ctx->aio_ring_file, 0, ctx->mmap_size,
 				 PROT_READ | PROT_WRITE,
-				 MAP_SHARED, 0, &unused, NULL);
+				 MAP_SHARED, 0, 0, &unused, NULL);
 	mmap_write_unlock(mm);
 	if (IS_ERR((void *)ctx->mmap_base)) {
 		ctx->mmap_size = 0;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index c12b3d36dbd3..06e9899e13b8 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2585,7 +2585,8 @@ extern unsigned long mmap_region(struct file *file, unsigned long addr,
 	struct list_head *uf);
 extern unsigned long do_mmap(struct file *file, unsigned long addr,
 	unsigned long len, unsigned long prot, unsigned long flags,
-	unsigned long pgoff, unsigned long *populate, struct list_head *uf);
+	vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate,
+	struct list_head *uf);
 extern int __do_munmap(struct mm_struct *, unsigned long, size_t,
 		       struct list_head *uf, bool downgrade);
 extern int do_munmap(struct mm_struct *, unsigned long, size_t,
diff --git a/ipc/shm.c b/ipc/shm.c
index febd88daba8c..b6370eb1eaab 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -1556,7 +1556,7 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg,
 			goto invalid;
 	}
 
-	addr = do_mmap(file, addr, size, prot, flags, 0, &populate, NULL);
+	addr = do_mmap(file, addr, size, prot, flags, 0, 0, &populate, NULL);
 	*raddr = addr;
 	err = 0;
 	if (IS_ERR_VALUE(addr))
diff --git a/mm/mmap.c b/mm/mmap.c
index 51200b821898..95ce7cd68654 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1403,11 +1403,11 @@ static inline bool file_mmap_ok(struct file *file, struct inode *inode,
  */
 unsigned long do_mmap(struct file *file, unsigned long addr,
 			unsigned long len, unsigned long prot,
-			unsigned long flags, unsigned long pgoff,
-			unsigned long *populate, struct list_head *uf)
+			unsigned long flags, vm_flags_t vm_flags,
+			unsigned long pgoff, unsigned long *populate,
+			struct list_head *uf)
 {
 	struct mm_struct *mm = current->mm;
-	vm_flags_t vm_flags;
 	int pkey = 0;
 
 	*populate = 0;
@@ -1469,7 +1469,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 	 * to. we assume access permissions have been handled by the open
 	 * of the memory object, so we don't do any here.
 	 */
-	vm_flags = calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(flags) |
+	vm_flags |= calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(flags) |
 			mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC;
 
 	if (flags & MAP_LOCKED)
@@ -3049,7 +3049,7 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
 
 	file = get_file(vma->vm_file);
 	ret = do_mmap(vma->vm_file, start, size,
-			prot, flags, pgoff, &populate, NULL);
+			prot, flags, 0, pgoff, &populate, NULL);
 	fput(file);
 out:
 	mmap_write_unlock(mm);
diff --git a/mm/nommu.c b/mm/nommu.c
index 870fea12823e..dbfa8e50d2db 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -1071,6 +1071,7 @@ unsigned long do_mmap(struct file *file,
 			unsigned long len,
 			unsigned long prot,
 			unsigned long flags,
+			vm_flags_t vm_flags,
 			unsigned long pgoff,
 			unsigned long *populate,
 			struct list_head *uf)
@@ -1078,7 +1079,6 @@ unsigned long do_mmap(struct file *file,
 	struct vm_area_struct *vma;
 	struct vm_region *region;
 	struct rb_node *rb;
-	vm_flags_t vm_flags;
 	unsigned long capabilities, result;
 	int ret;
 
@@ -1097,7 +1097,7 @@ unsigned long do_mmap(struct file *file,
 
 	/* we've determined that we can make the mapping, now translate what we
 	 * now know into VMA flags */
-	vm_flags = determine_vm_flags(file, prot, flags, capabilities);
+	vm_flags |= determine_vm_flags(file, prot, flags, capabilities);
 
 	/* we're going to need to record the mapping */
 	region = kmem_cache_zalloc(vm_region_jar, GFP_KERNEL);
diff --git a/mm/util.c b/mm/util.c
index 8c9b7d1e7c49..0f7ab91e672b 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -516,7 +516,7 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
 	if (!ret) {
 		if (mmap_write_lock_killable(mm))
 			return -EINTR;
-		ret = do_mmap(file, addr, len, prot, flag, pgoff, &populate,
+		ret = do_mmap(file, addr, len, prot, flag, 0, pgoff, &populate,
 			      &uf);
 		mmap_write_unlock(mm);
 		userfaultfd_unmap_complete(mm, &uf);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v19 20/25] x86/cet/shstk: User-mode shadow stack support
  2021-02-03 22:55 [PATCH v19 00/25] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (18 preceding siblings ...)
  2021-02-03 22:55 ` [PATCH v19 19/25] mm: Re-introduce vm_flags to do_mmap() Yu-cheng Yu
@ 2021-02-03 22:55 ` Yu-cheng Yu
  2021-02-04 20:29   ` Kees Cook
  2021-02-03 22:55 ` [PATCH v19 21/25] x86/cet/shstk: Handle signals for shadow stack Yu-cheng Yu
                   ` (4 subsequent siblings)
  24 siblings, 1 reply; 67+ messages in thread
From: Yu-cheng Yu @ 2021-02-03 22:55 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu
  Cc: Yu-cheng Yu

Introduce basic shadow stack enabling/disabling/allocation routines.
A task's shadow stack is allocated from memory with VM_SHSTK flag and has
a fixed size of min(RLIMIT_STACK, 4GB).

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/include/asm/cet.h       |  28 ++++++
 arch/x86/include/asm/processor.h |   5 ++
 arch/x86/kernel/Makefile         |   2 +
 arch/x86/kernel/cet.c            | 147 +++++++++++++++++++++++++++++++
 4 files changed, 182 insertions(+)
 create mode 100644 arch/x86/include/asm/cet.h
 create mode 100644 arch/x86/kernel/cet.c

diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h
new file mode 100644
index 000000000000..5750fbcbb952
--- /dev/null
+++ b/arch/x86/include/asm/cet.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_CET_H
+#define _ASM_X86_CET_H
+
+#ifndef __ASSEMBLY__
+#include <linux/types.h>
+
+struct task_struct;
+/*
+ * Per-thread CET status
+ */
+struct cet_status {
+	unsigned long	shstk_base;
+	unsigned long	shstk_size;
+};
+
+#ifdef CONFIG_X86_CET
+int cet_setup_shstk(void);
+void cet_disable_shstk(void);
+void cet_free_shstk(struct task_struct *p);
+#else
+static inline void cet_disable_shstk(void) {}
+static inline void cet_free_shstk(struct task_struct *p) {}
+#endif
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* _ASM_X86_CET_H */
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index c20a52b5534b..51158c6beb4f 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -27,6 +27,7 @@ struct vm86;
 #include <asm/unwind_hints.h>
 #include <asm/vmxfeatures.h>
 #include <asm/vdso/processor.h>
+#include <asm/cet.h>
 
 #include <linux/personality.h>
 #include <linux/cache.h>
@@ -536,6 +537,10 @@ struct thread_struct {
 
 	unsigned int		sig_on_uaccess_err:1;
 
+#ifdef CONFIG_X86_CET
+	struct cet_status	cet;
+#endif
+
 	/* Floating point and extended processor state */
 	struct fpu		fpu;
 	/*
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 5eeb808eb024..4a9a7e7d00dc 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -151,6 +151,8 @@ obj-$(CONFIG_UNWINDER_FRAME_POINTER)	+= unwind_frame.o
 obj-$(CONFIG_UNWINDER_GUESS)		+= unwind_guess.o
 
 obj-$(CONFIG_AMD_MEM_ENCRYPT)		+= sev-es.o
+obj-$(CONFIG_X86_CET)			+= cet.o
+
 ###
 # 64 bit specific files
 ifeq ($(CONFIG_X86_64),y)
diff --git a/arch/x86/kernel/cet.c b/arch/x86/kernel/cet.c
new file mode 100644
index 000000000000..f8b0a077594f
--- /dev/null
+++ b/arch/x86/kernel/cet.c
@@ -0,0 +1,147 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * cet.c - Control-flow Enforcement (CET)
+ *
+ * Copyright (c) 2019, Intel Corporation.
+ * Yu-cheng Yu <yu-cheng.yu@intel.com>
+ */
+
+#include <linux/types.h>
+#include <linux/mm.h>
+#include <linux/mman.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/sched/signal.h>
+#include <linux/compat.h>
+#include <asm/msr.h>
+#include <asm/user.h>
+#include <asm/fpu/internal.h>
+#include <asm/fpu/xstate.h>
+#include <asm/fpu/types.h>
+#include <asm/cet.h>
+
+static void start_update_msrs(void)
+{
+	fpregs_lock();
+	if (test_thread_flag(TIF_NEED_FPU_LOAD))
+		__fpregs_load_activate();
+}
+
+static void end_update_msrs(void)
+{
+	fpregs_unlock();
+}
+
+static unsigned long cet_get_shstk_addr(void)
+{
+	struct fpu *fpu = &current->thread.fpu;
+	unsigned long ssp = 0;
+
+	fpregs_lock();
+
+	if (fpregs_state_valid(fpu, smp_processor_id())) {
+		rdmsrl(MSR_IA32_PL3_SSP, ssp);
+	} else {
+		struct cet_user_state *p;
+
+		p = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
+		if (p)
+			ssp = p->user_ssp;
+	}
+
+	fpregs_unlock();
+	return ssp;
+}
+
+static unsigned long alloc_shstk(unsigned long size, int flags)
+{
+	struct mm_struct *mm = current->mm;
+	unsigned long addr, populate;
+
+	/* VM_SHSTK requires MAP_ANONYMOUS, MAP_PRIVATE */
+	flags |= MAP_ANONYMOUS | MAP_PRIVATE;
+
+	mmap_write_lock(mm);
+	addr = do_mmap(NULL, 0, size, PROT_READ, flags, VM_SHSTK, 0,
+		       &populate, NULL);
+	mmap_write_unlock(mm);
+
+	if (populate)
+		mm_populate(addr, populate);
+
+	return addr;
+}
+
+int cet_setup_shstk(void)
+{
+	unsigned long addr, size;
+	struct cet_status *cet = &current->thread.cet;
+
+	if (!static_cpu_has(X86_FEATURE_SHSTK))
+		return -EOPNOTSUPP;
+
+	size = round_up(min(rlimit(RLIMIT_STACK), 1UL << 32), PAGE_SIZE);
+	addr = alloc_shstk(size, 0);
+
+	if (IS_ERR_VALUE(addr))
+		return PTR_ERR((void *)addr);
+
+	cet->shstk_base = addr;
+	cet->shstk_size = size;
+
+	start_update_msrs();
+	wrmsrl(MSR_IA32_PL3_SSP, addr + size);
+	wrmsrl(MSR_IA32_U_CET, CET_SHSTK_EN);
+	end_update_msrs();
+	return 0;
+}
+
+void cet_disable_shstk(void)
+{
+	struct cet_status *cet = &current->thread.cet;
+	u64 msr_val;
+
+	if (!static_cpu_has(X86_FEATURE_SHSTK) ||
+	    !cet->shstk_size || !cet->shstk_base)
+		return;
+
+	start_update_msrs();
+	rdmsrl(MSR_IA32_U_CET, msr_val);
+	wrmsrl(MSR_IA32_U_CET, msr_val & ~CET_SHSTK_EN);
+	wrmsrl(MSR_IA32_PL3_SSP, 0);
+	end_update_msrs();
+
+	cet_free_shstk(current);
+}
+
+void cet_free_shstk(struct task_struct *tsk)
+{
+	struct cet_status *cet = &tsk->thread.cet;
+
+	if (!static_cpu_has(X86_FEATURE_SHSTK) ||
+	    !cet->shstk_size || !cet->shstk_base)
+		return;
+
+	if (!tsk->mm || (tsk->mm != current->mm))
+		return;
+
+	while (1) {
+		int r;
+
+		r = vm_munmap(cet->shstk_base, cet->shstk_size);
+
+		/*
+		 * Retry if mmap_lock is not available.
+		 */
+		if (r == -EINTR) {
+			cond_resched();
+			continue;
+		}
+
+		WARN_ON_ONCE(r);
+		break;
+	}
+
+	cet->shstk_base = 0;
+	cet->shstk_size = 0;
+}
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v19 21/25] x86/cet/shstk: Handle signals for shadow stack
  2021-02-03 22:55 [PATCH v19 00/25] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (19 preceding siblings ...)
  2021-02-03 22:55 ` [PATCH v19 20/25] x86/cet/shstk: User-mode shadow stack support Yu-cheng Yu
@ 2021-02-03 22:55 ` Yu-cheng Yu
  2021-02-03 22:55 ` [PATCH v19 22/25] ELF: Introduce arch_setup_elf_property() Yu-cheng Yu
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 67+ messages in thread
From: Yu-cheng Yu @ 2021-02-03 22:55 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu
  Cc: Yu-cheng Yu

To deliver a signal, create a shadow stack restore token and put the token
and the signal restorer address on the shadow stack.  For sigreturn, verify
the token and restore from it the shadow stack pointer.

A shadow stack restore token marks a restore point of the shadow stack.
The token is distinctively different from any shadow stack address.
In sigreturn, restoring from a token ensures the target address is the
location pointed by the token.

Introduce WRUSS, which is a kernel-mode instruction but writes directly to
user shadow stack.  It is used to construct the user signal stack as
described above.

Currently there is no systematic facility for extending a signal context.
Introduce a signal context extension 'struct sc_ext', which is used to save
shadow stack restore token address and WAIT_ENDBR status.  WAIT_ENDBR will
be introduced later in the Indirect Branch Tracking (IBT) series, but add
that into sc_ext now to keep the struct stable in case the IBT series is
applied later.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/ia32/ia32_signal.c            |  17 +++
 arch/x86/include/asm/cet.h             |   8 ++
 arch/x86/include/asm/fpu/internal.h    |  10 ++
 arch/x86/include/asm/special_insns.h   |  32 ++++++
 arch/x86/include/uapi/asm/sigcontext.h |   9 ++
 arch/x86/kernel/cet.c                  | 152 +++++++++++++++++++++++++
 arch/x86/kernel/fpu/signal.c           | 100 ++++++++++++++++
 arch/x86/kernel/signal.c               |  10 ++
 8 files changed, 338 insertions(+)

diff --git a/arch/x86/ia32/ia32_signal.c b/arch/x86/ia32/ia32_signal.c
index 5e3d9b7fd5fb..377ae8de511f 100644
--- a/arch/x86/ia32/ia32_signal.c
+++ b/arch/x86/ia32/ia32_signal.c
@@ -35,6 +35,7 @@
 #include <asm/sigframe.h>
 #include <asm/sighandling.h>
 #include <asm/smap.h>
+#include <asm/cet.h>
 
 static inline void reload_segments(struct sigcontext_32 *sc)
 {
@@ -205,6 +206,7 @@ static void __user *get_sigframe(struct ksignal *ksig, struct pt_regs *regs,
 				 void __user **fpstate)
 {
 	unsigned long sp, fx_aligned, math_size;
+	void __user *restorer = NULL;
 
 	/* Default to using normal stack */
 	sp = regs->sp;
@@ -218,8 +220,23 @@ static void __user *get_sigframe(struct ksignal *ksig, struct pt_regs *regs,
 		 ksig->ka.sa.sa_restorer)
 		sp = (unsigned long) ksig->ka.sa.sa_restorer;
 
+	if (ksig->ka.sa.sa_flags & SA_RESTORER) {
+		restorer = ksig->ka.sa.sa_restorer;
+	} else if (current->mm->context.vdso) {
+		if (ksig->ka.sa.sa_flags & SA_SIGINFO)
+			restorer = current->mm->context.vdso +
+				vdso_image_32.sym___kernel_rt_sigreturn;
+		else
+			restorer = current->mm->context.vdso +
+				vdso_image_32.sym___kernel_sigreturn;
+	}
+
 	sp = fpu__alloc_mathframe(sp, 1, &fx_aligned, &math_size);
 	*fpstate = (struct _fpstate_32 __user *) sp;
+
+	if (save_cet_to_sigframe(1, *fpstate, (unsigned long)restorer))
+		return (void __user *)-1L;
+
 	if (copy_fpstate_to_sigframe(*fpstate, (void __user *)fx_aligned,
 				     math_size) < 0)
 		return (void __user *) -1L;
diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h
index 5750fbcbb952..73435856ce54 100644
--- a/arch/x86/include/asm/cet.h
+++ b/arch/x86/include/asm/cet.h
@@ -6,6 +6,8 @@
 #include <linux/types.h>
 
 struct task_struct;
+struct sc_ext;
+
 /*
  * Per-thread CET status
  */
@@ -18,9 +20,15 @@ struct cet_status {
 int cet_setup_shstk(void);
 void cet_disable_shstk(void);
 void cet_free_shstk(struct task_struct *p);
+int cet_verify_rstor_token(bool ia32, unsigned long ssp, unsigned long *new_ssp);
+void cet_restore_signal(struct sc_ext *sc);
+int cet_setup_signal(bool ia32, unsigned long rstor, struct sc_ext *sc);
 #else
 static inline void cet_disable_shstk(void) {}
 static inline void cet_free_shstk(struct task_struct *p) {}
+static inline void cet_restore_signal(struct sc_ext *sc) { return; }
+static inline int cet_setup_signal(bool ia32, unsigned long rstor,
+				   struct sc_ext *sc) { return -EINVAL; }
 #endif
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 8d33ad80704f..c1dedec2281b 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -443,6 +443,16 @@ static inline void copy_kernel_to_fpregs(union fpregs_state *fpstate)
 	__copy_kernel_to_fpregs(fpstate, -1);
 }
 
+#ifdef CONFIG_X86_CET
+extern int save_cet_to_sigframe(int ia32, void __user *fp,
+				unsigned long restorer);
+#else
+static inline int save_cet_to_sigframe(int ia32, void __user *fp,
+				       unsigned long restorer)
+{
+	return 0;
+}
+#endif
 extern int copy_fpstate_to_sigframe(void __user *buf, void __user *fp, int size);
 
 /*
diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index cc177b4431ae..d979d0deb3ae 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -234,6 +234,38 @@ static inline void clwb(volatile void *__p)
 		: [pax] "a" (p));
 }
 
+#ifdef CONFIG_X86_CET
+#if defined(CONFIG_IA32_EMULATION) || defined(CONFIG_X86_X32)
+static inline int write_user_shstk_32(unsigned long addr, unsigned int val)
+{
+	asm_volatile_goto("1: wrussd %1, (%0)\n"
+			  _ASM_EXTABLE(1b, %l[fail])
+			  :: "r" (addr), "r" (val)
+			  :: fail);
+	return 0;
+fail:
+	return -EPERM;
+}
+#else
+static inline int write_user_shstk_32(unsigned long addr, unsigned int val)
+{
+	WARN_ONCE(1, "%s used but not supported.\n", __func__);
+	return -EFAULT;
+}
+#endif
+
+static inline int write_user_shstk_64(unsigned long addr, unsigned long val)
+{
+	asm_volatile_goto("1: wrussq %1, (%0)\n"
+			  _ASM_EXTABLE(1b, %l[fail])
+			  :: "r" (addr), "r" (val)
+			  :: fail);
+	return 0;
+fail:
+	return -EPERM;
+}
+#endif /* CONFIG_X86_CET */
+
 #define nop() asm volatile ("nop")
 
 static inline void serialize(void)
diff --git a/arch/x86/include/uapi/asm/sigcontext.h b/arch/x86/include/uapi/asm/sigcontext.h
index 844d60eb1882..cf2d55db3be4 100644
--- a/arch/x86/include/uapi/asm/sigcontext.h
+++ b/arch/x86/include/uapi/asm/sigcontext.h
@@ -196,6 +196,15 @@ struct _xstate {
 	/* New processor state extensions go here: */
 };
 
+/*
+ * Located at the end of sigcontext->fpstate, aligned to 8.
+ */
+struct sc_ext {
+	unsigned long total_size;
+	unsigned long ssp;
+	unsigned long wait_endbr;
+};
+
 /*
  * The 32-bit signal frame:
  */
diff --git a/arch/x86/kernel/cet.c b/arch/x86/kernel/cet.c
index f8b0a077594f..c3da4f59bd17 100644
--- a/arch/x86/kernel/cet.c
+++ b/arch/x86/kernel/cet.c
@@ -19,6 +19,8 @@
 #include <asm/fpu/xstate.h>
 #include <asm/fpu/types.h>
 #include <asm/cet.h>
+#include <asm/special_insns.h>
+#include <uapi/asm/sigcontext.h>
 
 static void start_update_msrs(void)
 {
@@ -72,6 +74,80 @@ static unsigned long alloc_shstk(unsigned long size, int flags)
 	return addr;
 }
 
+#define TOKEN_MODE_MASK	3UL
+#define TOKEN_MODE_64	1UL
+#define IS_TOKEN_64(token) (((token) & TOKEN_MODE_MASK) == TOKEN_MODE_64)
+#define IS_TOKEN_32(token) (((token) & TOKEN_MODE_MASK) == 0)
+
+/*
+ * Verify the restore token at the address of 'ssp' is
+ * valid and then set shadow stack pointer according to the
+ * token.
+ */
+int cet_verify_rstor_token(bool ia32, unsigned long ssp,
+			   unsigned long *new_ssp)
+{
+	unsigned long token;
+
+	*new_ssp = 0;
+
+	if (!IS_ALIGNED(ssp, 8))
+		return -EINVAL;
+
+	if (get_user(token, (unsigned long __user *)ssp))
+		return -EFAULT;
+
+	/* Is 64-bit mode flag correct? */
+	if (!ia32 && !IS_TOKEN_64(token))
+		return -EINVAL;
+	else if (ia32 && !IS_TOKEN_32(token))
+		return -EINVAL;
+
+	token &= ~TOKEN_MODE_MASK;
+
+	/*
+	 * Restore address properly aligned?
+	 */
+	if ((!ia32 && !IS_ALIGNED(token, 8)) || !IS_ALIGNED(token, 4))
+		return -EINVAL;
+
+	/*
+	 * Token was placed properly?
+	 */
+	if (((ALIGN_DOWN(token, 8) - 8) != ssp) || (token >= TASK_SIZE_MAX))
+		return -EINVAL;
+
+	*new_ssp = token;
+	return 0;
+}
+
+/*
+ * Create a restore token on the shadow stack.
+ * A token is always 8-byte and aligned to 8.
+ */
+static int create_rstor_token(bool ia32, unsigned long ssp,
+			      unsigned long *new_ssp)
+{
+	unsigned long addr;
+
+	*new_ssp = 0;
+
+	if ((!ia32 && !IS_ALIGNED(ssp, 8)) || !IS_ALIGNED(ssp, 4))
+		return -EINVAL;
+
+	addr = ALIGN_DOWN(ssp, 8) - 8;
+
+	/* Is the token for 64-bit? */
+	if (!ia32)
+		ssp |= TOKEN_MODE_64;
+
+	if (write_user_shstk_64(addr, ssp))
+		return -EFAULT;
+
+	*new_ssp = addr;
+	return 0;
+}
+
 int cet_setup_shstk(void)
 {
 	unsigned long addr, size;
@@ -145,3 +221,79 @@ void cet_free_shstk(struct task_struct *tsk)
 	cet->shstk_base = 0;
 	cet->shstk_size = 0;
 }
+
+/*
+ * Called from __fpu__restore_sig() and XSAVES buffer is protected by
+ * set_thread_flag(TIF_NEED_FPU_LOAD) in the slow path.
+ */
+void cet_restore_signal(struct sc_ext *sc_ext)
+{
+	struct cet_user_state *cet_user_state;
+	struct cet_status *cet = &current->thread.cet;
+	u64 msr_val = 0;
+
+	if (!static_cpu_has(X86_FEATURE_SHSTK))
+		return;
+
+	cet_user_state = get_xsave_addr(&current->thread.fpu.state.xsave,
+					XFEATURE_CET_USER);
+	if (!cet_user_state)
+		return;
+
+	if (cet->shstk_size) {
+		if (test_thread_flag(TIF_NEED_FPU_LOAD))
+			cet_user_state->user_ssp = sc_ext->ssp;
+		else
+			wrmsrl(MSR_IA32_PL3_SSP, sc_ext->ssp);
+
+		msr_val |= CET_SHSTK_EN;
+	}
+
+	if (test_thread_flag(TIF_NEED_FPU_LOAD))
+		cet_user_state->user_cet = msr_val;
+	else
+		wrmsrl(MSR_IA32_U_CET, msr_val);
+}
+
+/*
+ * Setup the shadow stack for the signal handler: first,
+ * create a restore token to keep track of the current ssp,
+ * and then the return address of the signal handler.
+ */
+int cet_setup_signal(bool ia32, unsigned long rstor_addr, struct sc_ext *sc_ext)
+{
+	struct cet_status *cet = &current->thread.cet;
+	unsigned long ssp = 0, new_ssp = 0;
+	int err;
+
+	if (cet->shstk_size) {
+		if (!rstor_addr)
+			return -EINVAL;
+
+		ssp = cet_get_shstk_addr();
+		err = create_rstor_token(ia32, ssp, &new_ssp);
+		if (err)
+			return err;
+
+		if (ia32) {
+			ssp = new_ssp - sizeof(u32);
+			err = write_user_shstk_32(ssp, (unsigned int)rstor_addr);
+		} else {
+			ssp = new_ssp - sizeof(u64);
+			err = write_user_shstk_64(ssp, rstor_addr);
+		}
+
+		if (err)
+			return err;
+
+		sc_ext->ssp = new_ssp;
+	}
+
+	if (ssp) {
+		start_update_msrs();
+		wrmsrl(MSR_IA32_PL3_SSP, ssp);
+		end_update_msrs();
+	}
+
+	return 0;
+}
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index a4ec65317a7f..270e4649f435 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -52,6 +52,74 @@ static inline int check_for_xstate(struct fxregs_state __user *buf,
 	return 0;
 }
 
+#ifdef CONFIG_X86_CET
+int save_cet_to_sigframe(int ia32, void __user *fp, unsigned long restorer)
+{
+	int err = 0;
+
+	if (!current->thread.cet.shstk_size)
+		return 0;
+
+	if (fp) {
+		struct sc_ext ext = {};
+
+		err = cet_setup_signal(ia32, restorer, &ext);
+		if (!err) {
+			void __user *p = fp;
+
+			ext.total_size = sizeof(ext);
+
+			if (ia32)
+				p += sizeof(struct fregs_state);
+
+			p += fpu_user_xstate_size + FP_XSTATE_MAGIC2_SIZE;
+			p = (void __user *)ALIGN((unsigned long)p, 8);
+
+			if (copy_to_user(p, &ext, sizeof(ext)))
+				return -EFAULT;
+		}
+	}
+
+	return err;
+}
+
+static int get_cet_from_sigframe(int ia32, void __user *fp, struct sc_ext *ext)
+{
+	int err = 0;
+
+	memset(ext, 0, sizeof(*ext));
+
+	if (!current->thread.cet.shstk_size)
+		return 0;
+
+	if (fp) {
+		void __user *p = fp;
+
+		if (ia32)
+			p += sizeof(struct fregs_state);
+
+		p += fpu_user_xstate_size + FP_XSTATE_MAGIC2_SIZE;
+		p = (void __user *)ALIGN((unsigned long)p, 8);
+
+		if (copy_from_user(ext, p, sizeof(*ext)))
+			return -EFAULT;
+
+		if (ext->total_size != sizeof(*ext))
+			return -EFAULT;
+
+		if (current->thread.cet.shstk_size)
+			err = cet_verify_rstor_token(ia32, ext->ssp, &ext->ssp);
+	}
+
+	return err;
+}
+#else
+static int get_cet_from_sigframe(int ia32, void __user *fp, struct sc_ext *ext)
+{
+	return 0;
+}
+#endif
+
 /*
  * Signal frame handlers.
  */
@@ -295,6 +363,7 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 	struct task_struct *tsk = current;
 	struct fpu *fpu = &tsk->thread.fpu;
 	struct user_i387_ia32_struct env;
+	struct sc_ext sc_ext;
 	u64 user_xfeatures = 0;
 	int fx_only = 0;
 	int ret = 0;
@@ -335,6 +404,10 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 	if ((unsigned long)buf_fx % 64)
 		fx_only = 1;
 
+	ret = get_cet_from_sigframe(ia32_fxstate, buf, &sc_ext);
+	if (ret)
+		return ret;
+
 	if (!ia32_fxstate) {
 		/*
 		 * Attempt to restore the FPU registers directly from user
@@ -349,6 +422,8 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 		pagefault_enable();
 		if (!ret) {
 
+			cet_restore_signal(&sc_ext);
+
 			/*
 			 * Restore supervisor states: previous context switch
 			 * etc has done XSAVES and saved the supervisor states
@@ -423,6 +498,8 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 		if (unlikely(init_bv))
 			copy_kernel_to_xregs(&init_fpstate.xsave, init_bv);
 
+		cet_restore_signal(&sc_ext);
+
 		/*
 		 * Restore previously saved supervisor xstates along with
 		 * copied-in user xstates.
@@ -491,12 +568,35 @@ int fpu__restore_sig(void __user *buf, int ia32_frame)
 	return __fpu__restore_sig(buf, buf_fx, size);
 }
 
+#ifdef CONFIG_X86_CET
+static unsigned long fpu__alloc_sigcontext_ext(unsigned long sp)
+{
+	struct cet_status *cet = &current->thread.cet;
+
+	/*
+	 * sigcontext_ext is at: fpu + fpu_user_xstate_size +
+	 * FP_XSTATE_MAGIC2_SIZE, then aligned to 8.
+	 */
+	if (cet->shstk_size)
+		sp -= (sizeof(struct sc_ext) + 8);
+
+	return sp;
+}
+#else
+static unsigned long fpu__alloc_sigcontext_ext(unsigned long sp)
+{
+	return sp;
+}
+#endif
+
 unsigned long
 fpu__alloc_mathframe(unsigned long sp, int ia32_frame,
 		     unsigned long *buf_fx, unsigned long *size)
 {
 	unsigned long frame_size = xstate_sigframe_size();
 
+	sp = fpu__alloc_sigcontext_ext(sp);
+
 	*buf_fx = sp = round_down(sp - frame_size, 64);
 	if (ia32_frame && use_fxsr()) {
 		frame_size += sizeof(struct fregs_state);
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index ea794a083c44..1807379f1d86 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -46,6 +46,7 @@
 #include <asm/syscall.h>
 #include <asm/sigframe.h>
 #include <asm/signal.h>
+#include <asm/cet.h>
 
 #ifdef CONFIG_X86_64
 /*
@@ -239,6 +240,9 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size,
 	unsigned long buf_fx = 0;
 	int onsigstack = on_sig_stack(sp);
 	int ret;
+#ifdef CONFIG_X86_64
+	void __user *restorer = NULL;
+#endif
 
 	/* redzone */
 	if (IS_ENABLED(CONFIG_X86_64))
@@ -270,6 +274,12 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size,
 	if (onsigstack && !likely(on_sig_stack(sp)))
 		return (void __user *)-1L;
 
+#ifdef CONFIG_X86_64
+	if (ka->sa.sa_flags & SA_RESTORER)
+		restorer = ka->sa.sa_restorer;
+	ret = save_cet_to_sigframe(0, *fpstate, (unsigned long)restorer);
+#endif
+
 	/* save i387 and extended state */
 	ret = copy_fpstate_to_sigframe(*fpstate, (void __user *)buf_fx, math_size);
 	if (ret < 0)
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v19 22/25] ELF: Introduce arch_setup_elf_property()
  2021-02-03 22:55 [PATCH v19 00/25] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (20 preceding siblings ...)
  2021-02-03 22:55 ` [PATCH v19 21/25] x86/cet/shstk: Handle signals for shadow stack Yu-cheng Yu
@ 2021-02-03 22:55 ` Yu-cheng Yu
  2021-02-04 20:33   ` Kees Cook
  2021-02-03 22:55 ` [PATCH v19 23/25] x86/cet/shstk: Handle thread shadow stack Yu-cheng Yu
                   ` (2 subsequent siblings)
  24 siblings, 1 reply; 67+ messages in thread
From: Yu-cheng Yu @ 2021-02-03 22:55 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu
  Cc: Yu-cheng Yu, Mark Brown, Catalin Marinas

An ELF file's .note.gnu.property indicates arch features supported by the
file.  These features are extracted by arch_parse_elf_property() and stored
in 'arch_elf_state'.

Introduce x86 feature definitions and arch_setup_elf_property(), which
enables such features.  The first use-case of this function is Shadow
Stack.

ARM64 is the other arch that has ARCH_USE_GNU_PROPERTY and arch_parse_elf_
property().  Add arch_setup_elf_property() for it.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Martin <Dave.Martin@arm.com>
---
 arch/arm64/include/asm/elf.h |  5 +++++
 arch/x86/Kconfig             |  2 ++
 arch/x86/include/asm/elf.h   | 13 +++++++++++++
 arch/x86/kernel/process_64.c | 32 ++++++++++++++++++++++++++++++++
 fs/binfmt_elf.c              |  4 ++++
 include/linux/elf.h          |  6 ++++++
 include/uapi/linux/elf.h     |  9 +++++++++
 7 files changed, 71 insertions(+)

diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h
index 8d1c8dcb87fd..d37bc7915935 100644
--- a/arch/arm64/include/asm/elf.h
+++ b/arch/arm64/include/asm/elf.h
@@ -281,6 +281,11 @@ static inline int arch_parse_elf_property(u32 type, const void *data,
 	return 0;
 }
 
+static inline int arch_setup_elf_property(struct arch_elf_state *arch)
+{
+	return 0;
+}
+
 static inline int arch_elf_pt_proc(void *ehdr, void *phdr,
 				   struct file *f, bool is_interp,
 				   struct arch_elf_state *state)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 35de64559a59..4628d4dadd34 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1965,6 +1965,8 @@ config X86_CET
 	select ARCH_USES_HIGH_VMA_FLAGS
 	select ARCH_HAS_SHADOW_STACK
 	select ARCH_MAYBE_MKWRITE
+	select ARCH_USE_GNU_PROPERTY
+	select ARCH_BINFMT_ELF_STATE
 	help
 	  Control-flow protection is a set of hardware features which place
 	  additional restrictions on indirect branches.  These help
diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index 66bdfe838d61..13701eaa521c 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -390,6 +390,19 @@ extern int compat_arch_setup_additional_pages(struct linux_binprm *bprm,
 
 extern bool arch_syscall_is_vdso_sigreturn(struct pt_regs *regs);
 
+#ifdef CONFIG_ARCH_BINFMT_ELF_STATE
+struct arch_elf_state {
+	unsigned int gnu_property;
+};
+
+#define INIT_ARCH_ELF_STATE {	\
+	.gnu_property = 0,	\
+}
+
+#define arch_elf_pt_proc(ehdr, phdr, elf, interp, state) (0)
+#define arch_check_elf(ehdr, interp, interp_ehdr, state) (0)
+#endif
+
 /* Do not change the values. See get_align_mask() */
 enum align_flags {
 	ALIGN_VA_32	= BIT(0),
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index ad582f9ac5a6..19f138f7a209 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -835,3 +835,35 @@ unsigned long KSTK_ESP(struct task_struct *task)
 {
 	return task_pt_regs(task)->sp;
 }
+
+#ifdef CONFIG_ARCH_USE_GNU_PROPERTY
+int arch_parse_elf_property(u32 type, const void *data, size_t datasz,
+			    bool compat, struct arch_elf_state *state)
+{
+	if (type != GNU_PROPERTY_X86_FEATURE_1_AND)
+		return 0;
+
+	if (datasz != sizeof(unsigned int))
+		return -ENOEXEC;
+
+	state->gnu_property = *(unsigned int *)data;
+	return 0;
+}
+
+int arch_setup_elf_property(struct arch_elf_state *state)
+{
+	int r = 0;
+
+	if (!IS_ENABLED(CONFIG_X86_CET))
+		return r;
+
+	memset(&current->thread.cet, 0, sizeof(struct cet_status));
+
+	if (static_cpu_has(X86_FEATURE_SHSTK)) {
+		if (state->gnu_property & GNU_PROPERTY_X86_FEATURE_1_SHSTK)
+			r = cet_setup_shstk();
+	}
+
+	return r;
+}
+#endif
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 950bc177238a..5d5f6a54a035 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1245,6 +1245,10 @@ static int load_elf_binary(struct linux_binprm *bprm)
 
 	set_binfmt(&elf_format);
 
+	retval = arch_setup_elf_property(&arch_state);
+	if (retval < 0)
+		goto out;
+
 #ifdef ARCH_HAS_SETUP_ADDITIONAL_PAGES
 	retval = ARCH_SETUP_ADDITIONAL_PAGES(bprm, elf_ex, !!interpreter);
 	if (retval < 0)
diff --git a/include/linux/elf.h b/include/linux/elf.h
index c9a46c4e183b..be04d15e937f 100644
--- a/include/linux/elf.h
+++ b/include/linux/elf.h
@@ -92,9 +92,15 @@ static inline int arch_parse_elf_property(u32 type, const void *data,
 {
 	return 0;
 }
+
+static inline int arch_setup_elf_property(struct arch_elf_state *arch)
+{
+	return 0;
+}
 #else
 extern int arch_parse_elf_property(u32 type, const void *data, size_t datasz,
 				   bool compat, struct arch_elf_state *arch);
+extern int arch_setup_elf_property(struct arch_elf_state *arch);
 #endif
 
 #ifdef CONFIG_ARCH_HAVE_ELF_PROT
diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
index 30f68b42eeb5..24ba55ba8278 100644
--- a/include/uapi/linux/elf.h
+++ b/include/uapi/linux/elf.h
@@ -455,4 +455,13 @@ typedef struct elf64_note {
 /* Bits for GNU_PROPERTY_AARCH64_FEATURE_1_BTI */
 #define GNU_PROPERTY_AARCH64_FEATURE_1_BTI	(1U << 0)
 
+/* .note.gnu.property types for x86: */
+#define GNU_PROPERTY_X86_FEATURE_1_AND		0xc0000002
+
+/* Bits for GNU_PROPERTY_X86_FEATURE_1_AND */
+#define GNU_PROPERTY_X86_FEATURE_1_IBT		0x00000001
+#define GNU_PROPERTY_X86_FEATURE_1_SHSTK	0x00000002
+#define GNU_PROPERTY_X86_FEATURE_1_VALID (GNU_PROPERTY_X86_FEATURE_1_IBT | \
+					   GNU_PROPERTY_X86_FEATURE_1_SHSTK)
+
 #endif /* _UAPI_LINUX_ELF_H */
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v19 23/25] x86/cet/shstk: Handle thread shadow stack
  2021-02-03 22:55 [PATCH v19 00/25] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (21 preceding siblings ...)
  2021-02-03 22:55 ` [PATCH v19 22/25] ELF: Introduce arch_setup_elf_property() Yu-cheng Yu
@ 2021-02-03 22:55 ` Yu-cheng Yu
  2021-02-03 22:55 ` [PATCH v19 24/25] x86/cet/shstk: Add arch_prctl functions for " Yu-cheng Yu
  2021-02-03 22:55 ` [PATCH v19 25/25] mm: Introduce PROT_SHSTK " Yu-cheng Yu
  24 siblings, 0 replies; 67+ messages in thread
From: Yu-cheng Yu @ 2021-02-03 22:55 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu
  Cc: Yu-cheng Yu

The kernel allocates (and frees on thread exit) a new shadow stack for a
pthread child.

    It is possible for the kernel to complete the clone syscall and set the
    child's shadow stack pointer to NULL and let the child thread allocate
    a shadow stack for itself.  There are two issues in this approach: It
    is not compatible with existing code that does inline syscall and it
    cannot handle signals before the child can successfully allocate a
    shadow stack.

A 64-bit shadow stack has a size of min(RLIMIT_STACK, 4 GB).  A compat-mode
thread shadow stack has a size of 1/4 min(RLIMIT_STACK, 4 GB).  This allows
more threads to run in a 32-bit address space.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/include/asm/cet.h         |  3 ++
 arch/x86/include/asm/mmu_context.h |  3 ++
 arch/x86/kernel/cet.c              | 44 ++++++++++++++++++++++++++++++
 arch/x86/kernel/process.c          |  8 ++++++
 4 files changed, 58 insertions(+)

diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h
index 73435856ce54..ec4b5e62d0ce 100644
--- a/arch/x86/include/asm/cet.h
+++ b/arch/x86/include/asm/cet.h
@@ -18,12 +18,15 @@ struct cet_status {
 
 #ifdef CONFIG_X86_CET
 int cet_setup_shstk(void);
+int cet_setup_thread_shstk(struct task_struct *p, unsigned long clone_flags);
 void cet_disable_shstk(void);
 void cet_free_shstk(struct task_struct *p);
 int cet_verify_rstor_token(bool ia32, unsigned long ssp, unsigned long *new_ssp);
 void cet_restore_signal(struct sc_ext *sc);
 int cet_setup_signal(bool ia32, unsigned long rstor, struct sc_ext *sc);
 #else
+static inline int cet_setup_thread_shstk(struct task_struct *p,
+					 unsigned long clone_flags) { return 0; }
 static inline void cet_disable_shstk(void) {}
 static inline void cet_free_shstk(struct task_struct *p) {}
 static inline void cet_restore_signal(struct sc_ext *sc) { return; }
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 27516046117a..e90bd2ee8498 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -11,6 +11,7 @@
 
 #include <asm/tlbflush.h>
 #include <asm/paravirt.h>
+#include <asm/cet.h>
 #include <asm/debugreg.h>
 
 extern atomic64_t last_mm_ctx_id;
@@ -146,6 +147,8 @@ do {						\
 #else
 #define deactivate_mm(tsk, mm)			\
 do {						\
+	if (!tsk->vfork_done)			\
+		cet_free_shstk(tsk);		\
 	load_gs_index(0);			\
 	loadsegment(fs, 0);			\
 } while (0)
diff --git a/arch/x86/kernel/cet.c b/arch/x86/kernel/cet.c
index c3da4f59bd17..feb466dc2ea8 100644
--- a/arch/x86/kernel/cet.c
+++ b/arch/x86/kernel/cet.c
@@ -172,6 +172,50 @@ int cet_setup_shstk(void)
 	return 0;
 }
 
+int cet_setup_thread_shstk(struct task_struct *tsk, unsigned long clone_flags)
+{
+	unsigned long addr, size;
+	struct cet_user_state *state;
+	struct cet_status *cet = &tsk->thread.cet;
+
+	if (!cet->shstk_size)
+		return 0;
+
+	if ((clone_flags & (CLONE_VFORK | CLONE_VM)) != CLONE_VM)
+		return 0;
+
+	state = get_xsave_addr(&tsk->thread.fpu.state.xsave,
+			       XFEATURE_CET_USER);
+
+	if (!state)
+		return -EINVAL;
+
+	/* Cap shadow stack size to 4 GB */
+	size = min(rlimit(RLIMIT_STACK), 1UL << 32);
+
+	/*
+	 * Compat-mode pthreads share a limited address space.
+	 * If each function call takes an average of four slots
+	 * stack space, allocate 1/4 of stack size for shadow stack.
+	 */
+	if (in_compat_syscall())
+		size /= 4;
+	size = round_up(size, PAGE_SIZE);
+	addr = alloc_shstk(size, 0);
+
+	if (IS_ERR_VALUE(addr)) {
+		cet->shstk_base = 0;
+		cet->shstk_size = 0;
+		return PTR_ERR((void *)addr);
+	}
+
+	fpu__prepare_write(&tsk->thread.fpu);
+	state->user_ssp = (u64)(addr + size);
+	cet->shstk_base = addr;
+	cet->shstk_size = size;
+	return 0;
+}
+
 void cet_disable_shstk(void)
 {
 	struct cet_status *cet = &current->thread.cet;
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 145a7ac0c19a..3af6b36e1a5c 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -43,6 +43,7 @@
 #include <asm/io_bitmap.h>
 #include <asm/proto.h>
 #include <asm/frame.h>
+#include <asm/cet.h>
 
 #include "process.h"
 
@@ -109,6 +110,7 @@ void exit_thread(struct task_struct *tsk)
 
 	free_vm86(t);
 
+	cet_free_shstk(tsk);
 	fpu__drop(fpu);
 }
 
@@ -181,6 +183,12 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg,
 	if (clone_flags & CLONE_SETTLS)
 		ret = set_new_tls(p, tls);
 
+#ifdef CONFIG_X86_64
+	/* Allocate a new shadow stack for pthread */
+	if (!ret)
+		ret = cet_setup_thread_shstk(p, clone_flags);
+#endif
+
 	if (!ret && unlikely(test_tsk_thread_flag(current, TIF_IO_BITMAP)))
 		io_bitmap_share(p);
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v19 24/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
  2021-02-03 22:55 [PATCH v19 00/25] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (22 preceding siblings ...)
  2021-02-03 22:55 ` [PATCH v19 23/25] x86/cet/shstk: Handle thread shadow stack Yu-cheng Yu
@ 2021-02-03 22:55 ` Yu-cheng Yu
  2021-02-04 20:35   ` Kees Cook
  2021-02-03 22:55 ` [PATCH v19 25/25] mm: Introduce PROT_SHSTK " Yu-cheng Yu
  24 siblings, 1 reply; 67+ messages in thread
From: Yu-cheng Yu @ 2021-02-03 22:55 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu
  Cc: Yu-cheng Yu

arch_prctl(ARCH_X86_CET_STATUS, u64 *args)
    Get CET feature status.

    The parameter 'args' is a pointer to a user buffer.  The kernel returns
    the following information:

    *args = shadow stack/IBT status
    *(args + 1) = shadow stack base address
    *(args + 2) = shadow stack size

    32-bit binaries use the same interface, but only lower 32-bits of each
    item.

arch_prctl(ARCH_X86_CET_DISABLE, unsigned int features)
    Disable CET features specified in 'features'.  Return -EPERM if CET is
    locked.

arch_prctl(ARCH_X86_CET_LOCK)
    Lock in CET features.

Also change do_arch_prctl_common()'s parameter 'cpuid_enabled' to
'arg2', as it is now also passed to prctl_cet().

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/include/asm/cet.h        |  3 ++
 arch/x86/include/uapi/asm/prctl.h |  4 +++
 arch/x86/kernel/Makefile          |  2 +-
 arch/x86/kernel/cet_prctl.c       | 60 +++++++++++++++++++++++++++++++
 arch/x86/kernel/process.c         |  6 ++--
 5 files changed, 71 insertions(+), 4 deletions(-)
 create mode 100644 arch/x86/kernel/cet_prctl.c

diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h
index ec4b5e62d0ce..16870e5bc8eb 100644
--- a/arch/x86/include/asm/cet.h
+++ b/arch/x86/include/asm/cet.h
@@ -14,9 +14,11 @@ struct sc_ext;
 struct cet_status {
 	unsigned long	shstk_base;
 	unsigned long	shstk_size;
+	unsigned int	locked:1;
 };
 
 #ifdef CONFIG_X86_CET
+int prctl_cet(int option, u64 arg2);
 int cet_setup_shstk(void);
 int cet_setup_thread_shstk(struct task_struct *p, unsigned long clone_flags);
 void cet_disable_shstk(void);
@@ -25,6 +27,7 @@ int cet_verify_rstor_token(bool ia32, unsigned long ssp, unsigned long *new_ssp)
 void cet_restore_signal(struct sc_ext *sc);
 int cet_setup_signal(bool ia32, unsigned long rstor, struct sc_ext *sc);
 #else
+static inline int prctl_cet(int option, u64 arg2) { return -EINVAL; }
 static inline int cet_setup_thread_shstk(struct task_struct *p,
 					 unsigned long clone_flags) { return 0; }
 static inline void cet_disable_shstk(void) {}
diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h
index 5a6aac9fa41f..9245bf629120 100644
--- a/arch/x86/include/uapi/asm/prctl.h
+++ b/arch/x86/include/uapi/asm/prctl.h
@@ -14,4 +14,8 @@
 #define ARCH_MAP_VDSO_32	0x2002
 #define ARCH_MAP_VDSO_64	0x2003
 
+#define ARCH_X86_CET_STATUS		0x3001
+#define ARCH_X86_CET_DISABLE		0x3002
+#define ARCH_X86_CET_LOCK		0x3003
+
 #endif /* _ASM_X86_PRCTL_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 4a9a7e7d00dc..2f60a28769f9 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -151,7 +151,7 @@ obj-$(CONFIG_UNWINDER_FRAME_POINTER)	+= unwind_frame.o
 obj-$(CONFIG_UNWINDER_GUESS)		+= unwind_guess.o
 
 obj-$(CONFIG_AMD_MEM_ENCRYPT)		+= sev-es.o
-obj-$(CONFIG_X86_CET)			+= cet.o
+obj-$(CONFIG_X86_CET)			+= cet.o cet_prctl.o
 
 ###
 # 64 bit specific files
diff --git a/arch/x86/kernel/cet_prctl.c b/arch/x86/kernel/cet_prctl.c
new file mode 100644
index 000000000000..0030c63a08c0
--- /dev/null
+++ b/arch/x86/kernel/cet_prctl.c
@@ -0,0 +1,60 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/errno.h>
+#include <linux/uaccess.h>
+#include <linux/prctl.h>
+#include <linux/compat.h>
+#include <linux/mman.h>
+#include <linux/elfcore.h>
+#include <linux/processor.h>
+#include <asm/prctl.h>
+#include <asm/cet.h>
+
+/* See Documentation/x86/intel_cet.rst. */
+
+static int cet_copy_status_to_user(struct cet_status *cet, u64 __user *ubuf)
+{
+	u64 buf[3] = {};
+
+	if (cet->shstk_size) {
+		buf[0] |= GNU_PROPERTY_X86_FEATURE_1_SHSTK;
+		buf[1] = cet->shstk_base;
+		buf[2] = cet->shstk_size;
+	}
+
+	return copy_to_user(ubuf, buf, sizeof(buf));
+}
+
+int prctl_cet(int option, u64 arg2)
+{
+	struct cet_status *cet;
+
+	if (!cpu_feature_enabled(X86_FEATURE_CET))
+		return -ENOTSUPP;
+
+	cet = &current->thread.cet;
+
+	if (option == ARCH_X86_CET_STATUS)
+		return cet_copy_status_to_user(cet, (u64 __user *)arg2);
+
+	switch (option) {
+	case ARCH_X86_CET_DISABLE:
+		if (cet->locked)
+			return -EPERM;
+
+		if (arg2 & ~GNU_PROPERTY_X86_FEATURE_1_VALID)
+			return -EINVAL;
+		if (arg2 & GNU_PROPERTY_X86_FEATURE_1_SHSTK)
+			cet_disable_shstk();
+		return 0;
+
+	case ARCH_X86_CET_LOCK:
+		if (arg2)
+			return -EINVAL;
+		cet->locked = 1;
+		return 0;
+
+	default:
+		return -ENOSYS;
+	}
+}
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 3af6b36e1a5c..9e11e5f589f3 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -979,14 +979,14 @@ unsigned long get_wchan(struct task_struct *p)
 }
 
 long do_arch_prctl_common(struct task_struct *task, int option,
-			  unsigned long cpuid_enabled)
+			  unsigned long arg2)
 {
 	switch (option) {
 	case ARCH_GET_CPUID:
 		return get_cpuid_mode();
 	case ARCH_SET_CPUID:
-		return set_cpuid_mode(task, cpuid_enabled);
+		return set_cpuid_mode(task, arg2);
 	}
 
-	return -EINVAL;
+	return prctl_cet(option, arg2);
 }
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v19 25/25] mm: Introduce PROT_SHSTK for shadow stack
  2021-02-03 22:55 [PATCH v19 00/25] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (23 preceding siblings ...)
  2021-02-03 22:55 ` [PATCH v19 24/25] x86/cet/shstk: Add arch_prctl functions for " Yu-cheng Yu
@ 2021-02-03 22:55 ` Yu-cheng Yu
  24 siblings, 0 replies; 67+ messages in thread
From: Yu-cheng Yu @ 2021-02-03 22:55 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu
  Cc: Yu-cheng Yu

There are three possible options to create a shadow stack allocation API:
an arch_prctl, a new syscall, or adding PROT_SHSTK to mmap()/mprotect().
Each has its advantages and compromises.

An arch_prctl() is the least intrusive.  However, the existing x86
arch_prctl() takes only two parameters.  Multiple parameters must be
passed in a memory buffer.  There is a proposal to pass more parameters in
registers [1], but no active discussion on that.

A new syscall minimizes compatibility issues and offers an extensible frame
work to other architectures, but this will likely result in some overlap of
mmap()/mprotect().

The introduction of PROT_SHSTK to mmap()/mprotect() takes advantage of
existing APIs.  The x86-specific PROT_SHSTK is translated to VM_SHSTK and
a shadow stack mapping is created without reinventing the wheel.  There are
potential pitfalls though.  The most obvious one would be using this as a
bypass to shadow stack protection.  However, the attacker would have to get
to the syscall first.

Since arch_calc_vm_prot_bits() is modified, I have moved arch_vm_get_page
_prot() and arch_calc_vm_prot_bits() to x86/include/asm/mman.h.
This will be more consistent with other architectures.

[1] https://lore.kernel.org/lkml/20200828121624.108243-1-hjl.tools@gmail.com/

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/include/asm/mman.h      | 85 ++++++++++++++++++++++++++++++++
 arch/x86/include/uapi/asm/mman.h | 28 ++---------
 include/linux/mm.h               |  1 +
 mm/mmap.c                        |  8 ++-
 4 files changed, 97 insertions(+), 25 deletions(-)
 create mode 100644 arch/x86/include/asm/mman.h

diff --git a/arch/x86/include/asm/mman.h b/arch/x86/include/asm/mman.h
new file mode 100644
index 000000000000..db897d14fc37
--- /dev/null
+++ b/arch/x86/include/asm/mman.h
@@ -0,0 +1,85 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_MMAN_H
+#define _ASM_X86_MMAN_H
+
+#include <linux/mm.h>
+#include <uapi/asm/mman.h>
+
+#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
+/*
+ * Take the 4 protection key bits out of the vma->vm_flags
+ * value and turn them in to the bits that we can put in
+ * to a pte.
+ *
+ * Only override these if Protection Keys are available
+ * (which is only on 64-bit).
+ */
+#define arch_vm_get_page_prot(vm_flags)	__pgprot(	\
+		((vm_flags) & VM_PKEY_BIT0 ? _PAGE_PKEY_BIT0 : 0) |	\
+		((vm_flags) & VM_PKEY_BIT1 ? _PAGE_PKEY_BIT1 : 0) |	\
+		((vm_flags) & VM_PKEY_BIT2 ? _PAGE_PKEY_BIT2 : 0) |	\
+		((vm_flags) & VM_PKEY_BIT3 ? _PAGE_PKEY_BIT3 : 0))
+
+#define pkey_vm_prot_bits(prot, key) (			\
+		((key) & 0x1 ? VM_PKEY_BIT0 : 0) |      \
+		((key) & 0x2 ? VM_PKEY_BIT1 : 0) |      \
+		((key) & 0x4 ? VM_PKEY_BIT2 : 0) |      \
+		((key) & 0x8 ? VM_PKEY_BIT3 : 0))
+#else
+#define pkey_vm_prot_bits(prot, key) (0)
+#endif
+
+static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot,
+	unsigned long pkey)
+{
+	unsigned long vm_prot_bits = pkey_vm_prot_bits(prot, pkey);
+
+	if (!(prot & PROT_WRITE) && (prot & PROT_SHSTK))
+		vm_prot_bits |= VM_SHSTK;
+
+	return vm_prot_bits;
+}
+
+#define arch_calc_vm_prot_bits(prot, pkey) arch_calc_vm_prot_bits(prot, pkey)
+
+#ifdef CONFIG_X86_CET
+static inline bool arch_validate_prot(unsigned long prot, unsigned long addr)
+{
+	unsigned long valid = PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM;
+
+	if (prot & ~(valid | PROT_SHSTK))
+		return false;
+
+	if (prot & PROT_SHSTK) {
+		struct vm_area_struct *vma;
+
+		if (!current->thread.cet.shstk_size)
+			return false;
+
+		/*
+		 * A shadow stack mapping is indirectly writable by only
+		 * the CALL and WRUSS instructions, but not other write
+		 * instructions).  PROT_SHSTK and PROT_WRITE are mutually
+		 * exclusive.
+		 */
+		if (prot & PROT_WRITE)
+			return false;
+
+		vma = find_vma(current->mm, addr);
+		if (!vma)
+			return false;
+
+		/*
+		 * Shadow stack cannot be backed by a file or shared.
+		 */
+		if (vma->vm_file || (vma->vm_flags & VM_SHARED))
+			return false;
+	}
+
+	return true;
+}
+
+#define arch_validate_prot arch_validate_prot
+#endif
+
+#endif /* _ASM_X86_MMAN_H */
diff --git a/arch/x86/include/uapi/asm/mman.h b/arch/x86/include/uapi/asm/mman.h
index d4a8d0424bfb..39bb7db344a6 100644
--- a/arch/x86/include/uapi/asm/mman.h
+++ b/arch/x86/include/uapi/asm/mman.h
@@ -1,31 +1,11 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
-#ifndef _ASM_X86_MMAN_H
-#define _ASM_X86_MMAN_H
+#ifndef _UAPI_ASM_X86_MMAN_H
+#define _UAPI_ASM_X86_MMAN_H
 
 #define MAP_32BIT	0x40		/* only give out 32bit addresses */
 
-#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
-/*
- * Take the 4 protection key bits out of the vma->vm_flags
- * value and turn them in to the bits that we can put in
- * to a pte.
- *
- * Only override these if Protection Keys are available
- * (which is only on 64-bit).
- */
-#define arch_vm_get_page_prot(vm_flags)	__pgprot(	\
-		((vm_flags) & VM_PKEY_BIT0 ? _PAGE_PKEY_BIT0 : 0) |	\
-		((vm_flags) & VM_PKEY_BIT1 ? _PAGE_PKEY_BIT1 : 0) |	\
-		((vm_flags) & VM_PKEY_BIT2 ? _PAGE_PKEY_BIT2 : 0) |	\
-		((vm_flags) & VM_PKEY_BIT3 ? _PAGE_PKEY_BIT3 : 0))
-
-#define arch_calc_vm_prot_bits(prot, key) (		\
-		((key) & 0x1 ? VM_PKEY_BIT0 : 0) |      \
-		((key) & 0x2 ? VM_PKEY_BIT1 : 0) |      \
-		((key) & 0x4 ? VM_PKEY_BIT2 : 0) |      \
-		((key) & 0x8 ? VM_PKEY_BIT3 : 0))
-#endif
+#define PROT_SHSTK	0x10		/* shadow stack pages */
 
 #include <asm-generic/mman.h>
 
-#endif /* _ASM_X86_MMAN_H */
+#endif /* _UAPI_ASM_X86_MMAN_H */
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 06e9899e13b8..f3a3b6be97e2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -342,6 +342,7 @@ extern unsigned int kobjsize(const void *objp);
 
 #if defined(CONFIG_X86)
 # define VM_PAT		VM_ARCH_1	/* PAT reserves whole VMA at once (x86) */
+# define VM_ARCH_CLEAR	VM_SHSTK
 #elif defined(CONFIG_PPC)
 # define VM_SAO		VM_ARCH_1	/* Strong Access Ordering (powerpc) */
 #elif defined(CONFIG_PARISC)
diff --git a/mm/mmap.c b/mm/mmap.c
index 95ce7cd68654..9434469db2ce 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1483,6 +1483,12 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 		struct inode *inode = file_inode(file);
 		unsigned long flags_mask;
 
+		/*
+		 * Call stack cannot be backed by a file.
+		 */
+		if (vm_flags & VM_SHSTK)
+			return -EINVAL;
+
 		if (!file_mmap_ok(file, inode, pgoff, len))
 			return -EOVERFLOW;
 
@@ -1547,7 +1553,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 	} else {
 		switch (flags & MAP_TYPE) {
 		case MAP_SHARED:
-			if (vm_flags & (VM_GROWSDOWN|VM_GROWSUP))
+			if (vm_flags & (VM_GROWSDOWN|VM_GROWSUP|VM_SHSTK))
 				return -EINVAL;
 			/*
 			 * Ignore pgoff.
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCH v19 01/25] Documentation/x86: Add CET description
  2021-02-03 22:55 ` [PATCH v19 01/25] Documentation/x86: Add CET description Yu-cheng Yu
@ 2021-02-04 19:52   ` Kees Cook
  2021-02-04 20:03     ` Yu, Yu-cheng
  0 siblings, 1 reply; 67+ messages in thread
From: Kees Cook @ 2021-02-04 19:52 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu

On Wed, Feb 03, 2021 at 02:55:23PM -0800, Yu-cheng Yu wrote:
> Explain no_user_shstk/no_user_ibt kernel parameters, and introduce a new
> document on Control-flow Enforcement Technology (CET).
> 
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v19 02/25] x86/cet/shstk: Add Kconfig option for user-mode control-flow protection
  2021-02-03 22:55 ` [PATCH v19 02/25] x86/cet/shstk: Add Kconfig option for user-mode control-flow protection Yu-cheng Yu
@ 2021-02-04 19:56   ` Kees Cook
  2021-02-05  0:05     ` Yu, Yu-cheng
  0 siblings, 1 reply; 67+ messages in thread
From: Kees Cook @ 2021-02-04 19:56 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu

On Wed, Feb 03, 2021 at 02:55:24PM -0800, Yu-cheng Yu wrote:
> Shadow Stack provides protection against function return address
> corruption.  It is active when the processor supports it, the kernel has
> CONFIG_X86_CET enabled, and the application is built for the feature.
> This is only implemented for the 64-bit kernel.  When it is enabled, legacy
> non-Shadow Stack applications continue to work, but without protection.
> 
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> ---
>  arch/x86/Kconfig           | 22 ++++++++++++++++++++++
>  arch/x86/Kconfig.assembler |  5 +++++
>  2 files changed, 27 insertions(+)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 21f851179ff0..074b3c0e6bf6 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1951,6 +1951,28 @@ config X86_SGX
>  
>  	  If unsure, say N.
>  
> +config ARCH_HAS_SHADOW_STACK
> +	def_bool n
> +
> +config X86_CET
> +	prompt "Intel Control-flow protection for user-mode"
> +	def_bool n
> +	depends on X86_64
> +	depends on AS_WRUSS
> +	select ARCH_USES_HIGH_VMA_FLAGS
> +	select ARCH_HAS_SHADOW_STACK

This seems backwards to me? Shouldn't 'config X86_64' do the 'select
ARCH_HAS_SHADOW_STACK' and 'config X86_CET' do a 'depends on
ARCH_HAS_SHADOW_STACK' instead?

> +	help
> +	  Control-flow protection is a set of hardware features which place
> +	  additional restrictions on indirect branches.  These help
> +	  mitigate ROP attacks.  Applications must be enabled to use it,
> +	  and old userspace does not get protection "for free".
> +	  Support for this feature is present on Tiger Lake family of
> +	  processors released in 2020 or later.  Enabling this feature
> +	  increases kernel text size by 3.7 KB.
> +	  See Documentation/x86/intel_cet.rst for more information.
> +
> +	  If unsure, say N.
> +
>  config EFI
>  	bool "EFI runtime service support"
>  	depends on ACPI
> diff --git a/arch/x86/Kconfig.assembler b/arch/x86/Kconfig.assembler
> index 26b8c08e2fc4..00c79dd93651 100644
> --- a/arch/x86/Kconfig.assembler
> +++ b/arch/x86/Kconfig.assembler
> @@ -19,3 +19,8 @@ config AS_TPAUSE
>  	def_bool $(as-instr,tpause %ecx)
>  	help
>  	  Supported by binutils >= 2.31.1 and LLVM integrated assembler >= V7
> +
> +config AS_WRUSS
> +	def_bool $(as-instr,wrussq %rax$(comma)(%rbx))
> +	help
> +	  Supported by binutils >= 2.31 and LLVM integrated assembler
> -- 
> 2.21.0
> 

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v19 03/25] x86/cpufeatures: Add CET CPU feature flags for Control-flow Enforcement Technology (CET)
  2021-02-03 22:55 ` [PATCH v19 03/25] x86/cpufeatures: Add CET CPU feature flags for Control-flow Enforcement Technology (CET) Yu-cheng Yu
@ 2021-02-04 19:57   ` Kees Cook
  0 siblings, 0 replies; 67+ messages in thread
From: Kees Cook @ 2021-02-04 19:57 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu

On Wed, Feb 03, 2021 at 02:55:25PM -0800, Yu-cheng Yu wrote:
> Add CPU feature flags for Control-flow Enforcement Technology (CET).
> 
> CPUID.(EAX=7,ECX=0):ECX[bit 7] Shadow stack
> CPUID.(EAX=7,ECX=0):EDX[bit 20] Indirect Branch Tracking
> 
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v19 04/25] x86/cpufeatures: Introduce X86_FEATURE_CET and setup functions
  2021-02-03 22:55 ` [PATCH v19 04/25] x86/cpufeatures: Introduce X86_FEATURE_CET and setup functions Yu-cheng Yu
@ 2021-02-04 19:58   ` Kees Cook
  2021-02-05 13:43   ` Borislav Petkov
  1 sibling, 0 replies; 67+ messages in thread
From: Kees Cook @ 2021-02-04 19:58 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu

On Wed, Feb 03, 2021 at 02:55:26PM -0800, Yu-cheng Yu wrote:
> Introduce a software-defined X86_FEATURE_CET, which indicates either Shadow
> Stack or Indirect Branch Tracking (or both) is present.  Also introduce
> related cpu init/setup functions.
> 
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v19 05/25] x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states
  2021-02-03 22:55 ` [PATCH v19 05/25] x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states Yu-cheng Yu
@ 2021-02-04 19:59   ` Kees Cook
  0 siblings, 0 replies; 67+ messages in thread
From: Kees Cook @ 2021-02-04 19:59 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu

On Wed, Feb 03, 2021 at 02:55:27PM -0800, Yu-cheng Yu wrote:
> Control-flow Enforcement Technology (CET) introduces these MSRs:
> 
>     MSR_IA32_U_CET (user-mode CET settings),
>     MSR_IA32_PL3_SSP (user-mode shadow stack pointer),
> 
>     MSR_IA32_PL0_SSP (kernel-mode shadow stack pointer),
>     MSR_IA32_PL1_SSP (Privilege Level 1 shadow stack pointer),
>     MSR_IA32_PL2_SSP (Privilege Level 2 shadow stack pointer),
>     MSR_IA32_S_CET (kernel-mode CET settings),
>     MSR_IA32_INT_SSP_TAB (exception shadow stack table).
> 
> The two user-mode MSRs belong to XFEATURE_CET_USER.  The first three of
> kernel-mode MSRs belong to XFEATURE_CET_KERNEL.  Both XSAVES states are
> supervisor states.  This means that there is no direct, unprivileged access
> to these states, making it harder for an attacker to subvert CET.
> 
> For sigreturn and future ptrace() support, shadow stack address and MSR
> reserved bits are checked before written to the supervisor states.
> 
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v19 01/25] Documentation/x86: Add CET description
  2021-02-04 19:52   ` Kees Cook
@ 2021-02-04 20:03     ` Yu, Yu-cheng
  0 siblings, 0 replies; 67+ messages in thread
From: Yu, Yu-cheng @ 2021-02-04 20:03 UTC (permalink / raw)
  To: Kees Cook
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu

On 2/4/2021 11:52 AM, Kees Cook wrote:
> On Wed, Feb 03, 2021 at 02:55:23PM -0800, Yu-cheng Yu wrote:
>> Explain no_user_shstk/no_user_ibt kernel parameters, and introduce a new
>> document on Control-flow Enforcement Technology (CET).
>>
>> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> 
> Reviewed-by: Kees Cook <keescook@chromium.org>
> 

Thanks for reviewing the series.

--
Yu-cheng

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v19 06/25] x86/cet: Add control-protection fault handler
  2021-02-03 22:55 ` [PATCH v19 06/25] x86/cet: Add control-protection fault handler Yu-cheng Yu
@ 2021-02-04 20:09   ` Kees Cook
  2021-02-05  0:10     ` Yu, Yu-cheng
  2021-02-05 13:59   ` Borislav Petkov
  1 sibling, 1 reply; 67+ messages in thread
From: Kees Cook @ 2021-02-04 20:09 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu, Michael Kerrisk

On Wed, Feb 03, 2021 at 02:55:28PM -0800, Yu-cheng Yu wrote:
> A control-protection fault is triggered when a control-flow transfer
> attempt violates Shadow Stack or Indirect Branch Tracking constraints.
> For example, the return address for a RET instruction differs from the copy
> on the shadow stack; or an indirect JMP instruction, without the NOTRACK
> prefix, arrives at a non-ENDBR opcode.
> 
> The control-protection fault handler works in a similar way as the general
> protection fault handler.  It provides the si_code SEGV_CPERR to the signal
> handler.
> 
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> Cc: Michael Kerrisk <mtk.manpages@gmail.com>
> ---
>  arch/x86/include/asm/idtentry.h    |  4 ++
>  arch/x86/kernel/idt.c              |  4 ++
>  arch/x86/kernel/signal_compat.c    |  2 +-
>  arch/x86/kernel/traps.c            | 60 ++++++++++++++++++++++++++++++
>  include/uapi/asm-generic/siginfo.h |  3 +-
>  5 files changed, 71 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
> index f656aabd1545..ff4b3bf634da 100644
> --- a/arch/x86/include/asm/idtentry.h
> +++ b/arch/x86/include/asm/idtentry.h
> @@ -574,6 +574,10 @@ DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_SS,	exc_stack_segment);
>  DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_GP,	exc_general_protection);
>  DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_AC,	exc_alignment_check);
>  
> +#ifdef CONFIG_X86_CET
> +DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_CP, exc_control_protection);
> +#endif
> +
>  /* Raw exception entries which need extra work */
>  DECLARE_IDTENTRY_RAW(X86_TRAP_UD,		exc_invalid_op);
>  DECLARE_IDTENTRY_RAW(X86_TRAP_BP,		exc_int3);
> diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
> index ee1a283f8e96..e8166d9bbb10 100644
> --- a/arch/x86/kernel/idt.c
> +++ b/arch/x86/kernel/idt.c
> @@ -105,6 +105,10 @@ static const __initconst struct idt_data def_idts[] = {
>  #elif defined(CONFIG_X86_32)
>  	SYSG(IA32_SYSCALL_VECTOR,	entry_INT80_32),
>  #endif
> +
> +#ifdef CONFIG_X86_CET
> +	INTG(X86_TRAP_CP,		asm_exc_control_protection),
> +#endif
>  };
>  
>  /*
> diff --git a/arch/x86/kernel/signal_compat.c b/arch/x86/kernel/signal_compat.c
> index a5330ff498f0..dd92490b1e7f 100644
> --- a/arch/x86/kernel/signal_compat.c
> +++ b/arch/x86/kernel/signal_compat.c
> @@ -27,7 +27,7 @@ static inline void signal_compat_build_tests(void)
>  	 */
>  	BUILD_BUG_ON(NSIGILL  != 11);
>  	BUILD_BUG_ON(NSIGFPE  != 15);
> -	BUILD_BUG_ON(NSIGSEGV != 9);
> +	BUILD_BUG_ON(NSIGSEGV != 10);
>  	BUILD_BUG_ON(NSIGBUS  != 5);
>  	BUILD_BUG_ON(NSIGTRAP != 5);
>  	BUILD_BUG_ON(NSIGCHLD != 6);
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index 7f5aec758f0e..f5354c35df32 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -606,6 +606,66 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
>  	cond_local_irq_disable(regs);
>  }
>  
> +#ifdef CONFIG_X86_CET
> +static const char * const control_protection_err[] = {
> +	"unknown",
> +	"near-ret",
> +	"far-ret/iret",
> +	"endbranch",
> +	"rstorssp",
> +	"setssbsy",
> +};
> +
> +/*
> + * When a control protection exception occurs, send a signal to the responsible
> + * application.  Currently, control protection is only enabled for user mode.
> + * This exception should not come from kernel mode.
> + */
> +DEFINE_IDTENTRY_ERRORCODE(exc_control_protection)
> +{
> +	static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
> +				      DEFAULT_RATELIMIT_BURST);
> +	struct task_struct *tsk;
> +
> +	if (!user_mode(regs)) {
> +		pr_emerg("PANIC: unexpected kernel control protection fault\n");
> +		die("kernel control protection fault", regs, error_code);
> +		panic("Machine halted.");
> +	}
> +
> +	cond_local_irq_enable(regs);
> +
> +	if (!boot_cpu_has(X86_FEATURE_CET))
> +		WARN_ONCE(1, "Control protection fault with CET support disabled\n");
> +
> +	tsk = current;
> +	tsk->thread.error_code = error_code;
> +	tsk->thread.trap_nr = X86_TRAP_CP;
> +
> +	if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV) &&
> +	    __ratelimit(&rs)) {
> +		unsigned int max_err;
> +		unsigned long ssp;
> +
> +		max_err = ARRAY_SIZE(control_protection_err) - 1;
> +		if (error_code < 0 || error_code > max_err)
> +			error_code = 0;

Do you want to mask the error_code here before printing its value?

> +
> +		rdmsrl(MSR_IA32_PL3_SSP, ssp);
> +		pr_emerg("%s[%d] control protection ip:%lx sp:%lx ssp:%lx error:%lx(%s)",
> +			 tsk->comm, task_pid_nr(tsk),
> +			 regs->ip, regs->sp, ssp, error_code,
> +			 control_protection_err[error_code]);

Instead, you could clamp error_code to ARRAY_SIZE(control_protection_err),
and add another "unknown" to the end of the strings:

	control_protection_err[
		array_index_nospec(error_code,
				   ARRAY_SIZE(control_protection_err))]

Everything else looks good.

> +		print_vma_addr(KERN_CONT " in ", regs->ip);
> +		pr_cont("\n");
> +	}
> +
> +	force_sig_fault(SIGSEGV, SEGV_CPERR,
> +			(void __user *)uprobe_get_trap_addr(regs));
> +	cond_local_irq_disable(regs);
> +}
> +#endif
> +
>  static bool do_int3(struct pt_regs *regs)
>  {
>  	int res;
> diff --git a/include/uapi/asm-generic/siginfo.h b/include/uapi/asm-generic/siginfo.h
> index d2597000407a..1c2ea91284a0 100644
> --- a/include/uapi/asm-generic/siginfo.h
> +++ b/include/uapi/asm-generic/siginfo.h
> @@ -231,7 +231,8 @@ typedef struct siginfo {
>  #define SEGV_ADIPERR	7	/* Precise MCD exception */
>  #define SEGV_MTEAERR	8	/* Asynchronous ARM MTE error */
>  #define SEGV_MTESERR	9	/* Synchronous ARM MTE exception */
> -#define NSIGSEGV	9
> +#define SEGV_CPERR	10	/* Control protection fault */
> +#define NSIGSEGV	10
>  
>  /*
>   * SIGBUS si_codes
> -- 
> 2.21.0
> 

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v19 07/25] x86/mm: Remove _PAGE_DIRTY from kernel RO pages
  2021-02-03 22:55 ` [PATCH v19 07/25] x86/mm: Remove _PAGE_DIRTY from kernel RO pages Yu-cheng Yu
@ 2021-02-04 20:10   ` Kees Cook
  0 siblings, 0 replies; 67+ messages in thread
From: Kees Cook @ 2021-02-04 20:10 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu, Christoph Hellwig

On Wed, Feb 03, 2021 at 02:55:29PM -0800, Yu-cheng Yu wrote:
> The x86 family of processors do not directly create read-only and Dirty
> PTEs.  These PTEs are created by software.  One such case is that kernel
> read-only pages are historically setup as Dirty.
> 
> New processors that support Shadow Stack regard read-only and Dirty PTEs as
> shadow stack pages.  This results in ambiguity between shadow stack and
> kernel read-only pages.  To resolve this, removed Dirty from kernel read-
> only pages.

Does this have any other side-effects?

Otherwise, sure:

Reviewed-by: Kees Cook <keescook@chromium.org>

-Kees

> 
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> ---
>  arch/x86/include/asm/pgtable_types.h | 6 +++---
>  arch/x86/mm/pat/set_memory.c         | 2 +-
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
> index 394757ee030a..b8b79d618bbc 100644
> --- a/arch/x86/include/asm/pgtable_types.h
> +++ b/arch/x86/include/asm/pgtable_types.h
> @@ -194,10 +194,10 @@ enum page_cache_mode {
>  #define _KERNPG_TABLE		 (__PP|__RW|   0|___A|   0|___D|   0|   0| _ENC)
>  #define _PAGE_TABLE_NOENC	 (__PP|__RW|_USR|___A|   0|___D|   0|   0)
>  #define _PAGE_TABLE		 (__PP|__RW|_USR|___A|   0|___D|   0|   0| _ENC)
> -#define __PAGE_KERNEL_RO	 (__PP|   0|   0|___A|__NX|___D|   0|___G)
> -#define __PAGE_KERNEL_ROX	 (__PP|   0|   0|___A|   0|___D|   0|___G)
> +#define __PAGE_KERNEL_RO	 (__PP|   0|   0|___A|__NX|   0|   0|___G)
> +#define __PAGE_KERNEL_ROX	 (__PP|   0|   0|___A|   0|   0|   0|___G)
>  #define __PAGE_KERNEL_NOCACHE	 (__PP|__RW|   0|___A|__NX|___D|   0|___G| __NC)
> -#define __PAGE_KERNEL_VVAR	 (__PP|   0|_USR|___A|__NX|___D|   0|___G)
> +#define __PAGE_KERNEL_VVAR	 (__PP|   0|_USR|___A|__NX|   0|   0|___G)
>  #define __PAGE_KERNEL_LARGE	 (__PP|__RW|   0|___A|__NX|___D|_PSE|___G)
>  #define __PAGE_KERNEL_LARGE_EXEC (__PP|__RW|   0|___A|   0|___D|_PSE|___G)
>  #define __PAGE_KERNEL_WP	 (__PP|__RW|   0|___A|__NX|___D|   0|___G| __WP)
> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> index 16f878c26667..6bebb95a6988 100644
> --- a/arch/x86/mm/pat/set_memory.c
> +++ b/arch/x86/mm/pat/set_memory.c
> @@ -1932,7 +1932,7 @@ int set_memory_nx(unsigned long addr, int numpages)
>  
>  int set_memory_ro(unsigned long addr, int numpages)
>  {
> -	return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_RW), 0);
> +	return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_RW | _PAGE_DIRTY), 0);
>  }
>  
>  int set_memory_rw(unsigned long addr, int numpages)
> -- 
> 2.21.0
> 

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v19 08/25] x86/mm: Introduce _PAGE_COW
  2021-02-03 22:55 ` [PATCH v19 08/25] x86/mm: Introduce _PAGE_COW Yu-cheng Yu
@ 2021-02-04 20:19   ` Kees Cook
  2021-02-04 20:27     ` Dave Hansen
  2021-02-05 18:58     ` Yu, Yu-cheng
  0 siblings, 2 replies; 67+ messages in thread
From: Kees Cook @ 2021-02-04 20:19 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu

On Wed, Feb 03, 2021 at 02:55:30PM -0800, Yu-cheng Yu wrote:
> There is essentially no room left in the x86 hardware PTEs on some OSes
> (not Linux).  That left the hardware architects looking for a way to
> represent a new memory type (shadow stack) within the existing bits.
> They chose to repurpose a lightly-used state: Write=0, Dirty=1.
> 
> The reason it's lightly used is that Dirty=1 is normally set by hardware
> and cannot normally be set by hardware on a Write=0 PTE.  Software must
> normally be involved to create one of these PTEs, so software can simply
> opt to not create them.
> 
> In places where Linux normally creates Write=0, Dirty=1, it can use the
> software-defined _PAGE_COW in place of the hardware _PAGE_DIRTY.  In other
> words, whenever Linux needs to create Write=0, Dirty=1, it instead creates
> Write=0, Cow=1, except for shadow stack, which is Write=0, Dirty=1.  This
> clearly separates shadow stack from other data, and results in the
> following:
> 
> (a) A modified, copy-on-write (COW) page: (Write=0, Cow=1)
> (b) A R/O page that has been COW'ed: (Write=0, Cow=1)
>     The user page is in a R/O VMA, and get_user_pages() needs a writable
>     copy.  The page fault handler creates a copy of the page and sets
>     the new copy's PTE as Write=0 and Cow=1.
> (c) A shadow stack PTE: (Write=0, Dirty=1)
> (d) A shared shadow stack PTE: (Write=0, Cow=1)
>     When a shadow stack page is being shared among processes (this happens
>     at fork()), its PTE is made Dirty=0, so the next shadow stack access
>     causes a fault, and the page is duplicated and Dirty=1 is set again.
>     This is the COW equivalent for shadow stack pages, even though it's
>     copy-on-access rather than copy-on-write.
> (e) A page where the processor observed a Write=1 PTE, started a write, set
>     Dirty=1, but then observed a Write=0 PTE.  That's possible today, but
>     will not happen on processors that support shadow stack.

What happens for "e" with/without CET? It sounds like direct writes to
such pages will be (correctly) rejected by the MMU?

> 
> Define _PAGE_COW and update pte_*() helpers and apply the same changes to
> pmd and pud.
> 
> After this, there are six free bits left in the 64-bit PTE, and no more
> free bits in the 32-bit PTE (except for PAE) and Shadow Stack is not
> implemented for the 32-bit kernel.

Are there selftests to validate this change?

I think it might be useful to more clearly describe what is considered
"dirty" and "writeable" in comments above the pte_helpers.

-Kees

> 
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> ---
>  arch/x86/include/asm/pgtable.h       | 125 ++++++++++++++++++++++++---
>  arch/x86/include/asm/pgtable_types.h |  42 ++++++++-
>  2 files changed, 154 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> index a02c67291cfc..4b0ec61510dc 100644
> --- a/arch/x86/include/asm/pgtable.h
> +++ b/arch/x86/include/asm/pgtable.h
> @@ -121,9 +121,9 @@ extern pmdval_t early_pmd_flags;
>   * The following only work if pte_present() is true.
>   * Undefined behaviour if not..
>   */
> -static inline int pte_dirty(pte_t pte)
> +static inline bool pte_dirty(pte_t pte)
>  {
> -	return pte_flags(pte) & _PAGE_DIRTY;
> +	return pte_flags(pte) & _PAGE_DIRTY_BITS;
>  }
>  
>  
> @@ -160,9 +160,9 @@ static inline int pte_young(pte_t pte)
>  	return pte_flags(pte) & _PAGE_ACCESSED;
>  }
>  
> -static inline int pmd_dirty(pmd_t pmd)
> +static inline bool pmd_dirty(pmd_t pmd)
>  {
> -	return pmd_flags(pmd) & _PAGE_DIRTY;
> +	return pmd_flags(pmd) & _PAGE_DIRTY_BITS;
>  }
>  
>  static inline int pmd_young(pmd_t pmd)
> @@ -170,9 +170,9 @@ static inline int pmd_young(pmd_t pmd)
>  	return pmd_flags(pmd) & _PAGE_ACCESSED;
>  }
>  
> -static inline int pud_dirty(pud_t pud)
> +static inline bool pud_dirty(pud_t pud)
>  {
> -	return pud_flags(pud) & _PAGE_DIRTY;
> +	return pud_flags(pud) & _PAGE_DIRTY_BITS;
>  }
>  
>  static inline int pud_young(pud_t pud)
> @@ -182,7 +182,14 @@ static inline int pud_young(pud_t pud)
>  
>  static inline int pte_write(pte_t pte)
>  {
> -	return pte_flags(pte) & _PAGE_RW;
> +	/*
> +	 * If _PAGE_DIRTY is set, the PTE must either have _PAGE_RW or be
> +	 * a shadow stack PTE, which is logically writable.
> +	 */
> +	if (cpu_feature_enabled(X86_FEATURE_SHSTK))
> +		return pte_flags(pte) & (_PAGE_RW | _PAGE_DIRTY);
> +	else
> +		return pte_flags(pte) & _PAGE_RW;
>  }
>  
>  static inline int pte_huge(pte_t pte)
> @@ -333,7 +340,7 @@ static inline pte_t pte_clear_uffd_wp(pte_t pte)
>  
>  static inline pte_t pte_mkclean(pte_t pte)
>  {
> -	return pte_clear_flags(pte, _PAGE_DIRTY);
> +	return pte_clear_flags(pte, _PAGE_DIRTY_BITS);
>  }
>  
>  static inline pte_t pte_mkold(pte_t pte)
> @@ -343,6 +350,18 @@ static inline pte_t pte_mkold(pte_t pte)
>  
>  static inline pte_t pte_wrprotect(pte_t pte)
>  {
> +	/*
> +	 * Blindly clearing _PAGE_RW might accidentally create
> +	 * a shadow stack PTE (RW=0, Dirty=1).  Move the hardware
> +	 * dirty value to the software bit.
> +	 */
> +	if (cpu_feature_enabled(X86_FEATURE_SHSTK)) {
> +		if (pte_flags(pte) & _PAGE_DIRTY) {
> +			pte = pte_clear_flags(pte, _PAGE_DIRTY);
> +			pte = pte_set_flags(pte, _PAGE_COW);
> +		}
> +	}
> +
>  	return pte_clear_flags(pte, _PAGE_RW);
>  }
>  
> @@ -353,6 +372,18 @@ static inline pte_t pte_mkexec(pte_t pte)
>  
>  static inline pte_t pte_mkdirty(pte_t pte)
>  {
> +	pteval_t dirty = _PAGE_DIRTY;
> +
> +	/* Avoid creating (HW)Dirty=1, Write=0 PTEs */
> +	if (cpu_feature_enabled(X86_FEATURE_SHSTK) && !pte_write(pte))
> +		dirty = _PAGE_COW;
> +
> +	return pte_set_flags(pte, dirty | _PAGE_SOFT_DIRTY);
> +}
> +
> +static inline pte_t pte_mkwrite_shstk(pte_t pte)
> +{
> +	pte = pte_clear_flags(pte, _PAGE_COW);
>  	return pte_set_flags(pte, _PAGE_DIRTY | _PAGE_SOFT_DIRTY);
>  }
>  
> @@ -363,6 +394,13 @@ static inline pte_t pte_mkyoung(pte_t pte)
>  
>  static inline pte_t pte_mkwrite(pte_t pte)
>  {
> +	if (cpu_feature_enabled(X86_FEATURE_SHSTK)) {
> +		if (pte_flags(pte) & _PAGE_COW) {
> +			pte = pte_clear_flags(pte, _PAGE_COW);
> +			pte = pte_set_flags(pte, _PAGE_DIRTY);
> +		}
> +	}
> +
>  	return pte_set_flags(pte, _PAGE_RW);
>  }
>  
> @@ -434,16 +472,40 @@ static inline pmd_t pmd_mkold(pmd_t pmd)
>  
>  static inline pmd_t pmd_mkclean(pmd_t pmd)
>  {
> -	return pmd_clear_flags(pmd, _PAGE_DIRTY);
> +	return pmd_clear_flags(pmd, _PAGE_DIRTY_BITS);
>  }
>  
>  static inline pmd_t pmd_wrprotect(pmd_t pmd)
>  {
> +	/*
> +	 * Blindly clearing _PAGE_RW might accidentally create
> +	 * a shadow stack PMD (RW=0, Dirty=1).  Move the hardware
> +	 * dirty value to the software bit.
> +	 */
> +	if (cpu_feature_enabled(X86_FEATURE_SHSTK)) {
> +		if (pmd_flags(pmd) & _PAGE_DIRTY) {
> +			pmd = pmd_clear_flags(pmd, _PAGE_DIRTY);
> +			pmd = pmd_set_flags(pmd, _PAGE_COW);
> +		}
> +	}
> +
>  	return pmd_clear_flags(pmd, _PAGE_RW);
>  }
>  
>  static inline pmd_t pmd_mkdirty(pmd_t pmd)
>  {
> +	pmdval_t dirty = _PAGE_DIRTY;
> +
> +	/* Avoid creating (HW)Dirty=1, Write=0 PMDs */
> +	if (cpu_feature_enabled(X86_FEATURE_SHSTK) && !(pmd_flags(pmd) & _PAGE_RW))
> +		dirty = _PAGE_COW;
> +
> +	return pmd_set_flags(pmd, dirty | _PAGE_SOFT_DIRTY);
> +}
> +
> +static inline pmd_t pmd_mkwrite_shstk(pmd_t pmd)
> +{
> +	pmd = pmd_clear_flags(pmd, _PAGE_COW);
>  	return pmd_set_flags(pmd, _PAGE_DIRTY | _PAGE_SOFT_DIRTY);
>  }
>  
> @@ -464,6 +526,13 @@ static inline pmd_t pmd_mkyoung(pmd_t pmd)
>  
>  static inline pmd_t pmd_mkwrite(pmd_t pmd)
>  {
> +	if (cpu_feature_enabled(X86_FEATURE_SHSTK)) {
> +		if (pmd_flags(pmd) & _PAGE_COW) {
> +			pmd = pmd_clear_flags(pmd, _PAGE_COW);
> +			pmd = pmd_set_flags(pmd, _PAGE_DIRTY);
> +		}
> +	}
> +
>  	return pmd_set_flags(pmd, _PAGE_RW);
>  }
>  
> @@ -488,17 +557,35 @@ static inline pud_t pud_mkold(pud_t pud)
>  
>  static inline pud_t pud_mkclean(pud_t pud)
>  {
> -	return pud_clear_flags(pud, _PAGE_DIRTY);
> +	return pud_clear_flags(pud, _PAGE_DIRTY_BITS);
>  }
>  
>  static inline pud_t pud_wrprotect(pud_t pud)
>  {
> +	/*
> +	 * Blindly clearing _PAGE_RW might accidentally create
> +	 * a shadow stack PUD (RW=0, Dirty=1).  Move the hardware
> +	 * dirty value to the software bit.
> +	 */
> +	if (cpu_feature_enabled(X86_FEATURE_SHSTK)) {
> +		if (pud_flags(pud) & _PAGE_DIRTY) {
> +			pud = pud_clear_flags(pud, _PAGE_DIRTY);
> +			pud = pud_set_flags(pud, _PAGE_COW);
> +		}
> +	}
> +
>  	return pud_clear_flags(pud, _PAGE_RW);
>  }
>  
>  static inline pud_t pud_mkdirty(pud_t pud)
>  {
> -	return pud_set_flags(pud, _PAGE_DIRTY | _PAGE_SOFT_DIRTY);
> +	pudval_t dirty = _PAGE_DIRTY;
> +
> +	/* Avoid creating (HW)Dirty=1, Write=0 PUDs */
> +	if (cpu_feature_enabled(X86_FEATURE_SHSTK) && !(pud_flags(pud) & _PAGE_RW))
> +		dirty = _PAGE_COW;
> +
> +	return pud_set_flags(pud, dirty | _PAGE_SOFT_DIRTY);
>  }
>  
>  static inline pud_t pud_mkdevmap(pud_t pud)
> @@ -518,6 +605,13 @@ static inline pud_t pud_mkyoung(pud_t pud)
>  
>  static inline pud_t pud_mkwrite(pud_t pud)
>  {
> +	if (cpu_feature_enabled(X86_FEATURE_SHSTK)) {
> +		if (pud_flags(pud) & _PAGE_COW) {
> +			pud = pud_clear_flags(pud, _PAGE_COW);
> +			pud = pud_set_flags(pud, _PAGE_DIRTY);
> +		}
> +	}
> +
>  	return pud_set_flags(pud, _PAGE_RW);
>  }
>  
> @@ -1131,7 +1225,14 @@ extern int pmdp_clear_flush_young(struct vm_area_struct *vma,
>  #define pmd_write pmd_write
>  static inline int pmd_write(pmd_t pmd)
>  {
> -	return pmd_flags(pmd) & _PAGE_RW;
> +	/*
> +	 * If _PAGE_DIRTY is set, then the PMD must either have _PAGE_RW or
> +	 * be a shadow stack PMD, which is logically writable.
> +	 */
> +	if (cpu_feature_enabled(X86_FEATURE_SHSTK))
> +		return pmd_flags(pmd) & (_PAGE_RW | _PAGE_DIRTY);
> +	else
> +		return pmd_flags(pmd) & _PAGE_RW;
>  }
>  
>  #define __HAVE_ARCH_PMDP_HUGE_GET_AND_CLEAR
> diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
> index b8b79d618bbc..437d7ff0ae80 100644
> --- a/arch/x86/include/asm/pgtable_types.h
> +++ b/arch/x86/include/asm/pgtable_types.h
> @@ -23,7 +23,8 @@
>  #define _PAGE_BIT_SOFTW2	10	/* " */
>  #define _PAGE_BIT_SOFTW3	11	/* " */
>  #define _PAGE_BIT_PAT_LARGE	12	/* On 2MB or 1GB pages */
> -#define _PAGE_BIT_SOFTW4	58	/* available for programmer */
> +#define _PAGE_BIT_SOFTW4	57	/* available for programmer */
> +#define _PAGE_BIT_SOFTW5	58	/* available for programmer */
>  #define _PAGE_BIT_PKEY_BIT0	59	/* Protection Keys, bit 1/4 */
>  #define _PAGE_BIT_PKEY_BIT1	60	/* Protection Keys, bit 2/4 */
>  #define _PAGE_BIT_PKEY_BIT2	61	/* Protection Keys, bit 3/4 */
> @@ -36,6 +37,15 @@
>  #define _PAGE_BIT_SOFT_DIRTY	_PAGE_BIT_SOFTW3 /* software dirty tracking */
>  #define _PAGE_BIT_DEVMAP	_PAGE_BIT_SOFTW4
>  
> +/*
> + * Indicates a copy-on-write page.
> + */
> +#ifdef CONFIG_X86_CET
> +#define _PAGE_BIT_COW		_PAGE_BIT_SOFTW5 /* copy-on-write */
> +#else
> +#define _PAGE_BIT_COW		0
> +#endif
> +
>  /* If _PAGE_BIT_PRESENT is clear, we use these: */
>  /* - if the user mapped it with PROT_NONE; pte_present gives true */
>  #define _PAGE_BIT_PROTNONE	_PAGE_BIT_GLOBAL
> @@ -117,6 +127,36 @@
>  #define _PAGE_DEVMAP	(_AT(pteval_t, 0))
>  #endif
>  
> +/*
> + * The hardware requires shadow stack to be read-only and Dirty.
> + * _PAGE_COW is a software-only bit used to separate copy-on-write PTEs
> + * from shadow stack PTEs:
> + * (a) A modified, copy-on-write (COW) page: (Write=0, Cow=1)
> + * (b) A R/O page that has been COW'ed: (Write=0, Cow=1)
> + *     The user page is in a R/O VMA, and get_user_pages() needs a
> + *     writable copy.  The page fault handler creates a copy of the page
> + *     and sets the new copy's PTE as Write=0, Cow=1.
> + * (c) A shadow stack PTE: (Write=0, Dirty=1)
> + * (d) A shared (copy-on-access) shadow stack PTE: (Write=0, Cow=1)
> + *     When a shadow stack page is being shared among processes (this
> + *     happens at fork()), its PTE is cleared of _PAGE_DIRTY, so the next
> + *     shadow stack access causes a fault, and the page is duplicated and
> + *     _PAGE_DIRTY is set again.  This is the COW equivalent for shadow
> + *     stack pages, even though it's copy-on-access rather than
> + *     copy-on-write.
> + * (e) A page where the processor observed a Write=1 PTE, started a write,
> + *     set Dirty=1, but then observed a Write=0 PTE (changed by another
> + *     thread).  That's possible today, but will not happen on processors
> + *     that support shadow stack.
> + */
> +#ifdef CONFIG_X86_CET
> +#define _PAGE_COW	(_AT(pteval_t, 1) << _PAGE_BIT_COW)
> +#else
> +#define _PAGE_COW	(_AT(pteval_t, 0))
> +#endif
> +
> +#define _PAGE_DIRTY_BITS (_PAGE_DIRTY | _PAGE_COW)
> +
>  #define _PAGE_PROTNONE	(_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE)
>  
>  /*
> -- 
> 2.21.0
> 

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v19 10/25] x86/mm: Update pte_modify for _PAGE_COW
  2021-02-03 22:55 ` [PATCH v19 10/25] x86/mm: Update pte_modify for _PAGE_COW Yu-cheng Yu
@ 2021-02-04 20:20   ` Kees Cook
  0 siblings, 0 replies; 67+ messages in thread
From: Kees Cook @ 2021-02-04 20:20 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu

On Wed, Feb 03, 2021 at 02:55:32PM -0800, Yu-cheng Yu wrote:
> The read-only and Dirty PTE has been used to indicate copy-on-write pages.
> However, newer x86 processors also regard a read-only and Dirty PTE as a
> shadow stack page.  In order to separate the two, the software-defined
> _PAGE_COW is created to replace _PAGE_DIRTY for the copy-on-write case, and
> pte_*() are updated.
> 
> Pte_modify() changes a PTE to 'newprot', but it doesn't use the pte_*().
> Introduce fixup_dirty_pte(), which sets a dirty PTE, based on _PAGE_RW,
> to either _PAGE_DIRTY or _PAGE_COW.
> 
> Apply the same changes to pmd_modify().
> 
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v19 11/25] x86/mm: Update ptep_set_wrprotect() and pmdp_set_wrprotect() for transition from _PAGE_DIRTY to _PAGE_COW
  2021-02-03 22:55 ` [PATCH v19 11/25] x86/mm: Update ptep_set_wrprotect() and pmdp_set_wrprotect() for transition from _PAGE_DIRTY to _PAGE_COW Yu-cheng Yu
@ 2021-02-04 20:21   ` Kees Cook
  0 siblings, 0 replies; 67+ messages in thread
From: Kees Cook @ 2021-02-04 20:21 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu

On Wed, Feb 03, 2021 at 02:55:33PM -0800, Yu-cheng Yu wrote:
> When Shadow Stack is introduced, [R/O + _PAGE_DIRTY] PTE is reserved for
> shadow stack.  Copy-on-write PTEs have [R/O + _PAGE_COW].
> 
> When a PTE goes from [R/W + _PAGE_DIRTY] to [R/O + _PAGE_COW], it could
> become a transient shadow stack PTE in two cases:
> 
> The first case is that some processors can start a write but end up seeing
> a read-only PTE by the time they get to the Dirty bit, creating a transient
> shadow stack PTE.  However, this will not occur on processors supporting
> Shadow Stack, and a TLB flush is not necessary.
> 
> The second case is that when _PAGE_DIRTY is replaced with _PAGE_COW non-
> atomically, a transient shadow stack PTE can be created as a result.
> Thus, prevent that with cmpxchg.
> 
> Dave Hansen, Jann Horn, Andy Lutomirski, and Peter Zijlstra provided many
> insights to the issue.  Jann Horn provided the cmpxchg solution.
> 
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v19 14/25] x86/mm: Update maybe_mkwrite() for shadow stack
  2021-02-03 22:55 ` [PATCH v19 14/25] x86/mm: Update maybe_mkwrite() for shadow stack Yu-cheng Yu
@ 2021-02-04 20:22   ` Kees Cook
  0 siblings, 0 replies; 67+ messages in thread
From: Kees Cook @ 2021-02-04 20:22 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu

On Wed, Feb 03, 2021 at 02:55:36PM -0800, Yu-cheng Yu wrote:
> When serving a page fault, maybe_mkwrite() makes a PTE writable if its vma
> has VM_WRITE.
> 
> A shadow stack vma has VM_SHSTK.  Its PTEs have _PAGE_DIRTY, but not
> _PAGE_WRITE.  In fork(), _PAGE_DIRTY is cleared to effect copy-on-write,
> and in page fault, _PAGE_DIRTY is restored and the shadow stack page is
> writable again.
> 
> Update maybe_mkwrite() by introducing arch_maybe_mkwrite(), which sets
> _PAGE_DIRTY for a shadow stack PTE.
> 
> Apply the same changes to maybe_pmd_mkwrite().
> 
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v19 15/25] mm: Fixup places that call pte_mkwrite() directly
  2021-02-03 22:55 ` [PATCH v19 15/25] mm: Fixup places that call pte_mkwrite() directly Yu-cheng Yu
@ 2021-02-04 20:22   ` Kees Cook
  0 siblings, 0 replies; 67+ messages in thread
From: Kees Cook @ 2021-02-04 20:22 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu

On Wed, Feb 03, 2021 at 02:55:37PM -0800, Yu-cheng Yu wrote:
> When serving a page fault, maybe_mkwrite() makes a PTE writable if it is in
> a writable vma.  A shadow stack vma is writable, but its PTEs need
> _PAGE_DIRTY to be set to become writable.  For this reason, maybe_mkwrite()
> has been updated.
> 
> There are a few places that call pte_mkwrite() directly, but effect the
> same result as from maybe_mkwrite().  These sites need to be updated for
> shadow stack as well.  Thus, change them to maybe_mkwrite():
> 
> - do_anonymous_page() and migrate_vma_insert_page() check VM_WRITE directly
>   and call pte_mkwrite(), which is the same as maybe_mkwrite().  Change
>   them to maybe_mkwrite().
> 
> - In do_numa_page(), if the numa entry 'was-writable', then pte_mkwrite()
>   is called directly.  Fix it by doing maybe_mkwrite().
> 
> - In change_pte_range(), pte_mkwrite() is called directly.  Replace it with
>   maybe_mkwrite().
> 
>   A shadow stack vma is writable but has different vma
> flags, and handled accordingly in maybe_mkwrite().
> 
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v19 16/25] mm: Add guard pages around a shadow stack.
  2021-02-03 22:55 ` [PATCH v19 16/25] mm: Add guard pages around a shadow stack Yu-cheng Yu
@ 2021-02-04 20:24   ` Kees Cook
  0 siblings, 0 replies; 67+ messages in thread
From: Kees Cook @ 2021-02-04 20:24 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu

On Wed, Feb 03, 2021 at 02:55:38PM -0800, Yu-cheng Yu wrote:
> INCSSP(Q/D) increments shadow stack pointer and 'pops and discards' the
> first and the last elements in the range, effectively touches those memory
> areas.
> 
> The maximum moving distance by INCSSPQ is 255 * 8 = 2040 bytes and
> 255 * 4 = 1020 bytes by INCSSPD.  Both ranges are far from PAGE_SIZE.
> Thus, putting a gap page on both ends of a shadow stack prevents INCSSP,
> CALL, and RET from going beyond.
> 
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>

Yay guard pages! :)

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v19 17/25] mm/mmap: Add shadow stack pages to memory accounting
  2021-02-03 22:55 ` [PATCH v19 17/25] mm/mmap: Add shadow stack pages to memory accounting Yu-cheng Yu
@ 2021-02-04 20:24   ` Kees Cook
  0 siblings, 0 replies; 67+ messages in thread
From: Kees Cook @ 2021-02-04 20:24 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu

On Wed, Feb 03, 2021 at 02:55:39PM -0800, Yu-cheng Yu wrote:
> Account shadow stack pages to stack memory.
> 
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v19 18/25] mm: Update can_follow_write_pte() for shadow stack
  2021-02-03 22:55 ` [PATCH v19 18/25] mm: Update can_follow_write_pte() for shadow stack Yu-cheng Yu
@ 2021-02-04 20:27   ` Kees Cook
  0 siblings, 0 replies; 67+ messages in thread
From: Kees Cook @ 2021-02-04 20:27 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu

On Wed, Feb 03, 2021 at 02:55:40PM -0800, Yu-cheng Yu wrote:
> Can_follow_write_pte() ensures a read-only page is COWed by checking the
> FOLL_COW flag, and uses pte_dirty() to validate the flag is still valid.
> 
> Like a writable data page, a shadow stack page is writable, and becomes
> read-only during copy-on-write, but it is always dirty.  Thus, in the
> can_follow_write_pte() check, it belongs to the writable page case and
> should be excluded from the read-only page pte_dirty() check.  Apply
> the same changes to can_follow_write_pmd().

Does this need the vma passed down? Should it just pass vm_flags? I
suppose it doesn't really matter, though.

Reviewed-by: Kees Cook <keescook@chromium.org>

-Kees

> 
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> ---
>  mm/gup.c         | 8 +++++---
>  mm/huge_memory.c | 8 +++++---
>  2 files changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index e4c224cd9661..66ab67626f57 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -357,10 +357,12 @@ static int follow_pfn_pte(struct vm_area_struct *vma, unsigned long address,
>   * FOLL_FORCE can write to even unwritable pte's, but only
>   * after we've gone through a COW cycle and they are dirty.
>   */
> -static inline bool can_follow_write_pte(pte_t pte, unsigned int flags)
> +static inline bool can_follow_write_pte(pte_t pte, unsigned int flags,
> +					struct vm_area_struct *vma)
>  {
>  	return pte_write(pte) ||
> -		((flags & FOLL_FORCE) && (flags & FOLL_COW) && pte_dirty(pte));
> +		((flags & FOLL_FORCE) && (flags & FOLL_COW) && pte_dirty(pte) &&
> +				  !arch_shadow_stack_mapping(vma->vm_flags));
>  }
>  
>  static struct page *follow_page_pte(struct vm_area_struct *vma,
> @@ -403,7 +405,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
>  	}
>  	if ((flags & FOLL_NUMA) && pte_protnone(pte))
>  		goto no_page;
> -	if ((flags & FOLL_WRITE) && !can_follow_write_pte(pte, flags)) {
> +	if ((flags & FOLL_WRITE) && !can_follow_write_pte(pte, flags, vma)) {
>  		pte_unmap_unlock(ptep, ptl);
>  		return NULL;
>  	}
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index bfec65c9308b..eb64e2b56bc9 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1337,10 +1337,12 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd)
>   * FOLL_FORCE can write to even unwritable pmd's, but only
>   * after we've gone through a COW cycle and they are dirty.
>   */
> -static inline bool can_follow_write_pmd(pmd_t pmd, unsigned int flags)
> +static inline bool can_follow_write_pmd(pmd_t pmd, unsigned int flags,
> +					struct vm_area_struct *vma)
>  {
>  	return pmd_write(pmd) ||
> -	       ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pmd_dirty(pmd));
> +	       ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pmd_dirty(pmd) &&
> +				  !arch_shadow_stack_mapping(vma->vm_flags));
>  }
>  
>  struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
> @@ -1353,7 +1355,7 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
>  
>  	assert_spin_locked(pmd_lockptr(mm, pmd));
>  
> -	if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, flags))
> +	if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, flags, vma))
>  		goto out;
>  
>  	/* Avoid dumping huge zero page */
> -- 
> 2.21.0
> 
> 

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v19 08/25] x86/mm: Introduce _PAGE_COW
  2021-02-04 20:19   ` Kees Cook
@ 2021-02-04 20:27     ` Dave Hansen
  2021-02-05 18:41       ` Yu, Yu-cheng
  2021-02-05 18:58     ` Yu, Yu-cheng
  1 sibling, 1 reply; 67+ messages in thread
From: Dave Hansen @ 2021-02-04 20:27 UTC (permalink / raw)
  To: Kees Cook, Yu-cheng Yu
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu

On 2/4/21 12:19 PM, Kees Cook wrote:
>> (e) A page where the processor observed a Write=1 PTE, started a write, set
>>     Dirty=1, but then observed a Write=0 PTE.  That's possible today, but
>>     will not happen on processors that support shadow stack.
> What happens for "e" with/without CET? It sounds like direct writes to
> such pages will be (correctly) rejected by the MMU?

A page fault would be generated regardless of CET support.

If CET were not around, the fault would be reported as a present, write
fault.

If this happened and CET were around (which shouldn't happen in
practice, it means we have a hardware issue) a page fault exception is
generated.  Yu-cheng, I'm not sure there's enough debugging around to
tell us if this happens.  Would we even notice?

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v19 19/25] mm: Re-introduce vm_flags to do_mmap()
  2021-02-03 22:55 ` [PATCH v19 19/25] mm: Re-introduce vm_flags to do_mmap() Yu-cheng Yu
@ 2021-02-04 20:28   ` Kees Cook
  0 siblings, 0 replies; 67+ messages in thread
From: Kees Cook @ 2021-02-04 20:28 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu, Peter Collingbourne, Andrew Morton

On Wed, Feb 03, 2021 at 02:55:41PM -0800, Yu-cheng Yu wrote:
> There was no more caller passing vm_flags to do_mmap(), and vm_flags was
> removed from the function's input by:
> 
>     commit 45e55300f114 ("mm: remove unnecessary wrapper function do_mmap_pgoff()").
> 
> There is a new user now.  Shadow stack allocation passes VM_SHSTK to
> do_mmap().  Re-introduce vm_flags to do_mmap(), but without the old wrapper
> do_mmap_pgoff().  Instead, make all callers of the wrapper pass a zero
> vm_flags to do_mmap().
> 
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v19 20/25] x86/cet/shstk: User-mode shadow stack support
  2021-02-03 22:55 ` [PATCH v19 20/25] x86/cet/shstk: User-mode shadow stack support Yu-cheng Yu
@ 2021-02-04 20:29   ` Kees Cook
  0 siblings, 0 replies; 67+ messages in thread
From: Kees Cook @ 2021-02-04 20:29 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu

On Wed, Feb 03, 2021 at 02:55:42PM -0800, Yu-cheng Yu wrote:
> Introduce basic shadow stack enabling/disabling/allocation routines.
> A task's shadow stack is allocated from memory with VM_SHSTK flag and has
> a fixed size of min(RLIMIT_STACK, 4GB).
> 
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v19 22/25] ELF: Introduce arch_setup_elf_property()
  2021-02-03 22:55 ` [PATCH v19 22/25] ELF: Introduce arch_setup_elf_property() Yu-cheng Yu
@ 2021-02-04 20:33   ` Kees Cook
  0 siblings, 0 replies; 67+ messages in thread
From: Kees Cook @ 2021-02-04 20:33 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu, Mark Brown, Catalin Marinas

On Wed, Feb 03, 2021 at 02:55:44PM -0800, Yu-cheng Yu wrote:
> An ELF file's .note.gnu.property indicates arch features supported by the
> file.  These features are extracted by arch_parse_elf_property() and stored
> in 'arch_elf_state'.
> 
> Introduce x86 feature definitions and arch_setup_elf_property(), which
> enables such features.  The first use-case of this function is Shadow
> Stack.
> 
> ARM64 is the other arch that has ARCH_USE_GNU_PROPERTY and arch_parse_elf_
> property().  Add arch_setup_elf_property() for it.
> 
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v19 24/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
  2021-02-03 22:55 ` [PATCH v19 24/25] x86/cet/shstk: Add arch_prctl functions for " Yu-cheng Yu
@ 2021-02-04 20:35   ` Kees Cook
  2021-02-04 23:41     ` Yu, Yu-cheng
  0 siblings, 1 reply; 67+ messages in thread
From: Kees Cook @ 2021-02-04 20:35 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu

On Wed, Feb 03, 2021 at 02:55:46PM -0800, Yu-cheng Yu wrote:
> arch_prctl(ARCH_X86_CET_STATUS, u64 *args)
>     Get CET feature status.
> 
>     The parameter 'args' is a pointer to a user buffer.  The kernel returns
>     the following information:
> 
>     *args = shadow stack/IBT status
>     *(args + 1) = shadow stack base address
>     *(args + 2) = shadow stack size

What happens if this needs to grow in the future? Should the first u64
contain the array size?

Otherwise, looks sensible.

-Kees

> 
>     32-bit binaries use the same interface, but only lower 32-bits of each
>     item.
> 
> arch_prctl(ARCH_X86_CET_DISABLE, unsigned int features)
>     Disable CET features specified in 'features'.  Return -EPERM if CET is
>     locked.
> 
> arch_prctl(ARCH_X86_CET_LOCK)
>     Lock in CET features.
> 
> Also change do_arch_prctl_common()'s parameter 'cpuid_enabled' to
> 'arg2', as it is now also passed to prctl_cet().
> 
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> ---
>  arch/x86/include/asm/cet.h        |  3 ++
>  arch/x86/include/uapi/asm/prctl.h |  4 +++
>  arch/x86/kernel/Makefile          |  2 +-
>  arch/x86/kernel/cet_prctl.c       | 60 +++++++++++++++++++++++++++++++
>  arch/x86/kernel/process.c         |  6 ++--
>  5 files changed, 71 insertions(+), 4 deletions(-)
>  create mode 100644 arch/x86/kernel/cet_prctl.c
> 
> diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h
> index ec4b5e62d0ce..16870e5bc8eb 100644
> --- a/arch/x86/include/asm/cet.h
> +++ b/arch/x86/include/asm/cet.h
> @@ -14,9 +14,11 @@ struct sc_ext;
>  struct cet_status {
>  	unsigned long	shstk_base;
>  	unsigned long	shstk_size;
> +	unsigned int	locked:1;
>  };
>  
>  #ifdef CONFIG_X86_CET
> +int prctl_cet(int option, u64 arg2);
>  int cet_setup_shstk(void);
>  int cet_setup_thread_shstk(struct task_struct *p, unsigned long clone_flags);
>  void cet_disable_shstk(void);
> @@ -25,6 +27,7 @@ int cet_verify_rstor_token(bool ia32, unsigned long ssp, unsigned long *new_ssp)
>  void cet_restore_signal(struct sc_ext *sc);
>  int cet_setup_signal(bool ia32, unsigned long rstor, struct sc_ext *sc);
>  #else
> +static inline int prctl_cet(int option, u64 arg2) { return -EINVAL; }
>  static inline int cet_setup_thread_shstk(struct task_struct *p,
>  					 unsigned long clone_flags) { return 0; }
>  static inline void cet_disable_shstk(void) {}
> diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h
> index 5a6aac9fa41f..9245bf629120 100644
> --- a/arch/x86/include/uapi/asm/prctl.h
> +++ b/arch/x86/include/uapi/asm/prctl.h
> @@ -14,4 +14,8 @@
>  #define ARCH_MAP_VDSO_32	0x2002
>  #define ARCH_MAP_VDSO_64	0x2003
>  
> +#define ARCH_X86_CET_STATUS		0x3001
> +#define ARCH_X86_CET_DISABLE		0x3002
> +#define ARCH_X86_CET_LOCK		0x3003
> +
>  #endif /* _ASM_X86_PRCTL_H */
> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> index 4a9a7e7d00dc..2f60a28769f9 100644
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -151,7 +151,7 @@ obj-$(CONFIG_UNWINDER_FRAME_POINTER)	+= unwind_frame.o
>  obj-$(CONFIG_UNWINDER_GUESS)		+= unwind_guess.o
>  
>  obj-$(CONFIG_AMD_MEM_ENCRYPT)		+= sev-es.o
> -obj-$(CONFIG_X86_CET)			+= cet.o
> +obj-$(CONFIG_X86_CET)			+= cet.o cet_prctl.o
>  
>  ###
>  # 64 bit specific files
> diff --git a/arch/x86/kernel/cet_prctl.c b/arch/x86/kernel/cet_prctl.c
> new file mode 100644
> index 000000000000..0030c63a08c0
> --- /dev/null
> +++ b/arch/x86/kernel/cet_prctl.c
> @@ -0,0 +1,60 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/errno.h>
> +#include <linux/uaccess.h>
> +#include <linux/prctl.h>
> +#include <linux/compat.h>
> +#include <linux/mman.h>
> +#include <linux/elfcore.h>
> +#include <linux/processor.h>
> +#include <asm/prctl.h>
> +#include <asm/cet.h>
> +
> +/* See Documentation/x86/intel_cet.rst. */
> +
> +static int cet_copy_status_to_user(struct cet_status *cet, u64 __user *ubuf)
> +{
> +	u64 buf[3] = {};
> +
> +	if (cet->shstk_size) {
> +		buf[0] |= GNU_PROPERTY_X86_FEATURE_1_SHSTK;
> +		buf[1] = cet->shstk_base;
> +		buf[2] = cet->shstk_size;
> +	}
> +
> +	return copy_to_user(ubuf, buf, sizeof(buf));
> +}
> +
> +int prctl_cet(int option, u64 arg2)
> +{
> +	struct cet_status *cet;
> +
> +	if (!cpu_feature_enabled(X86_FEATURE_CET))
> +		return -ENOTSUPP;
> +
> +	cet = &current->thread.cet;
> +
> +	if (option == ARCH_X86_CET_STATUS)
> +		return cet_copy_status_to_user(cet, (u64 __user *)arg2);
> +
> +	switch (option) {
> +	case ARCH_X86_CET_DISABLE:
> +		if (cet->locked)
> +			return -EPERM;
> +
> +		if (arg2 & ~GNU_PROPERTY_X86_FEATURE_1_VALID)
> +			return -EINVAL;
> +		if (arg2 & GNU_PROPERTY_X86_FEATURE_1_SHSTK)
> +			cet_disable_shstk();
> +		return 0;
> +
> +	case ARCH_X86_CET_LOCK:
> +		if (arg2)
> +			return -EINVAL;
> +		cet->locked = 1;
> +		return 0;
> +
> +	default:
> +		return -ENOSYS;
> +	}
> +}
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 3af6b36e1a5c..9e11e5f589f3 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -979,14 +979,14 @@ unsigned long get_wchan(struct task_struct *p)
>  }
>  
>  long do_arch_prctl_common(struct task_struct *task, int option,
> -			  unsigned long cpuid_enabled)
> +			  unsigned long arg2)
>  {
>  	switch (option) {
>  	case ARCH_GET_CPUID:
>  		return get_cpuid_mode();
>  	case ARCH_SET_CPUID:
> -		return set_cpuid_mode(task, cpuid_enabled);
> +		return set_cpuid_mode(task, arg2);
>  	}
>  
> -	return -EINVAL;
> +	return prctl_cet(option, arg2);
>  }
> -- 
> 2.21.0
> 
> 

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v19 12/25] mm: Introduce VM_SHSTK for shadow stack memory
  2021-02-03 22:55 ` [PATCH v19 12/25] mm: Introduce VM_SHSTK for shadow stack memory Yu-cheng Yu
@ 2021-02-04 20:46   ` Cyrill Gorcunov
  2021-02-04 21:48     ` Yu, Yu-cheng
  0 siblings, 1 reply; 67+ messages in thread
From: Cyrill Gorcunov @ 2021-02-04 20:46 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Dave Hansen,
	Eugene Syromiatnikov, Florian Weimer, H.J. Lu, Jann Horn,
	Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu

On Wed, Feb 03, 2021 at 02:55:34PM -0800, Yu-cheng Yu wrote:
>  
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 602e3a52884d..59623dcd92bb 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -661,6 +661,9 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
>  		[ilog2(VM_PKEY_BIT4)]	= "",
>  #endif
>  #endif /* CONFIG_ARCH_HAS_PKEYS */
> +#ifdef CONFIG_X86_CET
> +		[ilog2(VM_SHSTK)]	= "ss",
> +#endif
>  	};

IIRC we've these abbreviations explained in documentaion
(proc.rst file). Could you please update it once time
permit? I think it can be done on top of the series.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v19 12/25] mm: Introduce VM_SHSTK for shadow stack memory
  2021-02-04 20:46   ` Cyrill Gorcunov
@ 2021-02-04 21:48     ` Yu, Yu-cheng
  0 siblings, 0 replies; 67+ messages in thread
From: Yu, Yu-cheng @ 2021-02-04 21:48 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Dave Hansen,
	Eugene Syromiatnikov, Florian Weimer, H.J. Lu, Jann Horn,
	Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu

On 2/4/2021 12:46 PM, Cyrill Gorcunov wrote:
> On Wed, Feb 03, 2021 at 02:55:34PM -0800, Yu-cheng Yu wrote:
>>   
>> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
>> index 602e3a52884d..59623dcd92bb 100644
>> --- a/fs/proc/task_mmu.c
>> +++ b/fs/proc/task_mmu.c
>> @@ -661,6 +661,9 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
>>   		[ilog2(VM_PKEY_BIT4)]	= "",
>>   #endif
>>   #endif /* CONFIG_ARCH_HAS_PKEYS */
>> +#ifdef CONFIG_X86_CET
>> +		[ilog2(VM_SHSTK)]	= "ss",
>> +#endif
>>   	};
> 
> IIRC we've these abbreviations explained in documentaion
> (proc.rst file). Could you please update it once time
> permit? I think it can be done on top of the series.
> 

I will add that.  Thanks!

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v19 24/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
  2021-02-04 20:35   ` Kees Cook
@ 2021-02-04 23:41     ` Yu, Yu-cheng
  2021-02-05 18:26       ` Kees Cook
  0 siblings, 1 reply; 67+ messages in thread
From: Yu, Yu-cheng @ 2021-02-04 23:41 UTC (permalink / raw)
  To: Kees Cook
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu

On 2/4/2021 12:35 PM, Kees Cook wrote:
> On Wed, Feb 03, 2021 at 02:55:46PM -0800, Yu-cheng Yu wrote:
>> arch_prctl(ARCH_X86_CET_STATUS, u64 *args)
>>      Get CET feature status.
>>
>>      The parameter 'args' is a pointer to a user buffer.  The kernel returns
>>      the following information:
>>
>>      *args = shadow stack/IBT status
>>      *(args + 1) = shadow stack base address
>>      *(args + 2) = shadow stack size
> 
> What happens if this needs to grow in the future? Should the first u64
> contain the array size?
> 
> Otherwise, looks sensible.
> 
> -Kees
> 

The first item is a bitmap, and there are two possible bits.  Should 
there be a need, we can then do things about it.  My thought at the 
moment is, we may not meet the situation.  Can we keep this for now?

--
Yu-cheng

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v19 02/25] x86/cet/shstk: Add Kconfig option for user-mode control-flow protection
  2021-02-04 19:56   ` Kees Cook
@ 2021-02-05  0:05     ` Yu, Yu-cheng
  0 siblings, 0 replies; 67+ messages in thread
From: Yu, Yu-cheng @ 2021-02-05  0:05 UTC (permalink / raw)
  To: Kees Cook
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu

On 2/4/2021 11:56 AM, Kees Cook wrote:
> On Wed, Feb 03, 2021 at 02:55:24PM -0800, Yu-cheng Yu wrote:
>> Shadow Stack provides protection against function return address
>> corruption.  It is active when the processor supports it, the kernel has
>> CONFIG_X86_CET enabled, and the application is built for the feature.
>> This is only implemented for the 64-bit kernel.  When it is enabled, legacy
>> non-Shadow Stack applications continue to work, but without protection.
>>
>> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
>> ---
>>   arch/x86/Kconfig           | 22 ++++++++++++++++++++++
>>   arch/x86/Kconfig.assembler |  5 +++++
>>   2 files changed, 27 insertions(+)
>>
>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> index 21f851179ff0..074b3c0e6bf6 100644
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -1951,6 +1951,28 @@ config X86_SGX
>>   
>>   	  If unsure, say N.
>>   
>> +config ARCH_HAS_SHADOW_STACK
>> +	def_bool n
>> +
>> +config X86_CET
>> +	prompt "Intel Control-flow protection for user-mode"
>> +	def_bool n
>> +	depends on X86_64
>> +	depends on AS_WRUSS
>> +	select ARCH_USES_HIGH_VMA_FLAGS
>> +	select ARCH_HAS_SHADOW_STACK
> 
> This seems backwards to me? Shouldn't 'config X86_64' do the 'select
> ARCH_HAS_SHADOW_STACK' and 'config X86_CET' do a 'depends on
> ARCH_HAS_SHADOW_STACK' instead?

I will change it.  Thanks!

--
Yu-cheng

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v19 06/25] x86/cet: Add control-protection fault handler
  2021-02-04 20:09   ` Kees Cook
@ 2021-02-05  0:10     ` Yu, Yu-cheng
  0 siblings, 0 replies; 67+ messages in thread
From: Yu, Yu-cheng @ 2021-02-05  0:10 UTC (permalink / raw)
  To: Kees Cook
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu, Michael Kerrisk

On 2/4/2021 12:09 PM, Kees Cook wrote:
> On Wed, Feb 03, 2021 at 02:55:28PM -0800, Yu-cheng Yu wrote:

[...]

>> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
>> index 7f5aec758f0e..f5354c35df32 100644
>> --- a/arch/x86/kernel/traps.c
>> +++ b/arch/x86/kernel/traps.c
>> @@ -606,6 +606,66 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
>>   	cond_local_irq_disable(regs);
>>   }
>>   
>> +#ifdef CONFIG_X86_CET
>> +static const char * const control_protection_err[] = {
>> +	"unknown",
>> +	"near-ret",
>> +	"far-ret/iret",
>> +	"endbranch",
>> +	"rstorssp",
>> +	"setssbsy",
>> +};
>> +
>> +/*
>> + * When a control protection exception occurs, send a signal to the responsible
>> + * application.  Currently, control protection is only enabled for user mode.
>> + * This exception should not come from kernel mode.
>> + */
>> +DEFINE_IDTENTRY_ERRORCODE(exc_control_protection)
>> +{
>> +	static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
>> +				      DEFAULT_RATELIMIT_BURST);
>> +	struct task_struct *tsk;
>> +
>> +	if (!user_mode(regs)) {
>> +		pr_emerg("PANIC: unexpected kernel control protection fault\n");
>> +		die("kernel control protection fault", regs, error_code);
>> +		panic("Machine halted.");
>> +	}
>> +
>> +	cond_local_irq_enable(regs);
>> +
>> +	if (!boot_cpu_has(X86_FEATURE_CET))
>> +		WARN_ONCE(1, "Control protection fault with CET support disabled\n");
>> +
>> +	tsk = current;
>> +	tsk->thread.error_code = error_code;
>> +	tsk->thread.trap_nr = X86_TRAP_CP;
>> +
>> +	if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV) &&
>> +	    __ratelimit(&rs)) {
>> +		unsigned int max_err;
>> +		unsigned long ssp;
>> +
>> +		max_err = ARRAY_SIZE(control_protection_err) - 1;
>> +		if (error_code < 0 || error_code > max_err)
>> +			error_code = 0;
> 
> Do you want to mask the error_code here before printing its value?
> 
>> +
>> +		rdmsrl(MSR_IA32_PL3_SSP, ssp);
>> +		pr_emerg("%s[%d] control protection ip:%lx sp:%lx ssp:%lx error:%lx(%s)",
>> +			 tsk->comm, task_pid_nr(tsk),
>> +			 regs->ip, regs->sp, ssp, error_code,
>> +			 control_protection_err[error_code]);
> 
> Instead, you could clamp error_code to ARRAY_SIZE(control_protection_err),
> and add another "unknown" to the end of the strings:
> 
> 	control_protection_err[
> 		array_index_nospec(error_code,
> 				   ARRAY_SIZE(control_protection_err))]
> 
> Everything else looks good.
> 

I will update it.  Thanks!

[...]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v19 04/25] x86/cpufeatures: Introduce X86_FEATURE_CET and setup functions
  2021-02-03 22:55 ` [PATCH v19 04/25] x86/cpufeatures: Introduce X86_FEATURE_CET and setup functions Yu-cheng Yu
  2021-02-04 19:58   ` Kees Cook
@ 2021-02-05 13:43   ` Borislav Petkov
  2021-02-05 16:15     ` Yu, Yu-cheng
  1 sibling, 1 reply; 67+ messages in thread
From: Borislav Petkov @ 2021-02-05 13:43 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Cyrill Gorcunov, Dave Hansen,
	Eugene Syromiatnikov, Florian Weimer, H.J. Lu, Jann Horn,
	Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu

On Wed, Feb 03, 2021 at 02:55:26PM -0800, Yu-cheng Yu wrote:
> @@ -1252,6 +1260,13 @@ static void __init cpu_parse_early_param(void)
>  	if (cmdline_find_option_bool(boot_command_line, "noxsaves"))
>  		setup_clear_cpu_cap(X86_FEATURE_XSAVES);
>  
> +#ifdef CONFIG_X86_CET
> +	if (cmdline_find_option_bool(boot_command_line, "no_user_shstk"))
> +		setup_clear_cpu_cap(X86_FEATURE_SHSTK);
> +	if (cmdline_find_option_bool(boot_command_line, "no_user_ibt"))
> +		setup_clear_cpu_cap(X86_FEATURE_IBT);
> +#endif

Any particular need for the ifdeffery?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v19 06/25] x86/cet: Add control-protection fault handler
  2021-02-03 22:55 ` [PATCH v19 06/25] x86/cet: Add control-protection fault handler Yu-cheng Yu
  2021-02-04 20:09   ` Kees Cook
@ 2021-02-05 13:59   ` Borislav Petkov
  2021-02-05 18:00     ` Yu, Yu-cheng
  1 sibling, 1 reply; 67+ messages in thread
From: Borislav Petkov @ 2021-02-05 13:59 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Cyrill Gorcunov, Dave Hansen,
	Eugene Syromiatnikov, Florian Weimer, H.J. Lu, Jann Horn,
	Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu, Michael Kerrisk

On Wed, Feb 03, 2021 at 02:55:28PM -0800, Yu-cheng Yu wrote:
> +DEFINE_IDTENTRY_ERRORCODE(exc_control_protection)
> +{
> +	static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
> +				      DEFAULT_RATELIMIT_BURST);
> +	struct task_struct *tsk;
> +
> +	if (!user_mode(regs)) {
> +		pr_emerg("PANIC: unexpected kernel control protection fault\n");
> +		die("kernel control protection fault", regs, error_code);
> +		panic("Machine halted.");
> +	}
> +
> +	cond_local_irq_enable(regs);
> +
> +	if (!boot_cpu_has(X86_FEATURE_CET))
> +		WARN_ONCE(1, "Control protection fault with CET support disabled\n");
> +
> +	tsk = current;
> +	tsk->thread.error_code = error_code;
> +	tsk->thread.trap_nr = X86_TRAP_CP;
> +
> +	if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV) &&
> +	    __ratelimit(&rs)) {

I can't find it written down anywhere why the ratelimiting is needed at
all?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v19 04/25] x86/cpufeatures: Introduce X86_FEATURE_CET and setup functions
  2021-02-05 13:43   ` Borislav Petkov
@ 2021-02-05 16:15     ` Yu, Yu-cheng
  0 siblings, 0 replies; 67+ messages in thread
From: Yu, Yu-cheng @ 2021-02-05 16:15 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Cyrill Gorcunov, Dave Hansen,
	Eugene Syromiatnikov, Florian Weimer, H.J. Lu, Jann Horn,
	Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu

On 2/5/2021 5:43 AM, Borislav Petkov wrote:
> On Wed, Feb 03, 2021 at 02:55:26PM -0800, Yu-cheng Yu wrote:
>> @@ -1252,6 +1260,13 @@ static void __init cpu_parse_early_param(void)
>>   	if (cmdline_find_option_bool(boot_command_line, "noxsaves"))
>>   		setup_clear_cpu_cap(X86_FEATURE_XSAVES);
>>   
>> +#ifdef CONFIG_X86_CET
>> +	if (cmdline_find_option_bool(boot_command_line, "no_user_shstk"))
>> +		setup_clear_cpu_cap(X86_FEATURE_SHSTK);
>> +	if (cmdline_find_option_bool(boot_command_line, "no_user_ibt"))
>> +		setup_clear_cpu_cap(X86_FEATURE_IBT);
>> +#endif
> 
> Any particular need for the ifdeffery?
> 

Right.  There is no need for that.  I will remove it.

--
Yu-cheng

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v19 06/25] x86/cet: Add control-protection fault handler
  2021-02-05 13:59   ` Borislav Petkov
@ 2021-02-05 18:00     ` Yu, Yu-cheng
  2021-02-05 18:29       ` Kees Cook
  2021-02-08 18:20       ` Borislav Petkov
  0 siblings, 2 replies; 67+ messages in thread
From: Yu, Yu-cheng @ 2021-02-05 18:00 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Cyrill Gorcunov, Dave Hansen,
	Eugene Syromiatnikov, Florian Weimer, H.J. Lu, Jann Horn,
	Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu, Michael Kerrisk

On 2/5/2021 5:59 AM, Borislav Petkov wrote:
> On Wed, Feb 03, 2021 at 02:55:28PM -0800, Yu-cheng Yu wrote:
>> +DEFINE_IDTENTRY_ERRORCODE(exc_control_protection)
>> +{
>> +	static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
>> +				      DEFAULT_RATELIMIT_BURST);
>> +	struct task_struct *tsk;
>> +
>> +	if (!user_mode(regs)) {
>> +		pr_emerg("PANIC: unexpected kernel control protection fault\n");
>> +		die("kernel control protection fault", regs, error_code);
>> +		panic("Machine halted.");
>> +	}
>> +
>> +	cond_local_irq_enable(regs);
>> +
>> +	if (!boot_cpu_has(X86_FEATURE_CET))
>> +		WARN_ONCE(1, "Control protection fault with CET support disabled\n");
>> +
>> +	tsk = current;
>> +	tsk->thread.error_code = error_code;
>> +	tsk->thread.trap_nr = X86_TRAP_CP;
>> +
>> +	if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV) &&
>> +	    __ratelimit(&rs)) {
> 
> I can't find it written down anywhere why the ratelimiting is needed at
> all?
> 

The ratelimit here is only for #CP, and its rate is not counted together 
with other types of faults.  If a task gets here, it will exit.  The 
only condition the ratelimit will trigger is when multiple tasks hit #CP 
at once, which is unlikely.  Are you suggesting that we do not need the 
ratelimit here?

Thanks!

--
Yu-cheng

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v19 24/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
  2021-02-04 23:41     ` Yu, Yu-cheng
@ 2021-02-05 18:26       ` Kees Cook
  0 siblings, 0 replies; 67+ messages in thread
From: Kees Cook @ 2021-02-05 18:26 UTC (permalink / raw)
  To: Yu, Yu-cheng
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu

On Thu, Feb 04, 2021 at 03:41:59PM -0800, Yu, Yu-cheng wrote:
> On 2/4/2021 12:35 PM, Kees Cook wrote:
> > On Wed, Feb 03, 2021 at 02:55:46PM -0800, Yu-cheng Yu wrote:
> > > arch_prctl(ARCH_X86_CET_STATUS, u64 *args)
> > >      Get CET feature status.
> > > 
> > >      The parameter 'args' is a pointer to a user buffer.  The kernel returns
> > >      the following information:
> > > 
> > >      *args = shadow stack/IBT status
> > >      *(args + 1) = shadow stack base address
> > >      *(args + 2) = shadow stack size
> > 
> > What happens if this needs to grow in the future? Should the first u64
> > contain the array size?
> > 
> > Otherwise, looks sensible.
> > 
> > -Kees
> > 
> 
> The first item is a bitmap, and there are two possible bits.  Should there
> be a need, we can then do things about it.  My thought at the moment is, we
> may not meet the situation.  Can we keep this for now?

Ah, good point. Yes, since that's a bitmap it ends up describing what
follows. This is fine as-is. Thanks!

Reviewed-by: Kees Cook <keescook@chromium.org>


-- 
Kees Cook

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v19 06/25] x86/cet: Add control-protection fault handler
  2021-02-05 18:00     ` Yu, Yu-cheng
@ 2021-02-05 18:29       ` Kees Cook
  2021-02-08 18:20       ` Borislav Petkov
  1 sibling, 0 replies; 67+ messages in thread
From: Kees Cook @ 2021-02-05 18:29 UTC (permalink / raw)
  To: Yu, Yu-cheng
  Cc: Borislav Petkov, x86, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, linux-kernel, linux-doc, linux-mm, linux-arch,
	linux-api, Arnd Bergmann, Andy Lutomirski, Balbir Singh,
	Cyrill Gorcunov, Dave Hansen, Eugene Syromiatnikov,
	Florian Weimer, H.J. Lu, Jann Horn, Jonathan Corbet,
	Mike Kravetz, Nadav Amit, Oleg Nesterov, Pavel Machek,
	Peter Zijlstra, Randy Dunlap, Ravi V. Shankar,
	Vedvyas Shanbhogue, Dave Martin, Weijiang Yang, Pengfei Xu,
	Michael Kerrisk

On Fri, Feb 05, 2021 at 10:00:21AM -0800, Yu, Yu-cheng wrote:
> On 2/5/2021 5:59 AM, Borislav Petkov wrote:
> > On Wed, Feb 03, 2021 at 02:55:28PM -0800, Yu-cheng Yu wrote:
> > > +DEFINE_IDTENTRY_ERRORCODE(exc_control_protection)
> > > +{
> > > +	static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
> > > +				      DEFAULT_RATELIMIT_BURST);
> > > +	struct task_struct *tsk;
> > > +
> > > +	if (!user_mode(regs)) {
> > > +		pr_emerg("PANIC: unexpected kernel control protection fault\n");
> > > +		die("kernel control protection fault", regs, error_code);
> > > +		panic("Machine halted.");
> > > +	}
> > > +
> > > +	cond_local_irq_enable(regs);
> > > +
> > > +	if (!boot_cpu_has(X86_FEATURE_CET))
> > > +		WARN_ONCE(1, "Control protection fault with CET support disabled\n");
> > > +
> > > +	tsk = current;
> > > +	tsk->thread.error_code = error_code;
> > > +	tsk->thread.trap_nr = X86_TRAP_CP;
> > > +
> > > +	if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV) &&
> > > +	    __ratelimit(&rs)) {
> > 
> > I can't find it written down anywhere why the ratelimiting is needed at
> > all?
> > 
> 
> The ratelimit here is only for #CP, and its rate is not counted together
> with other types of faults.  If a task gets here, it will exit.  The only
> condition the ratelimit will trigger is when multiple tasks hit #CP at once,
> which is unlikely.  Are you suggesting that we do not need the ratelimit
> here?

Since this is a potentially unprivileged-userspace-triggerable
condition, I tend to prefer having a ratelimit. I don't feel _strongly_
about it, but I find it better to be defensive against log spamming
(whether malicious or accidental).

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v19 08/25] x86/mm: Introduce _PAGE_COW
  2021-02-04 20:27     ` Dave Hansen
@ 2021-02-05 18:41       ` Yu, Yu-cheng
  2021-02-08 23:03         ` Yu, Yu-cheng
  0 siblings, 1 reply; 67+ messages in thread
From: Yu, Yu-cheng @ 2021-02-05 18:41 UTC (permalink / raw)
  To: Dave Hansen, Kees Cook
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu

On 2/4/2021 12:27 PM, Dave Hansen wrote:
> On 2/4/21 12:19 PM, Kees Cook wrote:
>>> (e) A page where the processor observed a Write=1 PTE, started a write, set
>>>      Dirty=1, but then observed a Write=0 PTE.  That's possible today, but
>>>      will not happen on processors that support shadow stack.
>> What happens for "e" with/without CET? It sounds like direct writes to
>> such pages will be (correctly) rejected by the MMU?
> 
> A page fault would be generated regardless of CET support.
> 
> If CET were not around, the fault would be reported as a present, write
> fault.
> 
> If this happened and CET were around (which shouldn't happen in
> practice, it means we have a hardware issue) a page fault exception is
> generated. 

Thanks for the clarification.  With or without CET, direct write to 
Write=0, Dirty=1 PTE triggers page fault.

> Yu-cheng, I'm not sure there's enough debugging around to
> tell us if this happens.  Would we even notice?

That potential hardware issue is, on a CET-capable system, a processor 
writes to a Write=1, Dirty=0 page, and then observes the PTE is Write=0, 
Dirty=1.  Let me think about it...

Thanks!

--
Yu-cheng

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v19 08/25] x86/mm: Introduce _PAGE_COW
  2021-02-04 20:19   ` Kees Cook
  2021-02-04 20:27     ` Dave Hansen
@ 2021-02-05 18:58     ` Yu, Yu-cheng
  1 sibling, 0 replies; 67+ messages in thread
From: Yu, Yu-cheng @ 2021-02-05 18:58 UTC (permalink / raw)
  To: Kees Cook
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu

On 2/4/2021 12:19 PM, Kees Cook wrote:
> On Wed, Feb 03, 2021 at 02:55:30PM -0800, Yu-cheng Yu wrote:
>> There is essentially no room left in the x86 hardware PTEs on some OSes
>> (not Linux).  That left the hardware architects looking for a way to
>> represent a new memory type (shadow stack) within the existing bits.
>> They chose to repurpose a lightly-used state: Write=0, Dirty=1.
>>
>> The reason it's lightly used is that Dirty=1 is normally set by hardware
>> and cannot normally be set by hardware on a Write=0 PTE.  Software must
>> normally be involved to create one of these PTEs, so software can simply
>> opt to not create them.
>>
>> In places where Linux normally creates Write=0, Dirty=1, it can use the
>> software-defined _PAGE_COW in place of the hardware _PAGE_DIRTY.  In other
>> words, whenever Linux needs to create Write=0, Dirty=1, it instead creates
>> Write=0, Cow=1, except for shadow stack, which is Write=0, Dirty=1.  This
>> clearly separates shadow stack from other data, and results in the
>> following:
>>
>> (a) A modified, copy-on-write (COW) page: (Write=0, Cow=1)
>> (b) A R/O page that has been COW'ed: (Write=0, Cow=1)
>>      The user page is in a R/O VMA, and get_user_pages() needs a writable
>>      copy.  The page fault handler creates a copy of the page and sets
>>      the new copy's PTE as Write=0 and Cow=1.
>> (c) A shadow stack PTE: (Write=0, Dirty=1)
>> (d) A shared shadow stack PTE: (Write=0, Cow=1)
>>      When a shadow stack page is being shared among processes (this happens
>>      at fork()), its PTE is made Dirty=0, so the next shadow stack access
>>      causes a fault, and the page is duplicated and Dirty=1 is set again.
>>      This is the COW equivalent for shadow stack pages, even though it's
>>      copy-on-access rather than copy-on-write.
>> (e) A page where the processor observed a Write=1 PTE, started a write, set
>>      Dirty=1, but then observed a Write=0 PTE.  That's possible today, but
>>      will not happen on processors that support shadow stack.
> 
> What happens for "e" with/without CET? It sounds like direct writes to
> such pages will be (correctly) rejected by the MMU?
> 
>>
>> Define _PAGE_COW and update pte_*() helpers and apply the same changes to
>> pmd and pud.
>>
>> After this, there are six free bits left in the 64-bit PTE, and no more
>> free bits in the 32-bit PTE (except for PAE) and Shadow Stack is not
>> implemented for the 32-bit kernel.
> 
> Are there selftests to validate this change?
> 

I have some tests to verify, for example,

- After clone(), shadow stack pages are indeed copy-on-write,
- Shadow stack pages (i.e. Write=0, Dirty=1) cannot be directly written to,
- Shadow stack guard pages exist.

These tests are now on github, but kind of messy.  I can gradually clean 
up them and submit as selftests separately.

If you are asking for the detection of the potential hardware issue 
(that Dave Hansen talked about), then maybe we need to detect it from 
the kernel.

> I think it might be useful to more clearly describe what is considered
> "dirty" and "writeable" in comments above the pte_helpers.
> 

Yes, I will update it.  Thanks!

[...]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v19 06/25] x86/cet: Add control-protection fault handler
  2021-02-05 18:00     ` Yu, Yu-cheng
  2021-02-05 18:29       ` Kees Cook
@ 2021-02-08 18:20       ` Borislav Petkov
  2021-02-08 18:50         ` Yu, Yu-cheng
  1 sibling, 1 reply; 67+ messages in thread
From: Borislav Petkov @ 2021-02-08 18:20 UTC (permalink / raw)
  To: Yu, Yu-cheng
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Cyrill Gorcunov, Dave Hansen,
	Eugene Syromiatnikov, Florian Weimer, H.J. Lu, Jann Horn,
	Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu, Michael Kerrisk

On Fri, Feb 05, 2021 at 10:00:21AM -0800, Yu, Yu-cheng wrote:
> The ratelimit here is only for #CP, and its rate is not counted together
> with other types of faults.  If a task gets here, it will exit.  The only
> condition the ratelimit will trigger is when multiple tasks hit #CP at once,
> which is unlikely.  Are you suggesting that we do not need the ratelimit
> here?

I'm trying to first find out why is it there.

Is this something you've hit during testing and thought, oh well, this
needs a ratelimit or was it added just because?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v19 06/25] x86/cet: Add control-protection fault handler
  2021-02-08 18:20       ` Borislav Petkov
@ 2021-02-08 18:50         ` Yu, Yu-cheng
  2021-02-08 18:53           ` Borislav Petkov
  0 siblings, 1 reply; 67+ messages in thread
From: Yu, Yu-cheng @ 2021-02-08 18:50 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Cyrill Gorcunov, Dave Hansen,
	Eugene Syromiatnikov, Florian Weimer, H.J. Lu, Jann Horn,
	Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu, Michael Kerrisk

On 2/8/2021 10:20 AM, Borislav Petkov wrote:
> On Fri, Feb 05, 2021 at 10:00:21AM -0800, Yu, Yu-cheng wrote:
>> The ratelimit here is only for #CP, and its rate is not counted together
>> with other types of faults.  If a task gets here, it will exit.  The only
>> condition the ratelimit will trigger is when multiple tasks hit #CP at once,
>> which is unlikely.  Are you suggesting that we do not need the ratelimit
>> here?
> 
> I'm trying to first find out why is it there.
> 
> Is this something you've hit during testing and thought, oh well, this
> needs a ratelimit or was it added just because?
> 

I have not run into the situation.  Initially it was there because other 
faults have it.  When you asked, I went through it and put out my 
reasoning.  I think it still makes sense to keep it.

--
Yu-cheng

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v19 06/25] x86/cet: Add control-protection fault handler
  2021-02-08 18:50         ` Yu, Yu-cheng
@ 2021-02-08 18:53           ` Borislav Petkov
  2021-02-08 19:23             ` Yu, Yu-cheng
  0 siblings, 1 reply; 67+ messages in thread
From: Borislav Petkov @ 2021-02-08 18:53 UTC (permalink / raw)
  To: Yu, Yu-cheng
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Cyrill Gorcunov, Dave Hansen,
	Eugene Syromiatnikov, Florian Weimer, H.J. Lu, Jann Horn,
	Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu, Michael Kerrisk

On Mon, Feb 08, 2021 at 10:50:07AM -0800, Yu, Yu-cheng wrote:
> I have not run into the situation.  Initially it was there because other
> faults have it.

Which other faults?

> When you asked, I went through it and put out my reasoning.

What does that mean?

> I think it still makes sense to keep it.

Because you have a hunch or you actually have an objective reason why?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v19 06/25] x86/cet: Add control-protection fault handler
  2021-02-08 18:53           ` Borislav Petkov
@ 2021-02-08 19:23             ` Yu, Yu-cheng
  2021-02-08 19:48               ` Borislav Petkov
  0 siblings, 1 reply; 67+ messages in thread
From: Yu, Yu-cheng @ 2021-02-08 19:23 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Cyrill Gorcunov, Dave Hansen,
	Eugene Syromiatnikov, Florian Weimer, H.J. Lu, Jann Horn,
	Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu, Michael Kerrisk

On 2/8/2021 10:53 AM, Borislav Petkov wrote:
> On Mon, Feb 08, 2021 at 10:50:07AM -0800, Yu, Yu-cheng wrote:
>> I have not run into the situation.  Initially it was there because other
>> faults have it.
> 
> Which other faults?

exc_general_protection() and do_trap() both call show_signal(), which 
then calls printk_ratelimit().

> 
>> When you asked, I went through it and put out my reasoning.
> 
> What does that mean?
> 

I went through my patch and check if ratelimit is necessary, and then 
describe the finding.

>> I think it still makes sense to keep it.
> 
> Because you have a hunch or you actually have an objective reason why?
> 

For example, if a shell script, in a loop re-starts an app when it 
exits, and the app is causing control-protection fault.  The log 
messages should be rate limited.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v19 06/25] x86/cet: Add control-protection fault handler
  2021-02-08 19:23             ` Yu, Yu-cheng
@ 2021-02-08 19:48               ` Borislav Petkov
  2021-02-08 20:11                 ` Yu, Yu-cheng
  0 siblings, 1 reply; 67+ messages in thread
From: Borislav Petkov @ 2021-02-08 19:48 UTC (permalink / raw)
  To: Yu, Yu-cheng
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Cyrill Gorcunov, Dave Hansen,
	Eugene Syromiatnikov, Florian Weimer, H.J. Lu, Jann Horn,
	Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu, Michael Kerrisk

On Mon, Feb 08, 2021 at 11:23:18AM -0800, Yu, Yu-cheng wrote:
> exc_general_protection() and do_trap() both call show_signal(), which
> then calls printk_ratelimit().

You could've done some git archeology and could've found

  abd4f7505baf ("x86: i386-show-unhandled-signals-v3")

which explains why that ratelimiting is needed.

> For example, if a shell script, in a loop re-starts an app when it
> exits, and the app is causing control-protection fault. The log
> messages should be rate limited.

I think you should be able to get where I'm going with this, by now:
please put a comment over the ratelimiting to explain why it is there,
just like the above commit explains.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v19 06/25] x86/cet: Add control-protection fault handler
  2021-02-08 19:48               ` Borislav Petkov
@ 2021-02-08 20:11                 ` Yu, Yu-cheng
  0 siblings, 0 replies; 67+ messages in thread
From: Yu, Yu-cheng @ 2021-02-08 20:11 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Cyrill Gorcunov, Dave Hansen,
	Eugene Syromiatnikov, Florian Weimer, H.J. Lu, Jann Horn,
	Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu, Michael Kerrisk

On 2/8/2021 11:48 AM, Borislav Petkov wrote:
> On Mon, Feb 08, 2021 at 11:23:18AM -0800, Yu, Yu-cheng wrote:
>> exc_general_protection() and do_trap() both call show_signal(), which
>> then calls printk_ratelimit().
> 
> You could've done some git archeology and could've found
> 
>    abd4f7505baf ("x86: i386-show-unhandled-signals-v3")
> 
> which explains why that ratelimiting is needed.
> 
>> For example, if a shell script, in a loop re-starts an app when it
>> exits, and the app is causing control-protection fault. The log
>> messages should be rate limited.
> 
> I think you should be able to get where I'm going with this, by now:
> please put a comment over the ratelimiting to explain why it is there,
> just like the above commit explains.

I will add that.

> 
> Thx.
> 


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v19 08/25] x86/mm: Introduce _PAGE_COW
  2021-02-05 18:41       ` Yu, Yu-cheng
@ 2021-02-08 23:03         ` Yu, Yu-cheng
  0 siblings, 0 replies; 67+ messages in thread
From: Yu, Yu-cheng @ 2021-02-08 23:03 UTC (permalink / raw)
  To: Dave Hansen, Kees Cook
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin, Weijiang Yang,
	Pengfei Xu

On 2/5/2021 10:41 AM, Yu, Yu-cheng wrote:
> On 2/4/2021 12:27 PM, Dave Hansen wrote:
>> On 2/4/21 12:19 PM, Kees Cook wrote:
>>>> (e) A page where the processor observed a Write=1 PTE, started a 
>>>> write, set
>>>>      Dirty=1, but then observed a Write=0 PTE.  That's possible 
>>>> today, but
>>>>      will not happen on processors that support shadow stack.
>>> What happens for "e" with/without CET? It sounds like direct writes to
>>> such pages will be (correctly) rejected by the MMU?
>>
>> A page fault would be generated regardless of CET support.
>>
>> If CET were not around, the fault would be reported as a present, write
>> fault.
>>
>> If this happened and CET were around (which shouldn't happen in
>> practice, it means we have a hardware issue) a page fault exception is
>> generated. 
> 
> Thanks for the clarification.  With or without CET, direct write to 
> Write=0, Dirty=1 PTE triggers page fault.
> 
>> Yu-cheng, I'm not sure there's enough debugging around to
>> tell us if this happens.  Would we even notice?
> 
> That potential hardware issue is, on a CET-capable system, a processor 
> writes to a Write=1, Dirty=0 page, and then observes the PTE is Write=0, 
> Dirty=1.  Let me think about it...
> 

One way to detect the potential issue is adding a check when a 
non-shadow stack page's PTE goes from RW=0 to RW=1, like the following...

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 29aa6f07e3c9..241b94a0fa77 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -405,6 +405,8 @@ static inline pte_t pte_mkyoung(pte_t pte)
  static inline pte_t pte_mkwrite(pte_t pte)
  {
  	if (cpu_feature_enabled(X86_FEATURE_SHSTK)) {
+		WARN_ONCE((pte_flags(pte) & (_PAGE_RW | _PAGE_DIRTY)) ==
+			  _PAGE_DIRTY, "Found transient shadow stack PTE\n");
  		if (pte_flags(pte) & _PAGE_COW) {
  			pte = pte_clear_flags(pte, _PAGE_COW);
  			pte = pte_set_flags(pte, _PAGE_DIRTY);

I run all my routine stress tests with the changes, and do not see any 
warning triggered.  If this change is desirable, we can probably add 
#ifdef CONFIG_DEBUG_VM around it and make it a separate patch.

--
Yu-cheng

^ permalink raw reply related	[flat|nested] 67+ messages in thread

end of thread, other threads:[~2021-02-08 23:04 UTC | newest]

Thread overview: 67+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-03 22:55 [PATCH v19 00/25] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
2021-02-03 22:55 ` [PATCH v19 01/25] Documentation/x86: Add CET description Yu-cheng Yu
2021-02-04 19:52   ` Kees Cook
2021-02-04 20:03     ` Yu, Yu-cheng
2021-02-03 22:55 ` [PATCH v19 02/25] x86/cet/shstk: Add Kconfig option for user-mode control-flow protection Yu-cheng Yu
2021-02-04 19:56   ` Kees Cook
2021-02-05  0:05     ` Yu, Yu-cheng
2021-02-03 22:55 ` [PATCH v19 03/25] x86/cpufeatures: Add CET CPU feature flags for Control-flow Enforcement Technology (CET) Yu-cheng Yu
2021-02-04 19:57   ` Kees Cook
2021-02-03 22:55 ` [PATCH v19 04/25] x86/cpufeatures: Introduce X86_FEATURE_CET and setup functions Yu-cheng Yu
2021-02-04 19:58   ` Kees Cook
2021-02-05 13:43   ` Borislav Petkov
2021-02-05 16:15     ` Yu, Yu-cheng
2021-02-03 22:55 ` [PATCH v19 05/25] x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states Yu-cheng Yu
2021-02-04 19:59   ` Kees Cook
2021-02-03 22:55 ` [PATCH v19 06/25] x86/cet: Add control-protection fault handler Yu-cheng Yu
2021-02-04 20:09   ` Kees Cook
2021-02-05  0:10     ` Yu, Yu-cheng
2021-02-05 13:59   ` Borislav Petkov
2021-02-05 18:00     ` Yu, Yu-cheng
2021-02-05 18:29       ` Kees Cook
2021-02-08 18:20       ` Borislav Petkov
2021-02-08 18:50         ` Yu, Yu-cheng
2021-02-08 18:53           ` Borislav Petkov
2021-02-08 19:23             ` Yu, Yu-cheng
2021-02-08 19:48               ` Borislav Petkov
2021-02-08 20:11                 ` Yu, Yu-cheng
2021-02-03 22:55 ` [PATCH v19 07/25] x86/mm: Remove _PAGE_DIRTY from kernel RO pages Yu-cheng Yu
2021-02-04 20:10   ` Kees Cook
2021-02-03 22:55 ` [PATCH v19 08/25] x86/mm: Introduce _PAGE_COW Yu-cheng Yu
2021-02-04 20:19   ` Kees Cook
2021-02-04 20:27     ` Dave Hansen
2021-02-05 18:41       ` Yu, Yu-cheng
2021-02-08 23:03         ` Yu, Yu-cheng
2021-02-05 18:58     ` Yu, Yu-cheng
2021-02-03 22:55 ` [PATCH v19 09/25] drm/i915/gvt: Change _PAGE_DIRTY to _PAGE_DIRTY_BITS Yu-cheng Yu
2021-02-03 22:55 ` [PATCH v19 10/25] x86/mm: Update pte_modify for _PAGE_COW Yu-cheng Yu
2021-02-04 20:20   ` Kees Cook
2021-02-03 22:55 ` [PATCH v19 11/25] x86/mm: Update ptep_set_wrprotect() and pmdp_set_wrprotect() for transition from _PAGE_DIRTY to _PAGE_COW Yu-cheng Yu
2021-02-04 20:21   ` Kees Cook
2021-02-03 22:55 ` [PATCH v19 12/25] mm: Introduce VM_SHSTK for shadow stack memory Yu-cheng Yu
2021-02-04 20:46   ` Cyrill Gorcunov
2021-02-04 21:48     ` Yu, Yu-cheng
2021-02-03 22:55 ` [PATCH v19 13/25] x86/mm: Shadow Stack page fault error checking Yu-cheng Yu
2021-02-03 22:55 ` [PATCH v19 14/25] x86/mm: Update maybe_mkwrite() for shadow stack Yu-cheng Yu
2021-02-04 20:22   ` Kees Cook
2021-02-03 22:55 ` [PATCH v19 15/25] mm: Fixup places that call pte_mkwrite() directly Yu-cheng Yu
2021-02-04 20:22   ` Kees Cook
2021-02-03 22:55 ` [PATCH v19 16/25] mm: Add guard pages around a shadow stack Yu-cheng Yu
2021-02-04 20:24   ` Kees Cook
2021-02-03 22:55 ` [PATCH v19 17/25] mm/mmap: Add shadow stack pages to memory accounting Yu-cheng Yu
2021-02-04 20:24   ` Kees Cook
2021-02-03 22:55 ` [PATCH v19 18/25] mm: Update can_follow_write_pte() for shadow stack Yu-cheng Yu
2021-02-04 20:27   ` Kees Cook
2021-02-03 22:55 ` [PATCH v19 19/25] mm: Re-introduce vm_flags to do_mmap() Yu-cheng Yu
2021-02-04 20:28   ` Kees Cook
2021-02-03 22:55 ` [PATCH v19 20/25] x86/cet/shstk: User-mode shadow stack support Yu-cheng Yu
2021-02-04 20:29   ` Kees Cook
2021-02-03 22:55 ` [PATCH v19 21/25] x86/cet/shstk: Handle signals for shadow stack Yu-cheng Yu
2021-02-03 22:55 ` [PATCH v19 22/25] ELF: Introduce arch_setup_elf_property() Yu-cheng Yu
2021-02-04 20:33   ` Kees Cook
2021-02-03 22:55 ` [PATCH v19 23/25] x86/cet/shstk: Handle thread shadow stack Yu-cheng Yu
2021-02-03 22:55 ` [PATCH v19 24/25] x86/cet/shstk: Add arch_prctl functions for " Yu-cheng Yu
2021-02-04 20:35   ` Kees Cook
2021-02-04 23:41     ` Yu, Yu-cheng
2021-02-05 18:26       ` Kees Cook
2021-02-03 22:55 ` [PATCH v19 25/25] mm: Introduce PROT_SHSTK " Yu-cheng Yu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).