linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 00/27] Control-flow Enforcement: Shadow Stack
@ 2019-06-06 20:06 Yu-cheng Yu
  2019-06-06 20:06 ` [PATCH v7 01/27] Documentation/x86: Add CET description Yu-cheng Yu
                   ` (26 more replies)
  0 siblings, 27 replies; 70+ messages in thread
From: Yu-cheng Yu @ 2019-06-06 20:06 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin
  Cc: Yu-cheng Yu

Intel has published Control-flow Enforcement (CET) in the Architecture
Instruction Set Extensions Programming Reference:

  https://software.intel.com/en-us/download/intel-architecture-instruction-set-
  extensions-programming-reference

The previous version (v6) of CET Shadow Stack patches is here:

  https://lkml.org/lkml/2018/11/20/225

Summary of changes from v6:

  Rebase to v5.2-rc3.

  Adapt XSAVES system state changes to the recent FPU changes:
    - Add modify_fpu_regs_begin(), modify_fpu_regs_end() to protect
      CET MSR changes.
    - Update signal handling.

  Move ELF header-parsing out of x86.

  Other small fixes in response to comments.

This version passed GLIBC built-in tests.

Hi Maintainers,

What else is needed before this series can be merged?

Thanks,
Yu-cheng


Yu-cheng Yu (27):
  Documentation/x86: Add CET description
  x86/cpufeatures: Add CET CPU feature flags for Control-flow
    Enforcement Technology (CET)
  x86/fpu/xstate: Change names to separate XSAVES system and user states
  x86/fpu/xstate: Introduce XSAVES system states
  x86/fpu/xstate: Add XSAVES system states for shadow stack
  x86/cet: Add control protection exception handler
  x86/cet/shstk: Add Kconfig option for user-mode shadow stack
  mm: Introduce VM_SHSTK for shadow stack memory
  mm/mmap: Prevent Shadow Stack VMA merges
  x86/mm: Change _PAGE_DIRTY to _PAGE_DIRTY_HW
  x86/mm: Introduce _PAGE_DIRTY_SW
  drm/i915/gvt: Update _PAGE_DIRTY to _PAGE_DIRTY_BITS
  x86/mm: Modify ptep_set_wrprotect and pmdp_set_wrprotect for
    _PAGE_DIRTY_SW
  x86/mm: Shadow stack page fault error checking
  mm: Handle shadow stack page fault
  mm: Handle THP/HugeTLB shadow stack page fault
  mm: Update can_follow_write_pte/pmd for shadow stack
  mm: Introduce do_mmap_locked()
  x86/cet/shstk: User-mode shadow stack support
  x86/cet/shstk: Introduce WRUSS instruction
  x86/cet/shstk: Handle signals for shadow stack
  binfmt_elf: Extract .note.gnu.property from an ELF file
  x86/cet/shstk: ELF header parsing of Shadow Stack
  x86/cet/shstk: Handle thread shadow stack
  mm/mmap: Add Shadow stack pages to memory accounting
  x86/cet/shstk: Add arch_prctl functions for Shadow Stack
  x86/cet/shstk: Add Shadow Stack instructions to opcode map

 .../admin-guide/kernel-parameters.txt         |   6 +
 Documentation/x86/index.rst                   |   1 +
 Documentation/x86/intel_cet.rst               | 268 +++++++++++++
 arch/x86/Kconfig                              |  26 ++
 arch/x86/Makefile                             |   7 +
 arch/x86/entry/entry_64.S                     |   2 +-
 arch/x86/ia32/ia32_signal.c                   |   8 +
 arch/x86/include/asm/cet.h                    |  48 +++
 arch/x86/include/asm/cpufeatures.h            |   2 +
 arch/x86/include/asm/disabled-features.h      |   8 +-
 arch/x86/include/asm/fpu/internal.h           |  25 +-
 arch/x86/include/asm/fpu/signal.h             |   2 +
 arch/x86/include/asm/fpu/types.h              |  22 ++
 arch/x86/include/asm/fpu/xstate.h             |  26 +-
 arch/x86/include/asm/mmu_context.h            |   3 +
 arch/x86/include/asm/msr-index.h              |  18 +
 arch/x86/include/asm/pgtable.h                | 191 ++++++++--
 arch/x86/include/asm/pgtable_types.h          |  38 +-
 arch/x86/include/asm/processor.h              |   5 +
 arch/x86/include/asm/special_insns.h          |  32 ++
 arch/x86/include/asm/traps.h                  |   5 +
 arch/x86/include/uapi/asm/prctl.h             |   5 +
 arch/x86/include/uapi/asm/processor-flags.h   |   2 +
 arch/x86/include/uapi/asm/sigcontext.h        |  15 +
 arch/x86/kernel/Makefile                      |   2 +
 arch/x86/kernel/cet.c                         | 327 ++++++++++++++++
 arch/x86/kernel/cet_prctl.c                   |  85 +++++
 arch/x86/kernel/cpu/common.c                  |  25 ++
 arch/x86/kernel/cpu/cpuid-deps.c              |   2 +
 arch/x86/kernel/fpu/core.c                    |  26 +-
 arch/x86/kernel/fpu/init.c                    |  10 -
 arch/x86/kernel/fpu/signal.c                  |  81 +++-
 arch/x86/kernel/fpu/xstate.c                  | 156 +++++---
 arch/x86/kernel/idt.c                         |   4 +
 arch/x86/kernel/process.c                     |   8 +-
 arch/x86/kernel/process_64.c                  |  31 ++
 arch/x86/kernel/relocate_kernel_64.S          |   2 +-
 arch/x86/kernel/signal.c                      |  10 +-
 arch/x86/kernel/signal_compat.c               |   2 +-
 arch/x86/kernel/traps.c                       |  57 +++
 arch/x86/kvm/vmx/vmx.c                        |   2 +-
 arch/x86/lib/x86-opcode-map.txt               |  26 +-
 arch/x86/mm/fault.c                           |  18 +
 arch/x86/mm/pgtable.c                         |  41 ++
 drivers/gpu/drm/i915/gvt/gtt.c                |   2 +-
 fs/Kconfig.binfmt                             |   3 +
 fs/Makefile                                   |   1 +
 fs/binfmt_elf.c                               |  13 +
 fs/gnu_property.c                             | 351 ++++++++++++++++++
 fs/proc/task_mmu.c                            |   3 +
 include/asm-generic/pgtable.h                 |  14 +
 include/linux/elf.h                           |  12 +
 include/linux/mm.h                            |  26 ++
 include/uapi/asm-generic/siginfo.h            |   3 +-
 include/uapi/linux/elf.h                      |  14 +
 mm/gup.c                                      |   8 +-
 mm/huge_memory.c                              |  12 +-
 mm/memory.c                                   |   7 +-
 mm/mmap.c                                     |  11 +
 .../arch/x86/include/asm/disabled-features.h  |   8 +-
 tools/objtool/arch/x86/lib/x86-opcode-map.txt |  26 +-
 61 files changed, 2029 insertions(+), 165 deletions(-)
 create mode 100644 Documentation/x86/intel_cet.rst
 create mode 100644 arch/x86/include/asm/cet.h
 create mode 100644 arch/x86/kernel/cet.c
 create mode 100644 arch/x86/kernel/cet_prctl.c
 create mode 100644 fs/gnu_property.c

-- 
2.17.1


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v7 01/27] Documentation/x86: Add CET description
  2019-06-06 20:06 [PATCH v7 00/27] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
@ 2019-06-06 20:06 ` Yu-cheng Yu
  2019-06-06 20:06 ` [PATCH v7 02/27] x86/cpufeatures: Add CET CPU feature flags for Control-flow Enforcement Technology (CET) Yu-cheng Yu
                   ` (25 subsequent siblings)
  26 siblings, 0 replies; 70+ messages in thread
From: Yu-cheng Yu @ 2019-06-06 20:06 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin
  Cc: Yu-cheng Yu

Explain how CET works and the no_cet_shstk/no_cet_ibt kernel
parameters.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 .../admin-guide/kernel-parameters.txt         |   6 +
 Documentation/x86/index.rst                   |   1 +
 Documentation/x86/intel_cet.rst               | 268 ++++++++++++++++++
 3 files changed, 275 insertions(+)
 create mode 100644 Documentation/x86/intel_cet.rst

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 138f6664b2e2..b9b054f6c732 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2906,6 +2906,12 @@
 			noexec=on: enable non-executable mappings (default)
 			noexec=off: disable non-executable mappings
 
+	no_cet_ibt	[X86-64] Disable indirect branch tracking for user-mode
+			applications
+
+	no_cet_shstk	[X86-64] Disable shadow stack support for user-mode
+			applications
+
 	nosmap		[X86,PPC]
 			Disable SMAP (Supervisor Mode Access Prevention)
 			even if it is supported by processor.
diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index ae36fc5fc649..47822ac02fe0 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -21,6 +21,7 @@ x86-specific Documentation
    pat
    protection-keys
    intel_mpx
+   intel_cet
    amd-memory-encryption
    pti
    mds
diff --git a/Documentation/x86/intel_cet.rst b/Documentation/x86/intel_cet.rst
new file mode 100644
index 000000000000..dac83bbf8a24
--- /dev/null
+++ b/Documentation/x86/intel_cet.rst
@@ -0,0 +1,268 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=========================================
+Control-flow Enforcement Technology (CET)
+=========================================
+
+[1] Overview
+============
+
+Control-flow Enforcement Technology (CET) provides protection against
+return/jump-oriented programming (ROP) attacks.  It can be setup to
+protect both the kernel and applications.  In the first phase,
+only the user-mode protection is implemented in 64-bit mode; 32-bit
+applications are supported in compatibility mode.
+
+CET introduces shadow stack (SHSTK) and indirect branch tracking
+(IBT).  SHSTK is a secondary stack allocated from memory and cannot
+be directly modified by applications.  When executing a CALL, the
+processor pushes a copy of the return address to SHSTK.  Upon
+function return, the processor pops the SHSTK copy and compares it
+to the one from the program stack.  If the two copies differ, the
+processor raises a control-protection exception.  IBT verifies all
+indirect CALL/JMP targets are intended as marked by the compiler
+with 'ENDBR' opcodes (see CET instructions below).
+
+There are two kernel configuration options:
+
+    INTEL_X86_SHADOW_STACK_USER, and
+    INTEL_X86_BRANCH_TRACKING_USER.
+
+To build a CET-enabled kernel, Binutils v2.31 and GCC v8.1 or later
+are required.  To build a CET-enabled application, GLIBC v2.28 or
+later is also required.
+
+There are two command-line options for disabling CET features:
+
+    no_cet_shstk - disables SHSTK, and
+    no_cet_ibt - disables IBT.
+
+At run time, /proc/cpuinfo shows the availability of SHSTK and IBT.
+
+[2] CET assembly instructions
+=============================
+
+RDSSP %r
+    Read the SHSTK pointer into %r.
+
+INCSSP %r
+    Unwind (increment) the SHSTK pointer (0 ~ 255) steps as indicated
+    in the operand register.  The GLIBC longjmp uses INCSSP to unwind
+    the SHSTK until that matches the program stack.  When it is
+    necessary to unwind beyond 255 steps, longjmp divides and repeats
+    the process.
+
+RSTORSSP (%r)
+    Switch to the SHSTK indicated in the 'restore token' pointed by
+    the operand register and replace the 'restore token' with a new
+    token to be saved (with SAVEPREVSSP) for the outgoing SHSTK.
+
+::
+
+                               Before RSTORSSP
+
+             Incoming SHSTK                   Current/Outgoing SHSTK
+
+        |----------------------|             |----------------------|
+ addr=x |                      |       ssp-> |                      |
+        |----------------------|             |----------------------|
+ (%r)-> | rstor_token=(x|Lg)   |    addr=y-8 |                      |
+        |----------------------|             |----------------------|
+
+                               After RSTORSSP
+
+        |----------------------|             |----------------------|
+        |                      |             |                      |
+        |----------------------|             |----------------------|
+  ssp-> | rstor_token=(y|Bz|Lg)|    addr=y-8 |                      |
+        |----------------------|             |----------------------|
+
+    note:
+        1. Only valid addresses and restore tokens can be on the
+           user-mode SHSTK.
+        2. A token is always of type u64 and must align to u64.
+        3. The incoming SHSTK pointer in a rstor_token must point to
+           immediately above the token.
+        4. 'Lg' is bit[0] of a rstor_token indicating a 64-bit SHSTK.
+        5. 'Bz' is bit[1] of a rstor_token indicating the token is to
+           be used only for the next SAVEPREVSSP and invalid for the
+           RSTORSSP.
+
+SAVEPREVSSP
+    Store the SHSTK 'restore token' pointed by
+        (current_SHSTK_pointer + 8).
+
+::
+
+                             After SAVEPREVSSP
+
+        |----------------------|             |----------------------|
+  ssp-> |                      |             |                      |
+        |----------------------|             |----------------------|
+        | rstor_token=(y|Bz|Lg)|    addr=y-8 | rstor_token(y|Lg)    |
+        |----------------------|             |----------------------|
+
+WRUSS %r0, (%r1)
+    Write the value in %r0 to the SHSTK address pointed by (%r1).
+    This is a kernel-mode only instruction.
+
+ENDBR
+    The compiler inserts an ENDBR at all valid branch targets.  Any
+    CALL/JMP to a target without an ENDBR triggers a control
+    protection fault.
+
+[3] Application Enabling
+========================
+
+An application's CET capability is marked in its ELF header and can
+be verified from the following command output, in the
+NT_GNU_PROPERTY_TYPE_0 field:
+
+    readelf -n <application>
+
+If an application supports CET and is statically linked, it will run
+with CET protection.  If the application needs any shared libraries,
+the loader checks all dependencies and enables CET only when all
+requirements are met.
+
+[4] Legacy Libraries
+====================
+
+GLIBC provides a few tunables for backward compatibility.
+
+GLIBC_TUNABLES=glibc.tune.hwcaps=-SHSTK,-IBT
+    Turn off SHSTK/IBT for the current shell.
+
+GLIBC_TUNABLES=glibc.tune.x86_shstk=<on, permissive>
+    This controls how dlopen() handles SHSTK legacy libraries:
+        on: continue with SHSTK enabled;
+        permissive: continue with SHSTK off.
+
+[5] CET system calls
+====================
+
+The following arch_prctl() system calls are added for CET:
+
+arch_prctl(ARCH_X86_CET_STATUS, unsigned long *addr)
+    Return CET feature status.
+
+    The parameter 'addr' is a pointer to a user buffer.
+    On returning to the caller, the kernel fills the following
+    information:
+
+    *addr = SHSTK/IBT status
+    *(addr + 1) = SHSTK base address
+    *(addr + 2) = SHSTK size
+
+arch_prctl(ARCH_X86_CET_DISABLE, unsigned long features)
+    Disable SHSTK and/or IBT specified in 'features'.  Return -EPERM
+    if CET is locked.
+
+arch_prctl(ARCH_X86_CET_LOCK)
+    Lock in CET feature.
+
+arch_prctl(ARCH_X86_CET_ALLOC_SHSTK, unsigned long *addr)
+    Allocate a new SHSTK and put a restore token at top.
+
+    The parameter 'addr' is a pointer to a user buffer and indicates
+    the desired SHSTK size to allocate.  On returning to the caller,
+    the kernel fills *addr with the base address of the new SHSTK.
+
+arch_prctl(ARCH_X86_CET_SET_LEGACY_BITMAP, unsigned long *addr)
+    Setup an IBT legacy code bitmap.
+
+    The parameter 'addr' is a pointer to a user buffer that has the
+    following information:
+
+    *addr = IBT bitmap base address
+    *(addr + 1) = IBT bitmap size
+
+Note:
+  There is no CET enabling arch_prctl function.  By design, CET is
+  enabled automatically if the binary and the system can support it.
+
+  The parameters passed are always unsigned 64-bit.  When an ia32
+  application passing pointers, it should only use the lower 32 bits.
+
+[6] The implementation of the SHSTK
+===================================
+
+SHSTK size
+----------
+
+A task's SHSTK is allocated from memory to a fixed size of
+RLIMIT_STACK.  A compat-mode thread's SHSTK size is 1/4 of
+RLIMIT_STACK.  The smaller 32-bit thread SHSTK allows more threads to
+share a 32-bit address space.
+
+Signal
+------
+
+The main program and its signal handlers use the same SHSTK.  Because
+the SHSTK stores only return addresses, a large SHSTK will cover the
+condition that both the program stack and the sigaltstack run out.
+
+The kernel creates a restore token at the SHSTK restoring address and
+verifies that token when restoring from the signal handler.
+
+Fork
+----
+
+The SHSTK's vma has VM_SHSTK flag set; its PTEs are required to be
+read-only and dirty.  When a SHSTK PTE is not present, RO, and dirty,
+a SHSTK access triggers a page fault with an additional SHSTK bit set
+in the page fault error code.
+
+When a task forks a child, its SHSTK PTEs are copied and both the
+parent's and the child's SHSTK PTEs are cleared of the dirty bit.
+Upon the next SHSTK access, the resulting SHSTK page fault is handled
+by page copy/re-use.
+
+When a pthread child is created, the kernel allocates a new SHSTK for
+the new thread.
+
+Setjmp/Longjmp
+--------------
+
+Longjmp unwinds SHSTK until it matches the program stack.
+
+Ucontext
+--------
+
+In GLIBC, getcontext/setcontext is implemented in similar way as
+setjmp/longjmp.
+
+When makecontext creates a new ucontext, a new SHSTK is allocated for
+that context with ARCH_X86_CET_ALLOC_SHSTK the syscall.  The kernel
+creates a restore token at the top of the new SHSTK and the user-mode
+code switches to the new SHSTK with the RSTORSSP instruction.
+
+[7] The management of read-only & dirty PTEs for SHSTK
+======================================================
+
+A RO and dirty PTE exists in the following cases:
+
+(a) A page is modified and then shared with a fork()'ed child;
+(b) A R/O page that has been COW'ed;
+(c) A SHSTK page.
+
+The processor only checks the dirty bit for (c).  To prevent the use
+of non-SHSTK memory as SHSTK, we use a spare bit of the 64-bit PTE as
+DIRTY_SW for (a) and (b) above.  This results to the following PTE
+settings:
+
+Modified PTE:             (R/W + DIRTY_HW)
+Modified and shared PTE:  (R/O + DIRTY_SW)
+R/O PTE, COW'ed:          (R/O + DIRTY_SW)
+SHSTK PTE:                (R/O + DIRTY_HW)
+SHSTK PTE, COW'ed:        (R/O + DIRTY_HW)
+SHSTK PTE, shared:        (R/O + DIRTY_SW)
+
+Note that DIRTY_SW is only used in R/O PTEs but not R/W PTEs.
+
+[8] The implementation of IBT
+=============================
+
+The kernel provides IBT support in mmap() of the legacy code bit map.
+However, the management of the bitmap is done in the GLIBC or the
+application.
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v7 02/27] x86/cpufeatures: Add CET CPU feature flags for Control-flow Enforcement Technology (CET)
  2019-06-06 20:06 [PATCH v7 00/27] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
  2019-06-06 20:06 ` [PATCH v7 01/27] Documentation/x86: Add CET description Yu-cheng Yu
@ 2019-06-06 20:06 ` Yu-cheng Yu
  2019-06-06 20:06 ` [PATCH v7 03/27] x86/fpu/xstate: Change names to separate XSAVES system and user states Yu-cheng Yu
                   ` (24 subsequent siblings)
  26 siblings, 0 replies; 70+ messages in thread
From: Yu-cheng Yu @ 2019-06-06 20:06 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin
  Cc: Yu-cheng Yu

Add CPU feature flags for Control-flow Enforcement Technology (CET).

CPUID.(EAX=7,ECX=0):ECX[bit 7] Shadow stack
CPUID.(EAX=7,ECX=0):EDX[bit 20] Indirect branch tracking

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/include/asm/cpufeatures.h | 2 ++
 arch/x86/kernel/cpu/cpuid-deps.c   | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 75f27ee2c263..21b2d9497c0f 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -323,6 +323,7 @@
 #define X86_FEATURE_PKU			(16*32+ 3) /* Protection Keys for Userspace */
 #define X86_FEATURE_OSPKE		(16*32+ 4) /* OS Protection Keys Enable */
 #define X86_FEATURE_AVX512_VBMI2	(16*32+ 6) /* Additional AVX512 Vector Bit Manipulation Instructions */
+#define X86_FEATURE_SHSTK		(16*32+ 7) /* Shadow Stack */
 #define X86_FEATURE_GFNI		(16*32+ 8) /* Galois Field New Instructions */
 #define X86_FEATURE_VAES		(16*32+ 9) /* Vector AES */
 #define X86_FEATURE_VPCLMULQDQ		(16*32+10) /* Carry-Less Multiplication Double Quadword */
@@ -347,6 +348,7 @@
 #define X86_FEATURE_MD_CLEAR		(18*32+10) /* VERW clears CPU buffers */
 #define X86_FEATURE_TSX_FORCE_ABORT	(18*32+13) /* "" TSX_FORCE_ABORT */
 #define X86_FEATURE_PCONFIG		(18*32+18) /* Intel PCONFIG */
+#define X86_FEATURE_IBT			(18*32+20) /* Indirect Branch Tracking */
 #define X86_FEATURE_SPEC_CTRL		(18*32+26) /* "" Speculation Control (IBRS + IBPB) */
 #define X86_FEATURE_INTEL_STIBP		(18*32+27) /* "" Single Thread Indirect Branch Predictors */
 #define X86_FEATURE_FLUSH_L1D		(18*32+28) /* Flush L1D cache */
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index 2c0bd38a44ab..68ef07175062 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -59,6 +59,8 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_AVX512_4VNNIW,	X86_FEATURE_AVX512F   },
 	{ X86_FEATURE_AVX512_4FMAPS,	X86_FEATURE_AVX512F   },
 	{ X86_FEATURE_AVX512_VPOPCNTDQ, X86_FEATURE_AVX512F   },
+	{ X86_FEATURE_SHSTK,		X86_FEATURE_XSAVES    },
+	{ X86_FEATURE_IBT,		X86_FEATURE_XSAVES    },
 	{}
 };
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v7 03/27] x86/fpu/xstate: Change names to separate XSAVES system and user states
  2019-06-06 20:06 [PATCH v7 00/27] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
  2019-06-06 20:06 ` [PATCH v7 01/27] Documentation/x86: Add CET description Yu-cheng Yu
  2019-06-06 20:06 ` [PATCH v7 02/27] x86/cpufeatures: Add CET CPU feature flags for Control-flow Enforcement Technology (CET) Yu-cheng Yu
@ 2019-06-06 20:06 ` Yu-cheng Yu
  2019-06-06 20:06 ` [PATCH v7 04/27] x86/fpu/xstate: Introduce XSAVES system states Yu-cheng Yu
                   ` (23 subsequent siblings)
  26 siblings, 0 replies; 70+ messages in thread
From: Yu-cheng Yu @ 2019-06-06 20:06 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin
  Cc: Yu-cheng Yu

Control-flow Enforcement (CET) MSR contents are XSAVES system states.
To support CET, introduce XSAVES system states first.

XSAVES is a "supervisor" instruction and, comparing to XSAVE, saves
additional "supervisor" states that can be modified only from CPL 0.
However, these states are per-task and not kernel's own.  Rename
"supervisor" states to "system" states to clearly separate them from
"user" states.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/include/asm/fpu/internal.h |  4 +-
 arch/x86/include/asm/fpu/xstate.h   | 20 +++----
 arch/x86/kernel/fpu/init.c          |  2 +-
 arch/x86/kernel/fpu/signal.c        | 10 ++--
 arch/x86/kernel/fpu/xstate.c        | 86 ++++++++++++++---------------
 5 files changed, 60 insertions(+), 62 deletions(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 9e27fa05a7ae..bcaa4fc54eb5 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -92,7 +92,7 @@ static inline void fpstate_init_xstate(struct xregs_state *xsave)
 	 * XRSTORS requires these bits set in xcomp_bv, or it will
 	 * trigger #GP:
 	 */
-	xsave->header.xcomp_bv = XCOMP_BV_COMPACTED_FORMAT | xfeatures_mask;
+	xsave->header.xcomp_bv = XCOMP_BV_COMPACTED_FORMAT | xfeatures_mask_user;
 }
 
 static inline void fpstate_init_fxstate(struct fxregs_state *fx)
@@ -225,7 +225,7 @@ static inline void copy_fxregs_to_kernel(struct fpu *fpu)
 
 /*
  * If XSAVES is enabled, it replaces XSAVEOPT because it supports a compact
- * format and supervisor states in addition to modified optimization in
+ * format and system states in addition to modified optimization in
  * XSAVEOPT.
  *
  * Otherwise, if XSAVEOPT is enabled, XSAVEOPT replaces XSAVE because XSAVEOPT
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 7e42b285c856..a1e33ebfb13f 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -25,15 +25,15 @@
 #define XFEATURE_MASK_SUPERVISOR (XFEATURE_MASK_PT)
 
 /* All currently supported features */
-#define XCNTXT_MASK		(XFEATURE_MASK_FP | \
-				 XFEATURE_MASK_SSE | \
-				 XFEATURE_MASK_YMM | \
-				 XFEATURE_MASK_OPMASK | \
-				 XFEATURE_MASK_ZMM_Hi256 | \
-				 XFEATURE_MASK_Hi16_ZMM	 | \
-				 XFEATURE_MASK_PKRU | \
-				 XFEATURE_MASK_BNDREGS | \
-				 XFEATURE_MASK_BNDCSR)
+#define SUPPORTED_XFEATURES_MASK (XFEATURE_MASK_FP | \
+				  XFEATURE_MASK_SSE | \
+				  XFEATURE_MASK_YMM | \
+				  XFEATURE_MASK_OPMASK | \
+				  XFEATURE_MASK_ZMM_Hi256 | \
+				  XFEATURE_MASK_Hi16_ZMM | \
+				  XFEATURE_MASK_PKRU | \
+				  XFEATURE_MASK_BNDREGS | \
+				  XFEATURE_MASK_BNDCSR)
 
 #ifdef CONFIG_X86_64
 #define REX_PREFIX	"0x48, "
@@ -41,7 +41,7 @@
 #define REX_PREFIX
 #endif
 
-extern u64 xfeatures_mask;
+extern u64 xfeatures_mask_user;
 extern u64 xstate_fx_sw_bytes[USER_XSTATE_FX_SW_WORDS];
 
 extern void __init update_regset_xstate_info(unsigned int size,
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index ef0030e3fe6b..3271fd8b0322 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -230,7 +230,7 @@ static void __init fpu__init_system_xstate_size_legacy(void)
  */
 u64 __init fpu__get_supported_xfeatures_mask(void)
 {
-	return XCNTXT_MASK;
+	return SUPPORTED_XFEATURES_MASK;
 }
 
 /* Legacy code to initialize eager fpu mode. */
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 5a8d118bc423..2aecbeaaee25 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -256,11 +256,11 @@ static int copy_user_to_fpregs_zeroing(void __user *buf, u64 xbv, int fx_only)
 {
 	if (use_xsave()) {
 		if (fx_only) {
-			u64 init_bv = xfeatures_mask & ~XFEATURE_MASK_FPSSE;
+			u64 init_bv = xfeatures_mask_user & ~XFEATURE_MASK_FPSSE;
 			copy_kernel_to_xregs(&init_fpstate.xsave, init_bv);
 			return copy_user_to_fxregs(buf);
 		} else {
-			u64 init_bv = xfeatures_mask & ~xbv;
+			u64 init_bv = xfeatures_mask_user & ~xbv;
 			if (unlikely(init_bv))
 				copy_kernel_to_xregs(&init_fpstate.xsave, init_bv);
 			return copy_user_to_xregs(buf, xbv);
@@ -359,7 +359,7 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 
 
 	if (use_xsave() && !fx_only) {
-		u64 init_bv = xfeatures_mask & ~xfeatures;
+		u64 init_bv = xfeatures_mask_user & ~xfeatures;
 
 		if (using_compacted_format()) {
 			ret = copy_user_to_xstate(&fpu->state.xsave, buf_fx);
@@ -390,7 +390,7 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 
 		fpregs_lock();
 		if (use_xsave()) {
-			u64 init_bv = xfeatures_mask & ~XFEATURE_MASK_FPSSE;
+			u64 init_bv = xfeatures_mask_user & ~XFEATURE_MASK_FPSSE;
 			copy_kernel_to_xregs(&init_fpstate.xsave, init_bv);
 		}
 
@@ -464,7 +464,7 @@ void fpu__init_prepare_fx_sw_frame(void)
 
 	fx_sw_reserved.magic1 = FP_XSTATE_MAGIC1;
 	fx_sw_reserved.extended_size = size;
-	fx_sw_reserved.xfeatures = xfeatures_mask;
+	fx_sw_reserved.xfeatures = xfeatures_mask_user;
 	fx_sw_reserved.xstate_size = fpu_user_xstate_size;
 
 	if (IS_ENABLED(CONFIG_IA32_EMULATION) ||
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 3c36dd1784db..d503e1fa15e8 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -52,13 +52,16 @@ static short xsave_cpuid_features[] __initdata = {
 };
 
 /*
- * Mask of xstate features supported by the CPU and the kernel:
+ * XSAVES system states can only be modified from CPL 0 and saved by
+ * XSAVES.  The rest are user states.  The following is a mask of
+ * supported user state features derived from boot_cpu_has() and
+ * SUPPORTED_XFEATURES_MASK.
  */
-u64 xfeatures_mask __read_mostly;
+u64 xfeatures_mask_user __read_mostly;
 
 static unsigned int xstate_offsets[XFEATURE_MAX] = { [ 0 ... XFEATURE_MAX - 1] = -1};
 static unsigned int xstate_sizes[XFEATURE_MAX]   = { [ 0 ... XFEATURE_MAX - 1] = -1};
-static unsigned int xstate_comp_offsets[sizeof(xfeatures_mask)*8];
+static unsigned int xstate_comp_offsets[sizeof(xfeatures_mask_user)*8];
 
 /*
  * The XSAVE area of kernel can be in standard or compacted format;
@@ -83,7 +86,7 @@ void fpu__xstate_clear_all_cpu_caps(void)
  */
 int cpu_has_xfeatures(u64 xfeatures_needed, const char **feature_name)
 {
-	u64 xfeatures_missing = xfeatures_needed & ~xfeatures_mask;
+	u64 xfeatures_missing = xfeatures_needed & ~xfeatures_mask_user;
 
 	if (unlikely(feature_name)) {
 		long xfeature_idx, max_idx;
@@ -114,15 +117,12 @@ int cpu_has_xfeatures(u64 xfeatures_needed, const char **feature_name)
 }
 EXPORT_SYMBOL_GPL(cpu_has_xfeatures);
 
-static int xfeature_is_supervisor(int xfeature_nr)
+static int xfeature_is_system(int xfeature_nr)
 {
 	/*
-	 * We currently do not support supervisor states, but if
-	 * we did, we could find out like this.
-	 *
 	 * SDM says: If state component 'i' is a user state component,
-	 * ECX[0] return 0; if state component i is a supervisor
-	 * state component, ECX[0] returns 1.
+	 * ECX[0] is 0; if state component i is a system state component,
+	 * ECX[0] is 1.
 	 */
 	u32 eax, ebx, ecx, edx;
 
@@ -132,7 +132,7 @@ static int xfeature_is_supervisor(int xfeature_nr)
 
 static int xfeature_is_user(int xfeature_nr)
 {
-	return !xfeature_is_supervisor(xfeature_nr);
+	return !xfeature_is_system(xfeature_nr);
 }
 
 /*
@@ -165,7 +165,7 @@ void fpstate_sanitize_xstate(struct fpu *fpu)
 	 * None of the feature bits are in init state. So nothing else
 	 * to do for us, as the memory layout is up to date.
 	 */
-	if ((xfeatures & xfeatures_mask) == xfeatures_mask)
+	if ((xfeatures & xfeatures_mask_user) == xfeatures_mask_user)
 		return;
 
 	/*
@@ -192,7 +192,7 @@ void fpstate_sanitize_xstate(struct fpu *fpu)
 	 * in a special way already:
 	 */
 	feature_bit = 0x2;
-	xfeatures = (xfeatures_mask & ~xfeatures) >> 2;
+	xfeatures = (xfeatures_mask_user & ~xfeatures) >> 2;
 
 	/*
 	 * Update all the remaining memory layouts according to their
@@ -220,20 +220,18 @@ void fpstate_sanitize_xstate(struct fpu *fpu)
  */
 void fpu__init_cpu_xstate(void)
 {
-	if (!boot_cpu_has(X86_FEATURE_XSAVE) || !xfeatures_mask)
+	if (!boot_cpu_has(X86_FEATURE_XSAVE) || !xfeatures_mask_user)
 		return;
 	/*
-	 * Make it clear that XSAVES supervisor states are not yet
-	 * implemented should anyone expect it to work by changing
-	 * bits in XFEATURE_MASK_* macros and XCR0.
+	 * XCR_XFEATURE_ENABLED_MASK sets the features that are managed
+	 * by XSAVE{C, OPT} and XRSTOR.  Only XSAVE user states can be
+	 * set here.
 	 */
-	WARN_ONCE((xfeatures_mask & XFEATURE_MASK_SUPERVISOR),
-		"x86/fpu: XSAVES supervisor states are not yet implemented.\n");
 
-	xfeatures_mask &= ~XFEATURE_MASK_SUPERVISOR;
+	xfeatures_mask_user &= ~XFEATURE_MASK_SUPERVISOR;
 
 	cr4_set_bits(X86_CR4_OSXSAVE);
-	xsetbv(XCR_XFEATURE_ENABLED_MASK, xfeatures_mask);
+	xsetbv(XCR_XFEATURE_ENABLED_MASK, xfeatures_mask_user);
 }
 
 /*
@@ -243,7 +241,7 @@ void fpu__init_cpu_xstate(void)
  */
 static int xfeature_enabled(enum xfeature xfeature)
 {
-	return !!(xfeatures_mask & (1UL << xfeature));
+	return !!(xfeatures_mask_user & BIT_ULL(xfeature));
 }
 
 /*
@@ -273,7 +271,7 @@ static void __init setup_xstate_features(void)
 		cpuid_count(XSTATE_CPUID, i, &eax, &ebx, &ecx, &edx);
 
 		/*
-		 * If an xfeature is supervisor state, the offset
+		 * If an xfeature is a system state, the offset
 		 * in EBX is invalid. We leave it to -1.
 		 */
 		if (xfeature_is_user(i))
@@ -349,7 +347,7 @@ static int xfeature_is_aligned(int xfeature_nr)
  */
 static void __init setup_xstate_comp(void)
 {
-	unsigned int xstate_comp_sizes[sizeof(xfeatures_mask)*8];
+	unsigned int xstate_comp_sizes[sizeof(xfeatures_mask_user)*8];
 	int i;
 
 	/*
@@ -422,7 +420,7 @@ static void __init setup_init_fpu_buf(void)
 	print_xstate_features();
 
 	if (boot_cpu_has(X86_FEATURE_XSAVES))
-		init_fpstate.xsave.header.xcomp_bv = (u64)1 << 63 | xfeatures_mask;
+		init_fpstate.xsave.header.xcomp_bv = BIT_ULL(63) | xfeatures_mask_user;
 
 	/*
 	 * Init all the features state with header.xfeatures being 0x0
@@ -441,8 +439,8 @@ static int xfeature_uncompacted_offset(int xfeature_nr)
 	u32 eax, ebx, ecx, edx;
 
 	/*
-	 * Only XSAVES supports supervisor states and it uses compacted
-	 * format. Checking a supervisor state's uncompacted offset is
+	 * Only XSAVES supports system states and it uses compacted
+	 * format. Checking a system state's uncompacted offset is
 	 * an error.
 	 */
 	if (XFEATURE_MASK_SUPERVISOR & BIT_ULL(xfeature_nr)) {
@@ -466,7 +464,7 @@ static int xfeature_size(int xfeature_nr)
 
 /*
  * 'XSAVES' implies two different things:
- * 1. saving of supervisor/system state
+ * 1. saving of system state
  * 2. using the compacted format
  *
  * Use this function when dealing with the compacted format so
@@ -481,8 +479,8 @@ int using_compacted_format(void)
 /* Validate an xstate header supplied by userspace (ptrace or sigreturn) */
 int validate_xstate_header(const struct xstate_header *hdr)
 {
-	/* No unknown or supervisor features may be set */
-	if (hdr->xfeatures & (~xfeatures_mask | XFEATURE_MASK_SUPERVISOR))
+	/* No unknown or system features may be set */
+	if (hdr->xfeatures & ~xfeatures_mask_user)
 		return -EINVAL;
 
 	/* Userspace must use the uncompacted format */
@@ -589,11 +587,11 @@ static void do_extra_xstate_size_checks(void)
 
 		check_xstate_against_struct(i);
 		/*
-		 * Supervisor state components can be managed only by
+		 * System state components can be managed only by
 		 * XSAVES, which is compacted-format only.
 		 */
 		if (!using_compacted_format())
-			XSTATE_WARN_ON(xfeature_is_supervisor(i));
+			XSTATE_WARN_ON(xfeature_is_system(i));
 
 		/* Align from the end of the previous feature */
 		if (xfeature_is_aligned(i))
@@ -617,7 +615,7 @@ static void do_extra_xstate_size_checks(void)
 
 
 /*
- * Get total size of enabled xstates in XCR0/xfeatures_mask.
+ * Get total size of enabled xstates in XCR0/xfeatures_mask_user.
  *
  * Note the SDM's wording here.  "sub-function 0" only enumerates
  * the size of the *user* states.  If we use it to size a buffer
@@ -707,7 +705,7 @@ static int __init init_xstate_size(void)
  */
 static void fpu__init_disable_system_xstate(void)
 {
-	xfeatures_mask = 0;
+	xfeatures_mask_user = 0;
 	cr4_clear_bits(X86_CR4_OSXSAVE);
 	fpu__xstate_clear_all_cpu_caps();
 }
@@ -743,15 +741,15 @@ void __init fpu__init_system_xstate(void)
 	}
 
 	cpuid_count(XSTATE_CPUID, 0, &eax, &ebx, &ecx, &edx);
-	xfeatures_mask = eax + ((u64)edx << 32);
+	xfeatures_mask_user = eax + ((u64)edx << 32);
 
-	if ((xfeatures_mask & XFEATURE_MASK_FPSSE) != XFEATURE_MASK_FPSSE) {
+	if ((xfeatures_mask_user & XFEATURE_MASK_FPSSE) != XFEATURE_MASK_FPSSE) {
 		/*
 		 * This indicates that something really unexpected happened
 		 * with the enumeration.  Disable XSAVE and try to continue
 		 * booting without it.  This is too early to BUG().
 		 */
-		pr_err("x86/fpu: FP/SSE not present amongst the CPU's xstate features: 0x%llx.\n", xfeatures_mask);
+		pr_err("x86/fpu: FP/SSE not present amongst the CPU's xstate features: 0x%llx.\n", xfeatures_mask_user);
 		goto out_disable;
 	}
 
@@ -760,10 +758,10 @@ void __init fpu__init_system_xstate(void)
 	 */
 	for (i = 0; i < ARRAY_SIZE(xsave_cpuid_features); i++) {
 		if (!boot_cpu_has(xsave_cpuid_features[i]))
-			xfeatures_mask &= ~BIT(i);
+			xfeatures_mask_user &= ~BIT_ULL(i);
 	}
 
-	xfeatures_mask &= fpu__get_supported_xfeatures_mask();
+	xfeatures_mask_user &= fpu__get_supported_xfeatures_mask();
 
 	/* Enable xstate instructions to be able to continue with initialization: */
 	fpu__init_cpu_xstate();
@@ -773,9 +771,9 @@ void __init fpu__init_system_xstate(void)
 
 	/*
 	 * Update info used for ptrace frames; use standard-format size and no
-	 * supervisor xstates:
+	 * system xstates:
 	 */
-	update_regset_xstate_info(fpu_user_xstate_size,	xfeatures_mask & ~XFEATURE_MASK_SUPERVISOR);
+	update_regset_xstate_info(fpu_user_xstate_size, xfeatures_mask_user & ~XFEATURE_MASK_SUPERVISOR);
 
 	fpu__init_prepare_fx_sw_frame();
 	setup_init_fpu_buf();
@@ -783,7 +781,7 @@ void __init fpu__init_system_xstate(void)
 	print_xstate_offset_size();
 
 	pr_info("x86/fpu: Enabled xstate features 0x%llx, context size is %d bytes, using '%s' format.\n",
-		xfeatures_mask,
+		xfeatures_mask_user,
 		fpu_kernel_xstate_size,
 		boot_cpu_has(X86_FEATURE_XSAVES) ? "compacted" : "standard");
 	return;
@@ -802,7 +800,7 @@ void fpu__resume_cpu(void)
 	 * Restore XCR0 on xsave capable CPUs:
 	 */
 	if (boot_cpu_has(X86_FEATURE_XSAVE))
-		xsetbv(XCR_XFEATURE_ENABLED_MASK, xfeatures_mask);
+		xsetbv(XCR_XFEATURE_ENABLED_MASK, xfeatures_mask_user);
 }
 
 /*
@@ -850,7 +848,7 @@ void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr)
 	 * have not enabled.  Remember that pcntxt_mask is
 	 * what we write to the XCR0 register.
 	 */
-	WARN_ONCE(!(xfeatures_mask & BIT_ULL(xfeature_nr)),
+	WARN_ONCE(!(xfeatures_mask_user & BIT_ULL(xfeature_nr)),
 		  "get of unsupported state");
 	/*
 	 * This assumes the last 'xsave*' instruction to
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v7 04/27] x86/fpu/xstate: Introduce XSAVES system states
  2019-06-06 20:06 [PATCH v7 00/27] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (2 preceding siblings ...)
  2019-06-06 20:06 ` [PATCH v7 03/27] x86/fpu/xstate: Change names to separate XSAVES system and user states Yu-cheng Yu
@ 2019-06-06 20:06 ` Yu-cheng Yu
  2019-06-06 21:18   ` Dave Hansen
  2019-06-06 20:06 ` [PATCH v7 05/27] x86/fpu/xstate: Add XSAVES system states for shadow stack Yu-cheng Yu
                   ` (22 subsequent siblings)
  26 siblings, 1 reply; 70+ messages in thread
From: Yu-cheng Yu @ 2019-06-06 20:06 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin
  Cc: Yu-cheng Yu

Control-flow Enforcement (CET) MSR contents are XSAVES system states.
To support CET, introduce XSAVES system states first.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/include/asm/fpu/internal.h | 21 ++++++-
 arch/x86/include/asm/fpu/xstate.h   |  4 +-
 arch/x86/kernel/fpu/core.c          | 26 +++++++--
 arch/x86/kernel/fpu/init.c          | 10 ----
 arch/x86/kernel/fpu/signal.c        |  4 +-
 arch/x86/kernel/fpu/xstate.c        | 90 +++++++++++++++++++----------
 arch/x86/kernel/process.c           |  2 +-
 arch/x86/kernel/signal.c            |  2 +-
 8 files changed, 104 insertions(+), 55 deletions(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index bcaa4fc54eb5..148a3d8c8c35 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -31,7 +31,8 @@ extern void fpu__save(struct fpu *fpu);
 extern int  fpu__restore_sig(void __user *buf, int ia32_frame);
 extern void fpu__drop(struct fpu *fpu);
 extern int  fpu__copy(struct task_struct *dst, struct task_struct *src);
-extern void fpu__clear(struct fpu *fpu);
+extern void fpu__clear_user_states(struct fpu *fpu);
+extern void fpu__clear_all(struct fpu *fpu);
 extern int  fpu__exception_code(struct fpu *fpu, int trap_nr);
 extern int  dump_fpu(struct pt_regs *ptregs, struct user_i387_struct *fpstate);
 
@@ -44,7 +45,6 @@ extern void fpu__init_cpu_xstate(void);
 extern void fpu__init_system(struct cpuinfo_x86 *c);
 extern void fpu__init_check_bugs(void);
 extern void fpu__resume_cpu(void);
-extern u64 fpu__get_supported_xfeatures_mask(void);
 
 /*
  * Debugging facility:
@@ -92,7 +92,7 @@ static inline void fpstate_init_xstate(struct xregs_state *xsave)
 	 * XRSTORS requires these bits set in xcomp_bv, or it will
 	 * trigger #GP:
 	 */
-	xsave->header.xcomp_bv = XCOMP_BV_COMPACTED_FORMAT | xfeatures_mask_user;
+	xsave->header.xcomp_bv = XCOMP_BV_COMPACTED_FORMAT | xfeatures_mask_all;
 }
 
 static inline void fpstate_init_fxstate(struct fxregs_state *fx)
@@ -615,6 +615,21 @@ static inline void switch_fpu_finish(struct fpu *new_fpu)
 	__write_pkru(pkru_val);
 }
 
+/*
+ * Helpers for changing XSAVES system states.
+ */
+static inline void modify_fpu_regs_begin(void)
+{
+	fpregs_lock();
+	if (test_thread_flag(TIF_NEED_FPU_LOAD))
+		__fpregs_load_activate();
+}
+
+static inline void modify_fpu_regs_end(void)
+{
+	fpregs_unlock();
+}
+
 /*
  * MXCSR and XCR definitions:
  */
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index a1e33ebfb13f..2ec19415c58e 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -21,9 +21,6 @@
 #define XSAVE_YMM_SIZE	    256
 #define XSAVE_YMM_OFFSET    (XSAVE_HDR_SIZE + XSAVE_HDR_OFFSET)
 
-/* Supervisor features */
-#define XFEATURE_MASK_SUPERVISOR (XFEATURE_MASK_PT)
-
 /* All currently supported features */
 #define SUPPORTED_XFEATURES_MASK (XFEATURE_MASK_FP | \
 				  XFEATURE_MASK_SSE | \
@@ -42,6 +39,7 @@
 #endif
 
 extern u64 xfeatures_mask_user;
+extern u64 xfeatures_mask_all;
 extern u64 xstate_fx_sw_bytes[USER_XSTATE_FX_SW_WORDS];
 
 extern void __init update_regset_xstate_info(unsigned int size,
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 466fca686fb9..7a9dcd69580a 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -316,12 +316,16 @@ void fpu__drop(struct fpu *fpu)
  * Clear FPU registers by setting them up from
  * the init fpstate:
  */
-static inline void copy_init_fpstate_to_fpregs(void)
+static inline void copy_init_fpstate_to_fpregs(u64 features_mask)
 {
 	fpregs_lock();
 
+	/*
+	 * Only XSAVES user states are copied.
+	 * System states are preserved.
+	 */
 	if (use_xsave())
-		copy_kernel_to_xregs(&init_fpstate.xsave, -1);
+		copy_kernel_to_xregs(&init_fpstate.xsave, features_mask);
 	else if (static_cpu_has(X86_FEATURE_FXSR))
 		copy_kernel_to_fxregs(&init_fpstate.fxsave);
 	else
@@ -340,7 +344,21 @@ static inline void copy_init_fpstate_to_fpregs(void)
  * Called by sys_execve(), by the signal handler code and by various
  * error paths.
  */
-void fpu__clear(struct fpu *fpu)
+void fpu__clear_user_states(struct fpu *fpu)
+{
+	WARN_ON_FPU(fpu != &current->thread.fpu); /* Almost certainly an anomaly */
+
+	fpu__drop(fpu);
+
+	/*
+	 * Make sure fpstate is cleared and initialized.
+	 */
+	fpu__initialize(fpu);
+	if (static_cpu_has(X86_FEATURE_FPU))
+		copy_init_fpstate_to_fpregs(xfeatures_mask_user);
+}
+
+void fpu__clear_all(struct fpu *fpu)
 {
 	WARN_ON_FPU(fpu != &current->thread.fpu); /* Almost certainly an anomaly */
 
@@ -351,7 +369,7 @@ void fpu__clear(struct fpu *fpu)
 	 */
 	fpu__initialize(fpu);
 	if (static_cpu_has(X86_FEATURE_FPU))
-		copy_init_fpstate_to_fpregs();
+		copy_init_fpstate_to_fpregs(xfeatures_mask_all);
 }
 
 /*
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 3271fd8b0322..d9b2e8ec4e4b 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -223,16 +223,6 @@ static void __init fpu__init_system_xstate_size_legacy(void)
 	fpu_user_xstate_size = fpu_kernel_xstate_size;
 }
 
-/*
- * Find supported xfeatures based on cpu features and command-line input.
- * This must be called after fpu__init_parse_early_param() is called and
- * xfeatures_mask is enumerated.
- */
-u64 __init fpu__get_supported_xfeatures_mask(void)
-{
-	return SUPPORTED_XFEATURES_MASK;
-}
-
 /* Legacy code to initialize eager fpu mode. */
 static void __init fpu__init_system_ctx_switch(void)
 {
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 2aecbeaaee25..e38b272793b1 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -287,7 +287,7 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 			 IS_ENABLED(CONFIG_IA32_EMULATION));
 
 	if (!buf) {
-		fpu__clear(fpu);
+		fpu__clear_user_states(fpu);
 		return 0;
 	}
 
@@ -409,7 +409,7 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 
 err_out:
 	if (ret)
-		fpu__clear(fpu);
+		fpu__clear_user_states(fpu);
 	return ret;
 }
 
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index d503e1fa15e8..6b453455a4f0 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -59,9 +59,19 @@ static short xsave_cpuid_features[] __initdata = {
  */
 u64 xfeatures_mask_user __read_mostly;
 
+/*
+ * Supported XSAVES system states.
+ */
+static u64 xfeatures_mask_system __read_mostly;
+
+/*
+ * Combined XSAVES system and user states.
+ */
+u64 xfeatures_mask_all __read_mostly;
+
 static unsigned int xstate_offsets[XFEATURE_MAX] = { [ 0 ... XFEATURE_MAX - 1] = -1};
 static unsigned int xstate_sizes[XFEATURE_MAX]   = { [ 0 ... XFEATURE_MAX - 1] = -1};
-static unsigned int xstate_comp_offsets[sizeof(xfeatures_mask_user)*8];
+static unsigned int xstate_comp_offsets[sizeof(xfeatures_mask_all)*8];
 
 /*
  * The XSAVE area of kernel can be in standard or compacted format;
@@ -86,7 +96,7 @@ void fpu__xstate_clear_all_cpu_caps(void)
  */
 int cpu_has_xfeatures(u64 xfeatures_needed, const char **feature_name)
 {
-	u64 xfeatures_missing = xfeatures_needed & ~xfeatures_mask_user;
+	u64 xfeatures_missing = xfeatures_needed & ~xfeatures_mask_all;
 
 	if (unlikely(feature_name)) {
 		long xfeature_idx, max_idx;
@@ -165,7 +175,7 @@ void fpstate_sanitize_xstate(struct fpu *fpu)
 	 * None of the feature bits are in init state. So nothing else
 	 * to do for us, as the memory layout is up to date.
 	 */
-	if ((xfeatures & xfeatures_mask_user) == xfeatures_mask_user)
+	if ((xfeatures & xfeatures_mask_all) == xfeatures_mask_all)
 		return;
 
 	/*
@@ -220,28 +230,27 @@ void fpstate_sanitize_xstate(struct fpu *fpu)
  */
 void fpu__init_cpu_xstate(void)
 {
-	if (!boot_cpu_has(X86_FEATURE_XSAVE) || !xfeatures_mask_user)
+	if (!boot_cpu_has(X86_FEATURE_XSAVE) || !xfeatures_mask_all)
 		return;
 	/*
 	 * XCR_XFEATURE_ENABLED_MASK sets the features that are managed
 	 * by XSAVE{C, OPT} and XRSTOR.  Only XSAVE user states can be
 	 * set here.
 	 */
-
-	xfeatures_mask_user &= ~XFEATURE_MASK_SUPERVISOR;
-
 	cr4_set_bits(X86_CR4_OSXSAVE);
 	xsetbv(XCR_XFEATURE_ENABLED_MASK, xfeatures_mask_user);
+
+	/*
+	 * MSR_IA32_XSS controls which system (not user) states are
+	 * to be managed by XSAVES.
+	 */
+	if (boot_cpu_has(X86_FEATURE_XSAVES))
+		wrmsrl(MSR_IA32_XSS, xfeatures_mask_system);
 }
 
-/*
- * Note that in the future we will likely need a pair of
- * functions here: one for user xstates and the other for
- * system xstates.  For now, they are the same.
- */
 static int xfeature_enabled(enum xfeature xfeature)
 {
-	return !!(xfeatures_mask_user & BIT_ULL(xfeature));
+	return !!(xfeatures_mask_all & BIT_ULL(xfeature));
 }
 
 /*
@@ -347,7 +356,7 @@ static int xfeature_is_aligned(int xfeature_nr)
  */
 static void __init setup_xstate_comp(void)
 {
-	unsigned int xstate_comp_sizes[sizeof(xfeatures_mask_user)*8];
+	unsigned int xstate_comp_sizes[sizeof(xfeatures_mask_all)*8];
 	int i;
 
 	/*
@@ -420,7 +429,7 @@ static void __init setup_init_fpu_buf(void)
 	print_xstate_features();
 
 	if (boot_cpu_has(X86_FEATURE_XSAVES))
-		init_fpstate.xsave.header.xcomp_bv = BIT_ULL(63) | xfeatures_mask_user;
+		init_fpstate.xsave.header.xcomp_bv = BIT_ULL(63) | xfeatures_mask_all;
 
 	/*
 	 * Init all the features state with header.xfeatures being 0x0
@@ -443,7 +452,7 @@ static int xfeature_uncompacted_offset(int xfeature_nr)
 	 * format. Checking a system state's uncompacted offset is
 	 * an error.
 	 */
-	if (XFEATURE_MASK_SUPERVISOR & BIT_ULL(xfeature_nr)) {
+	if (~xfeatures_mask_user & BIT_ULL(xfeature_nr)) {
 		WARN_ONCE(1, "No fixed offset for xstate %d\n", xfeature_nr);
 		return -1;
 	}
@@ -615,15 +624,12 @@ static void do_extra_xstate_size_checks(void)
 
 
 /*
- * Get total size of enabled xstates in XCR0/xfeatures_mask_user.
+ * Get total size of enabled xstates in XCR0 | IA32_XSS.
  *
  * Note the SDM's wording here.  "sub-function 0" only enumerates
  * the size of the *user* states.  If we use it to size a buffer
  * that we use 'XSAVES' on, we could potentially overflow the
  * buffer because 'XSAVES' saves system states too.
- *
- * Note that we do not currently set any bits on IA32_XSS so
- * 'XCR0 | IA32_XSS == XCR0' for now.
  */
 static unsigned int __init get_xsaves_size(void)
 {
@@ -705,6 +711,7 @@ static int __init init_xstate_size(void)
  */
 static void fpu__init_disable_system_xstate(void)
 {
+	xfeatures_mask_all = 0;
 	xfeatures_mask_user = 0;
 	cr4_clear_bits(X86_CR4_OSXSAVE);
 	fpu__xstate_clear_all_cpu_caps();
@@ -740,10 +747,23 @@ void __init fpu__init_system_xstate(void)
 		return;
 	}
 
+	/*
+	 * Find user states supported by the processor.
+	 * Only these bits can be set in XCR0.
+	 */
 	cpuid_count(XSTATE_CPUID, 0, &eax, &ebx, &ecx, &edx);
 	xfeatures_mask_user = eax + ((u64)edx << 32);
 
-	if ((xfeatures_mask_user & XFEATURE_MASK_FPSSE) != XFEATURE_MASK_FPSSE) {
+	/*
+	 * Find system states supported by the processor.
+	 * Only these bits can be set in IA32_XSS MSR.
+	 */
+	cpuid_count(XSTATE_CPUID, 1, &eax, &ebx, &ecx, &edx);
+	xfeatures_mask_system = ecx + ((u64)edx << 32);
+
+	xfeatures_mask_all = xfeatures_mask_user | xfeatures_mask_system;
+
+	if ((xfeatures_mask_all & XFEATURE_MASK_FPSSE) != XFEATURE_MASK_FPSSE) {
 		/*
 		 * This indicates that something really unexpected happened
 		 * with the enumeration.  Disable XSAVE and try to continue
@@ -758,10 +778,12 @@ void __init fpu__init_system_xstate(void)
 	 */
 	for (i = 0; i < ARRAY_SIZE(xsave_cpuid_features); i++) {
 		if (!boot_cpu_has(xsave_cpuid_features[i]))
-			xfeatures_mask_user &= ~BIT_ULL(i);
+			xfeatures_mask_all &= ~BIT_ULL(i);
 	}
 
-	xfeatures_mask_user &= fpu__get_supported_xfeatures_mask();
+	xfeatures_mask_all &= SUPPORTED_XFEATURES_MASK;
+	xfeatures_mask_user &= xfeatures_mask_all;
+	xfeatures_mask_system &= xfeatures_mask_all;
 
 	/* Enable xstate instructions to be able to continue with initialization: */
 	fpu__init_cpu_xstate();
@@ -773,7 +795,7 @@ void __init fpu__init_system_xstate(void)
 	 * Update info used for ptrace frames; use standard-format size and no
 	 * system xstates:
 	 */
-	update_regset_xstate_info(fpu_user_xstate_size, xfeatures_mask_user & ~XFEATURE_MASK_SUPERVISOR);
+	update_regset_xstate_info(fpu_user_xstate_size, xfeatures_mask_user);
 
 	fpu__init_prepare_fx_sw_frame();
 	setup_init_fpu_buf();
@@ -781,7 +803,7 @@ void __init fpu__init_system_xstate(void)
 	print_xstate_offset_size();
 
 	pr_info("x86/fpu: Enabled xstate features 0x%llx, context size is %d bytes, using '%s' format.\n",
-		xfeatures_mask_user,
+		xfeatures_mask_all,
 		fpu_kernel_xstate_size,
 		boot_cpu_has(X86_FEATURE_XSAVES) ? "compacted" : "standard");
 	return;
@@ -801,6 +823,12 @@ void fpu__resume_cpu(void)
 	 */
 	if (boot_cpu_has(X86_FEATURE_XSAVE))
 		xsetbv(XCR_XFEATURE_ENABLED_MASK, xfeatures_mask_user);
+
+	/*
+	 * Restore IA32_XSS
+	 */
+	if (boot_cpu_has(X86_FEATURE_XSAVES))
+		wrmsrl(MSR_IA32_XSS, xfeatures_mask_system);
 }
 
 /*
@@ -846,9 +874,9 @@ void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr)
 	/*
 	 * We should not ever be requesting features that we
 	 * have not enabled.  Remember that pcntxt_mask is
-	 * what we write to the XCR0 register.
+	 * what we write to the XCR0 | IA32_XSS registers.
 	 */
-	WARN_ONCE(!(xfeatures_mask_user & BIT_ULL(xfeature_nr)),
+	WARN_ONCE(!(xfeatures_mask_all & BIT_ULL(xfeature_nr)),
 		  "get of unsupported state");
 	/*
 	 * This assumes the last 'xsave*' instruction to
@@ -996,7 +1024,7 @@ int copy_xstate_to_kernel(void *kbuf, struct xregs_state *xsave, unsigned int of
 	 */
 	memset(&header, 0, sizeof(header));
 	header.xfeatures = xsave->header.xfeatures;
-	header.xfeatures &= ~XFEATURE_MASK_SUPERVISOR;
+	header.xfeatures &= xfeatures_mask_user;
 
 	/*
 	 * Copy xregs_state->header:
@@ -1080,7 +1108,7 @@ int copy_xstate_to_user(void __user *ubuf, struct xregs_state *xsave, unsigned i
 	 */
 	memset(&header, 0, sizeof(header));
 	header.xfeatures = xsave->header.xfeatures;
-	header.xfeatures &= ~XFEATURE_MASK_SUPERVISOR;
+	header.xfeatures &= xfeatures_mask_user;
 
 	/*
 	 * Copy xregs_state->header:
@@ -1173,7 +1201,7 @@ int copy_kernel_to_xstate(struct xregs_state *xsave, const void *kbuf)
 	 * The state that came in from userspace was user-state only.
 	 * Mask all the user states out of 'xfeatures':
 	 */
-	xsave->header.xfeatures &= XFEATURE_MASK_SUPERVISOR;
+	xsave->header.xfeatures &= xfeatures_mask_system;
 
 	/*
 	 * Add back in the features that came in from userspace:
@@ -1229,7 +1257,7 @@ int copy_user_to_xstate(struct xregs_state *xsave, const void __user *ubuf)
 	 * The state that came in from userspace was user-state only.
 	 * Mask all the user states out of 'xfeatures':
 	 */
-	xsave->header.xfeatures &= XFEATURE_MASK_SUPERVISOR;
+	xsave->header.xfeatures &= xfeatures_mask_system;
 
 	/*
 	 * Add back in the features that came in from userspace:
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 75fea0d48c0e..d360bf4d696b 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -139,7 +139,7 @@ void flush_thread(void)
 	flush_ptrace_hw_breakpoint(tsk);
 	memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
 
-	fpu__clear(&tsk->thread.fpu);
+	fpu__clear_all(&tsk->thread.fpu);
 }
 
 void disable_TSC(void)
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 364813cea647..3b0dcec597ce 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -763,7 +763,7 @@ handle_signal(struct ksignal *ksig, struct pt_regs *regs)
 		/*
 		 * Ensure the signal handler starts with the new fpu state.
 		 */
-		fpu__clear(fpu);
+		fpu__clear_user_states(fpu);
 	}
 	signal_setup_done(failed, ksig, stepping);
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v7 05/27] x86/fpu/xstate: Add XSAVES system states for shadow stack
  2019-06-06 20:06 [PATCH v7 00/27] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (3 preceding siblings ...)
  2019-06-06 20:06 ` [PATCH v7 04/27] x86/fpu/xstate: Introduce XSAVES system states Yu-cheng Yu
@ 2019-06-06 20:06 ` Yu-cheng Yu
  2019-06-07  7:07   ` Peter Zijlstra
  2019-06-06 20:06 ` [PATCH v7 06/27] x86/cet: Add control protection exception handler Yu-cheng Yu
                   ` (21 subsequent siblings)
  26 siblings, 1 reply; 70+ messages in thread
From: Yu-cheng Yu @ 2019-06-06 20:06 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin
  Cc: Yu-cheng Yu

Intel Control-flow Enforcement Technology (CET) introduces the
following MSRs.

    MSR_IA32_U_CET (user-mode CET settings),
    MSR_IA32_PL3_SSP (user-mode shadow stack),
    MSR_IA32_PL0_SSP (kernel-mode shadow stack),
    MSR_IA32_PL1_SSP (Privilege Level 1 shadow stack),
    MSR_IA32_PL2_SSP (Privilege Level 2 shadow stack).

Introduce them into XSAVES system states.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/include/asm/fpu/types.h            | 22 +++++++++++++++++++++
 arch/x86/include/asm/fpu/xstate.h           |  4 +++-
 arch/x86/include/uapi/asm/processor-flags.h |  2 ++
 arch/x86/kernel/fpu/xstate.c                | 10 ++++++++++
 4 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index f098f6cab94b..d7ef4d9c7ad5 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -114,6 +114,9 @@ enum xfeature {
 	XFEATURE_Hi16_ZMM,
 	XFEATURE_PT_UNIMPLEMENTED_SO_FAR,
 	XFEATURE_PKRU,
+	XFEATURE_RESERVED,
+	XFEATURE_CET_USER,
+	XFEATURE_CET_KERNEL,
 
 	XFEATURE_MAX,
 };
@@ -128,6 +131,8 @@ enum xfeature {
 #define XFEATURE_MASK_Hi16_ZMM		(1 << XFEATURE_Hi16_ZMM)
 #define XFEATURE_MASK_PT		(1 << XFEATURE_PT_UNIMPLEMENTED_SO_FAR)
 #define XFEATURE_MASK_PKRU		(1 << XFEATURE_PKRU)
+#define XFEATURE_MASK_CET_USER		(1 << XFEATURE_CET_USER)
+#define XFEATURE_MASK_CET_KERNEL	(1 << XFEATURE_CET_KERNEL)
 
 #define XFEATURE_MASK_FPSSE		(XFEATURE_MASK_FP | XFEATURE_MASK_SSE)
 #define XFEATURE_MASK_AVX512		(XFEATURE_MASK_OPMASK \
@@ -229,6 +234,23 @@ struct pkru_state {
 	u32				pad;
 } __packed;
 
+/*
+ * State component 11 is Control-flow Enforcement user states
+ */
+struct cet_user_state {
+	u64 user_cet;			/* user control-flow settings */
+	u64 user_ssp;			/* user shadow stack pointer */
+};
+
+/*
+ * State component 12 is Control-flow Enforcement kernel states
+ */
+struct cet_kernel_state {
+	u64 kernel_ssp;			/* kernel shadow stack */
+	u64 pl1_ssp;			/* privilege level 1 shadow stack */
+	u64 pl2_ssp;			/* privilege level 2 shadow stack */
+};
+
 struct xstate_header {
 	u64				xfeatures;
 	u64				xcomp_bv;
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 2ec19415c58e..9ac8a81e851d 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -30,7 +30,9 @@
 				  XFEATURE_MASK_Hi16_ZMM | \
 				  XFEATURE_MASK_PKRU | \
 				  XFEATURE_MASK_BNDREGS | \
-				  XFEATURE_MASK_BNDCSR)
+				  XFEATURE_MASK_BNDCSR | \
+				  XFEATURE_MASK_CET_USER | \
+				  XFEATURE_MASK_CET_KERNEL)
 
 #ifdef CONFIG_X86_64
 #define REX_PREFIX	"0x48, "
diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h
index bcba3c643e63..a8df907e8017 100644
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -130,6 +130,8 @@
 #define X86_CR4_SMAP		_BITUL(X86_CR4_SMAP_BIT)
 #define X86_CR4_PKE_BIT		22 /* enable Protection Keys support */
 #define X86_CR4_PKE		_BITUL(X86_CR4_PKE_BIT)
+#define X86_CR4_CET_BIT		23 /* enable Control-flow Enforcement */
+#define X86_CR4_CET		_BITUL(X86_CR4_CET_BIT)
 
 /*
  * x86-64 Task Priority Register, CR8
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 6b453455a4f0..7f99878111d7 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -36,6 +36,9 @@ static const char *xfeature_names[] =
 	"Processor Trace (unused)"	,
 	"Protection Keys User registers",
 	"unknown xstate feature"	,
+	"Control-flow User registers"	,
+	"Control-flow Kernel registers"	,
+	"unknown xstate feature"	,
 };
 
 static short xsave_cpuid_features[] __initdata = {
@@ -49,6 +52,9 @@ static short xsave_cpuid_features[] __initdata = {
 	X86_FEATURE_AVX512F,
 	X86_FEATURE_INTEL_PT,
 	X86_FEATURE_PKU,
+	0,		   /* Unused */
+	X86_FEATURE_SHSTK, /* XFEATURE_CET_USER */
+	X86_FEATURE_SHSTK, /* XFEATURE_CET_KERNEL */
 };
 
 /*
@@ -320,6 +326,8 @@ static void __init print_xstate_features(void)
 	print_xstate_feature(XFEATURE_MASK_ZMM_Hi256);
 	print_xstate_feature(XFEATURE_MASK_Hi16_ZMM);
 	print_xstate_feature(XFEATURE_MASK_PKRU);
+	print_xstate_feature(XFEATURE_MASK_CET_USER);
+	print_xstate_feature(XFEATURE_MASK_CET_KERNEL);
 }
 
 /*
@@ -566,6 +574,8 @@ static void check_xstate_against_struct(int nr)
 	XCHECK_SZ(sz, nr, XFEATURE_ZMM_Hi256, struct avx_512_zmm_uppers_state);
 	XCHECK_SZ(sz, nr, XFEATURE_Hi16_ZMM,  struct avx_512_hi16_state);
 	XCHECK_SZ(sz, nr, XFEATURE_PKRU,      struct pkru_state);
+	XCHECK_SZ(sz, nr, XFEATURE_CET_USER,   struct cet_user_state);
+	XCHECK_SZ(sz, nr, XFEATURE_CET_KERNEL, struct cet_kernel_state);
 
 	/*
 	 * Make *SURE* to add any feature numbers in below if
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v7 06/27] x86/cet: Add control protection exception handler
  2019-06-06 20:06 [PATCH v7 00/27] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (4 preceding siblings ...)
  2019-06-06 20:06 ` [PATCH v7 05/27] x86/fpu/xstate: Add XSAVES system states for shadow stack Yu-cheng Yu
@ 2019-06-06 20:06 ` Yu-cheng Yu
  2019-06-06 20:06 ` [PATCH v7 07/27] x86/cet/shstk: Add Kconfig option for user-mode shadow stack Yu-cheng Yu
                   ` (20 subsequent siblings)
  26 siblings, 0 replies; 70+ messages in thread
From: Yu-cheng Yu @ 2019-06-06 20:06 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin
  Cc: Yu-cheng Yu

A control protection exception is triggered when a control flow transfer
attempt violated shadow stack or indirect branch tracking constraints.
For example, the return address for a RET instruction differs from the
safe copy on the shadow stack; or a JMP instruction arrives at a non-
ENDBR instruction.

The control protection exception handler works in a similar way as the
general protection fault handler.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/entry/entry_64.S          |  2 +-
 arch/x86/include/asm/traps.h       |  3 ++
 arch/x86/kernel/idt.c              |  4 +++
 arch/x86/kernel/signal_compat.c    |  2 +-
 arch/x86/kernel/traps.c            | 57 ++++++++++++++++++++++++++++++
 include/uapi/asm-generic/siginfo.h |  3 +-
 6 files changed, 68 insertions(+), 3 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 11aa3b2afa4d..595c2efbb893 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -993,7 +993,7 @@ idtentry spurious_interrupt_bug		do_spurious_interrupt_bug	has_error_code=0
 idtentry coprocessor_error		do_coprocessor_error		has_error_code=0
 idtentry alignment_check		do_alignment_check		has_error_code=1
 idtentry simd_coprocessor_error		do_simd_coprocessor_error	has_error_code=0
-
+idtentry control_protection		do_control_protection		has_error_code=1
 
 	/*
 	 * Reload gs selector with exception handling
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 7d6f3f3fad78..5906a22796b6 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -26,6 +26,7 @@ asmlinkage void invalid_TSS(void);
 asmlinkage void segment_not_present(void);
 asmlinkage void stack_segment(void);
 asmlinkage void general_protection(void);
+asmlinkage void control_protection(void);
 asmlinkage void page_fault(void);
 asmlinkage void async_page_fault(void);
 asmlinkage void spurious_interrupt_bug(void);
@@ -81,6 +82,7 @@ struct bad_iret_stack *fixup_bad_iret(struct bad_iret_stack *s);
 void __init trap_init(void);
 #endif
 dotraplinkage void do_general_protection(struct pt_regs *regs, long error_code);
+dotraplinkage void do_control_protection(struct pt_regs *regs, long error_code);
 dotraplinkage void do_page_fault(struct pt_regs *regs, unsigned long error_code);
 dotraplinkage void do_spurious_interrupt_bug(struct pt_regs *regs, long error_code);
 dotraplinkage void do_coprocessor_error(struct pt_regs *regs, long error_code);
@@ -151,6 +153,7 @@ enum {
 	X86_TRAP_AC,		/* 17, Alignment Check */
 	X86_TRAP_MC,		/* 18, Machine Check */
 	X86_TRAP_XF,		/* 19, SIMD Floating-Point Exception */
+	X86_TRAP_CP = 21,	/* 21 Control Protection Fault */
 	X86_TRAP_IRET = 32,	/* 32, IRET Exception */
 };
 
diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
index 6d8917875f44..588848d00fff 100644
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -103,6 +103,10 @@ static const __initconst struct idt_data def_idts[] = {
 #elif defined(CONFIG_X86_32)
 	SYSG(IA32_SYSCALL_VECTOR,	entry_INT80_32),
 #endif
+
+#ifdef CONFIG_X86_64
+	INTG(X86_TRAP_CP,		control_protection),
+#endif
 };
 
 /*
diff --git a/arch/x86/kernel/signal_compat.c b/arch/x86/kernel/signal_compat.c
index 9ccbf0576cd0..c572a3de1037 100644
--- a/arch/x86/kernel/signal_compat.c
+++ b/arch/x86/kernel/signal_compat.c
@@ -27,7 +27,7 @@ static inline void signal_compat_build_tests(void)
 	 */
 	BUILD_BUG_ON(NSIGILL  != 11);
 	BUILD_BUG_ON(NSIGFPE  != 15);
-	BUILD_BUG_ON(NSIGSEGV != 7);
+	BUILD_BUG_ON(NSIGSEGV != 8);
 	BUILD_BUG_ON(NSIGBUS  != 5);
 	BUILD_BUG_ON(NSIGTRAP != 5);
 	BUILD_BUG_ON(NSIGCHLD != 6);
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 8b6d03e55d2f..db143d447bba 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -570,6 +570,63 @@ do_general_protection(struct pt_regs *regs, long error_code)
 }
 NOKPROBE_SYMBOL(do_general_protection);
 
+static const char *control_protection_err[] = {
+	"unknown",
+	"near-ret",
+	"far-ret/iret",
+	"endbranch",
+	"rstorssp",
+	"setssbsy",
+};
+
+/*
+ * When a control protection exception occurs, send a signal
+ * to the responsible application.  Currently, control
+ * protection is only enabled for the user mode.  This
+ * exception should not come from the kernel mode.
+ */
+dotraplinkage void
+do_control_protection(struct pt_regs *regs, long error_code)
+{
+	struct task_struct *tsk;
+
+	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
+	if (notify_die(DIE_TRAP, "control protection fault", regs,
+		       error_code, X86_TRAP_CP, SIGSEGV) == NOTIFY_STOP)
+		return;
+	cond_local_irq_enable(regs);
+
+	if (!user_mode(regs))
+		die("kernel control protection fault", regs, error_code);
+
+	if (!static_cpu_has(X86_FEATURE_SHSTK) &&
+	    !static_cpu_has(X86_FEATURE_IBT))
+		WARN_ONCE(1, "CET is disabled but got control protection fault\n");
+
+	tsk = current;
+	tsk->thread.error_code = error_code;
+	tsk->thread.trap_nr = X86_TRAP_CP;
+
+	if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV) &&
+	    printk_ratelimit()) {
+		unsigned int max_err;
+
+		max_err = ARRAY_SIZE(control_protection_err) - 1;
+		if ((error_code < 0) || (error_code > max_err))
+			error_code = 0;
+		pr_info("%s[%d] control protection ip:%lx sp:%lx error:%lx(%s)",
+			tsk->comm, task_pid_nr(tsk),
+			regs->ip, regs->sp, error_code,
+			control_protection_err[error_code]);
+		print_vma_addr(KERN_CONT " in ", regs->ip);
+		pr_cont("\n");
+	}
+
+	force_sig_fault(SIGSEGV, SEGV_CPERR,
+			(void __user *)uprobe_get_trap_addr(regs), tsk);
+}
+NOKPROBE_SYMBOL(do_control_protection);
+
 dotraplinkage void notrace do_int3(struct pt_regs *regs, long error_code)
 {
 #ifdef CONFIG_DYNAMIC_FTRACE
diff --git a/include/uapi/asm-generic/siginfo.h b/include/uapi/asm-generic/siginfo.h
index cb3d6c267181..693071dbe641 100644
--- a/include/uapi/asm-generic/siginfo.h
+++ b/include/uapi/asm-generic/siginfo.h
@@ -229,7 +229,8 @@ typedef struct siginfo {
 #define SEGV_ACCADI	5	/* ADI not enabled for mapped object */
 #define SEGV_ADIDERR	6	/* Disrupting MCD error */
 #define SEGV_ADIPERR	7	/* Precise MCD exception */
-#define NSIGSEGV	7
+#define SEGV_CPERR	8
+#define NSIGSEGV	8
 
 /*
  * SIGBUS si_codes
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v7 07/27] x86/cet/shstk: Add Kconfig option for user-mode shadow stack
  2019-06-06 20:06 [PATCH v7 00/27] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (5 preceding siblings ...)
  2019-06-06 20:06 ` [PATCH v7 06/27] x86/cet: Add control protection exception handler Yu-cheng Yu
@ 2019-06-06 20:06 ` Yu-cheng Yu
  2019-06-06 20:06 ` [PATCH v7 08/27] mm: Introduce VM_SHSTK for shadow stack memory Yu-cheng Yu
                   ` (19 subsequent siblings)
  26 siblings, 0 replies; 70+ messages in thread
From: Yu-cheng Yu @ 2019-06-06 20:06 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin
  Cc: Yu-cheng Yu

Introduce Kconfig option X86_INTEL_SHADOW_STACK_USER.

An application has shadow stack protection when all the following are
true:

  (1) The kernel has X86_INTEL_SHADOW_STACK_USER enabled,
  (2) The running processor supports the shadow stack,
  (3) The application is built with shadow stack enabled tools & libs
      and, and at runtime, all dependent shared libs can support
      shadow stack.

If this kernel config option is enabled, but (2) or (3) above is not
true, the application runs without the shadow stack protection.
Existing legacy applications will continue to work without the shadow
stack protection.

The user-mode shadow stack protection is only implemented for the
64-bit kernel.  Thirty-two bit applications are supported under the
compatibility mode.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/Kconfig  | 25 +++++++++++++++++++++++++
 arch/x86/Makefile |  7 +++++++
 2 files changed, 32 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2bbbd4d1ba31..1664918c2c1c 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1915,6 +1915,31 @@ config X86_INTEL_MEMORY_PROTECTION_KEYS
 
 	  If unsure, say y.
 
+config X86_INTEL_CET
+	def_bool n
+
+config ARCH_HAS_SHSTK
+	def_bool n
+
+config X86_INTEL_SHADOW_STACK_USER
+	prompt "Intel Shadow Stack for user-mode"
+	def_bool n
+	depends on CPU_SUP_INTEL && X86_64
+	select ARCH_USES_HIGH_VMA_FLAGS
+	select X86_INTEL_CET
+	select ARCH_HAS_SHSTK
+	---help---
+	  Shadow stack provides hardware protection against program stack
+	  corruption.  Only when all the following are true will an application
+	  have the shadow stack protection: the kernel supports it (i.e. this
+	  feature is enabled), the application is compiled and linked with
+	  shadow stack enabled, and the processor supports this feature.
+	  When the kernel has this configuration enabled, existing non shadow
+	  stack applications will continue to work, but without shadow stack
+	  protection.
+
+	  If unsure, say y.
+
 config EFI
 	bool "EFI runtime service support"
 	depends on ACPI
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 56e748a7679f..0b2e9df48907 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -148,6 +148,13 @@ ifdef CONFIG_X86_X32
 endif
 export CONFIG_X86_X32_ABI
 
+# Check assembler shadow stack suppot
+ifdef CONFIG_X86_INTEL_SHADOW_STACK_USER
+  ifeq ($(call as-instr, saveprevssp, y),)
+      $(error CONFIG_X86_INTEL_SHADOW_STACK_USER not supported by the assembler)
+  endif
+endif
+
 #
 # If the function graph tracer is used with mcount instead of fentry,
 # '-maccumulate-outgoing-args' is needed to prevent a GCC bug
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v7 08/27] mm: Introduce VM_SHSTK for shadow stack memory
  2019-06-06 20:06 [PATCH v7 00/27] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (6 preceding siblings ...)
  2019-06-06 20:06 ` [PATCH v7 07/27] x86/cet/shstk: Add Kconfig option for user-mode shadow stack Yu-cheng Yu
@ 2019-06-06 20:06 ` Yu-cheng Yu
  2019-06-06 20:06 ` [PATCH v7 09/27] mm/mmap: Prevent Shadow Stack VMA merges Yu-cheng Yu
                   ` (18 subsequent siblings)
  26 siblings, 0 replies; 70+ messages in thread
From: Yu-cheng Yu @ 2019-06-06 20:06 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin
  Cc: Yu-cheng Yu

VM_SHSTK indicates a shadow stack memory area.
The shadow stack is implemented only for the 64-bit kernel.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 fs/proc/task_mmu.c | 3 +++
 include/linux/mm.h | 8 ++++++++
 2 files changed, 11 insertions(+)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 01d4eb0e6bd1..dd8b0cd1ea36 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -661,6 +661,9 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
 		[ilog2(VM_PKEY_BIT4)]	= "",
 #endif
 #endif /* CONFIG_ARCH_HAS_PKEYS */
+#ifdef CONFIG_X86_INTEL_SHADOW_STACK_USER
+		[ilog2(VM_SHSTK)]	= "ss"
+#endif
 	};
 	size_t i;
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0e8834ac32b7..398f1e1c35e5 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -287,11 +287,13 @@ extern unsigned int kobjsize(const void *objp);
 #define VM_HIGH_ARCH_BIT_2	34	/* bit only usable on 64-bit architectures */
 #define VM_HIGH_ARCH_BIT_3	35	/* bit only usable on 64-bit architectures */
 #define VM_HIGH_ARCH_BIT_4	36	/* bit only usable on 64-bit architectures */
+#define VM_HIGH_ARCH_BIT_5	37	/* bit only usable on 64-bit architectures */
 #define VM_HIGH_ARCH_0	BIT(VM_HIGH_ARCH_BIT_0)
 #define VM_HIGH_ARCH_1	BIT(VM_HIGH_ARCH_BIT_1)
 #define VM_HIGH_ARCH_2	BIT(VM_HIGH_ARCH_BIT_2)
 #define VM_HIGH_ARCH_3	BIT(VM_HIGH_ARCH_BIT_3)
 #define VM_HIGH_ARCH_4	BIT(VM_HIGH_ARCH_BIT_4)
+#define VM_HIGH_ARCH_5	BIT(VM_HIGH_ARCH_BIT_5)
 #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */
 
 #ifdef CONFIG_ARCH_HAS_PKEYS
@@ -329,6 +331,12 @@ extern unsigned int kobjsize(const void *objp);
 # define VM_MPX		VM_NONE
 #endif
 
+#ifdef CONFIG_X86_INTEL_SHADOW_STACK_USER
+# define VM_SHSTK	VM_HIGH_ARCH_5
+#else
+# define VM_SHSTK	VM_NONE
+#endif
+
 #ifndef VM_GROWSUP
 # define VM_GROWSUP	VM_NONE
 #endif
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v7 09/27] mm/mmap: Prevent Shadow Stack VMA merges
  2019-06-06 20:06 [PATCH v7 00/27] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (7 preceding siblings ...)
  2019-06-06 20:06 ` [PATCH v7 08/27] mm: Introduce VM_SHSTK for shadow stack memory Yu-cheng Yu
@ 2019-06-06 20:06 ` Yu-cheng Yu
  2019-06-06 20:06 ` [PATCH v7 10/27] x86/mm: Change _PAGE_DIRTY to _PAGE_DIRTY_HW Yu-cheng Yu
                   ` (17 subsequent siblings)
  26 siblings, 0 replies; 70+ messages in thread
From: Yu-cheng Yu @ 2019-06-06 20:06 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin
  Cc: Yu-cheng Yu

To prevent function call/return spills into the next shadow stack
area, do not merge shadow stack areas.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 mm/mmap.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/mm/mmap.c b/mm/mmap.c
index 7e8c3e8ae75f..b1a921c0de63 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1149,6 +1149,12 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
 	if (vm_flags & VM_SPECIAL)
 		return NULL;
 
+	/*
+	 * Do not merge shadow stack areas.
+	 */
+	if (vm_flags & VM_SHSTK)
+		return NULL;
+
 	if (prev)
 		next = prev->vm_next;
 	else
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v7 10/27] x86/mm: Change _PAGE_DIRTY to _PAGE_DIRTY_HW
  2019-06-06 20:06 [PATCH v7 00/27] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (8 preceding siblings ...)
  2019-06-06 20:06 ` [PATCH v7 09/27] mm/mmap: Prevent Shadow Stack VMA merges Yu-cheng Yu
@ 2019-06-06 20:06 ` Yu-cheng Yu
  2019-06-06 20:06 ` [PATCH v7 11/27] x86/mm: Introduce _PAGE_DIRTY_SW Yu-cheng Yu
                   ` (16 subsequent siblings)
  26 siblings, 0 replies; 70+ messages in thread
From: Yu-cheng Yu @ 2019-06-06 20:06 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin
  Cc: Yu-cheng Yu

Before introducing _PAGE_DIRTY_SW for non-hardware, memory management
purposes in the next patch, rename _PAGE_DIRTY to _PAGE_DIRTY_HW and
_PAGE_BIT_DIRTY to _PAGE_BIT_DIRTY_HW to make these PTE dirty bits
more clear.  There are no functional changes in this patch.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/include/asm/pgtable.h       |  6 +++---
 arch/x86/include/asm/pgtable_types.h | 17 +++++++++--------
 arch/x86/kernel/relocate_kernel_64.S |  2 +-
 arch/x86/kvm/vmx/vmx.c               |  2 +-
 4 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 5e0509b41986..5008dd2817e2 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -332,7 +332,7 @@ static inline pte_t pte_mkexec(pte_t pte)
 
 static inline pte_t pte_mkdirty(pte_t pte)
 {
-	return pte_set_flags(pte, _PAGE_DIRTY | _PAGE_SOFT_DIRTY);
+	return pte_set_flags(pte, _PAGE_DIRTY_HW | _PAGE_SOFT_DIRTY);
 }
 
 static inline pte_t pte_mkyoung(pte_t pte)
@@ -406,7 +406,7 @@ static inline pmd_t pmd_wrprotect(pmd_t pmd)
 
 static inline pmd_t pmd_mkdirty(pmd_t pmd)
 {
-	return pmd_set_flags(pmd, _PAGE_DIRTY | _PAGE_SOFT_DIRTY);
+	return pmd_set_flags(pmd, _PAGE_DIRTY_HW | _PAGE_SOFT_DIRTY);
 }
 
 static inline pmd_t pmd_mkdevmap(pmd_t pmd)
@@ -460,7 +460,7 @@ static inline pud_t pud_wrprotect(pud_t pud)
 
 static inline pud_t pud_mkdirty(pud_t pud)
 {
-	return pud_set_flags(pud, _PAGE_DIRTY | _PAGE_SOFT_DIRTY);
+	return pud_set_flags(pud, _PAGE_DIRTY_HW | _PAGE_SOFT_DIRTY);
 }
 
 static inline pud_t pud_mkdevmap(pud_t pud)
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index d6ff0bbdb394..cebee656a250 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -15,7 +15,7 @@
 #define _PAGE_BIT_PWT		3	/* page write through */
 #define _PAGE_BIT_PCD		4	/* page cache disabled */
 #define _PAGE_BIT_ACCESSED	5	/* was accessed (raised by CPU) */
-#define _PAGE_BIT_DIRTY		6	/* was written to (raised by CPU) */
+#define _PAGE_BIT_DIRTY_HW	6	/* was written to (raised by CPU) */
 #define _PAGE_BIT_PSE		7	/* 4 MB (or 2MB) page */
 #define _PAGE_BIT_PAT		7	/* on 4KB pages */
 #define _PAGE_BIT_GLOBAL	8	/* Global TLB entry PPro+ */
@@ -45,7 +45,7 @@
 #define _PAGE_PWT	(_AT(pteval_t, 1) << _PAGE_BIT_PWT)
 #define _PAGE_PCD	(_AT(pteval_t, 1) << _PAGE_BIT_PCD)
 #define _PAGE_ACCESSED	(_AT(pteval_t, 1) << _PAGE_BIT_ACCESSED)
-#define _PAGE_DIRTY	(_AT(pteval_t, 1) << _PAGE_BIT_DIRTY)
+#define _PAGE_DIRTY_HW	(_AT(pteval_t, 1) << _PAGE_BIT_DIRTY_HW)
 #define _PAGE_PSE	(_AT(pteval_t, 1) << _PAGE_BIT_PSE)
 #define _PAGE_GLOBAL	(_AT(pteval_t, 1) << _PAGE_BIT_GLOBAL)
 #define _PAGE_SOFTW1	(_AT(pteval_t, 1) << _PAGE_BIT_SOFTW1)
@@ -73,7 +73,7 @@
 			 _PAGE_PKEY_BIT3)
 
 #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
-#define _PAGE_KNL_ERRATUM_MASK (_PAGE_DIRTY | _PAGE_ACCESSED)
+#define _PAGE_KNL_ERRATUM_MASK (_PAGE_DIRTY_HW | _PAGE_ACCESSED)
 #else
 #define _PAGE_KNL_ERRATUM_MASK 0
 #endif
@@ -112,9 +112,9 @@
 #define _PAGE_PROTNONE	(_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE)
 
 #define _PAGE_TABLE_NOENC	(_PAGE_PRESENT | _PAGE_RW | _PAGE_USER |\
-				 _PAGE_ACCESSED | _PAGE_DIRTY)
+				 _PAGE_ACCESSED | _PAGE_DIRTY_HW)
 #define _KERNPG_TABLE_NOENC	(_PAGE_PRESENT | _PAGE_RW |		\
-				 _PAGE_ACCESSED | _PAGE_DIRTY)
+				 _PAGE_ACCESSED | _PAGE_DIRTY_HW)
 
 /*
  * Set of bits not changed in pte_modify.  The pte's
@@ -123,7 +123,7 @@
  * pte_modify() does modify it.
  */
 #define _PAGE_CHG_MASK	(PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT |		\
-			 _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY |	\
+			 _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY_HW |	\
 			 _PAGE_SOFT_DIRTY | _PAGE_DEVMAP)
 #define _HPAGE_CHG_MASK (_PAGE_CHG_MASK | _PAGE_PSE)
 
@@ -168,7 +168,8 @@ enum page_cache_mode {
 					 _PAGE_ACCESSED)
 
 #define __PAGE_KERNEL_EXEC						\
-	(_PAGE_PRESENT | _PAGE_RW | _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_GLOBAL)
+	(_PAGE_PRESENT | _PAGE_RW | _PAGE_DIRTY_HW | _PAGE_ACCESSED | \
+	 _PAGE_GLOBAL)
 #define __PAGE_KERNEL		(__PAGE_KERNEL_EXEC | _PAGE_NX)
 
 #define __PAGE_KERNEL_RO		(__PAGE_KERNEL & ~_PAGE_RW)
@@ -187,7 +188,7 @@ enum page_cache_mode {
 #define _PAGE_ENC	(_AT(pteval_t, sme_me_mask))
 
 #define _KERNPG_TABLE	(_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED |	\
-			 _PAGE_DIRTY | _PAGE_ENC)
+			 _PAGE_DIRTY_HW | _PAGE_ENC)
 #define _PAGE_TABLE	(_KERNPG_TABLE | _PAGE_USER)
 
 #define __PAGE_KERNEL_ENC	(__PAGE_KERNEL | _PAGE_ENC)
diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S
index 11eda21eb697..e7665a4767b3 100644
--- a/arch/x86/kernel/relocate_kernel_64.S
+++ b/arch/x86/kernel/relocate_kernel_64.S
@@ -17,7 +17,7 @@
  */
 
 #define PTR(x) (x << 3)
-#define PAGE_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY)
+#define PAGE_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY_HW)
 
 /*
  * control_page + KEXEC_CONTROL_CODE_MAX_SIZE
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b93e36ddee5e..9b3b229b5ef0 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3364,7 +3364,7 @@ static int init_rmode_identity_map(struct kvm *kvm)
 	/* Set up identity-mapping pagetable for EPT in real mode */
 	for (i = 0; i < PT32_ENT_PER_PAGE; i++) {
 		tmp = (i << 22) + (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER |
-			_PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE);
+			_PAGE_ACCESSED | _PAGE_DIRTY_HW | _PAGE_PSE);
 		r = kvm_write_guest_page(kvm, identity_map_pfn,
 				&tmp, i * sizeof(tmp), sizeof(tmp));
 		if (r < 0)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v7 11/27] x86/mm: Introduce _PAGE_DIRTY_SW
  2019-06-06 20:06 [PATCH v7 00/27] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (9 preceding siblings ...)
  2019-06-06 20:06 ` [PATCH v7 10/27] x86/mm: Change _PAGE_DIRTY to _PAGE_DIRTY_HW Yu-cheng Yu
@ 2019-06-06 20:06 ` Yu-cheng Yu
  2019-06-06 20:06 ` [PATCH v7 12/27] drm/i915/gvt: Update _PAGE_DIRTY to _PAGE_DIRTY_BITS Yu-cheng Yu
                   ` (15 subsequent siblings)
  26 siblings, 0 replies; 70+ messages in thread
From: Yu-cheng Yu @ 2019-06-06 20:06 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin
  Cc: Yu-cheng Yu

A RO and dirty PTE exists in the following cases:

(a) A page is modified and then shared with a fork()'ed child;
(b) A R/O page that has been COW'ed;
(c) A SHSTK page.

The processor does not read the dirty bit for (a) and (b), but
checks the dirty bit for (c).  To prevent the use of non-SHSTK
memory as SHSTK, we introduce a spare bit of the 64-bit PTE as
_PAGE_BIT_DIRTY_SW and use that for (a) and (b).  This results
to the following possible PTE settings:

Modified PTE:             (R/W + DIRTY_HW)
Modified and shared PTE:  (R/O + DIRTY_SW)
R/O PTE COW'ed:           (R/O + DIRTY_SW)
SHSTK PTE:                (R/O + DIRTY_HW)
SHSTK PTE COW'ed:         (R/O + DIRTY_HW)
SHSTK PTE shared:         (R/O + DIRTY_SW)

Note that _PAGE_BIT_DRITY_SW is only used in R/O PTEs but
not R/W PTEs.

When this patch is applied, there are six free bits left in
the 64-bit PTE.  There is no more free bit in the 32-bit
PTE (except for PAE) and shadow stack is not implemented
for the 32-bit kernel.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/include/asm/pgtable.h       | 129 ++++++++++++++++++++++-----
 arch/x86/include/asm/pgtable_types.h |  21 ++++-
 2 files changed, 128 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 5008dd2817e2..0a3ce0a94d73 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -120,9 +120,9 @@ extern pmdval_t early_pmd_flags;
  * The following only work if pte_present() is true.
  * Undefined behaviour if not..
  */
-static inline int pte_dirty(pte_t pte)
+static inline bool pte_dirty(pte_t pte)
 {
-	return pte_flags(pte) & _PAGE_DIRTY;
+	return pte_flags(pte) & _PAGE_DIRTY_BITS;
 }
 
 
@@ -159,9 +159,9 @@ static inline int pte_young(pte_t pte)
 	return pte_flags(pte) & _PAGE_ACCESSED;
 }
 
-static inline int pmd_dirty(pmd_t pmd)
+static inline bool pmd_dirty(pmd_t pmd)
 {
-	return pmd_flags(pmd) & _PAGE_DIRTY;
+	return pmd_flags(pmd) & _PAGE_DIRTY_BITS;
 }
 
 static inline int pmd_young(pmd_t pmd)
@@ -169,9 +169,9 @@ static inline int pmd_young(pmd_t pmd)
 	return pmd_flags(pmd) & _PAGE_ACCESSED;
 }
 
-static inline int pud_dirty(pud_t pud)
+static inline bool pud_dirty(pud_t pud)
 {
-	return pud_flags(pud) & _PAGE_DIRTY;
+	return pud_flags(pud) & _PAGE_DIRTY_BITS;
 }
 
 static inline int pud_young(pud_t pud)
@@ -310,9 +310,23 @@ static inline pte_t pte_clear_flags(pte_t pte, pteval_t clear)
 	return native_make_pte(v & ~clear);
 }
 
+#if defined(CONFIG_X86_INTEL_SHADOW_STACK_USER)
+static inline pte_t pte_move_flags(pte_t pte, pteval_t from, pteval_t to)
+{
+	if (pte_flags(pte) & from)
+		pte = pte_set_flags(pte_clear_flags(pte, from), to);
+	return pte;
+}
+#else
+static inline pte_t pte_move_flags(pte_t pte, pteval_t from, pteval_t to)
+{
+	return pte;
+}
+#endif
+
 static inline pte_t pte_mkclean(pte_t pte)
 {
-	return pte_clear_flags(pte, _PAGE_DIRTY);
+	return pte_clear_flags(pte, _PAGE_DIRTY_BITS);
 }
 
 static inline pte_t pte_mkold(pte_t pte)
@@ -322,6 +336,7 @@ static inline pte_t pte_mkold(pte_t pte)
 
 static inline pte_t pte_wrprotect(pte_t pte)
 {
+	pte = pte_move_flags(pte, _PAGE_DIRTY_HW, _PAGE_DIRTY_SW);
 	return pte_clear_flags(pte, _PAGE_RW);
 }
 
@@ -332,9 +347,24 @@ static inline pte_t pte_mkexec(pte_t pte)
 
 static inline pte_t pte_mkdirty(pte_t pte)
 {
+	pteval_t dirty = (!IS_ENABLED(CONFIG_X86_INTEL_SHADOW_STACK_USER) ||
+			   pte_write(pte)) ? _PAGE_DIRTY_HW:_PAGE_DIRTY_SW;
+	return pte_set_flags(pte, dirty | _PAGE_SOFT_DIRTY);
+}
+
+#ifdef CONFIG_ARCH_HAS_SHSTK
+static inline pte_t pte_mkdirty_shstk(pte_t pte)
+{
+	pte = pte_clear_flags(pte, _PAGE_DIRTY_SW);
 	return pte_set_flags(pte, _PAGE_DIRTY_HW | _PAGE_SOFT_DIRTY);
 }
 
+static inline bool pte_dirty_hw(pte_t pte)
+{
+	return pte_flags(pte) & _PAGE_DIRTY_HW;
+}
+#endif
+
 static inline pte_t pte_mkyoung(pte_t pte)
 {
 	return pte_set_flags(pte, _PAGE_ACCESSED);
@@ -342,6 +372,7 @@ static inline pte_t pte_mkyoung(pte_t pte)
 
 static inline pte_t pte_mkwrite(pte_t pte)
 {
+	pte = pte_move_flags(pte, _PAGE_DIRTY_SW, _PAGE_DIRTY_HW);
 	return pte_set_flags(pte, _PAGE_RW);
 }
 
@@ -389,6 +420,20 @@ static inline pmd_t pmd_clear_flags(pmd_t pmd, pmdval_t clear)
 	return native_make_pmd(v & ~clear);
 }
 
+#if defined(CONFIG_X86_INTEL_SHADOW_STACK_USER)
+static inline pmd_t pmd_move_flags(pmd_t pmd, pmdval_t from, pmdval_t to)
+{
+	if (pmd_flags(pmd) & from)
+		pmd = pmd_set_flags(pmd_clear_flags(pmd, from), to);
+	return pmd;
+}
+#else
+static inline pmd_t pmd_move_flags(pmd_t pmd, pmdval_t from, pmdval_t to)
+{
+	return pmd;
+}
+#endif
+
 static inline pmd_t pmd_mkold(pmd_t pmd)
 {
 	return pmd_clear_flags(pmd, _PAGE_ACCESSED);
@@ -396,19 +441,36 @@ static inline pmd_t pmd_mkold(pmd_t pmd)
 
 static inline pmd_t pmd_mkclean(pmd_t pmd)
 {
-	return pmd_clear_flags(pmd, _PAGE_DIRTY);
+	return pmd_clear_flags(pmd, _PAGE_DIRTY_BITS);
 }
 
 static inline pmd_t pmd_wrprotect(pmd_t pmd)
 {
+	pmd = pmd_move_flags(pmd, _PAGE_DIRTY_HW, _PAGE_DIRTY_SW);
 	return pmd_clear_flags(pmd, _PAGE_RW);
 }
 
 static inline pmd_t pmd_mkdirty(pmd_t pmd)
 {
+	pmdval_t dirty = (!IS_ENABLED(CONFIG_X86_INTEL_SHADOW_STACK_USER) ||
+			  (pmd_flags(pmd) & _PAGE_RW)) ?
+			  _PAGE_DIRTY_HW:_PAGE_DIRTY_SW;
+	return pmd_set_flags(pmd, dirty | _PAGE_SOFT_DIRTY);
+}
+
+#ifdef CONFIG_ARCH_HAS_SHSTK
+static inline pmd_t pmd_mkdirty_shstk(pmd_t pmd)
+{
+	pmd = pmd_clear_flags(pmd, _PAGE_DIRTY_SW);
 	return pmd_set_flags(pmd, _PAGE_DIRTY_HW | _PAGE_SOFT_DIRTY);
 }
 
+static inline bool pmd_dirty_hw(pmd_t pmd)
+{
+	return  pmd_flags(pmd) & _PAGE_DIRTY_HW;
+}
+#endif
+
 static inline pmd_t pmd_mkdevmap(pmd_t pmd)
 {
 	return pmd_set_flags(pmd, _PAGE_DEVMAP);
@@ -426,6 +488,7 @@ static inline pmd_t pmd_mkyoung(pmd_t pmd)
 
 static inline pmd_t pmd_mkwrite(pmd_t pmd)
 {
+	pmd = pmd_move_flags(pmd, _PAGE_DIRTY_SW, _PAGE_DIRTY_HW);
 	return pmd_set_flags(pmd, _PAGE_RW);
 }
 
@@ -443,6 +506,20 @@ static inline pud_t pud_clear_flags(pud_t pud, pudval_t clear)
 	return native_make_pud(v & ~clear);
 }
 
+#if defined(CONFIG_X86_INTEL_SHADOW_STACK_USER)
+static inline pud_t pud_move_flags(pud_t pud, pudval_t from, pudval_t to)
+{
+	if (pud_flags(pud) & from)
+		pud = pud_set_flags(pud_clear_flags(pud, from), to);
+	return pud;
+}
+#else
+static inline pud_t pud_move_flags(pud_t pud, pudval_t from, pudval_t to)
+{
+	return pud;
+}
+#endif
+
 static inline pud_t pud_mkold(pud_t pud)
 {
 	return pud_clear_flags(pud, _PAGE_ACCESSED);
@@ -450,17 +527,22 @@ static inline pud_t pud_mkold(pud_t pud)
 
 static inline pud_t pud_mkclean(pud_t pud)
 {
-	return pud_clear_flags(pud, _PAGE_DIRTY);
+	return pud_clear_flags(pud, _PAGE_DIRTY_BITS);
 }
 
 static inline pud_t pud_wrprotect(pud_t pud)
 {
+	pud = pud_move_flags(pud, _PAGE_DIRTY_HW, _PAGE_DIRTY_SW);
 	return pud_clear_flags(pud, _PAGE_RW);
 }
 
 static inline pud_t pud_mkdirty(pud_t pud)
 {
-	return pud_set_flags(pud, _PAGE_DIRTY_HW | _PAGE_SOFT_DIRTY);
+	pudval_t dirty = (!IS_ENABLED(CONFIG_X86_INTEL_SHADOW_STACK_USER) ||
+			  (pud_flags(pud) & _PAGE_RW)) ?
+			  _PAGE_DIRTY_HW:_PAGE_DIRTY_SW;
+
+	return pud_set_flags(pud, dirty | _PAGE_SOFT_DIRTY);
 }
 
 static inline pud_t pud_mkdevmap(pud_t pud)
@@ -480,6 +562,7 @@ static inline pud_t pud_mkyoung(pud_t pud)
 
 static inline pud_t pud_mkwrite(pud_t pud)
 {
+	pud = pud_move_flags(pud, _PAGE_DIRTY_SW, _PAGE_DIRTY_HW);
 	return pud_set_flags(pud, _PAGE_RW);
 }
 
@@ -611,19 +694,12 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 	val &= _PAGE_CHG_MASK;
 	val |= check_pgprot(newprot) & ~_PAGE_CHG_MASK;
 	val = flip_protnone_guard(oldval, val, PTE_PFN_MASK);
+	if ((pte_write(pte) && !(pgprot_val(newprot) & _PAGE_RW)))
+		return pte_move_flags(__pte(val), _PAGE_DIRTY_HW,
+				      _PAGE_DIRTY_SW);
 	return __pte(val);
 }
 
-static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
-{
-	pmdval_t val = pmd_val(pmd), oldval = val;
-
-	val &= _HPAGE_CHG_MASK;
-	val |= check_pgprot(newprot) & ~_HPAGE_CHG_MASK;
-	val = flip_protnone_guard(oldval, val, PHYSICAL_PMD_PAGE_MASK);
-	return __pmd(val);
-}
-
 /* mprotect needs to preserve PAT bits when updating vm_page_prot */
 #define pgprot_modify pgprot_modify
 static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot)
@@ -1178,6 +1254,19 @@ static inline int pmd_write(pmd_t pmd)
 	return pmd_flags(pmd) & _PAGE_RW;
 }
 
+static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
+{
+	pmdval_t val = pmd_val(pmd), oldval = val;
+
+	val &= _HPAGE_CHG_MASK;
+	val |= check_pgprot(newprot) & ~_HPAGE_CHG_MASK;
+	val = flip_protnone_guard(oldval, val, PHYSICAL_PMD_PAGE_MASK);
+	if ((pmd_write(pmd) && !(pgprot_val(newprot) & _PAGE_RW)))
+		return pmd_move_flags(__pmd(val), _PAGE_DIRTY_HW,
+				      _PAGE_DIRTY_SW);
+	return __pmd(val);
+}
+
 #define __HAVE_ARCH_PMDP_HUGE_GET_AND_CLEAR
 static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm, unsigned long addr,
 				       pmd_t *pmdp)
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index cebee656a250..1f04551c55f6 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -23,6 +23,7 @@
 #define _PAGE_BIT_SOFTW2	10	/* " */
 #define _PAGE_BIT_SOFTW3	11	/* " */
 #define _PAGE_BIT_PAT_LARGE	12	/* On 2MB or 1GB pages */
+#define _PAGE_BIT_SOFTW5	57	/* available for programmer */
 #define _PAGE_BIT_SOFTW4	58	/* available for programmer */
 #define _PAGE_BIT_PKEY_BIT0	59	/* Protection Keys, bit 1/4 */
 #define _PAGE_BIT_PKEY_BIT1	60	/* Protection Keys, bit 2/4 */
@@ -34,6 +35,7 @@
 #define _PAGE_BIT_CPA_TEST	_PAGE_BIT_SOFTW1
 #define _PAGE_BIT_SOFT_DIRTY	_PAGE_BIT_SOFTW3 /* software dirty tracking */
 #define _PAGE_BIT_DEVMAP	_PAGE_BIT_SOFTW4
+#define _PAGE_BIT_DIRTY_SW	_PAGE_BIT_SOFTW5 /* was written to */
 
 /* If _PAGE_BIT_PRESENT is clear, we use these: */
 /* - if the user mapped it with PROT_NONE; pte_present gives true */
@@ -109,6 +111,21 @@
 #define _PAGE_DEVMAP	(_AT(pteval_t, 0))
 #endif
 
+/*
+ * _PAGE_DIRTY_HW: set by the processor when a page is written.
+ * _PAGE_DIRTY_SW: a spare bit tracking a written, but now R/O page.
+ *   [R/W + _PAGE_DIRTY_HW] <-> [R/O + _PAGE_DIRTY_SW].
+ * _PAGE_SOFT_DIRTY: a spare bit used to track written pages since a time point
+ * set by the system admin; see Documentation/admin-guide/mm/soft-dirty.rst.
+ */
+#if defined(CONFIG_X86_INTEL_SHADOW_STACK_USER)
+#define _PAGE_DIRTY_SW	(_AT(pteval_t, 1) << _PAGE_BIT_DIRTY_SW)
+#else
+#define _PAGE_DIRTY_SW	(_AT(pteval_t, 0))
+#endif
+
+#define _PAGE_DIRTY_BITS (_PAGE_DIRTY_HW | _PAGE_DIRTY_SW)
+
 #define _PAGE_PROTNONE	(_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE)
 
 #define _PAGE_TABLE_NOENC	(_PAGE_PRESENT | _PAGE_RW | _PAGE_USER |\
@@ -122,9 +139,9 @@
  * instance, and is *not* included in this mask since
  * pte_modify() does modify it.
  */
-#define _PAGE_CHG_MASK	(PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT |		\
+#define _PAGE_CHG_MASK	(PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT |			\
 			 _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY_HW |	\
-			 _PAGE_SOFT_DIRTY | _PAGE_DEVMAP)
+			 _PAGE_DIRTY_SW | _PAGE_SOFT_DIRTY | _PAGE_DEVMAP)
 #define _HPAGE_CHG_MASK (_PAGE_CHG_MASK | _PAGE_PSE)
 
 /*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v7 12/27] drm/i915/gvt: Update _PAGE_DIRTY to _PAGE_DIRTY_BITS
  2019-06-06 20:06 [PATCH v7 00/27] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (10 preceding siblings ...)
  2019-06-06 20:06 ` [PATCH v7 11/27] x86/mm: Introduce _PAGE_DIRTY_SW Yu-cheng Yu
@ 2019-06-06 20:06 ` Yu-cheng Yu
  2019-06-06 20:06 ` [PATCH v7 13/27] x86/mm: Modify ptep_set_wrprotect and pmdp_set_wrprotect for _PAGE_DIRTY_SW Yu-cheng Yu
                   ` (14 subsequent siblings)
  26 siblings, 0 replies; 70+ messages in thread
From: Yu-cheng Yu @ 2019-06-06 20:06 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin
  Cc: Yu-cheng Yu

Update _PAGE_DIRTY to _PAGE_DIRTY_BITS in split_2MB_gtt_entry().

In order to support Control-flow Enforcement (CET), _PAGE_DIRTY is
now _PAGE_DIRTY_HW or _PAGE_DIRTY_SW.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 drivers/gpu/drm/i915/gvt/gtt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index 244ad1729764..44f35880d771 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -1185,7 +1185,7 @@ static int split_2MB_gtt_entry(struct intel_vgpu *vgpu,
 	}
 
 	/* Clear dirty field. */
-	se->val64 &= ~_PAGE_DIRTY;
+	se->val64 &= ~_PAGE_DIRTY_BITS;
 
 	ops->clear_pse(se);
 	ops->clear_ips(se);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v7 13/27] x86/mm: Modify ptep_set_wrprotect and pmdp_set_wrprotect for _PAGE_DIRTY_SW
  2019-06-06 20:06 [PATCH v7 00/27] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (11 preceding siblings ...)
  2019-06-06 20:06 ` [PATCH v7 12/27] drm/i915/gvt: Update _PAGE_DIRTY to _PAGE_DIRTY_BITS Yu-cheng Yu
@ 2019-06-06 20:06 ` Yu-cheng Yu
  2019-06-06 20:06 ` [PATCH v7 14/27] x86/mm: Shadow stack page fault error checking Yu-cheng Yu
                   ` (13 subsequent siblings)
  26 siblings, 0 replies; 70+ messages in thread
From: Yu-cheng Yu @ 2019-06-06 20:06 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin
  Cc: Yu-cheng Yu

When Shadow Stack is enabled, the [R/O + PAGE_DIRTY_HW] setting is
reserved only for the Shadow Stack.  Non-Shadow Stack R/O PTEs use
[R/O + PAGE_DIRTY_SW].

When a PTE goes from [R/W + PAGE_DIRTY_HW] to [R/O + PAGE_DIRTY_SW],
it could become a transient Shadow Stack PTE in two cases.

The first case is that some processors can start a write but end up
seeing a read-only PTE by the time they get to the Dirty bit,
creating a transient Shadow Stack PTE.  However, this will not occur
on processors supporting Shadow Stack therefore we don't need a TLB
flush here.

The second case is that when the software, without atomic, tests &
replaces PAGE_DIRTY_HW with PAGE_DIRTY_SW, a transient Shadow Stack
PTE can exist.  This is prevented with cmpxchg.

Dave Hansen, Jann Horn, Andy Lutomirski, and Peter Zijlstra provided
many insights to the issue.  Jann Horn provided the cmpxchg solution.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/include/asm/pgtable.h | 58 ++++++++++++++++++++++++++++++++++
 1 file changed, 58 insertions(+)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 0a3ce0a94d73..2dd079080be2 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1222,7 +1222,36 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm,
 static inline void ptep_set_wrprotect(struct mm_struct *mm,
 				      unsigned long addr, pte_t *ptep)
 {
+#ifdef CONFIG_X86_INTEL_SHADOW_STACK_USER
+	pte_t new_pte, pte = READ_ONCE(*ptep);
+
+	/*
+	 * Some processors can start a write, but end up
+	 * seeing a read-only PTE by the time they get
+	 * to the Dirty bit.  In this case, they will
+	 * set the Dirty bit, leaving a read-only, Dirty
+	 * PTE which looks like a Shadow Stack PTE.
+	 *
+	 * However, this behavior has been improved and
+	 * will not occur on processors supporting
+	 * Shadow Stacks.  Without this guarantee, a
+	 * transition to a non-present PTE and flush the
+	 * TLB would be needed.
+	 *
+	 * When changing a writable PTE to read-only and
+	 * if the PTE has _PAGE_DIRTY_HW set, we move
+	 * that bit to _PAGE_DIRTY_SW so that the PTE is
+	 * not a valid Shadow Stack PTE.
+	 */
+	do {
+		new_pte = pte_wrprotect(pte);
+		new_pte.pte |= (new_pte.pte & _PAGE_DIRTY_HW) >>
+				_PAGE_BIT_DIRTY_HW << _PAGE_BIT_DIRTY_SW;
+		new_pte.pte &= ~_PAGE_DIRTY_HW;
+	} while (!try_cmpxchg(ptep, &pte, new_pte));
+#else
 	clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte);
+#endif
 }
 
 #define flush_tlb_fix_spurious_fault(vma, address) do { } while (0)
@@ -1285,7 +1314,36 @@ static inline pud_t pudp_huge_get_and_clear(struct mm_struct *mm,
 static inline void pmdp_set_wrprotect(struct mm_struct *mm,
 				      unsigned long addr, pmd_t *pmdp)
 {
+#ifdef CONFIG_X86_INTEL_SHADOW_STACK_USER
+	pmd_t new_pmd, pmd = READ_ONCE(*pmdp);
+
+	/*
+	 * Some processors can start a write, but end up
+	 * seeing a read-only PMD by the time they get
+	 * to the Dirty bit.  In this case, they will
+	 * set the Dirty bit, leaving a read-only, Dirty
+	 * PMD which looks like a Shadow Stack PMD.
+	 *
+	 * However, this behavior has been improved and
+	 * will not occur on processors supporting
+	 * Shadow Stacks.  Without this guarantee, a
+	 * transition to a non-present PMD and flush the
+	 * TLB would be needed.
+	 *
+	 * When changing a writable PMD to read-only and
+	 * if the PMD has _PAGE_DIRTY_HW set, we move
+	 * that bit to _PAGE_DIRTY_SW so that the PMD is
+	 * not a valid Shadow Stack PMD.
+	 */
+	do {
+		new_pmd = pmd_wrprotect(pmd);
+		new_pmd.pmd |= (new_pmd.pmd & _PAGE_DIRTY_HW) >>
+				_PAGE_BIT_DIRTY_HW << _PAGE_BIT_DIRTY_SW;
+		new_pmd.pmd &= ~_PAGE_DIRTY_HW;
+	} while (!try_cmpxchg(pmdp, &pmd, new_pmd));
+#else
 	clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp);
+#endif
 }
 
 #define pud_write pud_write
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v7 14/27] x86/mm: Shadow stack page fault error checking
  2019-06-06 20:06 [PATCH v7 00/27] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (12 preceding siblings ...)
  2019-06-06 20:06 ` [PATCH v7 13/27] x86/mm: Modify ptep_set_wrprotect and pmdp_set_wrprotect for _PAGE_DIRTY_SW Yu-cheng Yu
@ 2019-06-06 20:06 ` Yu-cheng Yu
  2019-06-06 20:06 ` [PATCH v7 15/27] mm: Handle shadow stack page fault Yu-cheng Yu
                   ` (12 subsequent siblings)
  26 siblings, 0 replies; 70+ messages in thread
From: Yu-cheng Yu @ 2019-06-06 20:06 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin
  Cc: Yu-cheng Yu

If a page fault is triggered by a shadow stack access (e.g. call/ret)
or shadow stack management instructions (e.g. wrussq), then bit[6] of
the page fault error code is set.

In access_error(), verify a shadow stack page fault is within a
shadow stack memory area.  It is always an error otherwise.

For a valid shadow stack access, set FAULT_FLAG_WRITE to effect
copy-on-write.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/include/asm/traps.h |  2 ++
 arch/x86/mm/fault.c          | 18 ++++++++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 5906a22796b6..292fb3a1b340 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -166,6 +166,7 @@ enum {
  *   bit 3 ==				1: use of reserved bit detected
  *   bit 4 ==				1: fault was an instruction fetch
  *   bit 5 ==				1: protection keys block access
+ *   bit 6 ==				1: shadow stack access fault
  */
 enum x86_pf_error_code {
 	X86_PF_PROT	=		1 << 0,
@@ -174,5 +175,6 @@ enum x86_pf_error_code {
 	X86_PF_RSVD	=		1 << 3,
 	X86_PF_INSTR	=		1 << 4,
 	X86_PF_PK	=		1 << 5,
+	X86_PF_SHSTK	=		1 << 6,
 };
 #endif /* _ASM_X86_TRAPS_H */
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 46df4c6aae46..59f4f66e4f2e 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1205,6 +1205,17 @@ access_error(unsigned long error_code, struct vm_area_struct *vma)
 				       (error_code & X86_PF_INSTR), foreign))
 		return 1;
 
+	/*
+	 * Verify X86_PF_SHSTK is within a shadow stack VMA.
+	 * It is always an error if there is a shadow stack
+	 * fault outside a shadow stack VMA.
+	 */
+	if (error_code & X86_PF_SHSTK) {
+		if (!(vma->vm_flags & VM_SHSTK))
+			return 1;
+		return 0;
+	}
+
 	if (error_code & X86_PF_WRITE) {
 		/* write, present and write, not present: */
 		if (unlikely(!(vma->vm_flags & VM_WRITE)))
@@ -1362,6 +1373,13 @@ void do_user_addr_fault(struct pt_regs *regs,
 
 	perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
 
+	/*
+	 * If the fault is caused by a shadow stack access,
+	 * i.e. CALL/RET/SAVEPREVSSP/RSTORSSP, then set
+	 * FAULT_FLAG_WRITE to effect copy-on-write.
+	 */
+	if (hw_error_code & X86_PF_SHSTK)
+		flags |= FAULT_FLAG_WRITE;
 	if (hw_error_code & X86_PF_WRITE)
 		flags |= FAULT_FLAG_WRITE;
 	if (hw_error_code & X86_PF_INSTR)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v7 15/27] mm: Handle shadow stack page fault
  2019-06-06 20:06 [PATCH v7 00/27] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (13 preceding siblings ...)
  2019-06-06 20:06 ` [PATCH v7 14/27] x86/mm: Shadow stack page fault error checking Yu-cheng Yu
@ 2019-06-06 20:06 ` Yu-cheng Yu
  2019-06-07  7:30   ` Peter Zijlstra
  2019-06-06 20:06 ` [PATCH v7 16/27] mm: Handle THP/HugeTLB " Yu-cheng Yu
                   ` (11 subsequent siblings)
  26 siblings, 1 reply; 70+ messages in thread
From: Yu-cheng Yu @ 2019-06-06 20:06 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin
  Cc: Yu-cheng Yu

When a task does fork(), its shadow stack (SHSTK) must be duplicated
for the child.  This patch implements a flow similar to copy-on-write
of an anonymous page, but for SHSTK.

A SHSTK PTE must be RO and dirty.  This dirty bit requirement is used
to effect the copying.  In copy_one_pte(), clear the dirty bit from a
SHSTK PTE to cause a page fault upon the next SHSTK access.  At that
time, fix the PTE and copy/re-use the page.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/mm/pgtable.c         | 15 +++++++++++++++
 include/asm-generic/pgtable.h |  8 ++++++++
 mm/memory.c                   |  7 ++++++-
 3 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 1f67b1e15bf6..c2d754a780b3 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -891,3 +891,18 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 
 #endif /* CONFIG_X86_64 */
 #endif	/* CONFIG_HAVE_ARCH_HUGE_VMAP */
+
+#ifdef CONFIG_X86_INTEL_SHADOW_STACK_USER
+inline pte_t pte_set_vma_features(pte_t pte, struct vm_area_struct *vma)
+{
+	if (vma->vm_flags & VM_SHSTK)
+		return pte_mkdirty_shstk(pte);
+	else
+		return pte;
+}
+
+inline bool arch_copy_pte_mapping(vm_flags_t vm_flags)
+{
+	return (vm_flags & VM_SHSTK);
+}
+#endif /* CONFIG_X86_INTEL_SHADOW_STACK_USER */
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 75d9d68a6de7..ffcc0be7cadc 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -1188,4 +1188,12 @@ static inline bool arch_has_pfn_modify_check(void)
 #define mm_pmd_folded(mm)	__is_defined(__PAGETABLE_PMD_FOLDED)
 #endif
 
+#ifndef CONFIG_ARCH_HAS_SHSTK
+#define pte_set_vma_features(pte, vma) pte
+#define arch_copy_pte_mapping(vma_flags) false
+#else
+pte_t pte_set_vma_features(pte_t pte, struct vm_area_struct *vma);
+bool arch_copy_pte_mapping(vm_flags_t vm_flags);
+#endif
+
 #endif /* _ASM_GENERIC_PGTABLE_H */
diff --git a/mm/memory.c b/mm/memory.c
index ddf20bd0c317..51c97294f00f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -777,7 +777,8 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	 * If it's a COW mapping, write protect it both
 	 * in the parent and the child
 	 */
-	if (is_cow_mapping(vm_flags) && pte_write(pte)) {
+	if ((is_cow_mapping(vm_flags) && pte_write(pte)) ||
+	    arch_copy_pte_mapping(vm_flags)) {
 		ptep_set_wrprotect(src_mm, addr, src_pte);
 		pte = pte_wrprotect(pte);
 	}
@@ -2312,6 +2313,7 @@ static inline void wp_page_reuse(struct vm_fault *vmf)
 	flush_cache_page(vma, vmf->address, pte_pfn(vmf->orig_pte));
 	entry = pte_mkyoung(vmf->orig_pte);
 	entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+	entry = pte_set_vma_features(entry, vma);
 	if (ptep_set_access_flags(vma, vmf->address, vmf->pte, entry, 1))
 		update_mmu_cache(vma, vmf->address, vmf->pte);
 	pte_unmap_unlock(vmf->pte, vmf->ptl);
@@ -2387,6 +2389,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
 		flush_cache_page(vma, vmf->address, pte_pfn(vmf->orig_pte));
 		entry = mk_pte(new_page, vma->vm_page_prot);
 		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+		entry = pte_set_vma_features(entry, vma);
 		/*
 		 * Clear the pte entry and flush it first, before updating the
 		 * pte with the new entry. This will avoid a race condition
@@ -2910,6 +2913,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 	pte = mk_pte(page, vma->vm_page_prot);
 	if ((vmf->flags & FAULT_FLAG_WRITE) && reuse_swap_page(page, NULL)) {
 		pte = maybe_mkwrite(pte_mkdirty(pte), vma);
+		pte = pte_set_vma_features(pte, vma);
 		vmf->flags &= ~FAULT_FLAG_WRITE;
 		ret |= VM_FAULT_WRITE;
 		exclusive = RMAP_EXCLUSIVE;
@@ -3052,6 +3056,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
 	entry = mk_pte(page, vma->vm_page_prot);
 	if (vma->vm_flags & VM_WRITE)
 		entry = pte_mkwrite(pte_mkdirty(entry));
+	entry = pte_set_vma_features(entry, vma);
 
 	vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address,
 			&vmf->ptl);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v7 16/27] mm: Handle THP/HugeTLB shadow stack page fault
  2019-06-06 20:06 [PATCH v7 00/27] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (14 preceding siblings ...)
  2019-06-06 20:06 ` [PATCH v7 15/27] mm: Handle shadow stack page fault Yu-cheng Yu
@ 2019-06-06 20:06 ` Yu-cheng Yu
  2019-06-06 20:06 ` [PATCH v7 17/27] mm: Update can_follow_write_pte/pmd for shadow stack Yu-cheng Yu
                   ` (10 subsequent siblings)
  26 siblings, 0 replies; 70+ messages in thread
From: Yu-cheng Yu @ 2019-06-06 20:06 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin
  Cc: Yu-cheng Yu

This patch implements THP shadow stack (SHSTK) copying in the same
way as in the previous patch for regular PTE.

In copy_huge_pmd(), clear the dirty bit from the PMD to cause a page
fault upon the next SHSTK access to the PMD.  At that time, fix the
PMD and copy/re-use the page.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/mm/pgtable.c         | 8 ++++++++
 include/asm-generic/pgtable.h | 2 ++
 mm/huge_memory.c              | 4 ++++
 3 files changed, 14 insertions(+)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index c2d754a780b3..8ff54bd978f3 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -901,6 +901,14 @@ inline pte_t pte_set_vma_features(pte_t pte, struct vm_area_struct *vma)
 		return pte;
 }
 
+inline pmd_t pmd_set_vma_features(pmd_t pmd, struct vm_area_struct *vma)
+{
+	if (vma->vm_flags & VM_SHSTK)
+		return pmd_mkdirty_shstk(pmd);
+	else
+		return pmd;
+}
+
 inline bool arch_copy_pte_mapping(vm_flags_t vm_flags)
 {
 	return (vm_flags & VM_SHSTK);
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index ffcc0be7cadc..4940411b8e1c 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -1190,9 +1190,11 @@ static inline bool arch_has_pfn_modify_check(void)
 
 #ifndef CONFIG_ARCH_HAS_SHSTK
 #define pte_set_vma_features(pte, vma) pte
+#define pmd_set_vma_features(pmd, vma) pmd
 #define arch_copy_pte_mapping(vma_flags) false
 #else
 pte_t pte_set_vma_features(pte_t pte, struct vm_area_struct *vma);
+pmd_t pmd_set_vma_features(pmd_t pmd, struct vm_area_struct *vma);
 bool arch_copy_pte_mapping(vm_flags_t vm_flags);
 #endif
 
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9f8bce9a6b32..eac1ee2f8985 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -608,6 +608,7 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
 
 		entry = mk_huge_pmd(page, vma->vm_page_prot);
 		entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
+		entry = pmd_set_vma_features(entry, vma);
 		page_add_new_anon_rmap(page, vma, haddr, true);
 		mem_cgroup_commit_charge(page, memcg, false, true);
 		lru_cache_add_active_or_unevictable(page, vma);
@@ -1250,6 +1251,7 @@ static vm_fault_t do_huge_pmd_wp_page_fallback(struct vm_fault *vmf,
 		pte_t entry;
 		entry = mk_pte(pages[i], vma->vm_page_prot);
 		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+		entry = pte_set_vma_features(entry, vma);
 		memcg = (void *)page_private(pages[i]);
 		set_page_private(pages[i], 0);
 		page_add_new_anon_rmap(pages[i], vmf->vma, haddr, false);
@@ -1332,6 +1334,7 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd)
 		pmd_t entry;
 		entry = pmd_mkyoung(orig_pmd);
 		entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
+		entry = pmd_set_vma_features(entry, vma);
 		if (pmdp_set_access_flags(vma, haddr, vmf->pmd, entry,  1))
 			update_mmu_cache_pmd(vma, vmf->address, vmf->pmd);
 		ret |= VM_FAULT_WRITE;
@@ -1404,6 +1407,7 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd)
 		pmd_t entry;
 		entry = mk_huge_pmd(new_page, vma->vm_page_prot);
 		entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
+		entry = pmd_set_vma_features(entry, vma);
 		pmdp_huge_clear_flush_notify(vma, haddr, vmf->pmd);
 		page_add_new_anon_rmap(new_page, vma, haddr, true);
 		mem_cgroup_commit_charge(new_page, memcg, false, true);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v7 17/27] mm: Update can_follow_write_pte/pmd for shadow stack
  2019-06-06 20:06 [PATCH v7 00/27] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (15 preceding siblings ...)
  2019-06-06 20:06 ` [PATCH v7 16/27] mm: Handle THP/HugeTLB " Yu-cheng Yu
@ 2019-06-06 20:06 ` Yu-cheng Yu
  2019-06-06 20:06 ` [PATCH v7 18/27] mm: Introduce do_mmap_locked() Yu-cheng Yu
                   ` (9 subsequent siblings)
  26 siblings, 0 replies; 70+ messages in thread
From: Yu-cheng Yu @ 2019-06-06 20:06 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin
  Cc: Yu-cheng Yu

can_follow_write_pte/pmd look for the (RO & DIRTY) PTE/PMD to
verify an exclusive RO page still exists after a broken COW.

A shadow stack PTE is RO & PAGE_DIRTY_SW when it is shared,
otherwise RO & PAGE_DIRTY_HW.

Introduce pte_exclusive() and pmd_exclusive() to also verify a
shadow stack PTE is exclusive.

Also rename can_follow_write_pte/pmd() to can_follow_write() to
make their meaning clear; i.e. "Can we write to the page?", not
"Is the PTE writable?"

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/mm/pgtable.c         | 18 ++++++++++++++++++
 include/asm-generic/pgtable.h |  4 ++++
 mm/gup.c                      |  8 +++++---
 mm/huge_memory.c              |  8 +++++---
 4 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 8ff54bd978f3..2a89c168df7b 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -913,4 +913,22 @@ inline bool arch_copy_pte_mapping(vm_flags_t vm_flags)
 {
 	return (vm_flags & VM_SHSTK);
 }
+
+inline bool pte_exclusive(pte_t pte, struct vm_area_struct *vma)
+{
+	if (vma->vm_flags & VM_SHSTK)
+		return pte_dirty_hw(pte);
+	else
+		return pte_dirty(pte);
+}
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+inline bool pmd_exclusive(pmd_t pmd, struct vm_area_struct *vma)
+{
+	if (vma->vm_flags & VM_SHSTK)
+		return pmd_dirty_hw(pmd);
+	else
+		return pmd_dirty(pmd);
+}
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 #endif /* CONFIG_X86_INTEL_SHADOW_STACK_USER */
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 4940411b8e1c..3324e30bb07f 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -1192,10 +1192,14 @@ static inline bool arch_has_pfn_modify_check(void)
 #define pte_set_vma_features(pte, vma) pte
 #define pmd_set_vma_features(pmd, vma) pmd
 #define arch_copy_pte_mapping(vma_flags) false
+#define pte_exclusive(pte, vma) pte_dirty(pte)
+#define pmd_exclusive(pmd, vma) pmd_dirty(pmd)
 #else
 pte_t pte_set_vma_features(pte_t pte, struct vm_area_struct *vma);
 pmd_t pmd_set_vma_features(pmd_t pmd, struct vm_area_struct *vma);
 bool arch_copy_pte_mapping(vm_flags_t vm_flags);
+bool pte_exclusive(pte_t pte, struct vm_area_struct *vma);
+bool pmd_exclusive(pmd_t pmd, struct vm_area_struct *vma);
 #endif
 
 #endif /* _ASM_GENERIC_PGTABLE_H */
diff --git a/mm/gup.c b/mm/gup.c
index ddde097cf9e4..7d11fff1e8c3 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -178,10 +178,12 @@ static int follow_pfn_pte(struct vm_area_struct *vma, unsigned long address,
  * FOLL_FORCE can write to even unwritable pte's, but only
  * after we've gone through a COW cycle and they are dirty.
  */
-static inline bool can_follow_write_pte(pte_t pte, unsigned int flags)
+static inline bool can_follow_write(pte_t pte, unsigned int flags,
+				    struct vm_area_struct *vma)
 {
 	return pte_write(pte) ||
-		((flags & FOLL_FORCE) && (flags & FOLL_COW) && pte_dirty(pte));
+		((flags & FOLL_FORCE) && (flags & FOLL_COW) &&
+		 pte_exclusive(pte, vma));
 }
 
 static struct page *follow_page_pte(struct vm_area_struct *vma,
@@ -219,7 +221,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
 	}
 	if ((flags & FOLL_NUMA) && pte_protnone(pte))
 		goto no_page;
-	if ((flags & FOLL_WRITE) && !can_follow_write_pte(pte, flags)) {
+	if ((flags & FOLL_WRITE) && !can_follow_write(pte, flags, vma)) {
 		pte_unmap_unlock(ptep, ptl);
 		return NULL;
 	}
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index eac1ee2f8985..d65970b9ece6 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1441,10 +1441,12 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd)
  * FOLL_FORCE can write to even unwritable pmd's, but only
  * after we've gone through a COW cycle and they are dirty.
  */
-static inline bool can_follow_write_pmd(pmd_t pmd, unsigned int flags)
+static inline bool can_follow_write(pmd_t pmd, unsigned int flags,
+				    struct vm_area_struct *vma)
 {
 	return pmd_write(pmd) ||
-	       ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pmd_dirty(pmd));
+	       ((flags & FOLL_FORCE) && (flags & FOLL_COW) &&
+		pmd_exclusive(pmd, vma));
 }
 
 struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
@@ -1457,7 +1459,7 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
 
 	assert_spin_locked(pmd_lockptr(mm, pmd));
 
-	if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, flags))
+	if (flags & FOLL_WRITE && !can_follow_write(*pmd, flags, vma))
 		goto out;
 
 	/* Avoid dumping huge zero page */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v7 18/27] mm: Introduce do_mmap_locked()
  2019-06-06 20:06 [PATCH v7 00/27] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (16 preceding siblings ...)
  2019-06-06 20:06 ` [PATCH v7 17/27] mm: Update can_follow_write_pte/pmd for shadow stack Yu-cheng Yu
@ 2019-06-06 20:06 ` Yu-cheng Yu
  2019-06-07  7:43   ` Peter Zijlstra
  2019-06-06 20:06 ` [PATCH v7 19/27] x86/cet/shstk: User-mode shadow stack support Yu-cheng Yu
                   ` (8 subsequent siblings)
  26 siblings, 1 reply; 70+ messages in thread
From: Yu-cheng Yu @ 2019-06-06 20:06 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin
  Cc: Yu-cheng Yu

There are a few places that need do_mmap() with mm->mmap_sem held.
Create an in-line function for that.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 include/linux/mm.h | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 398f1e1c35e5..7cf014604848 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2411,6 +2411,24 @@ static inline void mm_populate(unsigned long addr, unsigned long len)
 static inline void mm_populate(unsigned long addr, unsigned long len) {}
 #endif
 
+static inline unsigned long do_mmap_locked(unsigned long addr,
+	unsigned long len, unsigned long prot, unsigned long flags,
+	vm_flags_t vm_flags)
+{
+	struct mm_struct *mm = current->mm;
+	unsigned long populate;
+
+	down_write(&mm->mmap_sem);
+	addr = do_mmap(NULL, addr, len, prot, flags, vm_flags, 0,
+		       &populate, NULL);
+	up_write(&mm->mmap_sem);
+
+	if (populate)
+		mm_populate(addr, populate);
+
+	return addr;
+}
+
 /* These take the mm semaphore themselves */
 extern int __must_check vm_brk(unsigned long, unsigned long);
 extern int __must_check vm_brk_flags(unsigned long, unsigned long, unsigned long);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v7 19/27] x86/cet/shstk: User-mode shadow stack support
  2019-06-06 20:06 [PATCH v7 00/27] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (17 preceding siblings ...)
  2019-06-06 20:06 ` [PATCH v7 18/27] mm: Introduce do_mmap_locked() Yu-cheng Yu
@ 2019-06-06 20:06 ` Yu-cheng Yu
  2019-06-06 20:06 ` [PATCH v7 20/27] x86/cet/shstk: Introduce WRUSS instruction Yu-cheng Yu
                   ` (7 subsequent siblings)
  26 siblings, 0 replies; 70+ messages in thread
From: Yu-cheng Yu @ 2019-06-06 20:06 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin
  Cc: Yu-cheng Yu

This patch adds basic shadow stack enabling/disabling routines.
A task's shadow stack is allocated from memory with VM_SHSTK flag set
and read-only protection.  It has a fixed size of RLIMIT_STACK.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/include/asm/cet.h                    |  34 +++++
 arch/x86/include/asm/disabled-features.h      |   8 +-
 arch/x86/include/asm/msr-index.h              |  18 +++
 arch/x86/include/asm/processor.h              |   5 +
 arch/x86/kernel/Makefile                      |   2 +
 arch/x86/kernel/cet.c                         | 116 ++++++++++++++++++
 arch/x86/kernel/cpu/common.c                  |  25 ++++
 arch/x86/kernel/process.c                     |   1 +
 .../arch/x86/include/asm/disabled-features.h  |   8 +-
 9 files changed, 215 insertions(+), 2 deletions(-)
 create mode 100644 arch/x86/include/asm/cet.h
 create mode 100644 arch/x86/kernel/cet.c

diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h
new file mode 100644
index 000000000000..c952a2ec65fe
--- /dev/null
+++ b/arch/x86/include/asm/cet.h
@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_CET_H
+#define _ASM_X86_CET_H
+
+#ifndef __ASSEMBLY__
+#include <linux/types.h>
+
+struct task_struct;
+/*
+ * Per-thread CET status
+ */
+struct cet_status {
+	unsigned long	shstk_base;
+	unsigned long	shstk_size;
+	unsigned int	shstk_enabled:1;
+};
+
+#ifdef CONFIG_X86_INTEL_CET
+int cet_setup_shstk(void);
+void cet_disable_shstk(void);
+void cet_disable_free_shstk(struct task_struct *p);
+#else
+static inline int cet_setup_shstk(void) { return -EINVAL; }
+static inline void cet_disable_shstk(void) {}
+static inline void cet_disable_free_shstk(struct task_struct *p) {}
+#endif
+
+#define cpu_x86_cet_enabled() \
+	(cpu_feature_enabled(X86_FEATURE_SHSTK) || \
+	 cpu_feature_enabled(X86_FEATURE_IBT))
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* _ASM_X86_CET_H */
diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index a5ea841cc6d2..06323ebed643 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -62,6 +62,12 @@
 # define DISABLE_PTI		(1 << (X86_FEATURE_PTI & 31))
 #endif
 
+#ifdef CONFIG_X86_INTEL_SHADOW_STACK_USER
+#define DISABLE_SHSTK	0
+#else
+#define DISABLE_SHSTK	(1<<(X86_FEATURE_SHSTK & 31))
+#endif
+
 /*
  * Make sure to add features to the correct mask
  */
@@ -81,7 +87,7 @@
 #define DISABLED_MASK13	0
 #define DISABLED_MASK14	0
 #define DISABLED_MASK15	0
-#define DISABLED_MASK16	(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP)
+#define DISABLED_MASK16	(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP|DISABLE_SHSTK)
 #define DISABLED_MASK17	0
 #define DISABLED_MASK18	0
 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 19)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 979ef971cc78..30e9107974fa 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -839,4 +839,22 @@
 #define MSR_VM_IGNNE                    0xc0010115
 #define MSR_VM_HSAVE_PA                 0xc0010117
 
+/* Control-flow Enforcement Technology MSRs */
+#define MSR_IA32_U_CET		0x6a0 /* user mode cet setting */
+#define MSR_IA32_S_CET		0x6a2 /* kernel mode cet setting */
+#define MSR_IA32_PL0_SSP	0x6a4 /* kernel shstk pointer */
+#define MSR_IA32_PL1_SSP	0x6a5 /* ring-1 shstk pointer */
+#define MSR_IA32_PL2_SSP	0x6a6 /* ring-2 shstk pointer */
+#define MSR_IA32_PL3_SSP	0x6a7 /* user shstk pointer */
+#define MSR_IA32_INT_SSP_TAB	0x6a8 /* exception shstk table */
+
+/* MSR_IA32_U_CET and MSR_IA32_S_CET bits */
+#define MSR_IA32_CET_SHSTK_EN		0x0000000000000001ULL
+#define MSR_IA32_CET_WRSS_EN		0x0000000000000002ULL
+#define MSR_IA32_CET_ENDBR_EN		0x0000000000000004ULL
+#define MSR_IA32_CET_LEG_IW_EN		0x0000000000000008ULL
+#define MSR_IA32_CET_NO_TRACK_EN	0x0000000000000010ULL
+#define MSR_IA32_CET_WAIT_ENDBR	0x00000000000000800UL
+#define MSR_IA32_CET_BITMAP_MASK	0xfffffffffffff000ULL
+
 #endif /* _ASM_X86_MSR_INDEX_H */
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index c34a35c78618..2ae7c1bf4e43 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -24,6 +24,7 @@ struct vm86;
 #include <asm/special_insns.h>
 #include <asm/fpu/types.h>
 #include <asm/unwind_hints.h>
+#include <asm/cet.h>
 
 #include <linux/personality.h>
 #include <linux/cache.h>
@@ -487,6 +488,10 @@ struct thread_struct {
 	unsigned int		sig_on_uaccess_err:1;
 	unsigned int		uaccess_err:1;	/* uaccess failed */
 
+#ifdef CONFIG_X86_INTEL_CET
+	struct cet_status	cet;
+#endif
+
 	/* Floating point and extended processor state */
 	struct fpu		fpu;
 	/*
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index ce1b5cc360a2..584ed7e9a599 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -140,6 +140,8 @@ obj-$(CONFIG_UNWINDER_ORC)		+= unwind_orc.o
 obj-$(CONFIG_UNWINDER_FRAME_POINTER)	+= unwind_frame.o
 obj-$(CONFIG_UNWINDER_GUESS)		+= unwind_guess.o
 
+obj-$(CONFIG_X86_INTEL_CET)		+= cet.o
+
 ###
 # 64 bit specific files
 ifeq ($(CONFIG_X86_64),y)
diff --git a/arch/x86/kernel/cet.c b/arch/x86/kernel/cet.c
new file mode 100644
index 000000000000..2a48634aa6ce
--- /dev/null
+++ b/arch/x86/kernel/cet.c
@@ -0,0 +1,116 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * cet.c - Control-flow Enforcement (CET)
+ *
+ * Copyright (c) 2018, Intel Corporation.
+ * Yu-cheng Yu <yu-cheng.yu@intel.com>
+ */
+
+#include <linux/types.h>
+#include <linux/mm.h>
+#include <linux/mman.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/sched/signal.h>
+#include <linux/compat.h>
+#include <asm/msr.h>
+#include <asm/user.h>
+#include <asm/fpu/internal.h>
+#include <asm/fpu/xstate.h>
+#include <asm/fpu/types.h>
+#include <asm/cet.h>
+
+static int set_shstk_ptr(unsigned long addr)
+{
+	u64 r;
+
+	if (!cpu_feature_enabled(X86_FEATURE_SHSTK))
+		return -1;
+
+	if ((addr >= TASK_SIZE_MAX) || (!IS_ALIGNED(addr, 4)))
+		return -1;
+
+	modify_fpu_regs_begin();
+	rdmsrl(MSR_IA32_U_CET, r);
+	wrmsrl(MSR_IA32_PL3_SSP, addr);
+	wrmsrl(MSR_IA32_U_CET, r | MSR_IA32_CET_SHSTK_EN);
+	modify_fpu_regs_end();
+	return 0;
+}
+
+static unsigned long get_shstk_addr(void)
+{
+	unsigned long ptr;
+
+	if (!current->thread.cet.shstk_enabled)
+		return 0;
+
+	modify_fpu_regs_begin();
+	rdmsrl(MSR_IA32_PL3_SSP, ptr);
+	modify_fpu_regs_end();
+	return ptr;
+}
+
+int cet_setup_shstk(void)
+{
+	unsigned long addr, size;
+
+	if (!cpu_feature_enabled(X86_FEATURE_SHSTK))
+		return -EOPNOTSUPP;
+
+	size = rlimit(RLIMIT_STACK);
+	addr = do_mmap_locked(0, size, PROT_READ,
+			      MAP_ANONYMOUS | MAP_PRIVATE, VM_SHSTK);
+
+	/*
+	 * Return actual error from do_mmap().
+	 */
+	if (addr >= TASK_SIZE_MAX)
+		return addr;
+
+	set_shstk_ptr(addr + size - sizeof(u64));
+	current->thread.cet.shstk_base = addr;
+	current->thread.cet.shstk_size = size;
+	current->thread.cet.shstk_enabled = 1;
+	return 0;
+}
+
+void cet_disable_shstk(void)
+{
+	u64 r;
+
+	if (!cpu_feature_enabled(X86_FEATURE_SHSTK))
+		return;
+
+	modify_fpu_regs_begin();
+	rdmsrl(MSR_IA32_U_CET, r);
+	r &= ~(MSR_IA32_CET_SHSTK_EN);
+	wrmsrl(MSR_IA32_U_CET, r);
+	wrmsrl(MSR_IA32_PL3_SSP, 0);
+	modify_fpu_regs_end();
+	current->thread.cet.shstk_enabled = 0;
+}
+
+void cet_disable_free_shstk(struct task_struct *tsk)
+{
+	if (!cpu_feature_enabled(X86_FEATURE_SHSTK) ||
+	    !tsk->thread.cet.shstk_enabled)
+		return;
+
+	if (tsk->mm && (tsk == current))
+		cet_disable_shstk();
+
+	/*
+	 * Free only when tsk is current or shares mm
+	 * with current but has its own shstk.
+	 */
+	if (tsk->mm && (tsk->mm == current->mm) &&
+	    (tsk->thread.cet.shstk_base)) {
+		vm_munmap(tsk->thread.cet.shstk_base,
+			  tsk->thread.cet.shstk_size);
+		tsk->thread.cet.shstk_base = 0;
+		tsk->thread.cet.shstk_size = 0;
+	}
+
+	tsk->thread.cet.shstk_enabled = 0;
+}
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 2c57fffebf9b..b0780fe8717e 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -53,6 +53,7 @@
 #include <asm/microcode_intel.h>
 #include <asm/intel-family.h>
 #include <asm/cpu_device_id.h>
+#include <asm/cet.h>
 
 #ifdef CONFIG_X86_LOCAL_APIC
 #include <asm/uv/uv.h>
@@ -417,6 +418,29 @@ static __init int setup_disable_pku(char *arg)
 __setup("nopku", setup_disable_pku);
 #endif /* CONFIG_X86_64 */
 
+static __always_inline void setup_cet(struct cpuinfo_x86 *c)
+{
+	if (cpu_x86_cet_enabled())
+		cr4_set_bits(X86_CR4_CET);
+}
+
+#ifdef CONFIG_X86_INTEL_SHADOW_STACK_USER
+static __init int setup_disable_shstk(char *s)
+{
+	/* require an exact match without trailing characters */
+	if (s[0] != '\0')
+		return 0;
+
+	if (!boot_cpu_has(X86_FEATURE_SHSTK))
+		return 1;
+
+	setup_clear_cpu_cap(X86_FEATURE_SHSTK);
+	pr_info("x86: 'no_cet_shstk' specified, disabling Shadow Stack\n");
+	return 1;
+}
+__setup("no_cet_shstk", setup_disable_shstk);
+#endif
+
 /*
  * Some CPU features depend on higher CPUID levels, which may not always
  * be available due to CPUID level capping or broken virtualization
@@ -1393,6 +1417,7 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	x86_init_rdrand(c);
 	x86_init_cache_qos(c);
 	setup_pku(c);
+	setup_cet(c);
 
 	/*
 	 * Clear/Set all flags overridden by options, need do it
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index d360bf4d696b..a4deb79b1089 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -42,6 +42,7 @@
 #include <asm/prctl.h>
 #include <asm/spec-ctrl.h>
 #include <asm/proto.h>
+#include <asm/cet.h>
 
 #include "process.h"
 
diff --git a/tools/arch/x86/include/asm/disabled-features.h b/tools/arch/x86/include/asm/disabled-features.h
index a5ea841cc6d2..06323ebed643 100644
--- a/tools/arch/x86/include/asm/disabled-features.h
+++ b/tools/arch/x86/include/asm/disabled-features.h
@@ -62,6 +62,12 @@
 # define DISABLE_PTI		(1 << (X86_FEATURE_PTI & 31))
 #endif
 
+#ifdef CONFIG_X86_INTEL_SHADOW_STACK_USER
+#define DISABLE_SHSTK	0
+#else
+#define DISABLE_SHSTK	(1<<(X86_FEATURE_SHSTK & 31))
+#endif
+
 /*
  * Make sure to add features to the correct mask
  */
@@ -81,7 +87,7 @@
 #define DISABLED_MASK13	0
 #define DISABLED_MASK14	0
 #define DISABLED_MASK15	0
-#define DISABLED_MASK16	(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP)
+#define DISABLED_MASK16	(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP|DISABLE_SHSTK)
 #define DISABLED_MASK17	0
 #define DISABLED_MASK18	0
 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 19)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v7 20/27] x86/cet/shstk: Introduce WRUSS instruction
  2019-06-06 20:06 [PATCH v7 00/27] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (18 preceding siblings ...)
  2019-06-06 20:06 ` [PATCH v7 19/27] x86/cet/shstk: User-mode shadow stack support Yu-cheng Yu
@ 2019-06-06 20:06 ` Yu-cheng Yu
  2019-06-06 20:06 ` [PATCH v7 21/27] x86/cet/shstk: Handle signals for shadow stack Yu-cheng Yu
                   ` (6 subsequent siblings)
  26 siblings, 0 replies; 70+ messages in thread
From: Yu-cheng Yu @ 2019-06-06 20:06 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin
  Cc: Yu-cheng Yu

WRUSS is a new kernel-mode instruction but writes directly to user
shadow stack memory.  This is used to construct a return address on
the shadow stack for the signal handler.

This instruction can fault if the user shadow stack is invalid shadow
stack memory.  In that case, the kernel does a fixup.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/include/asm/special_insns.h | 32 ++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index 0a3c4cab39db..99530ab0a077 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -250,6 +250,38 @@ static inline void clwb(volatile void *__p)
 		: [pax] "a" (p));
 }
 
+#ifdef CONFIG_X86_INTEL_CET
+#if defined(CONFIG_IA32_EMULATION) || defined(CONFIG_X86_X32)
+static inline int write_user_shstk_32(unsigned long addr, unsigned int val)
+{
+	asm_volatile_goto("1: wrussd %1, (%0)\n"
+			  _ASM_EXTABLE(1b, %l[fail])
+			  :: "r" (addr), "r" (val)
+			  :: fail);
+	return 0;
+fail:
+	return -EPERM;
+}
+#else
+static inline int write_user_shstk_32(unsigned long addr, unsigned int val)
+{
+	WARN_ONCE(1, "%s used but not supported.\n", __func__);
+	return -EFAULT;
+}
+#endif
+
+static inline int write_user_shstk_64(unsigned long addr, unsigned long val)
+{
+	asm_volatile_goto("1: wrussq %1, (%0)\n"
+			  _ASM_EXTABLE(1b, %l[fail])
+			  :: "r" (addr), "r" (val)
+			  :: fail);
+	return 0;
+fail:
+	return -EPERM;
+}
+#endif /* CONFIG_X86_INTEL_CET */
+
 #define nop() asm volatile ("nop")
 
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v7 21/27] x86/cet/shstk: Handle signals for shadow stack
  2019-06-06 20:06 [PATCH v7 00/27] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (19 preceding siblings ...)
  2019-06-06 20:06 ` [PATCH v7 20/27] x86/cet/shstk: Introduce WRUSS instruction Yu-cheng Yu
@ 2019-06-06 20:06 ` Yu-cheng Yu
  2019-06-06 20:06 ` [PATCH v7 22/27] binfmt_elf: Extract .note.gnu.property from an ELF file Yu-cheng Yu
                   ` (5 subsequent siblings)
  26 siblings, 0 replies; 70+ messages in thread
From: Yu-cheng Yu @ 2019-06-06 20:06 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin
  Cc: Yu-cheng Yu

When setting up a signal, the kernel creates a shadow stack restore
token at the current SHSTK address and then stores the token's
address in the signal frame, right after the FPU state.  Before
restoring a signal, the kernel verifies and then uses the restore
token to set the SHSTK pointer.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/ia32/ia32_signal.c            |   8 ++
 arch/x86/include/asm/cet.h             |   7 ++
 arch/x86/include/asm/fpu/internal.h    |   2 +
 arch/x86/include/asm/fpu/signal.h      |   2 +
 arch/x86/include/uapi/asm/sigcontext.h |  15 +++
 arch/x86/kernel/cet.c                  | 141 +++++++++++++++++++++++++
 arch/x86/kernel/fpu/signal.c           |  67 ++++++++++++
 arch/x86/kernel/signal.c               |   8 ++
 8 files changed, 250 insertions(+)

diff --git a/arch/x86/ia32/ia32_signal.c b/arch/x86/ia32/ia32_signal.c
index 629d1ee05599..5216ca782262 100644
--- a/arch/x86/ia32/ia32_signal.c
+++ b/arch/x86/ia32/ia32_signal.c
@@ -34,6 +34,7 @@
 #include <asm/sigframe.h>
 #include <asm/sighandling.h>
 #include <asm/smap.h>
+#include <asm/cet.h>
 
 /*
  * Do a signal return; undo the signal stack.
@@ -235,6 +236,7 @@ static void __user *get_sigframe(struct ksignal *ksig, struct pt_regs *regs,
 		 ksig->ka.sa.sa_restorer)
 		sp = (unsigned long) ksig->ka.sa.sa_restorer;
 
+	sp = fpu__alloc_sigcontext_ext(sp);
 	sp = fpu__alloc_mathframe(sp, 1, &fx_aligned, &math_size);
 	*fpstate = (struct _fpstate_32 __user *) sp;
 	if (copy_fpstate_to_sigframe(*fpstate, (void __user *)fx_aligned,
@@ -295,6 +297,9 @@ int ia32_setup_frame(int sig, struct ksignal *ksig,
 			restorer = &frame->retcode;
 	}
 
+	if (setup_fpu_system_states(1, (unsigned long)restorer, fpstate))
+		return -EFAULT;
+
 	put_user_try {
 		put_user_ex(ptr_to_compat(restorer), &frame->pretcode);
 
@@ -384,6 +389,9 @@ int ia32_setup_rt_frame(int sig, struct ksignal *ksig,
 				     regs, set->sig[0]);
 	err |= __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set));
 
+	if (!err)
+		err = setup_fpu_system_states(1, (unsigned long)restorer, fpstate);
+
 	if (err)
 		return -EFAULT;
 
diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h
index c952a2ec65fe..422ccb8adbb7 100644
--- a/arch/x86/include/asm/cet.h
+++ b/arch/x86/include/asm/cet.h
@@ -6,6 +6,8 @@
 #include <linux/types.h>
 
 struct task_struct;
+struct sc_ext;
+
 /*
  * Per-thread CET status
  */
@@ -19,10 +21,15 @@ struct cet_status {
 int cet_setup_shstk(void);
 void cet_disable_shstk(void);
 void cet_disable_free_shstk(struct task_struct *p);
+int cet_restore_signal(bool ia32, struct sc_ext *sc);
+int cet_setup_signal(bool ia32, unsigned long rstor, struct sc_ext *sc);
 #else
 static inline int cet_setup_shstk(void) { return -EINVAL; }
 static inline void cet_disable_shstk(void) {}
 static inline void cet_disable_free_shstk(struct task_struct *p) {}
+static inline int cet_restore_signal(bool ia32, struct sc_ext *sc) { return -EINVAL; }
+static inline int cet_setup_signal(bool ia32, unsigned long rstor,
+				   struct sc_ext *sc) { return -EINVAL; }
 #endif
 
 #define cpu_x86_cet_enabled() \
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 148a3d8c8c35..7db8c19c9072 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -472,6 +472,8 @@ static inline void copy_kernel_to_fpregs(union fpregs_state *fpstate)
 	__copy_kernel_to_fpregs(fpstate, -1);
 }
 
+extern int setup_fpu_system_states(int is_ia32, unsigned long restorer,
+				   void __user *fp);
 extern int copy_fpstate_to_sigframe(void __user *buf, void __user *fp, int size);
 
 /*
diff --git a/arch/x86/include/asm/fpu/signal.h b/arch/x86/include/asm/fpu/signal.h
index 7fb516b6893a..630a658aeea3 100644
--- a/arch/x86/include/asm/fpu/signal.h
+++ b/arch/x86/include/asm/fpu/signal.h
@@ -25,6 +25,8 @@ extern void convert_from_fxsr(struct user_i387_ia32_struct *env,
 extern void convert_to_fxsr(struct fxregs_state *fxsave,
 			    const struct user_i387_ia32_struct *env);
 
+unsigned long fpu__alloc_sigcontext_ext(unsigned long sp);
+
 unsigned long
 fpu__alloc_mathframe(unsigned long sp, int ia32_frame,
 		     unsigned long *buf_fx, unsigned long *size);
diff --git a/arch/x86/include/uapi/asm/sigcontext.h b/arch/x86/include/uapi/asm/sigcontext.h
index 844d60eb1882..e3b08d1c0d3b 100644
--- a/arch/x86/include/uapi/asm/sigcontext.h
+++ b/arch/x86/include/uapi/asm/sigcontext.h
@@ -196,6 +196,21 @@ struct _xstate {
 	/* New processor state extensions go here: */
 };
 
+/*
+ * Sigcontext extension (struct sc_ext) is located after
+ * sigcontext->fpstate.  Because currently only the shadow
+ * stack pointer is saved there and the shadow stack depends
+ * on XSAVES, we can find sc_ext from sigcontext->fpstate.
+ *
+ * The 64-bit fpstate has a size of fpu_user_xstate_size, plus
+ * FP_XSTATE_MAGIC2_SIZE when XSAVE* is used.  The struct sc_ext
+ * is located at the end of sigcontext->fpstate, aligned to 8.
+ */
+struct sc_ext {
+	unsigned long total_size;
+	unsigned long ssp;
+};
+
 /*
  * The 32-bit signal frame:
  */
diff --git a/arch/x86/kernel/cet.c b/arch/x86/kernel/cet.c
index 2a48634aa6ce..b247cd15c1e2 100644
--- a/arch/x86/kernel/cet.c
+++ b/arch/x86/kernel/cet.c
@@ -19,6 +19,8 @@
 #include <asm/fpu/xstate.h>
 #include <asm/fpu/types.h>
 #include <asm/cet.h>
+#include <asm/special_insns.h>
+#include <uapi/asm/sigcontext.h>
 
 static int set_shstk_ptr(unsigned long addr)
 {
@@ -51,6 +53,80 @@ static unsigned long get_shstk_addr(void)
 	return ptr;
 }
 
+#define TOKEN_MODE_MASK	3UL
+#define TOKEN_MODE_64	1UL
+#define IS_TOKEN_64(token) ((token & TOKEN_MODE_MASK) == TOKEN_MODE_64)
+#define IS_TOKEN_32(token) ((token & TOKEN_MODE_MASK) == 0)
+
+/*
+ * Verify the restore token at the address of 'ssp' is
+ * valid and then set shadow stack pointer according to the
+ * token.
+ */
+static int verify_rstor_token(bool ia32, unsigned long ssp,
+			      unsigned long *new_ssp)
+{
+	unsigned long token;
+
+	*new_ssp = 0;
+
+	if (!IS_ALIGNED(ssp, 8))
+		return -EINVAL;
+
+	if (get_user(token, (unsigned long __user *)ssp))
+		return -EFAULT;
+
+	/* Is 64-bit mode flag correct? */
+	if (!ia32 && !IS_TOKEN_64(token))
+		return -EINVAL;
+	else if (ia32 && !IS_TOKEN_32(token))
+		return -EINVAL;
+
+	token &= ~TOKEN_MODE_MASK;
+
+	/*
+	 * Restore address properly aligned?
+	 */
+	if ((!ia32 && !IS_ALIGNED(token, 8)) || !IS_ALIGNED(token, 4))
+		return -EINVAL;
+
+	/*
+	 * Token was placed properly?
+	 */
+	if ((ALIGN_DOWN(token, 8) - 8) != ssp)
+		return -EINVAL;
+
+	*new_ssp = token;
+	return 0;
+}
+
+/*
+ * Create a restore token on the shadow stack.
+ * A token is always 8-byte and aligned to 8.
+ */
+static int create_rstor_token(bool ia32, unsigned long ssp,
+			      unsigned long *new_ssp)
+{
+	unsigned long addr;
+
+	*new_ssp = 0;
+
+	if ((!ia32 && !IS_ALIGNED(ssp, 8)) || !IS_ALIGNED(ssp, 4))
+		return -EINVAL;
+
+	addr = ALIGN_DOWN(ssp, 8) - 8;
+
+	/* Is the token for 64-bit? */
+	if (!ia32)
+		ssp |= TOKEN_MODE_64;
+
+	if (write_user_shstk_64(addr, ssp))
+		return -EFAULT;
+
+	*new_ssp = addr;
+	return 0;
+}
+
 int cet_setup_shstk(void)
 {
 	unsigned long addr, size;
@@ -114,3 +190,68 @@ void cet_disable_free_shstk(struct task_struct *tsk)
 
 	tsk->thread.cet.shstk_enabled = 0;
 }
+
+/*
+ * Called from __fpu__restore_sig() under the protection
+ * of fpregs_lock().
+ */
+int cet_restore_signal(bool ia32, struct sc_ext *sc_ext)
+{
+	unsigned long new_ssp = 0;
+	u64 msr_ia32_u_cet = 0;
+	int err;
+
+	if (current->thread.cet.shstk_enabled) {
+		err = verify_rstor_token(ia32, sc_ext->ssp, &new_ssp);
+		if (err)
+			return err;
+
+		msr_ia32_u_cet |= MSR_IA32_CET_SHSTK_EN;
+	}
+
+	wrmsrl(MSR_IA32_PL3_SSP, new_ssp);
+	wrmsrl(MSR_IA32_U_CET, msr_ia32_u_cet);
+	return 0;
+}
+
+/*
+ * Setup the shadow stack for the signal handler: first,
+ * create a restore token to keep track of the current ssp,
+ * and then the return address of the signal handler.
+ */
+int cet_setup_signal(bool ia32, unsigned long rstor_addr, struct sc_ext *sc_ext)
+{
+	unsigned long ssp = 0, new_ssp = 0;
+	u64 msr_ia32_u_cet = 0;
+	int err;
+
+	msr_ia32_u_cet = 0;
+	ssp = 0;
+
+	if (current->thread.cet.shstk_enabled) {
+		ssp = get_shstk_addr();
+		err = create_rstor_token(ia32, ssp, &new_ssp);
+		if (err)
+			return err;
+
+		if (ia32) {
+			ssp = new_ssp - sizeof(u32);
+			err = write_user_shstk_32(ssp, (unsigned int)rstor_addr);
+		} else {
+			ssp = new_ssp - sizeof(u64);
+			err = write_user_shstk_64(ssp, rstor_addr);
+		}
+
+		if (err)
+			return err;
+
+		msr_ia32_u_cet |= MSR_IA32_CET_SHSTK_EN;
+		sc_ext->ssp = new_ssp;
+	}
+
+	modify_fpu_regs_begin();
+	wrmsrl(MSR_IA32_PL3_SSP, ssp);
+	wrmsrl(MSR_IA32_U_CET, msr_ia32_u_cet);
+	modify_fpu_regs_end();
+	return 0;
+}
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index e38b272793b1..0a67518c142f 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -51,6 +51,58 @@ static inline int check_for_xstate(struct fxregs_state __user *buf,
 	return 0;
 }
 
+int setup_fpu_system_states(int is_ia32, unsigned long restorer,
+				   void __user *fp)
+{
+	int err = 0;
+
+#ifdef CONFIG_X86_64
+	if (cpu_x86_cet_enabled() && fp) {
+		struct sc_ext ext = {0, 0};
+
+		err = cet_setup_signal(is_ia32, restorer, &ext);
+		if (!err) {
+			void __user *p;
+
+			ext.total_size = sizeof(ext);
+
+			p = fp + fpu_user_xstate_size + FP_XSTATE_MAGIC2_SIZE;
+			p = (void __user *)ALIGN((unsigned long)p, 8);
+
+			if (copy_to_user(p, &ext, sizeof(ext)))
+				return -EFAULT;
+		}
+	}
+#endif
+
+	return err;
+}
+
+static int restore_fpu_system_states(int is_ia32, void __user *fp)
+{
+	int err = 0;
+
+#ifdef CONFIG_X86_64
+	if (cpu_x86_cet_enabled() && fp) {
+		struct sc_ext ext = {0, 0};
+		void __user *p;
+
+		p = fp + fpu_user_xstate_size + FP_XSTATE_MAGIC2_SIZE;
+		p = (void __user *)ALIGN((unsigned long)p, 8);
+
+		if (copy_from_user(&ext, p, sizeof(ext)))
+			return -EFAULT;
+
+		if (ext.total_size != sizeof(ext))
+			return -EFAULT;
+
+		err = cet_restore_signal(is_ia32, &ext);
+	}
+#endif
+
+	return err;
+}
+
 /*
  * Signal frame handlers.
  */
@@ -349,6 +401,10 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 		pagefault_disable();
 		ret = copy_user_to_fpregs_zeroing(buf_fx, xfeatures, fx_only);
 		pagefault_enable();
+
+		if (!ret)
+			ret = restore_fpu_system_states(0, buf);
+
 		if (!ret) {
 			fpregs_mark_activate();
 			fpregs_unlock();
@@ -435,6 +491,17 @@ int fpu__restore_sig(void __user *buf, int ia32_frame)
 	return __fpu__restore_sig(buf, buf_fx, size);
 }
 
+unsigned long fpu__alloc_sigcontext_ext(unsigned long sp)
+{
+	/*
+	 * sigcontext_ext is at: fpu + fpu_user_xstate_size +
+	 * FP_XSTATE_MAGIC2_SIZE, then aligned to 8.
+	 */
+	if (cpu_x86_cet_enabled())
+		sp -= (sizeof(struct sc_ext) + 8);
+	return sp;
+}
+
 unsigned long
 fpu__alloc_mathframe(unsigned long sp, int ia32_frame,
 		     unsigned long *buf_fx, unsigned long *size)
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 3b0dcec597ce..c81192bdc96c 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -46,6 +46,7 @@
 
 #include <asm/sigframe.h>
 #include <asm/signal.h>
+#include <asm/cet.h>
 
 #define COPY(x)			do {			\
 	get_user_ex(regs->x, &sc->x);			\
@@ -264,6 +265,7 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size,
 		sp = (unsigned long) ka->sa.sa_restorer;
 	}
 
+	sp = fpu__alloc_sigcontext_ext(sp);
 	sp = fpu__alloc_mathframe(sp, IS_ENABLED(CONFIG_X86_32),
 				  &buf_fx, &math_size);
 	*fpstate = (void __user *)sp;
@@ -493,6 +495,9 @@ static int __setup_rt_frame(int sig, struct ksignal *ksig,
 	err |= setup_sigcontext(&frame->uc.uc_mcontext, fp, regs, set->sig[0]);
 	err |= __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set));
 
+	if (!err)
+		err = setup_fpu_system_states(0, (unsigned long)ksig->ka.sa.sa_restorer, fp);
+
 	if (err)
 		return -EFAULT;
 
@@ -579,6 +584,9 @@ static int x32_setup_rt_frame(struct ksignal *ksig,
 				regs, set->sig[0]);
 	err |= __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set));
 
+	if (!err)
+		err = setup_fpu_system_states(0, (unsigned long)ksig->ka.sa.sa_restorer, fpstate);
+
 	if (err)
 		return -EFAULT;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v7 22/27] binfmt_elf: Extract .note.gnu.property from an ELF file
  2019-06-06 20:06 [PATCH v7 00/27] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (20 preceding siblings ...)
  2019-06-06 20:06 ` [PATCH v7 21/27] x86/cet/shstk: Handle signals for shadow stack Yu-cheng Yu
@ 2019-06-06 20:06 ` Yu-cheng Yu
  2019-06-07  7:58   ` Peter Zijlstra
  2019-06-07 18:01   ` Dave Martin
  2019-06-06 20:06 ` [PATCH v7 23/27] x86/cet/shstk: ELF header parsing of Shadow Stack Yu-cheng Yu
                   ` (4 subsequent siblings)
  26 siblings, 2 replies; 70+ messages in thread
From: Yu-cheng Yu @ 2019-06-06 20:06 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin
  Cc: Yu-cheng Yu

An ELF file's .note.gnu.property indicates features the executable file
can support.  For example, the property GNU_PROPERTY_X86_FEATURE_1_AND
indicates the file supports GNU_PROPERTY_X86_FEATURE_1_IBT and/or
GNU_PROPERTY_X86_FEATURE_1_SHSTK.

With this patch, if an arch needs to setup features from ELF properties,
it needs CONFIG_ARCH_USE_GNU_PROPERTY to be set, and a specific
arch_setup_property().

For example, for X86_64:

int arch_setup_property(void *ehdr, void *phdr, struct file *f, bool inter)
{
	int r;
	uint32_t property;

	r = get_gnu_property(ehdr, phdr, f, GNU_PROPERTY_X86_FEATURE_1_AND,
			     &property);
	...
}

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 fs/Kconfig.binfmt        |   3 +
 fs/Makefile              |   1 +
 fs/binfmt_elf.c          |  13 ++
 fs/gnu_property.c        | 351 +++++++++++++++++++++++++++++++++++++++
 include/linux/elf.h      |  12 ++
 include/uapi/linux/elf.h |  14 ++
 6 files changed, 394 insertions(+)
 create mode 100644 fs/gnu_property.c

diff --git a/fs/Kconfig.binfmt b/fs/Kconfig.binfmt
index f87ddd1b6d72..397138ab305b 100644
--- a/fs/Kconfig.binfmt
+++ b/fs/Kconfig.binfmt
@@ -36,6 +36,9 @@ config COMPAT_BINFMT_ELF
 config ARCH_BINFMT_ELF_STATE
 	bool
 
+config ARCH_USE_GNU_PROPERTY
+	bool
+
 config BINFMT_ELF_FDPIC
 	bool "Kernel support for FDPIC ELF binaries"
 	default y if !BINFMT_ELF
diff --git a/fs/Makefile b/fs/Makefile
index c9aea23aba56..b69f18c14e09 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -44,6 +44,7 @@ obj-$(CONFIG_BINFMT_ELF)	+= binfmt_elf.o
 obj-$(CONFIG_COMPAT_BINFMT_ELF)	+= compat_binfmt_elf.o
 obj-$(CONFIG_BINFMT_ELF_FDPIC)	+= binfmt_elf_fdpic.o
 obj-$(CONFIG_BINFMT_FLAT)	+= binfmt_flat.o
+obj-$(CONFIG_ARCH_USE_GNU_PROPERTY) += gnu_property.o
 
 obj-$(CONFIG_FS_MBCACHE)	+= mbcache.o
 obj-$(CONFIG_FS_POSIX_ACL)	+= posix_acl.o
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 8264b468f283..c3ea73787e93 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1080,6 +1080,19 @@ static int load_elf_binary(struct linux_binprm *bprm)
 		goto out_free_dentry;
 	}
 
+	if (interpreter) {
+		retval = arch_setup_property(&loc->interp_elf_ex,
+					     interp_elf_phdata,
+					     interpreter, true);
+	} else {
+		retval = arch_setup_property(&loc->elf_ex,
+					     elf_phdata,
+					     bprm->file, false);
+	}
+
+	if (retval < 0)
+		goto out_free_dentry;
+
 	if (interpreter) {
 		unsigned long interp_map_addr = 0;
 
diff --git a/fs/gnu_property.c b/fs/gnu_property.c
new file mode 100644
index 000000000000..9c4d1d5ebf00
--- /dev/null
+++ b/fs/gnu_property.c
@@ -0,0 +1,351 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Extract an ELF file's .note.gnu.property.
+ *
+ * The path from the ELF header to the note section is the following:
+ * elfhdr->elf_phdr->elf_note->property[].
+ */
+
+#include <uapi/linux/elf-em.h>
+#include <linux/processor.h>
+#include <linux/binfmts.h>
+#include <linux/elf.h>
+#include <linux/slab.h>
+#include <linux/fs.h>
+#include <linux/uaccess.h>
+#include <linux/string.h>
+#include <linux/compat.h>
+
+/*
+ * The .note.gnu.property layout:
+ *
+ *	struct elf_note {
+ *		u32 n_namesz; --> sizeof(n_name[]); always (4)
+ *		u32 n_ndescsz;--> sizeof(property[])
+ *		u32 n_type;   --> always NT_GNU_PROPERTY_TYPE_0
+ *	};
+ *	char n_name[4]; --> always 'GNU\0'
+ *
+ *	struct {
+ *		struct gnu_property {
+ *			u32 pr_type;
+ *			u32 pr_datasz;
+ *		};
+ *		u8 pr_data[pr_datasz];
+ *	}[];
+ */
+
+#define BUF_SIZE (PAGE_SIZE / 4)
+
+typedef bool (test_item_fn)(void *buf, u32 *arg, u32 type);
+typedef void *(next_item_fn)(void *buf, u32 *arg, u32 type);
+
+static inline bool test_note_type(void *buf, u32 *align, u32 note_type)
+{
+	struct elf_note *n = buf;
+
+	return ((n->n_type == note_type) && (n->n_namesz == 4) &&
+		(memcmp(n + 1, "GNU", 4) == 0));
+}
+
+static inline void *next_note(void *buf, u32 *align, u32 note_type)
+{
+	struct elf_note *n = buf;
+	u64 size;
+
+	if (check_add_overflow((u64)sizeof(*n), (u64)n->n_namesz, &size))
+		return NULL;
+
+	size = round_up(size, *align);
+
+	if (check_add_overflow(size, (u64)n->n_descsz, &size))
+		return NULL;
+
+	size = round_up(size, *align);
+
+	if (buf + size < buf)
+		return NULL;
+	else
+		return (buf + size);
+}
+
+static inline bool test_property(void *buf, u32 *max_type, u32 pr_type)
+{
+	struct gnu_property *pr = buf;
+
+	/*
+	 * Property types must be in ascending order.
+	 * Keep track of the max when testing each.
+	 */
+	if (pr->pr_type > *max_type)
+		*max_type = pr->pr_type;
+
+	return (pr->pr_type == pr_type);
+}
+
+static inline void *next_property(void *buf, u32 *max_type, u32 pr_type)
+{
+	struct gnu_property *pr = buf;
+
+	if ((buf + sizeof(*pr) +  pr->pr_datasz < buf) ||
+	    (pr->pr_type > pr_type) ||
+	    (pr->pr_type > *max_type))
+		return NULL;
+	else
+		return (buf + sizeof(*pr) + pr->pr_datasz);
+}
+
+/*
+ * Scan 'buf' for a pattern; return true if found.
+ * *pos is the distance from the beginning of buf to where
+ * the searched item or the next item is located.
+ */
+static int scan(u8 *buf, u32 buf_size, int item_size, test_item_fn test_item,
+		next_item_fn next_item, u32 *arg, u32 type, u32 *pos)
+{
+	int found = 0;
+	u8 *p, *max;
+
+	max = buf + buf_size;
+	if (max < buf)
+		return 0;
+
+	p = buf;
+
+	while ((p + item_size < max) && (p + item_size > buf)) {
+		if (test_item(p, arg, type)) {
+			found = 1;
+			break;
+		}
+
+		p = next_item(p, arg, type);
+	}
+
+	*pos = (p + item_size <= buf) ? 0 : (u32)(p - buf);
+	return found;
+}
+
+/*
+ * Search an NT_GNU_PROPERTY_TYPE_0 for the property of 'pr_type'.
+ */
+static int find_property(struct file *file, unsigned long desc_size,
+			 loff_t file_offset, u8 *buf,
+			 u32 pr_type, u32 *property)
+{
+	u32 buf_pos;
+	unsigned long read_size;
+	unsigned long done;
+	int found = 0;
+	int ret = 0;
+	u32 last_pr = 0;
+
+	*property = 0;
+	buf_pos = 0;
+
+	for (done = 0; done < desc_size; done += buf_pos) {
+		read_size = desc_size - done;
+		if (read_size > BUF_SIZE)
+			read_size = BUF_SIZE;
+
+		ret = kernel_read(file, buf, read_size, &file_offset);
+
+		if (ret != read_size)
+			return (ret < 0) ? ret : -EIO;
+
+		ret = 0;
+		found = scan(buf, read_size, sizeof(struct gnu_property),
+			     test_property, next_property,
+			     &last_pr, pr_type, &buf_pos);
+
+		if ((!buf_pos) || found)
+			break;
+
+		file_offset += buf_pos - read_size;
+	}
+
+	if (found) {
+		struct gnu_property *pr =
+			(struct gnu_property *)(buf + buf_pos);
+
+		if (pr->pr_datasz == 4) {
+			u32 *max =  (u32 *)(buf + read_size);
+			u32 *data = (u32 *)((u8 *)pr + sizeof(*pr));
+
+			if (data + 1 <= max) {
+				*property = *data;
+			} else {
+				file_offset += buf_pos - read_size;
+				file_offset += sizeof(*pr);
+				ret = kernel_read(file, property, 4,
+						  &file_offset);
+			}
+		}
+	}
+
+	return ret;
+}
+
+/*
+ * Search a PT_NOTE segment for NT_GNU_PROPERTY_TYPE_0.
+ */
+static int find_note_type_0(struct file *file, loff_t file_offset,
+			    unsigned long note_size, u32 align,
+			    u32 pr_type, u32 *property)
+{
+	u8 *buf;
+	u32 buf_pos;
+	unsigned long read_size;
+	unsigned long done;
+	int found = 0;
+	int ret = 0;
+
+	buf = kmalloc(BUF_SIZE, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	*property = 0;
+	buf_pos = 0;
+
+	for (done = 0; done < note_size; done += buf_pos) {
+		read_size = note_size - done;
+		if (read_size > BUF_SIZE)
+			read_size = BUF_SIZE;
+
+		ret = kernel_read(file, buf, read_size, &file_offset);
+
+		if (ret != read_size) {
+			ret = (ret < 0) ? ret : -EIO;
+			kfree(buf);
+			return ret;
+		}
+
+		/*
+		 * item_size = sizeof(struct elf_note) + elf_note.n_namesz.
+		 * n_namesz is 4 for the note type we look for.
+		 */
+		ret = scan(buf, read_size, sizeof(struct elf_note) + 4,
+			      test_note_type, next_note,
+			      &align, NT_GNU_PROPERTY_TYPE_0, &buf_pos);
+
+		file_offset += buf_pos - read_size;
+
+		if (ret && !found) {
+			struct elf_note *n =
+				(struct elf_note *)(buf + buf_pos);
+			u64 start = round_up(sizeof(*n) + n->n_namesz, align);
+			u64 total = 0;
+
+			if (check_add_overflow(start, (u64)n->n_descsz, &total)) {
+				ret = -EINVAL;
+				break;
+			}
+			total = round_up(total, align);
+
+			ret = find_property(file, n->n_descsz,
+					    file_offset + start,
+					    buf, pr_type, property);
+			found++;
+			file_offset += total;
+			buf_pos += total;
+		} else if (!buf_pos || ret) {
+			ret = 0;
+			*property = 0;
+			break;
+		}
+	}
+
+	kfree(buf);
+	return ret;
+}
+
+/*
+ * Look at an ELF file's PT_NOTE segments, then NT_GNU_PROPERTY_TYPE_0, then
+ * the property of pr_type.
+ *
+ * Input:
+ *	file: the file to search;
+ *	phdr: the file's elf header;
+ *	phnum: number of entries in phdr;
+ *	pr_type: the property type.
+ *
+ * Output:
+ *	The property found.
+ *
+ * Return:
+ *	Zero or error.
+ */
+static int scan_segments_64(struct file *file, struct elf64_phdr *phdr,
+			    int phnum, u32 pr_type, u32 *property)
+{
+	int i;
+	int err = 0;
+
+	for (i = 0; i < phnum; i++, phdr++) {
+		if ((phdr->p_type != PT_NOTE) || (phdr->p_align != 8))
+			continue;
+
+		/*
+		 * Search the PT_NOTE segment for NT_GNU_PROPERTY_TYPE_0.
+		 */
+		err = find_note_type_0(file, phdr->p_offset, phdr->p_filesz,
+				       phdr->p_align, pr_type, property);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static int scan_segments_32(struct file *file, struct elf32_phdr *phdr,
+			    int phnum, u32 pr_type, u32 *property)
+{
+	int i;
+	int err = 0;
+
+	for (i = 0; i < phnum; i++, phdr++) {
+		if ((phdr->p_type != PT_NOTE) || (phdr->p_align != 4))
+			continue;
+
+		/*
+		 * Search the PT_NOTE segment for NT_GNU_PROPERTY_TYPE_0.
+		 */
+		err = find_note_type_0(file, phdr->p_offset, phdr->p_filesz,
+				       phdr->p_align, pr_type, property);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+int get_gnu_property(void *ehdr_p, void *phdr_p, struct file *f,
+		     u32 pr_type, u32 *property)
+{
+	struct elf64_hdr *ehdr64 = ehdr_p;
+	int err = 0;
+
+	*property = 0;
+
+	if (ehdr64->e_ident[EI_CLASS] == ELFCLASS64) {
+		struct elf64_phdr *phdr64 = phdr_p;
+
+		err = scan_segments_64(f, phdr64, ehdr64->e_phnum,
+				       pr_type, property);
+		if (err < 0)
+			goto out;
+	} else {
+		struct elf32_hdr *ehdr32 = ehdr_p;
+
+		if (ehdr32->e_ident[EI_CLASS] == ELFCLASS32) {
+			struct elf32_phdr *phdr32 = phdr_p;
+
+			err = scan_segments_32(f, phdr32, ehdr32->e_phnum,
+					       pr_type, property);
+			if (err < 0)
+				goto out;
+		}
+	}
+
+out:
+	return err;
+}
diff --git a/include/linux/elf.h b/include/linux/elf.h
index e3649b3e970e..c15febebe7f2 100644
--- a/include/linux/elf.h
+++ b/include/linux/elf.h
@@ -56,4 +56,16 @@ static inline int elf_coredump_extra_notes_write(struct coredump_params *cprm) {
 extern int elf_coredump_extra_notes_size(void);
 extern int elf_coredump_extra_notes_write(struct coredump_params *cprm);
 #endif
+
+#ifdef CONFIG_ARCH_USE_GNU_PROPERTY
+extern int arch_setup_property(void *ehdr, void *phdr, struct file *f,
+			       bool interp);
+extern int get_gnu_property(void *ehdr_p, void *phdr_p, struct file *f,
+			    u32 pr_type, u32 *feature);
+#else
+static inline int arch_setup_property(void *ehdr, void *phdr, struct file *f,
+				      bool interp) { return 0; }
+static inline int get_gnu_property(void *ehdr_p, void *phdr_p, struct file *f,
+				   u32 pr_type, u32 *feature) { return 0; }
+#endif
 #endif /* _LINUX_ELF_H */
diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
index 34c02e4290fe..316177ce9e76 100644
--- a/include/uapi/linux/elf.h
+++ b/include/uapi/linux/elf.h
@@ -372,6 +372,7 @@ typedef struct elf64_shdr {
 #define NT_PRFPREG	2
 #define NT_PRPSINFO	3
 #define NT_TASKSTRUCT	4
+#define NT_GNU_PROPERTY_TYPE_0 5
 #define NT_AUXV		6
 /*
  * Note to userspace developers: size of NT_SIGINFO note may increase
@@ -443,4 +444,17 @@ typedef struct elf64_note {
   Elf64_Word n_type;	/* Content type */
 } Elf64_Nhdr;
 
+/* NT_GNU_PROPERTY_TYPE_0 header */
+struct gnu_property {
+  __u32 pr_type;
+  __u32 pr_datasz;
+};
+
+/* .note.gnu.property types */
+#define GNU_PROPERTY_X86_FEATURE_1_AND		(0xc0000002)
+
+/* Bits of GNU_PROPERTY_X86_FEATURE_1_AND */
+#define GNU_PROPERTY_X86_FEATURE_1_IBT		(0x00000001)
+#define GNU_PROPERTY_X86_FEATURE_1_SHSTK	(0x00000002)
+
 #endif /* _UAPI_LINUX_ELF_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v7 23/27] x86/cet/shstk: ELF header parsing of Shadow Stack
  2019-06-06 20:06 [PATCH v7 00/27] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (21 preceding siblings ...)
  2019-06-06 20:06 ` [PATCH v7 22/27] binfmt_elf: Extract .note.gnu.property from an ELF file Yu-cheng Yu
@ 2019-06-06 20:06 ` Yu-cheng Yu
  2019-06-07  7:54   ` Peter Zijlstra
  2019-06-06 20:06 ` [PATCH v7 24/27] x86/cet/shstk: Handle thread shadow stack Yu-cheng Yu
                   ` (3 subsequent siblings)
  26 siblings, 1 reply; 70+ messages in thread
From: Yu-cheng Yu @ 2019-06-06 20:06 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin
  Cc: Yu-cheng Yu

Look in .note.gnu.property of an ELF file and check if Shadow Stack needs
to be enabled for the task.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/Kconfig             |  1 +
 arch/x86/kernel/process_64.c | 24 ++++++++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 1664918c2c1c..df8b57de75b2 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1928,6 +1928,7 @@ config X86_INTEL_SHADOW_STACK_USER
 	select ARCH_USES_HIGH_VMA_FLAGS
 	select X86_INTEL_CET
 	select ARCH_HAS_SHSTK
+	select ARCH_USE_GNU_PROPERTY
 	---help---
 	  Shadow stack provides hardware protection against program stack
 	  corruption.  Only when all the following are true will an application
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 250e4c4ac6d9..c51df5b4e116 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -828,3 +828,27 @@ unsigned long KSTK_ESP(struct task_struct *task)
 {
 	return task_pt_regs(task)->sp;
 }
+
+#ifdef CONFIG_ARCH_USE_GNU_PROPERTY
+int arch_setup_property(void *ehdr, void *phdr, struct file *f, bool inter)
+{
+	int r;
+	uint32_t property;
+
+	r = get_gnu_property(ehdr, phdr, f, GNU_PROPERTY_X86_FEATURE_1_AND,
+			     &property);
+
+	memset(&current->thread.cet, 0, sizeof(struct cet_status));
+
+	if (r)
+		return r;
+
+	if (cpu_feature_enabled(X86_FEATURE_SHSTK)) {
+		if (property & GNU_PROPERTY_X86_FEATURE_1_SHSTK)
+			r = cet_setup_shstk();
+		if (r < 0)
+			return r;
+	}
+	return r;
+}
+#endif
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v7 24/27] x86/cet/shstk: Handle thread shadow stack
  2019-06-06 20:06 [PATCH v7 00/27] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (22 preceding siblings ...)
  2019-06-06 20:06 ` [PATCH v7 23/27] x86/cet/shstk: ELF header parsing of Shadow Stack Yu-cheng Yu
@ 2019-06-06 20:06 ` Yu-cheng Yu
  2019-06-06 20:06 ` [PATCH v7 25/27] mm/mmap: Add Shadow stack pages to memory accounting Yu-cheng Yu
                   ` (2 subsequent siblings)
  26 siblings, 0 replies; 70+ messages in thread
From: Yu-cheng Yu @ 2019-06-06 20:06 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin
  Cc: Yu-cheng Yu

The shadow stack for clone/fork is handled as the following:

(1) If ((clone_flags & (CLONE_VFORK | CLONE_VM)) == CLONE_VM),
    the kernel allocates (and frees on thread exit) a new SHSTK
    for the child.

    It is possible for the kernel to complete the clone syscall
    and set the child's SHSTK pointer to NULL and let the child
    thread allocate a SHSTK for itself.  There are two issues
    in this approach: It is not compatible with existing code
    that does inline syscall and it cannot handle signals before
    the child can successfully allocate a SHSTK.

(2) For (clone_flags & CLONE_VFORK), the child uses the existing
    SHSTK.

(3) For all other cases, the SHSTK is copied/reused whenever the
    parent or the child does a call/ret.

This patch handles cases (1) & (2).  Case (3) is handled in the
SHSTK page fault patches.

A 64-bit SHSTK has a fixed size of RLIMIT_STACK. A compat-mode
thread SHSTK has a fixed size of 1/4 RLIMIT_STACK.  This allows
more threads to share a 32-bit address space.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/include/asm/cet.h         |  2 ++
 arch/x86/include/asm/mmu_context.h |  3 +++
 arch/x86/kernel/cet.c              | 41 ++++++++++++++++++++++++++++++
 arch/x86/kernel/process.c          |  1 +
 arch/x86/kernel/process_64.c       |  7 +++++
 5 files changed, 54 insertions(+)

diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h
index 422ccb8adbb7..52c506a68848 100644
--- a/arch/x86/include/asm/cet.h
+++ b/arch/x86/include/asm/cet.h
@@ -19,12 +19,14 @@ struct cet_status {
 
 #ifdef CONFIG_X86_INTEL_CET
 int cet_setup_shstk(void);
+int cet_setup_thread_shstk(struct task_struct *p);
 void cet_disable_shstk(void);
 void cet_disable_free_shstk(struct task_struct *p);
 int cet_restore_signal(bool ia32, struct sc_ext *sc);
 int cet_setup_signal(bool ia32, unsigned long rstor, struct sc_ext *sc);
 #else
 static inline int cet_setup_shstk(void) { return -EINVAL; }
+static inline int cet_setup_thread_shstk(struct task_struct *p) { return 0; }
 static inline void cet_disable_shstk(void) {}
 static inline void cet_disable_free_shstk(struct task_struct *p) {}
 static inline int cet_restore_signal(bool ia32, struct sc_ext *sc) { return -EINVAL; }
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 9024236693d2..a9a768529540 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -13,6 +13,7 @@
 #include <asm/tlbflush.h>
 #include <asm/paravirt.h>
 #include <asm/mpx.h>
+#include <asm/cet.h>
 #include <asm/debugreg.h>
 
 extern atomic64_t last_mm_ctx_id;
@@ -228,6 +229,8 @@ do {						\
 #else
 #define deactivate_mm(tsk, mm)			\
 do {						\
+	if (!tsk->vfork_done)			\
+		cet_disable_free_shstk(tsk);	\
 	load_gs_index(0);			\
 	loadsegment(fs, 0);			\
 } while (0)
diff --git a/arch/x86/kernel/cet.c b/arch/x86/kernel/cet.c
index b247cd15c1e2..9ef1af617d38 100644
--- a/arch/x86/kernel/cet.c
+++ b/arch/x86/kernel/cet.c
@@ -151,6 +151,47 @@ int cet_setup_shstk(void)
 	return 0;
 }
 
+int cet_setup_thread_shstk(struct task_struct *tsk)
+{
+	unsigned long addr, size;
+	struct cet_user_state *state;
+
+	if (!current->thread.cet.shstk_enabled)
+		return 0;
+
+	state = get_xsave_addr(&tsk->thread.fpu.state.xsave,
+			       XFEATURE_CET_USER);
+
+	if (!state)
+		return -EINVAL;
+
+	size = rlimit(RLIMIT_STACK);
+
+	/*
+	 * Compat-mode pthreads share a limited address space.
+	 * If each function call takes an average of four slots
+	 * stack space, we need 1/4 of stack size for shadow stack.
+	 */
+	if (in_compat_syscall())
+		size /= 4;
+
+	addr = do_mmap_locked(0, size, PROT_READ,
+			      MAP_ANONYMOUS | MAP_PRIVATE, VM_SHSTK);
+
+	if (addr >= TASK_SIZE_MAX) {
+		tsk->thread.cet.shstk_base = 0;
+		tsk->thread.cet.shstk_size = 0;
+		tsk->thread.cet.shstk_enabled = 0;
+		return -ENOMEM;
+	}
+
+	fpu__prepare_write(&tsk->thread.fpu);
+	state->user_ssp = (u64)(addr + size - sizeof(u64));
+	tsk->thread.cet.shstk_base = addr;
+	tsk->thread.cet.shstk_size = size;
+	return 0;
+}
+
 void cet_disable_shstk(void)
 {
 	u64 r;
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index a4deb79b1089..58b1c52b38b5 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -130,6 +130,7 @@ void exit_thread(struct task_struct *tsk)
 
 	free_vm86(t);
 
+	cet_disable_free_shstk(tsk);
 	fpu__drop(fpu);
 }
 
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index c51df5b4e116..5fa0d9ab18f1 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -421,6 +421,13 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
 	if (sp)
 		childregs->sp = sp;
 
+	/* Allocate a new shadow stack for pthread */
+	if ((clone_flags & (CLONE_VFORK | CLONE_VM)) == CLONE_VM) {
+		err = cet_setup_thread_shstk(p);
+		if (err)
+			goto out;
+	}
+
 	err = -ENOMEM;
 	if (unlikely(test_tsk_thread_flag(me, TIF_IO_BITMAP))) {
 		p->thread.io_bitmap_ptr = kmemdup(me->thread.io_bitmap_ptr,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v7 25/27] mm/mmap: Add Shadow stack pages to memory accounting
  2019-06-06 20:06 [PATCH v7 00/27] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (23 preceding siblings ...)
  2019-06-06 20:06 ` [PATCH v7 24/27] x86/cet/shstk: Handle thread shadow stack Yu-cheng Yu
@ 2019-06-06 20:06 ` Yu-cheng Yu
  2019-06-11 17:55   ` Dave Hansen
  2019-06-06 20:06 ` [PATCH v7 26/27] x86/cet/shstk: Add arch_prctl functions for Shadow Stack Yu-cheng Yu
  2019-06-06 20:06 ` [PATCH v7 27/27] x86/cet/shstk: Add Shadow Stack instructions to opcode map Yu-cheng Yu
  26 siblings, 1 reply; 70+ messages in thread
From: Yu-cheng Yu @ 2019-06-06 20:06 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin
  Cc: Yu-cheng Yu

Add shadow stack pages to memory accounting.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 mm/mmap.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/mm/mmap.c b/mm/mmap.c
index b1a921c0de63..3b643ace2c49 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1703,6 +1703,9 @@ static inline int accountable_mapping(struct file *file, vm_flags_t vm_flags)
 	if (file && is_file_hugepages(file))
 		return 0;
 
+	if (arch_copy_pte_mapping(vm_flags))
+		return 1;
+
 	return (vm_flags & (VM_NORESERVE | VM_SHARED | VM_WRITE)) == VM_WRITE;
 }
 
@@ -3319,6 +3322,8 @@ void vm_stat_account(struct mm_struct *mm, vm_flags_t flags, long npages)
 		mm->stack_vm += npages;
 	else if (is_data_mapping(flags))
 		mm->data_vm += npages;
+	else if (arch_copy_pte_mapping(flags))
+		mm->data_vm += npages;
 }
 
 static vm_fault_t special_mapping_fault(struct vm_fault *vmf);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v7 26/27] x86/cet/shstk: Add arch_prctl functions for Shadow Stack
  2019-06-06 20:06 [PATCH v7 00/27] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (24 preceding siblings ...)
  2019-06-06 20:06 ` [PATCH v7 25/27] mm/mmap: Add Shadow stack pages to memory accounting Yu-cheng Yu
@ 2019-06-06 20:06 ` Yu-cheng Yu
  2019-06-06 20:06 ` [PATCH v7 27/27] x86/cet/shstk: Add Shadow Stack instructions to opcode map Yu-cheng Yu
  26 siblings, 0 replies; 70+ messages in thread
From: Yu-cheng Yu @ 2019-06-06 20:06 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin
  Cc: Yu-cheng Yu

arch_prctl(ARCH_X86_CET_STATUS, unsigned long *addr)
    Return CET feature status.

    The parameter 'addr' is a pointer to a user buffer.
    On returning to the caller, the kernel fills the following
    information:

    *addr = SHSTK/IBT status
    *(addr + 1) = SHSTK base address
    *(addr + 2) = SHSTK size

arch_prctl(ARCH_X86_CET_DISABLE, unsigned long features)
    Disable CET features specified in 'features'.  Return
    -EPERM if CET is locked.

arch_prctl(ARCH_X86_CET_LOCK)
    Lock in CET feature.

arch_prctl(ARCH_X86_CET_ALLOC_SHSTK, unsigned long *addr)
    Allocate a new SHSTK.

    The parameter 'addr' is a pointer to a user buffer and indicates
    the desired SHSTK size to allocate.  On returning to the caller
    the buffer contains the address of the new SHSTK.

There is no CET enabling arch_prctl function.  By design, CET is
enabled automatically if the binary and the system can support it.

The parameters passed are always unsigned 64-bit.  When an ia32
application passing pointers, it should only use the lower 32 bits.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/include/asm/cet.h        |  5 ++
 arch/x86/include/uapi/asm/prctl.h |  5 ++
 arch/x86/kernel/Makefile          |  2 +-
 arch/x86/kernel/cet.c             | 29 +++++++++++
 arch/x86/kernel/cet_prctl.c       | 85 +++++++++++++++++++++++++++++++
 arch/x86/kernel/process.c         |  4 +-
 6 files changed, 127 insertions(+), 3 deletions(-)
 create mode 100644 arch/x86/kernel/cet_prctl.c

diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h
index 52c506a68848..2df357dffd24 100644
--- a/arch/x86/include/asm/cet.h
+++ b/arch/x86/include/asm/cet.h
@@ -14,19 +14,24 @@ struct sc_ext;
 struct cet_status {
 	unsigned long	shstk_base;
 	unsigned long	shstk_size;
+	unsigned int	locked:1;
 	unsigned int	shstk_enabled:1;
 };
 
 #ifdef CONFIG_X86_INTEL_CET
+int prctl_cet(int option, unsigned long arg2);
 int cet_setup_shstk(void);
 int cet_setup_thread_shstk(struct task_struct *p);
+int cet_alloc_shstk(unsigned long *arg);
 void cet_disable_shstk(void);
 void cet_disable_free_shstk(struct task_struct *p);
 int cet_restore_signal(bool ia32, struct sc_ext *sc);
 int cet_setup_signal(bool ia32, unsigned long rstor, struct sc_ext *sc);
 #else
+static inline int prctl_cet(int option, unsigned long arg2) { return -EINVAL; }
 static inline int cet_setup_shstk(void) { return -EINVAL; }
 static inline int cet_setup_thread_shstk(struct task_struct *p) { return 0; }
+static inline int cet_alloc_shstk(unsigned long *arg) { return -EINVAL; }
 static inline void cet_disable_shstk(void) {}
 static inline void cet_disable_free_shstk(struct task_struct *p) {}
 static inline int cet_restore_signal(bool ia32, struct sc_ext *sc) { return -EINVAL; }
diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h
index 5a6aac9fa41f..d962f0ec9ccf 100644
--- a/arch/x86/include/uapi/asm/prctl.h
+++ b/arch/x86/include/uapi/asm/prctl.h
@@ -14,4 +14,9 @@
 #define ARCH_MAP_VDSO_32	0x2002
 #define ARCH_MAP_VDSO_64	0x2003
 
+#define ARCH_X86_CET_STATUS		0x3001
+#define ARCH_X86_CET_DISABLE		0x3002
+#define ARCH_X86_CET_LOCK		0x3003
+#define ARCH_X86_CET_ALLOC_SHSTK	0x3004
+
 #endif /* _ASM_X86_PRCTL_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 584ed7e9a599..d908c95306fc 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -140,7 +140,7 @@ obj-$(CONFIG_UNWINDER_ORC)		+= unwind_orc.o
 obj-$(CONFIG_UNWINDER_FRAME_POINTER)	+= unwind_frame.o
 obj-$(CONFIG_UNWINDER_GUESS)		+= unwind_guess.o
 
-obj-$(CONFIG_X86_INTEL_CET)		+= cet.o
+obj-$(CONFIG_X86_INTEL_CET)		+= cet.o cet_prctl.o
 
 ###
 # 64 bit specific files
diff --git a/arch/x86/kernel/cet.c b/arch/x86/kernel/cet.c
index 9ef1af617d38..0004333f8373 100644
--- a/arch/x86/kernel/cet.c
+++ b/arch/x86/kernel/cet.c
@@ -127,6 +127,35 @@ static int create_rstor_token(bool ia32, unsigned long ssp,
 	return 0;
 }
 
+int cet_alloc_shstk(unsigned long *arg)
+{
+	unsigned long len = *arg;
+	unsigned long addr;
+	unsigned long token;
+	unsigned long ssp;
+
+	addr = do_mmap_locked(0, len, PROT_READ,
+			      MAP_ANONYMOUS | MAP_PRIVATE, VM_SHSTK);
+	if (addr >= TASK_SIZE_MAX)
+		return -ENOMEM;
+
+	/* Restore token is 8 bytes and aligned to 8 bytes */
+	ssp = addr + len;
+	token = ssp;
+
+	if (!in_ia32_syscall())
+		token |= TOKEN_MODE_64;
+	ssp -= 8;
+
+	if (write_user_shstk_64(ssp, token)) {
+		vm_munmap(addr, len);
+		return -EINVAL;
+	}
+
+	*arg = addr;
+	return 0;
+}
+
 int cet_setup_shstk(void)
 {
 	unsigned long addr, size;
diff --git a/arch/x86/kernel/cet_prctl.c b/arch/x86/kernel/cet_prctl.c
new file mode 100644
index 000000000000..9c9d4262b07e
--- /dev/null
+++ b/arch/x86/kernel/cet_prctl.c
@@ -0,0 +1,85 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#include <linux/errno.h>
+#include <linux/uaccess.h>
+#include <linux/prctl.h>
+#include <linux/compat.h>
+#include <linux/mman.h>
+#include <linux/elfcore.h>
+#include <asm/processor.h>
+#include <asm/prctl.h>
+#include <asm/cet.h>
+
+/* See Documentation/x86/intel_cet.rst. */
+
+static int handle_get_status(unsigned long arg2)
+{
+	unsigned int features = 0;
+	unsigned long shstk_base, shstk_size;
+	unsigned long buf[3];
+
+	if (current->thread.cet.shstk_enabled)
+		features |= GNU_PROPERTY_X86_FEATURE_1_SHSTK;
+
+	shstk_base = current->thread.cet.shstk_base;
+	shstk_size = current->thread.cet.shstk_size;
+
+	buf[0] = (unsigned long)features;
+	buf[1] = shstk_base;
+	buf[2] = shstk_size;
+	return copy_to_user((unsigned long __user *)arg2, buf,
+			    sizeof(buf));
+}
+
+static int handle_alloc_shstk(unsigned long arg2)
+{
+	int err = 0;
+	unsigned long arg;
+	unsigned long addr = 0;
+	unsigned long size = 0;
+
+	if (get_user(arg, (unsigned long __user *)arg2))
+		return -EFAULT;
+
+	size = arg;
+	err = cet_alloc_shstk(&arg);
+	if (err)
+		return err;
+
+	addr = arg;
+	if (put_user(addr, (unsigned long __user *)arg2)) {
+		vm_munmap(addr, size);
+		return -EFAULT;
+	}
+
+	return 0;
+}
+
+int prctl_cet(int option, unsigned long arg2)
+{
+	if (!cpu_x86_cet_enabled())
+		return -EINVAL;
+
+	switch (option) {
+	case ARCH_X86_CET_STATUS:
+		return handle_get_status(arg2);
+
+	case ARCH_X86_CET_DISABLE:
+		if (current->thread.cet.locked)
+			return -EPERM;
+		if (arg2 & GNU_PROPERTY_X86_FEATURE_1_SHSTK)
+			cet_disable_free_shstk(current);
+
+		return 0;
+
+	case ARCH_X86_CET_LOCK:
+		current->thread.cet.locked = 1;
+		return 0;
+
+	case ARCH_X86_CET_ALLOC_SHSTK:
+		return handle_alloc_shstk(arg2);
+
+	default:
+		return -EINVAL;
+	}
+}
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 58b1c52b38b5..e0090f2790df 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -873,7 +873,7 @@ long do_arch_prctl_common(struct task_struct *task, int option,
 		return get_cpuid_mode();
 	case ARCH_SET_CPUID:
 		return set_cpuid_mode(task, cpuid_enabled);
+	default:
+		return prctl_cet(option, cpuid_enabled);
 	}
-
-	return -EINVAL;
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v7 27/27] x86/cet/shstk: Add Shadow Stack instructions to opcode map
  2019-06-06 20:06 [PATCH v7 00/27] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
                   ` (25 preceding siblings ...)
  2019-06-06 20:06 ` [PATCH v7 26/27] x86/cet/shstk: Add arch_prctl functions for Shadow Stack Yu-cheng Yu
@ 2019-06-06 20:06 ` Yu-cheng Yu
  2019-11-01 14:03   ` Adrian Hunter
  26 siblings, 1 reply; 70+ messages in thread
From: Yu-cheng Yu @ 2019-06-06 20:06 UTC (permalink / raw)
  To: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin
  Cc: Yu-cheng Yu

Add the following shadow stack management instructions.

INCSSP:
    Increment shadow stack pointer by the steps specified.

RDSSP:
    Read SSP register into a GPR.

SAVEPREVSSP:
    Use "prev ssp" token at top of current shadow stack to
    create a "restore token" on previous shadow stack.

RSTORSSP:
    Restore from a "restore token" pointed by a GPR to SSP.

WRSS:
    Write to kernel-mode shadow stack (kernel-mode instruction).

WRUSS:
    Write to user-mode shadow stack (kernel-mode instruction).

SETSSBSY:
    Verify the "supervisor token" pointed by IA32_PL0_SSP MSR,
    if valid, set the token to busy, and set SSP to the value
    of IA32_PL0_SSP MSR.

CLRSSBSY:
    Verify the "supervisor token" pointed by a GPR, if valid,
    clear the busy bit from the token.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/lib/x86-opcode-map.txt               | 26 +++++++++++++------
 tools/objtool/arch/x86/lib/x86-opcode-map.txt | 26 +++++++++++++------
 2 files changed, 36 insertions(+), 16 deletions(-)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index e0b85930dd77..c5e825d44766 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -366,7 +366,7 @@ AVXcode: 1
 1b: BNDCN Gv,Ev (F2) | BNDMOV Ev,Gv (66) | BNDMK Gv,Ev (F3) | BNDSTX Ev,Gv
 1c:
 1d:
-1e:
+1e: RDSSP Rd (F3),REX.W
 1f: NOP Ev
 # 0x0f 0x20-0x2f
 20: MOV Rd,Cd
@@ -610,7 +610,17 @@ fe: paddd Pq,Qq | vpaddd Vx,Hx,Wx (66),(v1)
 ff: UD0
 EndTable
 
-Table: 3-byte opcode 1 (0x0f 0x38)
+Table: 3-byte opcode 1 (0x0f 0x01)
+Referrer:
+AVXcode:
+# Skip 0x00-0xe7
+e8: SETSSBSY (f3)
+e9:
+ea: SAVEPREVSSP (f3)
+# Skip 0xeb-0xff
+EndTable
+
+Table: 3-byte opcode 2 (0x0f 0x38)
 Referrer: 3-byte escape 1
 AVXcode: 2
 # 0x0f 0x38 0x00-0x0f
@@ -789,12 +799,12 @@ f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
 f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
 f2: ANDN Gy,By,Ey (v)
 f3: Grp17 (1A)
-f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey (F2),(v)
-f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v)
+f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey (F2),(v) | WRUSS Pq,Qq (66),REX.W
+f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSS Pq,Qq (66),REX.W
 f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
 EndTable
 
-Table: 3-byte opcode 2 (0x0f 0x3a)
+Table: 3-byte opcode 3 (0x0f 0x3a)
 Referrer: 3-byte escape 2
 AVXcode: 3
 # 0x0f 0x3a 0x00-0xff
@@ -948,7 +958,7 @@ GrpTable: Grp7
 2: LGDT Ms | XGETBV (000),(11B) | XSETBV (001),(11B) | VMFUNC (100),(11B) | XEND (101)(11B) | XTEST (110)(11B)
 3: LIDT Ms
 4: SMSW Mw/Rv
-5: rdpkru (110),(11B) | wrpkru (111),(11B)
+5: rdpkru (110),(11B) | wrpkru (111),(11B) | RSTORSSP Mq (F3)
 6: LMSW Ew
 7: INVLPG Mb | SWAPGS (o64),(000),(11B) | RDTSCP (001),(11B)
 EndTable
@@ -1019,8 +1029,8 @@ GrpTable: Grp15
 2: vldmxcsr Md (v1) | WRFSBASE Ry (F3),(11B)
 3: vstmxcsr Md (v1) | WRGSBASE Ry (F3),(11B)
 4: XSAVE | ptwrite Ey (F3),(11B)
-5: XRSTOR | lfence (11B)
-6: XSAVEOPT | clwb (66) | mfence (11B)
+5: XRSTOR | lfence (11B) | INCSSP Rd (F3),REX.W
+6: XSAVEOPT | clwb (66) | mfence (11B) | CLRSSBSY Mq (F3)
 7: clflush | clflushopt (66) | sfence (11B)
 EndTable
 
diff --git a/tools/objtool/arch/x86/lib/x86-opcode-map.txt b/tools/objtool/arch/x86/lib/x86-opcode-map.txt
index e0b85930dd77..c5e825d44766 100644
--- a/tools/objtool/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/objtool/arch/x86/lib/x86-opcode-map.txt
@@ -366,7 +366,7 @@ AVXcode: 1
 1b: BNDCN Gv,Ev (F2) | BNDMOV Ev,Gv (66) | BNDMK Gv,Ev (F3) | BNDSTX Ev,Gv
 1c:
 1d:
-1e:
+1e: RDSSP Rd (F3),REX.W
 1f: NOP Ev
 # 0x0f 0x20-0x2f
 20: MOV Rd,Cd
@@ -610,7 +610,17 @@ fe: paddd Pq,Qq | vpaddd Vx,Hx,Wx (66),(v1)
 ff: UD0
 EndTable
 
-Table: 3-byte opcode 1 (0x0f 0x38)
+Table: 3-byte opcode 1 (0x0f 0x01)
+Referrer:
+AVXcode:
+# Skip 0x00-0xe7
+e8: SETSSBSY (f3)
+e9:
+ea: SAVEPREVSSP (f3)
+# Skip 0xeb-0xff
+EndTable
+
+Table: 3-byte opcode 2 (0x0f 0x38)
 Referrer: 3-byte escape 1
 AVXcode: 2
 # 0x0f 0x38 0x00-0x0f
@@ -789,12 +799,12 @@ f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
 f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
 f2: ANDN Gy,By,Ey (v)
 f3: Grp17 (1A)
-f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey (F2),(v)
-f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v)
+f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey (F2),(v) | WRUSS Pq,Qq (66),REX.W
+f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSS Pq,Qq (66),REX.W
 f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
 EndTable
 
-Table: 3-byte opcode 2 (0x0f 0x3a)
+Table: 3-byte opcode 3 (0x0f 0x3a)
 Referrer: 3-byte escape 2
 AVXcode: 3
 # 0x0f 0x3a 0x00-0xff
@@ -948,7 +958,7 @@ GrpTable: Grp7
 2: LGDT Ms | XGETBV (000),(11B) | XSETBV (001),(11B) | VMFUNC (100),(11B) | XEND (101)(11B) | XTEST (110)(11B)
 3: LIDT Ms
 4: SMSW Mw/Rv
-5: rdpkru (110),(11B) | wrpkru (111),(11B)
+5: rdpkru (110),(11B) | wrpkru (111),(11B) | RSTORSSP Mq (F3)
 6: LMSW Ew
 7: INVLPG Mb | SWAPGS (o64),(000),(11B) | RDTSCP (001),(11B)
 EndTable
@@ -1019,8 +1029,8 @@ GrpTable: Grp15
 2: vldmxcsr Md (v1) | WRFSBASE Ry (F3),(11B)
 3: vstmxcsr Md (v1) | WRGSBASE Ry (F3),(11B)
 4: XSAVE | ptwrite Ey (F3),(11B)
-5: XRSTOR | lfence (11B)
-6: XSAVEOPT | clwb (66) | mfence (11B)
+5: XRSTOR | lfence (11B) | INCSSP Rd (F3),REX.W
+6: XSAVEOPT | clwb (66) | mfence (11B) | CLRSSBSY Mq (F3)
 7: clflush | clflushopt (66) | sfence (11B)
 EndTable
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: [PATCH v7 04/27] x86/fpu/xstate: Introduce XSAVES system states
  2019-06-06 20:06 ` [PATCH v7 04/27] x86/fpu/xstate: Introduce XSAVES system states Yu-cheng Yu
@ 2019-06-06 21:18   ` Dave Hansen
  2019-06-06 22:04     ` Andy Lutomirski
  0 siblings, 1 reply; 70+ messages in thread
From: Dave Hansen @ 2019-06-06 21:18 UTC (permalink / raw)
  To: Yu-cheng Yu, x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	linux-kernel, linux-doc, linux-mm, linux-arch, linux-api,
	Arnd Bergmann, Andy Lutomirski, Balbir Singh, Borislav Petkov,
	Cyrill Gorcunov, Dave Hansen, Eugene Syromiatnikov,
	Florian Weimer, H.J. Lu, Jann Horn, Jonathan Corbet, Kees Cook,
	Mike Kravetz, Nadav Amit, Oleg Nesterov, Pavel Machek,
	Peter Zijlstra, Randy Dunlap, Ravi V. Shankar,
	Vedvyas Shanbhogue, Dave Martin

> +/*
> + * Helpers for changing XSAVES system states.
> + */
> +static inline void modify_fpu_regs_begin(void)
> +{
> +	fpregs_lock();
> +	if (test_thread_flag(TIF_NEED_FPU_LOAD))
> +		__fpregs_load_activate();
> +}
> +
> +static inline void modify_fpu_regs_end(void)
> +{
> +	fpregs_unlock();
> +}

These are massively under-commented and under-changelogged.  This looks
like it's intended to ensure that we have supervisor FPU state for this
task loaded before we go and run the MSRs that might be modifying it.

But, that seems broken.  If we have supervisor state, we can't always
defer the load until return to userspace, so we'll never?? have
TIF_NEED_FPU_LOAD.  That would certainly be true for cet_kernel_state.

It seems like we actually need three classes of XSAVE states:
1. User state
2. Supervisor state that affects user mode
3. Supervisor state that affects kernel mode

We can delay the load of 1 and 2, but not 3.

But I don't see any infrastructure for this.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v7 04/27] x86/fpu/xstate: Introduce XSAVES system states
  2019-06-06 21:18   ` Dave Hansen
@ 2019-06-06 22:04     ` Andy Lutomirski
  2019-06-06 22:08       ` Dave Hansen
  0 siblings, 1 reply; 70+ messages in thread
From: Andy Lutomirski @ 2019-06-06 22:04 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Yu-cheng Yu, x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	linux-kernel, linux-doc, linux-mm, linux-arch, linux-api,
	Arnd Bergmann, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin



On Jun 6, 2019, at 2:18 PM, Dave Hansen <dave.hansen@intel.com> wrote:

>> +/*
>> + * Helpers for changing XSAVES system states.
>> + */
>> +static inline void modify_fpu_regs_begin(void)
>> +{
>> +    fpregs_lock();
>> +    if (test_thread_flag(TIF_NEED_FPU_LOAD))
>> +        __fpregs_load_activate();
>> +}
>> +
>> +static inline void modify_fpu_regs_end(void)
>> +{
>> +    fpregs_unlock();
>> +}
> 
> These are massively under-commented and under-changelogged.  This looks
> like it's intended to ensure that we have supervisor FPU state for this
> task loaded before we go and run the MSRs that might be modifying it.
> 
> But, that seems broken.  If we have supervisor state, we can't always
> defer the load until return to userspace, so we'll never?? have
> TIF_NEED_FPU_LOAD.  That would certainly be true for cet_kernel_state.

Ugh. I was sort of imagining that we would treat supervisor state completely separately from user state.  But can you maybe give examples of exactly what you mean?

> 
> It seems like we actually need three classes of XSAVE states:
> 1. User state

This is FPU, XMM, etc, right?

> 2. Supervisor state that affects user mode

User CET?


> 3. Supervisor state that affects kernel mode

Like supervisor CET?  If we start doing supervisor shadow stack, the context switches will be real fun.  We may need to handle this in asm.

Where does PKRU fit in?  Maybe we can treat it as #3?

—Andy


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v7 04/27] x86/fpu/xstate: Introduce XSAVES system states
  2019-06-06 22:04     ` Andy Lutomirski
@ 2019-06-06 22:08       ` Dave Hansen
  2019-06-06 22:10         ` Yu-cheng Yu
  2019-06-07  1:54         ` Andy Lutomirski
  0 siblings, 2 replies; 70+ messages in thread
From: Dave Hansen @ 2019-06-06 22:08 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Yu-cheng Yu, x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	linux-kernel, linux-doc, linux-mm, linux-arch, linux-api,
	Arnd Bergmann, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin



On 6/6/19 3:04 PM, Andy Lutomirski wrote:
>> But, that seems broken.  If we have supervisor state, we can't 
>> always defer the load until return to userspace, so we'll never?? 
>> have TIF_NEED_FPU_LOAD.  That would certainly be true for 
>> cet_kernel_state.
> 
> Ugh. I was sort of imagining that we would treat supervisor state
 completely separately from user state.  But can you maybe give
examples of exactly what you mean?
> 
>> It seems like we actually need three classes of XSAVE states: 1. 
>> User state
> 
> This is FPU, XMM, etc, right?

Yep.

>> 2. Supervisor state that affects user mode
> 
> User CET?

Yep.

>> 3. Supervisor state that affects kernel mode
> 
> Like supervisor CET?  If we start doing supervisor shadow stack, the 
> context switches will be real fun.  We may need to handle this in 
> asm.

Yeah, that's what I was thinking.

I have the feeling Yu-cheng's patches don't comprehend this since
Sebastian's patches went in after he started working on shadow stacks.

> Where does PKRU fit in?  Maybe we can treat it as #3?

I thought Sebastian added specific PKRU handling to make it always
eager.  It's actually user state that affect kernel mode. :)


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v7 04/27] x86/fpu/xstate: Introduce XSAVES system states
  2019-06-06 22:08       ` Dave Hansen
@ 2019-06-06 22:10         ` Yu-cheng Yu
  2019-06-07  1:54         ` Andy Lutomirski
  1 sibling, 0 replies; 70+ messages in thread
From: Yu-cheng Yu @ 2019-06-06 22:10 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Balbir Singh, Borislav Petkov, Cyrill Gorcunov, Dave Hansen,
	Eugene Syromiatnikov, Florian Weimer, H.J. Lu, Jann Horn,
	Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin

On Thu, 2019-06-06 at 15:08 -0700, Dave Hansen wrote:
> 
> On 6/6/19 3:04 PM, Andy Lutomirski wrote:
> > > But, that seems broken.  If we have supervisor state, we can't 
> > > always defer the load until return to userspace, so we'll never?? 
> > > have TIF_NEED_FPU_LOAD.  That would certainly be true for 
> > > cet_kernel_state.
> > 
> > Ugh. I was sort of imagining that we would treat supervisor state
> 
>  completely separately from user state.  But can you maybe give
> examples of exactly what you mean?
> > 
> > > It seems like we actually need three classes of XSAVE states: 1. 
> > > User state
> > 
> > This is FPU, XMM, etc, right?
> 
> Yep.
> 
> > > 2. Supervisor state that affects user mode
> > 
> > User CET?
> 
> Yep.
> 
> > > 3. Supervisor state that affects kernel mode
> > 
> > Like supervisor CET?  If we start doing supervisor shadow stack, the 
> > context switches will be real fun.  We may need to handle this in 
> > asm.
> 
> Yeah, that's what I was thinking.
> 
> I have the feeling Yu-cheng's patches don't comprehend this since
> Sebastian's patches went in after he started working on shadow stacks.
> 
> > Where does PKRU fit in?  Maybe we can treat it as #3?
> 
> I thought Sebastian added specific PKRU handling to make it always
> eager.  It's actually user state that affect kernel mode. :)

For CET user states, we need to restore before making changes.  If they are not
being changed (i.e. signal handling and syscalls), then they are restored only
before going back to user-mode.

For CET kernel states, we only need to make small changes in the way similar to
the PKRU handling, right?  We'll address it when sending CET kernel-mode
patches.

I will put in more comments as suggested by Dave in earlier emails.

Yu-cheng


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v7 04/27] x86/fpu/xstate: Introduce XSAVES system states
  2019-06-06 22:08       ` Dave Hansen
  2019-06-06 22:10         ` Yu-cheng Yu
@ 2019-06-07  1:54         ` Andy Lutomirski
  1 sibling, 0 replies; 70+ messages in thread
From: Andy Lutomirski @ 2019-06-07  1:54 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Yu-cheng Yu, x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	linux-kernel, linux-doc, linux-mm, linux-arch, linux-api,
	Arnd Bergmann, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue, Dave Martin



> On Jun 6, 2019, at 3:08 PM, Dave Hansen <dave.hansen@intel.com> wrote:
> 
> 
> 
> On 6/6/19 3:04 PM, Andy Lutomirski wrote:
>>> But, that seems broken.  If we have supervisor state, we can't 
>>> always defer the load until return to userspace, so we'll never?? 
>>> have TIF_NEED_FPU_LOAD.  That would certainly be true for 
>>> cet_kernel_state.
>> 
>> Ugh. I was sort of imagining that we would treat supervisor state
> completely separately from user state.  But can you maybe give
> examples of exactly what you mean?

I was imagining a completely separate area in memory for supervisor states.  I guess this might defeat the modified optimization and is probably a bad idea.

>> 
>>> It seems like we actually need three classes of XSAVE states: 1. 
>>> User state
>> 
>> This is FPU, XMM, etc, right?
> 
> Yep.
> 
>>> 2. Supervisor state that affects user mode
>> 
>> User CET?
> 
> Yep.
> 
>>> 3. Supervisor state that affects kernel mode
>> 
>> Like supervisor CET?  If we start doing supervisor shadow stack, the 
>> context switches will be real fun.  We may need to handle this in 
>> asm.
> 
> Yeah, that's what I was thinking.
> 
> I have the feeling Yu-cheng's patches don't comprehend this since
> Sebastian's patches went in after he started working on shadow stacks.

Do we need to have TIF_LOAD_FPU mean “we need to load *some* of the xsave state”?  If so, maybe a bunch of the accessors should have their interfaces reviewed to make sure they’re sill sensible.

> 
>> Where does PKRU fit in?  Maybe we can treat it as #3?
> 
> I thought Sebastian added specific PKRU handling to make it always
> eager.  It's actually user state that affect kernel mode. :)

Indeed.  But, if we document a taxonomy of states, we should make sure it fits in. I guess it’s like supervisor CET except that user code can can also read and write it.

We should probably have self tests that make sure that the correct states, and nothing else, show up in ptrace and signal states, and that trying to write supervisor CET via ptrace and sigreturn is properly rejected.

Just to double check my mental model: it’s okay to XSAVES twice to the same buffer with disjoint RFBM as long as we do something intelligent with XSTATE_BV afterwards, right?  Because, unless we split up the buffers, I think we will have to do this when we context switch while TIF_LOAD_FPU is set.

Are there performance numbers for how the time needed to XRSTORS everything versus the time to XRSTORS supervisor CET and then separately XRSTORS the FPU?  This may affect whether we want context switches to have the new task eagerly or lazily restored.

Hmm. I wonder if we need some way for a selftest to reliably trigger TIF_LOAD_FPU.

—Andy

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v7 05/27] x86/fpu/xstate: Add XSAVES system states for shadow stack
  2019-06-06 20:06 ` [PATCH v7 05/27] x86/fpu/xstate: Add XSAVES system states for shadow stack Yu-cheng Yu
@ 2019-06-07  7:07   ` Peter Zijlstra
  2019-06-07 16:14     ` Yu-cheng Yu
  0 siblings, 1 reply; 70+ messages in thread
From: Peter Zijlstra @ 2019-06-07  7:07 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Randy Dunlap, Ravi V. Shankar,
	Vedvyas Shanbhogue, Dave Martin

On Thu, Jun 06, 2019 at 01:06:24PM -0700, Yu-cheng Yu wrote:
> Intel Control-flow Enforcement Technology (CET) introduces the
> following MSRs.
> 
>     MSR_IA32_U_CET (user-mode CET settings),
>     MSR_IA32_PL3_SSP (user-mode shadow stack),
>     MSR_IA32_PL0_SSP (kernel-mode shadow stack),
>     MSR_IA32_PL1_SSP (Privilege Level 1 shadow stack),
>     MSR_IA32_PL2_SSP (Privilege Level 2 shadow stack).
> 
> Introduce them into XSAVES system states.
> 
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> ---
>  arch/x86/include/asm/fpu/types.h            | 22 +++++++++++++++++++++
>  arch/x86/include/asm/fpu/xstate.h           |  4 +++-
>  arch/x86/include/uapi/asm/processor-flags.h |  2 ++
>  arch/x86/kernel/fpu/xstate.c                | 10 ++++++++++
>  4 files changed, 37 insertions(+), 1 deletion(-)

And yet, no changes to msr-index.h !?


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v7 15/27] mm: Handle shadow stack page fault
  2019-06-06 20:06 ` [PATCH v7 15/27] mm: Handle shadow stack page fault Yu-cheng Yu
@ 2019-06-07  7:30   ` Peter Zijlstra
  0 siblings, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2019-06-07  7:30 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Randy Dunlap, Ravi V. Shankar,
	Vedvyas Shanbhogue, Dave Martin

On Thu, Jun 06, 2019 at 01:06:34PM -0700, Yu-cheng Yu wrote:

> diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
> index 75d9d68a6de7..ffcc0be7cadc 100644
> --- a/include/asm-generic/pgtable.h
> +++ b/include/asm-generic/pgtable.h
> @@ -1188,4 +1188,12 @@ static inline bool arch_has_pfn_modify_check(void)
>  #define mm_pmd_folded(mm)	__is_defined(__PAGETABLE_PMD_FOLDED)
>  #endif
>  
> +#ifndef CONFIG_ARCH_HAS_SHSTK
> +#define pte_set_vma_features(pte, vma) pte
> +#define arch_copy_pte_mapping(vma_flags) false

static inline pte_t pte_set_vma_features(pte_t pte, struct vm_area_struct *vma)
{
	return pte;
}

static inline bool arch_copy_pte_mapping(unsigned long vm_flags)
{
	return false;
}

Please, this way we retain function prototype checking.

> +#else
> +pte_t pte_set_vma_features(pte_t pte, struct vm_area_struct *vma);
> +bool arch_copy_pte_mapping(vm_flags_t vm_flags);
> +#endif


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v7 18/27] mm: Introduce do_mmap_locked()
  2019-06-06 20:06 ` [PATCH v7 18/27] mm: Introduce do_mmap_locked() Yu-cheng Yu
@ 2019-06-07  7:43   ` Peter Zijlstra
  2019-06-07  7:47     ` Peter Zijlstra
  0 siblings, 1 reply; 70+ messages in thread
From: Peter Zijlstra @ 2019-06-07  7:43 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Randy Dunlap, Ravi V. Shankar,
	Vedvyas Shanbhogue, Dave Martin

On Thu, Jun 06, 2019 at 01:06:37PM -0700, Yu-cheng Yu wrote:
> There are a few places that need do_mmap() with mm->mmap_sem held.
> Create an in-line function for that.
> 
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> ---
>  include/linux/mm.h | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 398f1e1c35e5..7cf014604848 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2411,6 +2411,24 @@ static inline void mm_populate(unsigned long addr, unsigned long len)
>  static inline void mm_populate(unsigned long addr, unsigned long len) {}
>  #endif
>  
> +static inline unsigned long do_mmap_locked(unsigned long addr,
> +	unsigned long len, unsigned long prot, unsigned long flags,
> +	vm_flags_t vm_flags)
> +{
> +	struct mm_struct *mm = current->mm;
> +	unsigned long populate;
> +
> +	down_write(&mm->mmap_sem);
> +	addr = do_mmap(NULL, addr, len, prot, flags, vm_flags, 0,
> +		       &populate, NULL);

Funny thing how do_mmap() takes a file pointer as first argument and
this thing explicitly NULLs that. That more or less invalidates the name
do_mmap_locked().

> +	up_write(&mm->mmap_sem);
> +
> +	if (populate)
> +		mm_populate(addr, populate);
> +
> +	return addr;
> +}



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v7 18/27] mm: Introduce do_mmap_locked()
  2019-06-07  7:43   ` Peter Zijlstra
@ 2019-06-07  7:47     ` Peter Zijlstra
  2019-06-07 16:16       ` Yu-cheng Yu
  0 siblings, 1 reply; 70+ messages in thread
From: Peter Zijlstra @ 2019-06-07  7:47 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Randy Dunlap, Ravi V. Shankar,
	Vedvyas Shanbhogue, Dave Martin

On Fri, Jun 07, 2019 at 09:43:22AM +0200, Peter Zijlstra wrote:
> On Thu, Jun 06, 2019 at 01:06:37PM -0700, Yu-cheng Yu wrote:
> > There are a few places that need do_mmap() with mm->mmap_sem held.
> > Create an in-line function for that.
> > 
> > Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> > ---
> >  include/linux/mm.h | 18 ++++++++++++++++++
> >  1 file changed, 18 insertions(+)
> > 
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 398f1e1c35e5..7cf014604848 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -2411,6 +2411,24 @@ static inline void mm_populate(unsigned long addr, unsigned long len)
> >  static inline void mm_populate(unsigned long addr, unsigned long len) {}
> >  #endif
> >  
> > +static inline unsigned long do_mmap_locked(unsigned long addr,
> > +	unsigned long len, unsigned long prot, unsigned long flags,
> > +	vm_flags_t vm_flags)
> > +{
> > +	struct mm_struct *mm = current->mm;
> > +	unsigned long populate;
> > +
> > +	down_write(&mm->mmap_sem);
> > +	addr = do_mmap(NULL, addr, len, prot, flags, vm_flags, 0,
> > +		       &populate, NULL);
> 
> Funny thing how do_mmap() takes a file pointer as first argument and
> this thing explicitly NULLs that. That more or less invalidates the name
> do_mmap_locked().
> 
> > +	up_write(&mm->mmap_sem);
> > +
> > +	if (populate)
> > +		mm_populate(addr, populate);
> > +
> > +	return addr;
> > +}

You also don't retain that last @uf argument.

I'm thikning you're better off adding a helper to the cet.c file; call
it cet_mmap() or whatever.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v7 23/27] x86/cet/shstk: ELF header parsing of Shadow Stack
  2019-06-06 20:06 ` [PATCH v7 23/27] x86/cet/shstk: ELF header parsing of Shadow Stack Yu-cheng Yu
@ 2019-06-07  7:54   ` Peter Zijlstra
  0 siblings, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2019-06-07  7:54 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Randy Dunlap, Ravi V. Shankar,
	Vedvyas Shanbhogue, Dave Martin

On Thu, Jun 06, 2019 at 01:06:42PM -0700, Yu-cheng Yu wrote:

> +#ifdef CONFIG_ARCH_USE_GNU_PROPERTY
> +int arch_setup_property(void *ehdr, void *phdr, struct file *f, bool inter)
> +{
> +	int r;
> +	uint32_t property;

Flip those two lines around.

> +
> +	r = get_gnu_property(ehdr, phdr, f, GNU_PROPERTY_X86_FEATURE_1_AND,
> +			     &property);
> +
> +	memset(&current->thread.cet, 0, sizeof(struct cet_status));

It seems to me that memset would be better placed before
get_gnu_property().

> +	if (r)
> +		return r;
> +
> +	if (cpu_feature_enabled(X86_FEATURE_SHSTK)) {

	if (r || !cpu_feature_enabled())
		return r;

> +		if (property & GNU_PROPERTY_X86_FEATURE_1_SHSTK)
> +			r = cet_setup_shstk();
> +		if (r < 0)
> +			return r;
> +	}
> +	return r;

and loose the indent.

> +}
> +#endif
> -- 
> 2.17.1
> 


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v7 22/27] binfmt_elf: Extract .note.gnu.property from an ELF file
  2019-06-06 20:06 ` [PATCH v7 22/27] binfmt_elf: Extract .note.gnu.property from an ELF file Yu-cheng Yu
@ 2019-06-07  7:58   ` Peter Zijlstra
  2019-06-07 16:17     ` Yu-cheng Yu
  2019-06-07 18:01   ` Dave Martin
  1 sibling, 1 reply; 70+ messages in thread
From: Peter Zijlstra @ 2019-06-07  7:58 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Randy Dunlap, Ravi V. Shankar,
	Vedvyas Shanbhogue, Dave Martin

On Thu, Jun 06, 2019 at 01:06:41PM -0700, Yu-cheng Yu wrote:
> An ELF file's .note.gnu.property indicates features the executable file
> can support.  For example, the property GNU_PROPERTY_X86_FEATURE_1_AND
> indicates the file supports GNU_PROPERTY_X86_FEATURE_1_IBT and/or
> GNU_PROPERTY_X86_FEATURE_1_SHSTK.
> 
> With this patch, if an arch needs to setup features from ELF properties,
> it needs CONFIG_ARCH_USE_GNU_PROPERTY to be set, and a specific
> arch_setup_property().
> 
> For example, for X86_64:
> 
> int arch_setup_property(void *ehdr, void *phdr, struct file *f, bool inter)
> {
> 	int r;
> 	uint32_t property;
> 
> 	r = get_gnu_property(ehdr, phdr, f, GNU_PROPERTY_X86_FEATURE_1_AND,
> 			     &property);
> 	...
> }
> 
> Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>

Did HJ write this patch as suggested by that SoB chain? If so, you lost
a From: line on top, if not, the SoB thing is invalid.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v7 05/27] x86/fpu/xstate: Add XSAVES system states for shadow stack
  2019-06-07  7:07   ` Peter Zijlstra
@ 2019-06-07 16:14     ` Yu-cheng Yu
  0 siblings, 0 replies; 70+ messages in thread
From: Yu-cheng Yu @ 2019-06-07 16:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Randy Dunlap, Ravi V. Shankar,
	Vedvyas Shanbhogue, Dave Martin

On Fri, 2019-06-07 at 09:07 +0200, Peter Zijlstra wrote:
> On Thu, Jun 06, 2019 at 01:06:24PM -0700, Yu-cheng Yu wrote:
> > Intel Control-flow Enforcement Technology (CET) introduces the
> > following MSRs.
> > 
> >     MSR_IA32_U_CET (user-mode CET settings),
> >     MSR_IA32_PL3_SSP (user-mode shadow stack),
> >     MSR_IA32_PL0_SSP (kernel-mode shadow stack),
> >     MSR_IA32_PL1_SSP (Privilege Level 1 shadow stack),
> >     MSR_IA32_PL2_SSP (Privilege Level 2 shadow stack).
> > 
> > Introduce them into XSAVES system states.
> > 
> > Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> > ---
> >  arch/x86/include/asm/fpu/types.h            | 22 +++++++++++++++++++++
> >  arch/x86/include/asm/fpu/xstate.h           |  4 +++-
> >  arch/x86/include/uapi/asm/processor-flags.h |  2 ++
> >  arch/x86/kernel/fpu/xstate.c                | 10 ++++++++++
> >  4 files changed, 37 insertions(+), 1 deletion(-)
> 
> And yet, no changes to msr-index.h !?

You are right.  I will move msr-index.h changes to here.

Yu-cheng


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v7 18/27] mm: Introduce do_mmap_locked()
  2019-06-07  7:47     ` Peter Zijlstra
@ 2019-06-07 16:16       ` Yu-cheng Yu
  0 siblings, 0 replies; 70+ messages in thread
From: Yu-cheng Yu @ 2019-06-07 16:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Randy Dunlap, Ravi V. Shankar,
	Vedvyas Shanbhogue, Dave Martin

On Fri, 2019-06-07 at 09:47 +0200, Peter Zijlstra wrote:
> On Fri, Jun 07, 2019 at 09:43:22AM +0200, Peter Zijlstra wrote:
> > On Thu, Jun 06, 2019 at 01:06:37PM -0700, Yu-cheng Yu wrote:
> > > There are a few places that need do_mmap() with mm->mmap_sem held.
> > > Create an in-line function for that.
> > > 
> > > Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> > > ---
> > >  include/linux/mm.h | 18 ++++++++++++++++++
> > >  1 file changed, 18 insertions(+)
> > > 
> > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > index 398f1e1c35e5..7cf014604848 100644
> > > --- a/include/linux/mm.h
> > > +++ b/include/linux/mm.h
> > > @@ -2411,6 +2411,24 @@ static inline void mm_populate(unsigned long addr,
> > > unsigned long len)
> > >  static inline void mm_populate(unsigned long addr, unsigned long len) {}
> > >  #endif
> > >  
> > > +static inline unsigned long do_mmap_locked(unsigned long addr,
> > > +	unsigned long len, unsigned long prot, unsigned long flags,
> > > +	vm_flags_t vm_flags)
> > > +{
> > > +	struct mm_struct *mm = current->mm;
> > > +	unsigned long populate;
> > > +
> > > +	down_write(&mm->mmap_sem);
> > > +	addr = do_mmap(NULL, addr, len, prot, flags, vm_flags, 0,
> > > +		       &populate, NULL);
> > 
> > Funny thing how do_mmap() takes a file pointer as first argument and
> > this thing explicitly NULLs that. That more or less invalidates the name
> > do_mmap_locked().
> > 
> > > +	up_write(&mm->mmap_sem);
> > > +
> > > +	if (populate)
> > > +		mm_populate(addr, populate);
> > > +
> > > +	return addr;
> > > +}
> 
> You also don't retain that last @uf argument.
> 
> I'm thikning you're better off adding a helper to the cet.c file; call
> it cet_mmap() or whatever.

Ok, I will fix that.

Yu-cheng


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v7 22/27] binfmt_elf: Extract .note.gnu.property from an ELF file
  2019-06-07  7:58   ` Peter Zijlstra
@ 2019-06-07 16:17     ` Yu-cheng Yu
  0 siblings, 0 replies; 70+ messages in thread
From: Yu-cheng Yu @ 2019-06-07 16:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Randy Dunlap, Ravi V. Shankar,
	Vedvyas Shanbhogue, Dave Martin

On Fri, 2019-06-07 at 09:58 +0200, Peter Zijlstra wrote:
> On Thu, Jun 06, 2019 at 01:06:41PM -0700, Yu-cheng Yu wrote:
> > An ELF file's .note.gnu.property indicates features the executable file
> > can support.  For example, the property GNU_PROPERTY_X86_FEATURE_1_AND
> > indicates the file supports GNU_PROPERTY_X86_FEATURE_1_IBT and/or
> > GNU_PROPERTY_X86_FEATURE_1_SHSTK.
> > 
> > With this patch, if an arch needs to setup features from ELF properties,
> > it needs CONFIG_ARCH_USE_GNU_PROPERTY to be set, and a specific
> > arch_setup_property().
> > 
> > For example, for X86_64:
> > 
> > int arch_setup_property(void *ehdr, void *phdr, struct file *f, bool inter)
> > {
> > 	int r;
> > 	uint32_t property;
> > 
> > 	r = get_gnu_property(ehdr, phdr, f, GNU_PROPERTY_X86_FEATURE_1_AND,
> > 			     &property);
> > 	...
> > }
> > 
> > Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
> > Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> 
> Did HJ write this patch as suggested by that SoB chain? If so, you lost
> a From: line on top, if not, the SoB thing is invalid.

I will fix that.

Yu-cheng


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v7 22/27] binfmt_elf: Extract .note.gnu.property from an ELF file
  2019-06-06 20:06 ` [PATCH v7 22/27] binfmt_elf: Extract .note.gnu.property from an ELF file Yu-cheng Yu
  2019-06-07  7:58   ` Peter Zijlstra
@ 2019-06-07 18:01   ` Dave Martin
  2019-06-10 16:29     ` Yu-cheng Yu
  1 sibling, 1 reply; 70+ messages in thread
From: Dave Martin @ 2019-06-07 18:01 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue

On Thu, Jun 06, 2019 at 01:06:41PM -0700, Yu-cheng Yu wrote:
> An ELF file's .note.gnu.property indicates features the executable file
> can support.  For example, the property GNU_PROPERTY_X86_FEATURE_1_AND
> indicates the file supports GNU_PROPERTY_X86_FEATURE_1_IBT and/or
> GNU_PROPERTY_X86_FEATURE_1_SHSTK.
> 
> With this patch, if an arch needs to setup features from ELF properties,
> it needs CONFIG_ARCH_USE_GNU_PROPERTY to be set, and a specific
> arch_setup_property().
> 
> For example, for X86_64:
> 
> int arch_setup_property(void *ehdr, void *phdr, struct file *f, bool inter)
> {
> 	int r;
> 	uint32_t property;
> 
> 	r = get_gnu_property(ehdr, phdr, f, GNU_PROPERTY_X86_FEATURE_1_AND,
> 			     &property);
> 	...
> }

Although this code works for the simple case, I have some concerns about
some aspects of the implementation here.  There appear to be some bounds
checking / buffer overrun issues, and the code seems quite complex.

Maybe this patch tries too hard to be compatible with toolchains that do
silly things such as embedding huge notes in an executable, or mixing
NT_GNU_PROPERTY_TYPE_0 in a single PT_NOTE with a load of junk not
relevant to the loader.  I wonder whether Linux can dictate what
interpretation(s) of the ELF specs it is prepared to support, rather than
trying to support absolutely anything.


I've commented on some potential issues below, but my review isn't
exhaustive -- I may also have simply not understood the code in some
cases, so I apologise in advance for that!

I've also marked a few coding style nits that make the code harder to
read than necessary (but this is partly a matter of taste).

Comments below.

Cheers
---Dave

> 
> Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> ---
>  fs/Kconfig.binfmt        |   3 +
>  fs/Makefile              |   1 +
>  fs/binfmt_elf.c          |  13 ++
>  fs/gnu_property.c        | 351 +++++++++++++++++++++++++++++++++++++++
>  include/linux/elf.h      |  12 ++
>  include/uapi/linux/elf.h |  14 ++
>  6 files changed, 394 insertions(+)
>  create mode 100644 fs/gnu_property.c
> 
> diff --git a/fs/Kconfig.binfmt b/fs/Kconfig.binfmt
> index f87ddd1b6d72..397138ab305b 100644
> --- a/fs/Kconfig.binfmt
> +++ b/fs/Kconfig.binfmt
> @@ -36,6 +36,9 @@ config COMPAT_BINFMT_ELF
>  config ARCH_BINFMT_ELF_STATE
>  	bool
>  
> +config ARCH_USE_GNU_PROPERTY
> +	bool
> +
>  config BINFMT_ELF_FDPIC
>  	bool "Kernel support for FDPIC ELF binaries"
>  	default y if !BINFMT_ELF
> diff --git a/fs/Makefile b/fs/Makefile
> index c9aea23aba56..b69f18c14e09 100644
> --- a/fs/Makefile
> +++ b/fs/Makefile
> @@ -44,6 +44,7 @@ obj-$(CONFIG_BINFMT_ELF)	+= binfmt_elf.o
>  obj-$(CONFIG_COMPAT_BINFMT_ELF)	+= compat_binfmt_elf.o
>  obj-$(CONFIG_BINFMT_ELF_FDPIC)	+= binfmt_elf_fdpic.o
>  obj-$(CONFIG_BINFMT_FLAT)	+= binfmt_flat.o
> +obj-$(CONFIG_ARCH_USE_GNU_PROPERTY) += gnu_property.o
>  
>  obj-$(CONFIG_FS_MBCACHE)	+= mbcache.o
>  obj-$(CONFIG_FS_POSIX_ACL)	+= posix_acl.o
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index 8264b468f283..c3ea73787e93 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -1080,6 +1080,19 @@ static int load_elf_binary(struct linux_binprm *bprm)
>  		goto out_free_dentry;
>  	}
>  
> +	if (interpreter) {
> +		retval = arch_setup_property(&loc->interp_elf_ex,
> +					     interp_elf_phdata,
> +					     interpreter, true);
> +	} else {
> +		retval = arch_setup_property(&loc->elf_ex,
> +					     elf_phdata,
> +					     bprm->file, false);
> +	}
> +
> +	if (retval < 0)
> +		goto out_free_dentry;
> +
>  	if (interpreter) {
>  		unsigned long interp_map_addr = 0;
>  
> diff --git a/fs/gnu_property.c b/fs/gnu_property.c
> new file mode 100644
> index 000000000000..9c4d1d5ebf00
> --- /dev/null
> +++ b/fs/gnu_property.c
> @@ -0,0 +1,351 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Extract an ELF file's .note.gnu.property.
> + *
> + * The path from the ELF header to the note section is the following:
> + * elfhdr->elf_phdr->elf_note->property[].
> + */
> +
> +#include <uapi/linux/elf-em.h>
> +#include <linux/processor.h>
> +#include <linux/binfmts.h>
> +#include <linux/elf.h>
> +#include <linux/slab.h>
> +#include <linux/fs.h>
> +#include <linux/uaccess.h>
> +#include <linux/string.h>
> +#include <linux/compat.h>
> +
> +/*
> + * The .note.gnu.property layout:
> + *
> + *	struct elf_note {
> + *		u32 n_namesz; --> sizeof(n_name[]); always (4)
> + *		u32 n_ndescsz;--> sizeof(property[])
> + *		u32 n_type;   --> always NT_GNU_PROPERTY_TYPE_0
> + *	};
> + *	char n_name[4]; --> always 'GNU\0'
> + *
> + *	struct {
> + *		struct gnu_property {
> + *			u32 pr_type;
> + *			u32 pr_datasz;
> + *		};
> + *		u8 pr_data[pr_datasz];
> + *	}[];
> + */
> +
> +#define BUF_SIZE (PAGE_SIZE / 4)

Nit: magic number in disguise.  What does the size of ELF notes have
to do with the page size?

> +
> +typedef bool (test_item_fn)(void *buf, u32 *arg, u32 type);
> +typedef void *(next_item_fn)(void *buf, u32 *arg, u32 type);
> +
> +static inline bool test_note_type(void *buf, u32 *align, u32 note_type)
> +{
> +	struct elf_note *n = buf;
> +
> +	return ((n->n_type == note_type) && (n->n_namesz == 4) &&
> +		(memcmp(n + 1, "GNU", 4) == 0));
> +}
> +
> +static inline void *next_note(void *buf, u32 *align, u32 note_type)
> +{
> +	struct elf_note *n = buf;
> +	u64 size;
> +
> +	if (check_add_overflow((u64)sizeof(*n), (u64)n->n_namesz, &size))
> +		return NULL;

sizeof(*n) is a small integer under our control, and n->n_namesz is a
u32.

So, I'm not sure how we would overflow 64 bits here, although if we can
get arbitrarily close to ~(u64)0 then:

> +
> +	size = round_up(size, *align);

this can overflow too.

> +
> +	if (check_add_overflow(size, (u64)n->n_descsz, &size))
> +		return NULL;
> +
> +	size = round_up(size, *align);

Similarly here.

> +
> +	if (buf + size < buf)

Isn't this undefined behaviour of it overflows?  If so, the compiler can
probably delete the check entirely, making it useless.  Does UBSAN warn
about it?

> +		return NULL;
> +	else
> +		return (buf + size);

Nit: Unnecessary ()  (There are surplus () all over this patch; I won't
comment on them all.)

> +}
> +
> +static inline bool test_property(void *buf, u32 *max_type, u32 pr_type)
> +{
> +	struct gnu_property *pr = buf;
> +
> +	/*
> +	 * Property types must be in ascending order.
> +	 * Keep track of the max when testing each.
> +	 */
> +	if (pr->pr_type > *max_type)
> +		*max_type = pr->pr_type;

Is this worthwhile?  In general we don't try very hard to check that the
ELF file is well-formed.

Ideally we could search by binary chop, but the property size is
variable, so the sortedness is useless to us (yay).

> +
> +	return (pr->pr_type == pr_type);
> +}
> +
> +static inline void *next_property(void *buf, u32 *max_type, u32 pr_type)

Nit: does this need to be inline?  The compiler's guess is usually good
enough...

> +{
> +	struct gnu_property *pr = buf;
> +
> +	if ((buf + sizeof(*pr) +  pr->pr_datasz < buf) ||

Nit: random extra space, redundant (), etc.

> +	    (pr->pr_type > pr_type) ||
> +	    (pr->pr_type > *max_type))
> +		return NULL;
> +	else
> +		return (buf + sizeof(*pr) + pr->pr_datasz);

We can exceed the underlying buffer bounds here, which is technically
undefined behaviour.

I suspect we may be relying on similar tricks all over the kernel, but
IT MAy be best avoided anyway.


If we always pass in the buffer base pointer and the size of the buffer, say

	static int next_property(void *buf, size_t *offset,
						size_t bufsz, ...)

then we may be able to use direct comparisons that can't overflow
rather than relying on potentially undefined behaviour.  For example:

	size_t o = *offset;

	if (o > bufsz || sizeof (*pr) > bufsz - o)
		return -1;

	pr = buf + o;
	if (pr->pr_type > pr_type || pr->pr_type > *max_type)
		return -1;

	if (pr->pr_datasz > bufsz - o - sizeof (*pr))
		return -1;

	*offset = o + sizeof (*pr) + pr->pr_datasz;
	return 0;

(There may be neater ways to do this.)

> +}
> +
> +/*
> + * Scan 'buf' for a pattern; return true if found.
> + * *pos is the distance from the beginning of buf to where
> + * the searched item or the next item is located.
> + */
> +static int scan(u8 *buf, u32 buf_size, int item_size, test_item_fn test_item,
> +		next_item_fn next_item, u32 *arg, u32 type, u32 *pos)
> +{
> +	int found = 0;
> +	u8 *p, *max;
> +
> +	max = buf + buf_size;
> +	if (max < buf)

See comment about undefined behaviour above.

Also, I'm not sure this check adds anything.  We know buf_size is
<= BUF_SIZE (though we could stick a WARN_ON() here and bail out if we
want to make absolutely sure).

If buf is always the base pointer returned by kmalloc(BUF_SIZE), then
I think buf_size can never go outside its bounds?

> +		return 0;
> +
> +	p = buf;
> +
> +	while ((p + item_size < max) && (p + item_size > buf)) {

                           ^ <= ?                   ^ undefined behaviour?

> +		if (test_item(p, arg, type)) {
> +			found = 1;
> +			break;
> +		}
> +
> +		p = next_item(p, arg, type);
> +	}
> +
> +	*pos = (p + item_size <= buf) ? 0 : (u32)(p - buf);

Can this be written more simply, say:

	if (p + item_size > buf)
		*pos += p - buf;

Also, since next_property() adds pr_datasz onto buf, could we get
unlucky and wrap past (void *)~0UL?  Then (u32)(p - buf) may be giant.
Not sure whether this breaks code elsewhere.

> +	return found;
> +}
> +
> +/*
> + * Search an NT_GNU_PROPERTY_TYPE_0 for the property of 'pr_type'.
> + */
> +static int find_property(struct file *file, unsigned long desc_size,
> +			 loff_t file_offset, u8 *buf,
> +			 u32 pr_type, u32 *property)
> +{
> +	u32 buf_pos;
> +	unsigned long read_size;
> +	unsigned long done;
> +	int found = 0;
> +	int ret = 0;
> +	u32 last_pr = 0;
> +
> +	*property = 0;
> +	buf_pos = 0;
> +
> +	for (done = 0; done < desc_size; done += buf_pos) {
> +		read_size = desc_size - done;
> +		if (read_size > BUF_SIZE)
> +			read_size = BUF_SIZE;
> +
> +		ret = kernel_read(file, buf, read_size, &file_offset);
> +
> +		if (ret != read_size)
> +			return (ret < 0) ? ret : -EIO;
> +
> +		ret = 0;
> +		found = scan(buf, read_size, sizeof(struct gnu_property),
> +			     test_property, next_property,
> +			     &last_pr, pr_type, &buf_pos);
> +
> +		if ((!buf_pos) || found)
> +			break;
> +
> +		file_offset += buf_pos - read_size;
> +	}
> +
> +	if (found) {
> +		struct gnu_property *pr =
> +			(struct gnu_property *)(buf + buf_pos);
> +
> +		if (pr->pr_datasz == 4) {
> +			u32 *max =  (u32 *)(buf + read_size);
> +			u32 *data = (u32 *)((u8 *)pr + sizeof(*pr));
> +
> +			if (data + 1 <= max) {
> +				*property = *data;
> +			} else {
> +				file_offset += buf_pos - read_size;
> +				file_offset += sizeof(*pr);
> +				ret = kernel_read(file, property, 4,
> +						  &file_offset);
> +			}
> +		}
> +	}
> +
> +	return ret;
> +}
> +
> +/*
> + * Search a PT_NOTE segment for NT_GNU_PROPERTY_TYPE_0.
> + */
> +static int find_note_type_0(struct file *file, loff_t file_offset,
> +			    unsigned long note_size, u32 align,
> +			    u32 pr_type, u32 *property)
> +{
> +	u8 *buf;
> +	u32 buf_pos;
> +	unsigned long read_size;
> +	unsigned long done;
> +	int found = 0;
> +	int ret = 0;
> +
> +	buf = kmalloc(BUF_SIZE, GFP_KERNEL);
> +	if (!buf)
> +		return -ENOMEM;

Do we really need to alloc/free this once per note?

> +
> +	*property = 0;
> +	buf_pos = 0;
> +
> +	for (done = 0; done < note_size; done += buf_pos) {
> +		read_size = note_size - done;
> +		if (read_size > BUF_SIZE)
> +			read_size = BUF_SIZE;
> +
> +		ret = kernel_read(file, buf, read_size, &file_offset);
> +
> +		if (ret != read_size) {
> +			ret = (ret < 0) ? ret : -EIO;
> +			kfree(buf);
> +			return ret;
> +		}
> +
> +		/*
> +		 * item_size = sizeof(struct elf_note) + elf_note.n_namesz.
> +		 * n_namesz is 4 for the note type we look for.
> +		 */
> +		ret = scan(buf, read_size, sizeof(struct elf_note) + 4,
> +			      test_note_type, next_note,
> +			      &align, NT_GNU_PROPERTY_TYPE_0, &buf_pos);
> +
> +		file_offset += buf_pos - read_size;
> +
> +		if (ret && !found) {
> +			struct elf_note *n =
> +				(struct elf_note *)(buf + buf_pos);
> +			u64 start = round_up(sizeof(*n) + n->n_namesz, align);
> +			u64 total = 0;
> +
> +			if (check_add_overflow(start, (u64)n->n_descsz, &total)) {
> +				ret = -EINVAL;
> +				break;
> +			}
> +			total = round_up(total, align);
> +
> +			ret = find_property(file, n->n_descsz,
> +					    file_offset + start,
> +					    buf, pr_type, property);
> +			found++;
> +			file_offset += total;
> +			buf_pos += total;
> +		} else if (!buf_pos || ret) {
> +			ret = 0;
> +			*property = 0;
> +			break;
> +		}
> +	}

Do we really need this complexity?  How big are the notes realistically
going to be?

Since a file with bloated notes is going to be inefficient to exec
anyway if we have to scan all the way through them, would it be better
just to choke on it and force the toolchain to do something more
sensible?

This in one reason why it would be good for the kernel to require
PT_GNU_PROPERTY if possible, so we know the precise offset and size
without having to search...

> +
> +	kfree(buf);
> +	return ret;
> +}
> +
> +/*
> + * Look at an ELF file's PT_NOTE segments, then NT_GNU_PROPERTY_TYPE_0, then
> + * the property of pr_type.
> + *
> + * Input:
> + *	file: the file to search;
> + *	phdr: the file's elf header;
> + *	phnum: number of entries in phdr;
> + *	pr_type: the property type.
> + *
> + * Output:
> + *	The property found.
> + *
> + * Return:
> + *	Zero or error.
> + */
> +static int scan_segments_64(struct file *file, struct elf64_phdr *phdr,
> +			    int phnum, u32 pr_type, u32 *property)
> +{
> +	int i;
> +	int err = 0;
> +
> +	for (i = 0; i < phnum; i++, phdr++) {
> +		if ((phdr->p_type != PT_NOTE) || (phdr->p_align != 8))
> +			continue;
> +
> +		/*
> +		 * Search the PT_NOTE segment for NT_GNU_PROPERTY_TYPE_0.
> +		 */
> +		err = find_note_type_0(file, phdr->p_offset, phdr->p_filesz,
> +				       phdr->p_align, pr_type, property);
> +		if (err)
> +			return err;
> +	}
> +
> +	return 0;
> +}
> +
> +static int scan_segments_32(struct file *file, struct elf32_phdr *phdr,
> +			    int phnum, u32 pr_type, u32 *property)
> +{
> +	int i;
> +	int err = 0;
> +
> +	for (i = 0; i < phnum; i++, phdr++) {
> +		if ((phdr->p_type != PT_NOTE) || (phdr->p_align != 4))
> +			continue;
> +
> +		/*
> +		 * Search the PT_NOTE segment for NT_GNU_PROPERTY_TYPE_0.
> +		 */
> +		err = find_note_type_0(file, phdr->p_offset, phdr->p_filesz,
> +				       phdr->p_align, pr_type, property);
> +		if (err)
> +			return err;
> +	}
> +
> +	return 0;
> +}
> +
> +int get_gnu_property(void *ehdr_p, void *phdr_p, struct file *f,
> +		     u32 pr_type, u32 *property)
> +{
> +	struct elf64_hdr *ehdr64 = ehdr_p;
> +	int err = 0;
> +
> +	*property = 0;
> +
> +	if (ehdr64->e_ident[EI_CLASS] == ELFCLASS64) {
> +		struct elf64_phdr *phdr64 = phdr_p;
> +
> +		err = scan_segments_64(f, phdr64, ehdr64->e_phnum,
> +				       pr_type, property);
> +		if (err < 0)
> +			goto out;
> +	} else {
> +		struct elf32_hdr *ehdr32 = ehdr_p;
> +
> +		if (ehdr32->e_ident[EI_CLASS] == ELFCLASS32) {
> +			struct elf32_phdr *phdr32 = phdr_p;
> +
> +			err = scan_segments_32(f, phdr32, ehdr32->e_phnum,
> +					       pr_type, property);
> +			if (err < 0)
> +				goto out;
> +		}
> +	}
> +
> +out:
> +	return err;
> +}
> diff --git a/include/linux/elf.h b/include/linux/elf.h
> index e3649b3e970e..c15febebe7f2 100644
> --- a/include/linux/elf.h
> +++ b/include/linux/elf.h
> @@ -56,4 +56,16 @@ static inline int elf_coredump_extra_notes_write(struct coredump_params *cprm) {
>  extern int elf_coredump_extra_notes_size(void);
>  extern int elf_coredump_extra_notes_write(struct coredump_params *cprm);
>  #endif
> +
> +#ifdef CONFIG_ARCH_USE_GNU_PROPERTY
> +extern int arch_setup_property(void *ehdr, void *phdr, struct file *f,
> +			       bool interp);
> +extern int get_gnu_property(void *ehdr_p, void *phdr_p, struct file *f,
> +			    u32 pr_type, u32 *feature);
> +#else
> +static inline int arch_setup_property(void *ehdr, void *phdr, struct file *f,
> +				      bool interp) { return 0; }
> +static inline int get_gnu_property(void *ehdr_p, void *phdr_p, struct file *f,
> +				   u32 pr_type, u32 *feature) { return 0; }
> +#endif
>  #endif /* _LINUX_ELF_H */
> diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
> index 34c02e4290fe..316177ce9e76 100644
> --- a/include/uapi/linux/elf.h
> +++ b/include/uapi/linux/elf.h
> @@ -372,6 +372,7 @@ typedef struct elf64_shdr {
>  #define NT_PRFPREG	2
>  #define NT_PRPSINFO	3
>  #define NT_TASKSTRUCT	4
> +#define NT_GNU_PROPERTY_TYPE_0 5

Should this be in a separate block.  This required n_name = "GNU",
whereas the rest are "LINUX" notes AFAIK: it's really a separate
namespace.

I think the gap between 4 and 6 may be just coincidence: glibc's elf.h
already has NT_PLATFORM here (whatever that is).

>  #define NT_AUXV		6
>  /*
>   * Note to userspace developers: size of NT_SIGINFO note may increase
> @@ -443,4 +444,17 @@ typedef struct elf64_note {
>    Elf64_Word n_type;	/* Content type */
>  } Elf64_Nhdr;
>  
> +/* NT_GNU_PROPERTY_TYPE_0 header */
> +struct gnu_property {
> +  __u32 pr_type;
> +  __u32 pr_datasz;
> +};
> +
> +/* .note.gnu.property types */
> +#define GNU_PROPERTY_X86_FEATURE_1_AND		(0xc0000002)
> +
> +/* Bits of GNU_PROPERTY_X86_FEATURE_1_AND */
> +#define GNU_PROPERTY_X86_FEATURE_1_IBT		(0x00000001)
> +#define GNU_PROPERTY_X86_FEATURE_1_SHSTK	(0x00000002)
> +

Redundant ().  The rest of the file doesn't have them; can we conform to
the prevailing style there?

>  #endif /* _UAPI_LINUX_ELF_H */
> -- 
> 2.17.1
> 


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v7 22/27] binfmt_elf: Extract .note.gnu.property from an ELF file
  2019-06-07 18:01   ` Dave Martin
@ 2019-06-10 16:29     ` Yu-cheng Yu
  2019-06-10 16:57       ` Dave Martin
  2019-06-10 17:24       ` Florian Weimer
  0 siblings, 2 replies; 70+ messages in thread
From: Yu-cheng Yu @ 2019-06-10 16:29 UTC (permalink / raw)
  To: Dave Martin
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue

On Fri, 2019-06-07 at 19:01 +0100, Dave Martin wrote:
> On Thu, Jun 06, 2019 at 01:06:41PM -0700, Yu-cheng Yu wrote:
> > An ELF file's .note.gnu.property indicates features the executable file
> > can support.  For example, the property GNU_PROPERTY_X86_FEATURE_1_AND
> > indicates the file supports GNU_PROPERTY_X86_FEATURE_1_IBT and/or
> > GNU_PROPERTY_X86_FEATURE_1_SHSTK.
> > 
> > With this patch, if an arch needs to setup features from ELF properties,
> > it needs CONFIG_ARCH_USE_GNU_PROPERTY to be set, and a specific
> > arch_setup_property().
> > 
> > For example, for X86_64:
> > 
> > int arch_setup_property(void *ehdr, void *phdr, struct file *f, bool inter)
> > {
> > 	int r;
> > 	uint32_t property;
> > 
> > 	r = get_gnu_property(ehdr, phdr, f, GNU_PROPERTY_X86_FEATURE_1_AND,
> > 			     &property);
> > 	...
> > }
> 
> Although this code works for the simple case, I have some concerns about
> some aspects of the implementation here.  There appear to be some bounds
> checking / buffer overrun issues, and the code seems quite complex.
> 
> Maybe this patch tries too hard to be compatible with toolchains that do
> silly things such as embedding huge notes in an executable, or mixing
> NT_GNU_PROPERTY_TYPE_0 in a single PT_NOTE with a load of junk not
> relevant to the loader.  I wonder whether Linux can dictate what
> interpretation(s) of the ELF specs it is prepared to support, rather than
> trying to support absolutely anything.

To me, looking at PT_GNU_PROPERTY and not trying to support anything is a
logical choice.  And it breaks only a limited set of toolchains.

I will simplify the parser and leave this patch as-is for anyone who wants to
back-port.  Are there any objections or concerns?

Yu-cheng


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v7 22/27] binfmt_elf: Extract .note.gnu.property from an ELF file
  2019-06-10 16:29     ` Yu-cheng Yu
@ 2019-06-10 16:57       ` Dave Martin
  2019-06-10 17:24       ` Florian Weimer
  1 sibling, 0 replies; 70+ messages in thread
From: Dave Martin @ 2019-06-10 16:57 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue

On Mon, Jun 10, 2019 at 09:29:04AM -0700, Yu-cheng Yu wrote:
> On Fri, 2019-06-07 at 19:01 +0100, Dave Martin wrote:
> > On Thu, Jun 06, 2019 at 01:06:41PM -0700, Yu-cheng Yu wrote:
> > > An ELF file's .note.gnu.property indicates features the executable file
> > > can support.  For example, the property GNU_PROPERTY_X86_FEATURE_1_AND
> > > indicates the file supports GNU_PROPERTY_X86_FEATURE_1_IBT and/or
> > > GNU_PROPERTY_X86_FEATURE_1_SHSTK.
> > > 
> > > With this patch, if an arch needs to setup features from ELF properties,
> > > it needs CONFIG_ARCH_USE_GNU_PROPERTY to be set, and a specific
> > > arch_setup_property().
> > > 
> > > For example, for X86_64:
> > > 
> > > int arch_setup_property(void *ehdr, void *phdr, struct file *f, bool inter)
> > > {
> > > 	int r;
> > > 	uint32_t property;
> > > 
> > > 	r = get_gnu_property(ehdr, phdr, f, GNU_PROPERTY_X86_FEATURE_1_AND,
> > > 			     &property);
> > > 	...
> > > }
> > 
> > Although this code works for the simple case, I have some concerns about
> > some aspects of the implementation here.  There appear to be some bounds
> > checking / buffer overrun issues, and the code seems quite complex.
> > 
> > Maybe this patch tries too hard to be compatible with toolchains that do
> > silly things such as embedding huge notes in an executable, or mixing
> > NT_GNU_PROPERTY_TYPE_0 in a single PT_NOTE with a load of junk not
> > relevant to the loader.  I wonder whether Linux can dictate what
> > interpretation(s) of the ELF specs it is prepared to support, rather than
> > trying to support absolutely anything.
> 
> To me, looking at PT_GNU_PROPERTY and not trying to support anything is a
> logical choice.  And it breaks only a limited set of toolchains.
> 
> I will simplify the parser and leave this patch as-is for anyone who wants to
> back-port.  Are there any objections or concerns?

No objection from me ;)  But I'm biased.

Hopefully this change should allow substantial simplification.  For one
thing, PT_GNU_PROPERTY tells its file offset and size directly in its
phdrs entry.  That should save us a lot of effort on the kernel side.

Cheers
---Dave


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v7 22/27] binfmt_elf: Extract .note.gnu.property from an ELF file
  2019-06-10 16:29     ` Yu-cheng Yu
  2019-06-10 16:57       ` Dave Martin
@ 2019-06-10 17:24       ` Florian Weimer
  2019-06-11 11:41         ` Dave Martin
  1 sibling, 1 reply; 70+ messages in thread
From: Florian Weimer @ 2019-06-10 17:24 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: Dave Martin, x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	linux-kernel, linux-doc, linux-mm, linux-arch, linux-api,
	Arnd Bergmann, Andy Lutomirski, Balbir Singh, Borislav Petkov,
	Cyrill Gorcunov, Dave Hansen, Eugene Syromiatnikov, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue

* Yu-cheng Yu:

> To me, looking at PT_GNU_PROPERTY and not trying to support anything is a
> logical choice.  And it breaks only a limited set of toolchains.
>
> I will simplify the parser and leave this patch as-is for anyone who wants to
> back-port.  Are there any objections or concerns?

Red Hat Enterprise Linux 8 does not use PT_GNU_PROPERTY and is probably
the largest collection of CET-enabled binaries that exists today.

My hope was that we would backport the upstream kernel patches for CET,
port the glibc dynamic loader to the new kernel interface, and be ready
to run with CET enabled in principle (except that porting userspace
libraries such as OpenSSL has not really started upstream, so many
processes where CET is particularly desirable will still run without
it).

I'm not sure if it is a good idea to port the legacy support if it's not
part of the mainline kernel because it comes awfully close to creating
our own private ABI.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v7 22/27] binfmt_elf: Extract .note.gnu.property from an ELF file
  2019-06-10 17:24       ` Florian Weimer
@ 2019-06-11 11:41         ` Dave Martin
  2019-06-11 19:31           ` Yu-cheng Yu
  0 siblings, 1 reply; 70+ messages in thread
From: Dave Martin @ 2019-06-11 11:41 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Yu-cheng Yu, x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	linux-kernel, linux-doc, linux-mm, linux-arch, linux-api,
	Arnd Bergmann, Andy Lutomirski, Balbir Singh, Borislav Petkov,
	Cyrill Gorcunov, Dave Hansen, Eugene Syromiatnikov, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue

On Mon, Jun 10, 2019 at 07:24:43PM +0200, Florian Weimer wrote:
> * Yu-cheng Yu:
> 
> > To me, looking at PT_GNU_PROPERTY and not trying to support anything is a
> > logical choice.  And it breaks only a limited set of toolchains.
> >
> > I will simplify the parser and leave this patch as-is for anyone who wants to
> > back-port.  Are there any objections or concerns?
> 
> Red Hat Enterprise Linux 8 does not use PT_GNU_PROPERTY and is probably
> the largest collection of CET-enabled binaries that exists today.

For clarity, RHEL is actively parsing these properties today?

> My hope was that we would backport the upstream kernel patches for CET,
> port the glibc dynamic loader to the new kernel interface, and be ready
> to run with CET enabled in principle (except that porting userspace
> libraries such as OpenSSL has not really started upstream, so many
> processes where CET is particularly desirable will still run without
> it).
> 
> I'm not sure if it is a good idea to port the legacy support if it's not
> part of the mainline kernel because it comes awfully close to creating
> our own private ABI.

I guess we can aim to factor things so that PT_NOTE scanning is
available as a fallback on arches for which the absence of
PT_GNU_PROPERTY is not authoritative.

Can we argue that the lack of PT_GNU_PROPERTY is an ABI bug, fix it
for new binaries and hence limit the efforts we go to to support
theoretical binaries that lack the phdrs entry?

If we can make practical simplifications to the parsing, such as
limiting the maximum PT_NOTE size that we will search for the program
properties to 1K (say), or requiring NT_NOTE_GNU_PROPERTY_TYPE_0 to sit
by itself in a single PT_NOTE then that could help minimse the exec
overheads and the number of places for bugs to hide in the kernel.

What we can do here depends on what the tools currently do and what
binaries are out there.

Cheers
---Dave


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v7 25/27] mm/mmap: Add Shadow stack pages to memory accounting
  2019-06-06 20:06 ` [PATCH v7 25/27] mm/mmap: Add Shadow stack pages to memory accounting Yu-cheng Yu
@ 2019-06-11 17:55   ` Dave Hansen
  2019-06-11 19:22     ` Yu-cheng Yu
  0 siblings, 1 reply; 70+ messages in thread
From: Dave Hansen @ 2019-06-11 17:55 UTC (permalink / raw)
  To: Yu-cheng Yu, x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	linux-kernel, linux-doc, linux-mm, linux-arch, linux-api,
	Arnd Bergmann, Andy Lutomirski, Balbir Singh, Borislav Petkov,
	Cyrill Gorcunov, Dave Hansen, Eugene Syromiatnikov,
	Florian Weimer, H.J. Lu, Jann Horn, Jonathan Corbet, Kees Cook,
	Mike Kravetz, Nadav Amit, Oleg Nesterov, Pavel Machek,
	Peter Zijlstra, Randy Dunlap, Ravi V. Shankar,
	Vedvyas Shanbhogue, Dave Martin

On 6/6/19 1:06 PM, Yu-cheng Yu wrote:
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1703,6 +1703,9 @@ static inline int accountable_mapping(struct file *file, vm_flags_t vm_flags)
>  	if (file && is_file_hugepages(file))
>  		return 0;
>  
> +	if (arch_copy_pte_mapping(vm_flags))
> +		return 1;
> +
>  	return (vm_flags & (VM_NORESERVE | VM_SHARED | VM_WRITE)) == VM_WRITE;
>  }
>  
> @@ -3319,6 +3322,8 @@ void vm_stat_account(struct mm_struct *mm, vm_flags_t flags, long npages)
>  		mm->stack_vm += npages;
>  	else if (is_data_mapping(flags))
>  		mm->data_vm += npages;
> +	else if (arch_copy_pte_mapping(flags))
> +		mm->data_vm += npages;
>  }

This classifies shadow stack as data instead of stack.  That seems a wee
bit counterintuitive.  Why did you make this choice?


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v7 25/27] mm/mmap: Add Shadow stack pages to memory accounting
  2019-06-11 17:55   ` Dave Hansen
@ 2019-06-11 19:22     ` Yu-cheng Yu
  0 siblings, 0 replies; 70+ messages in thread
From: Yu-cheng Yu @ 2019-06-11 19:22 UTC (permalink / raw)
  To: Dave Hansen, x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	linux-kernel, linux-doc, linux-mm, linux-arch, linux-api,
	Arnd Bergmann, Andy Lutomirski, Balbir Singh, Borislav Petkov,
	Cyrill Gorcunov, Dave Hansen, Eugene Syromiatnikov,
	Florian Weimer, H.J. Lu, Jann Horn, Jonathan Corbet, Kees Cook,
	Mike Kravetz, Nadav Amit, Oleg Nesterov, Pavel Machek,
	Peter Zijlstra, Randy Dunlap, Ravi V. Shankar,
	Vedvyas Shanbhogue, Dave Martin

On Tue, 2019-06-11 at 10:55 -0700, Dave Hansen wrote:
> On 6/6/19 1:06 PM, Yu-cheng Yu wrote:
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -1703,6 +1703,9 @@ static inline int accountable_mapping(struct file
> > *file, vm_flags_t vm_flags)
> >  	if (file && is_file_hugepages(file))
> >  		return 0;
> >  
> > +	if (arch_copy_pte_mapping(vm_flags))
> > +		return 1;
> > +
> >  	return (vm_flags & (VM_NORESERVE | VM_SHARED | VM_WRITE)) ==
> > VM_WRITE;
> >  }
> >  
> > @@ -3319,6 +3322,8 @@ void vm_stat_account(struct mm_struct *mm, vm_flags_t
> > flags, long npages)
> >  		mm->stack_vm += npages;
> >  	else if (is_data_mapping(flags))
> >  		mm->data_vm += npages;
> > +	else if (arch_copy_pte_mapping(flags))
> > +		mm->data_vm += npages;
> >  }
> 
> This classifies shadow stack as data instead of stack.  That seems a wee
> bit counterintuitive.  Why did you make this choice?

I don't recall the reason; I will change it to stack and test it out.

Yu-cheng


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v7 22/27] binfmt_elf: Extract .note.gnu.property from an ELF file
  2019-06-11 11:41         ` Dave Martin
@ 2019-06-11 19:31           ` Yu-cheng Yu
  2019-06-12  9:32             ` Dave Martin
  0 siblings, 1 reply; 70+ messages in thread
From: Yu-cheng Yu @ 2019-06-11 19:31 UTC (permalink / raw)
  To: Dave Martin, Florian Weimer
  Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
	Andy Lutomirski, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, H.J. Lu, Jann Horn,
	Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue

On Tue, 2019-06-11 at 12:41 +0100, Dave Martin wrote:
> On Mon, Jun 10, 2019 at 07:24:43PM +0200, Florian Weimer wrote:
> > * Yu-cheng Yu:
> > 
> > > To me, looking at PT_GNU_PROPERTY and not trying to support anything is a
> > > logical choice.  And it breaks only a limited set of toolchains.
> > > 
> > > I will simplify the parser and leave this patch as-is for anyone who wants
> > > to
> > > back-port.  Are there any objections or concerns?
> > 
> > Red Hat Enterprise Linux 8 does not use PT_GNU_PROPERTY and is probably
> > the largest collection of CET-enabled binaries that exists today.
> 
> For clarity, RHEL is actively parsing these properties today?
> 
> > My hope was that we would backport the upstream kernel patches for CET,
> > port the glibc dynamic loader to the new kernel interface, and be ready
> > to run with CET enabled in principle (except that porting userspace
> > libraries such as OpenSSL has not really started upstream, so many
> > processes where CET is particularly desirable will still run without
> > it).
> > 
> > I'm not sure if it is a good idea to port the legacy support if it's not
> > part of the mainline kernel because it comes awfully close to creating
> > our own private ABI.
> 
> I guess we can aim to factor things so that PT_NOTE scanning is
> available as a fallback on arches for which the absence of
> PT_GNU_PROPERTY is not authoritative.

We can probably check PT_GNU_PROPERTY first, and fallback (based on ld-linux
version?) to PT_NOTE scanning?

Yu-cheng


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v7 22/27] binfmt_elf: Extract .note.gnu.property from an ELF file
  2019-06-11 19:31           ` Yu-cheng Yu
@ 2019-06-12  9:32             ` Dave Martin
  2019-06-12 19:04               ` Yu-cheng Yu
  2019-06-17 11:08               ` Florian Weimer
  0 siblings, 2 replies; 70+ messages in thread
From: Dave Martin @ 2019-06-12  9:32 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: Florian Weimer, x86, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, linux-kernel, linux-doc, linux-mm, linux-arch,
	linux-api, Arnd Bergmann, Andy Lutomirski, Balbir Singh,
	Borislav Petkov, Cyrill Gorcunov, Dave Hansen,
	Eugene Syromiatnikov, H.J. Lu, Jann Horn, Jonathan Corbet,
	Kees Cook, Mike Kravetz, Nadav Amit, Oleg Nesterov, Pavel Machek,
	Peter Zijlstra, Randy Dunlap, Ravi V. Shankar,
	Vedvyas Shanbhogue

On Tue, Jun 11, 2019 at 12:31:34PM -0700, Yu-cheng Yu wrote:
> On Tue, 2019-06-11 at 12:41 +0100, Dave Martin wrote:
> > On Mon, Jun 10, 2019 at 07:24:43PM +0200, Florian Weimer wrote:
> > > * Yu-cheng Yu:
> > > 
> > > > To me, looking at PT_GNU_PROPERTY and not trying to support anything is a
> > > > logical choice.  And it breaks only a limited set of toolchains.
> > > > 
> > > > I will simplify the parser and leave this patch as-is for anyone who wants
> > > > to
> > > > back-port.  Are there any objections or concerns?
> > > 
> > > Red Hat Enterprise Linux 8 does not use PT_GNU_PROPERTY and is probably
> > > the largest collection of CET-enabled binaries that exists today.
> > 
> > For clarity, RHEL is actively parsing these properties today?
> > 
> > > My hope was that we would backport the upstream kernel patches for CET,
> > > port the glibc dynamic loader to the new kernel interface, and be ready
> > > to run with CET enabled in principle (except that porting userspace
> > > libraries such as OpenSSL has not really started upstream, so many
> > > processes where CET is particularly desirable will still run without
> > > it).
> > > 
> > > I'm not sure if it is a good idea to port the legacy support if it's not
> > > part of the mainline kernel because it comes awfully close to creating
> > > our own private ABI.
> > 
> > I guess we can aim to factor things so that PT_NOTE scanning is
> > available as a fallback on arches for which the absence of
> > PT_GNU_PROPERTY is not authoritative.
> 
> We can probably check PT_GNU_PROPERTY first, and fallback (based on ld-linux
> version?) to PT_NOTE scanning?

For arm64, we can check for PT_GNU_PROPERTY and then give up
unconditionally.

For x86, we would fall back to PT_NOTE scanning, but this will add a bit
of cost to binaries that don't have NT_GNU_PROPERTY_TYPE_0.  The ld.so
version doesn't tell you what ELF ABI a given executable conforms to.

Since this sounds like it's largely a distro-specific issue, maybe there
could be a Kconfig option to turn the fallback PT_NOTE scanning on?

Cheers
---Dave


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v7 22/27] binfmt_elf: Extract .note.gnu.property from an ELF file
  2019-06-12  9:32             ` Dave Martin
@ 2019-06-12 19:04               ` Yu-cheng Yu
  2019-06-13 13:26                 ` Dave Martin
  2019-06-17 11:08               ` Florian Weimer
  1 sibling, 1 reply; 70+ messages in thread
From: Yu-cheng Yu @ 2019-06-12 19:04 UTC (permalink / raw)
  To: Dave Martin
  Cc: Florian Weimer, x86, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, linux-kernel, linux-doc, linux-mm, linux-arch,
	linux-api, Arnd Bergmann, Andy Lutomirski, Balbir Singh,
	Borislav Petkov, Cyrill Gorcunov, Dave Hansen,
	Eugene Syromiatnikov, H.J. Lu, Jann Horn, Jonathan Corbet,
	Kees Cook, Mike Kravetz, Nadav Amit, Oleg Nesterov, Pavel Machek,
	Peter Zijlstra, Randy Dunlap, Ravi V. Shankar,
	Vedvyas Shanbhogue

On Wed, 2019-06-12 at 10:32 +0100, Dave Martin wrote:
> On Tue, Jun 11, 2019 at 12:31:34PM -0700, Yu-cheng Yu wrote:
> > On Tue, 2019-06-11 at 12:41 +0100, Dave Martin wrote:
> > > On Mon, Jun 10, 2019 at 07:24:43PM +0200, Florian Weimer wrote:
> > > > * Yu-cheng Yu:
> > > > 
> > > > > To me, looking at PT_GNU_PROPERTY and not trying to support anything
> > > > > is a
> > > > > logical choice.  And it breaks only a limited set of toolchains.
> > > > > 
> > > > > I will simplify the parser and leave this patch as-is for anyone who
> > > > > wants
> > > > > to
> > > > > back-port.  Are there any objections or concerns?
> > > > 
> > > > Red Hat Enterprise Linux 8 does not use PT_GNU_PROPERTY and is probably
> > > > the largest collection of CET-enabled binaries that exists today.
> > > 
> > > For clarity, RHEL is actively parsing these properties today?
> > > 
> > > > My hope was that we would backport the upstream kernel patches for CET,
> > > > port the glibc dynamic loader to the new kernel interface, and be ready
> > > > to run with CET enabled in principle (except that porting userspace
> > > > libraries such as OpenSSL has not really started upstream, so many
> > > > processes where CET is particularly desirable will still run without
> > > > it).
> > > > 
> > > > I'm not sure if it is a good idea to port the legacy support if it's not
> > > > part of the mainline kernel because it comes awfully close to creating
> > > > our own private ABI.
> > > 
> > > I guess we can aim to factor things so that PT_NOTE scanning is
> > > available as a fallback on arches for which the absence of
> > > PT_GNU_PROPERTY is not authoritative.
> > 
> > We can probably check PT_GNU_PROPERTY first, and fallback (based on ld-linux
> > version?) to PT_NOTE scanning?
> 
> For arm64, we can check for PT_GNU_PROPERTY and then give up
> unconditionally.
> 
> For x86, we would fall back to PT_NOTE scanning, but this will add a bit
> of cost to binaries that don't have NT_GNU_PROPERTY_TYPE_0.  The ld.so
> version doesn't tell you what ELF ABI a given executable conforms to.
> 
> Since this sounds like it's largely a distro-specific issue, maybe there
> could be a Kconfig option to turn the fallback PT_NOTE scanning on?

Yes, I will make it a Kconfig option.

Yu-cheng


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v7 22/27] binfmt_elf: Extract .note.gnu.property from an ELF file
  2019-06-12 19:04               ` Yu-cheng Yu
@ 2019-06-13 13:26                 ` Dave Martin
  0 siblings, 0 replies; 70+ messages in thread
From: Dave Martin @ 2019-06-13 13:26 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: Florian Weimer, x86, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, linux-kernel, linux-doc, linux-mm, linux-arch,
	linux-api, Arnd Bergmann, Andy Lutomirski, Balbir Singh,
	Borislav Petkov, Cyrill Gorcunov, Dave Hansen,
	Eugene Syromiatnikov, H.J. Lu, Jann Horn, Jonathan Corbet,
	Kees Cook, Mike Kravetz, Nadav Amit, Oleg Nesterov, Pavel Machek,
	Peter Zijlstra, Randy Dunlap, Ravi V. Shankar,
	Vedvyas Shanbhogue

On Wed, Jun 12, 2019 at 12:04:01PM -0700, Yu-cheng Yu wrote:
> On Wed, 2019-06-12 at 10:32 +0100, Dave Martin wrote:
> > On Tue, Jun 11, 2019 at 12:31:34PM -0700, Yu-cheng Yu wrote:
> > > On Tue, 2019-06-11 at 12:41 +0100, Dave Martin wrote:
> > > > On Mon, Jun 10, 2019 at 07:24:43PM +0200, Florian Weimer wrote:
> > > > > * Yu-cheng Yu:
> > > > > 
> > > > > > To me, looking at PT_GNU_PROPERTY and not trying to support anything
> > > > > > is a
> > > > > > logical choice.  And it breaks only a limited set of toolchains.
> > > > > > 
> > > > > > I will simplify the parser and leave this patch as-is for anyone who
> > > > > > wants
> > > > > > to
> > > > > > back-port.  Are there any objections or concerns?
> > > > > 
> > > > > Red Hat Enterprise Linux 8 does not use PT_GNU_PROPERTY and is probably
> > > > > the largest collection of CET-enabled binaries that exists today.
> > > > 
> > > > For clarity, RHEL is actively parsing these properties today?
> > > > 
> > > > > My hope was that we would backport the upstream kernel patches for CET,
> > > > > port the glibc dynamic loader to the new kernel interface, and be ready
> > > > > to run with CET enabled in principle (except that porting userspace
> > > > > libraries such as OpenSSL has not really started upstream, so many
> > > > > processes where CET is particularly desirable will still run without
> > > > > it).
> > > > > 
> > > > > I'm not sure if it is a good idea to port the legacy support if it's not
> > > > > part of the mainline kernel because it comes awfully close to creating
> > > > > our own private ABI.
> > > > 
> > > > I guess we can aim to factor things so that PT_NOTE scanning is
> > > > available as a fallback on arches for which the absence of
> > > > PT_GNU_PROPERTY is not authoritative.
> > > 
> > > We can probably check PT_GNU_PROPERTY first, and fallback (based on ld-linux
> > > version?) to PT_NOTE scanning?
> > 
> > For arm64, we can check for PT_GNU_PROPERTY and then give up
> > unconditionally.
> > 
> > For x86, we would fall back to PT_NOTE scanning, but this will add a bit
> > of cost to binaries that don't have NT_GNU_PROPERTY_TYPE_0.  The ld.so
> > version doesn't tell you what ELF ABI a given executable conforms to.
> > 
> > Since this sounds like it's largely a distro-specific issue, maybe there
> > could be a Kconfig option to turn the fallback PT_NOTE scanning on?
> 
> Yes, I will make it a Kconfig option.

OK, that works for me.  This would also help keep the PT_NOTE scanning
separate from the rest of the code.

For arm64 we could then unconditionally select/deselect that option,
where x86 could leave it configurable either way.

Cheers
---Dave


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v7 22/27] binfmt_elf: Extract .note.gnu.property from an ELF file
  2019-06-12  9:32             ` Dave Martin
  2019-06-12 19:04               ` Yu-cheng Yu
@ 2019-06-17 11:08               ` Florian Weimer
  2019-06-17 12:20                 ` Thomas Gleixner
  1 sibling, 1 reply; 70+ messages in thread
From: Florian Weimer @ 2019-06-17 11:08 UTC (permalink / raw)
  To: Dave Martin
  Cc: Yu-cheng Yu, x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	linux-kernel, linux-doc, linux-mm, linux-arch, linux-api,
	Arnd Bergmann, Andy Lutomirski, Balbir Singh, Borislav Petkov,
	Cyrill Gorcunov, Dave Hansen, Eugene Syromiatnikov, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue

* Dave Martin:

> On Tue, Jun 11, 2019 at 12:31:34PM -0700, Yu-cheng Yu wrote:
>> On Tue, 2019-06-11 at 12:41 +0100, Dave Martin wrote:
>> > On Mon, Jun 10, 2019 at 07:24:43PM +0200, Florian Weimer wrote:
>> > > * Yu-cheng Yu:
>> > > 
>> > > > To me, looking at PT_GNU_PROPERTY and not trying to support anything is a
>> > > > logical choice.  And it breaks only a limited set of toolchains.
>> > > > 
>> > > > I will simplify the parser and leave this patch as-is for anyone who wants
>> > > > to
>> > > > back-port.  Are there any objections or concerns?
>> > > 
>> > > Red Hat Enterprise Linux 8 does not use PT_GNU_PROPERTY and is probably
>> > > the largest collection of CET-enabled binaries that exists today.
>> > 
>> > For clarity, RHEL is actively parsing these properties today?
>> > 
>> > > My hope was that we would backport the upstream kernel patches for CET,
>> > > port the glibc dynamic loader to the new kernel interface, and be ready
>> > > to run with CET enabled in principle (except that porting userspace
>> > > libraries such as OpenSSL has not really started upstream, so many
>> > > processes where CET is particularly desirable will still run without
>> > > it).
>> > > 
>> > > I'm not sure if it is a good idea to port the legacy support if it's not
>> > > part of the mainline kernel because it comes awfully close to creating
>> > > our own private ABI.
>> > 
>> > I guess we can aim to factor things so that PT_NOTE scanning is
>> > available as a fallback on arches for which the absence of
>> > PT_GNU_PROPERTY is not authoritative.
>> 
>> We can probably check PT_GNU_PROPERTY first, and fallback (based on ld-linux
>> version?) to PT_NOTE scanning?
>
> For arm64, we can check for PT_GNU_PROPERTY and then give up
> unconditionally.
>
> For x86, we would fall back to PT_NOTE scanning, but this will add a bit
> of cost to binaries that don't have NT_GNU_PROPERTY_TYPE_0.  The ld.so
> version doesn't tell you what ELF ABI a given executable conforms to.
>
> Since this sounds like it's largely a distro-specific issue, maybe there
> could be a Kconfig option to turn the fallback PT_NOTE scanning on?

I'm worried that this causes interop issues similarly to what we see
with VSYSCALL today.  If we need both and a way to disable it, it should
be something like a personality flag which can be configured for each
process tree separately.  Ideally, we'd settle on one correct approach
(i.e., either always process both, or only process PT_GNU_PROPERTY) and
enforce that.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v7 22/27] binfmt_elf: Extract .note.gnu.property from an ELF file
  2019-06-17 11:08               ` Florian Weimer
@ 2019-06-17 12:20                 ` Thomas Gleixner
  2019-06-18  9:12                   ` Dave Martin
  0 siblings, 1 reply; 70+ messages in thread
From: Thomas Gleixner @ 2019-06-17 12:20 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Dave Martin, Yu-cheng Yu, x86, H. Peter Anvin, Ingo Molnar,
	linux-kernel, linux-doc, linux-mm, linux-arch, linux-api,
	Arnd Bergmann, Andy Lutomirski, Balbir Singh, Borislav Petkov,
	Cyrill Gorcunov, Dave Hansen, Eugene Syromiatnikov, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue

On Mon, 17 Jun 2019, Florian Weimer wrote:
> * Dave Martin:
> > On Tue, Jun 11, 2019 at 12:31:34PM -0700, Yu-cheng Yu wrote:
> >> We can probably check PT_GNU_PROPERTY first, and fallback (based on ld-linux
> >> version?) to PT_NOTE scanning?
> >
> > For arm64, we can check for PT_GNU_PROPERTY and then give up
> > unconditionally.
> >
> > For x86, we would fall back to PT_NOTE scanning, but this will add a bit
> > of cost to binaries that don't have NT_GNU_PROPERTY_TYPE_0.  The ld.so
> > version doesn't tell you what ELF ABI a given executable conforms to.
> >
> > Since this sounds like it's largely a distro-specific issue, maybe there
> > could be a Kconfig option to turn the fallback PT_NOTE scanning on?
> 
> I'm worried that this causes interop issues similarly to what we see
> with VSYSCALL today.  If we need both and a way to disable it, it should
> be something like a personality flag which can be configured for each
> process tree separately.  Ideally, we'd settle on one correct approach
> (i.e., either always process both, or only process PT_GNU_PROPERTY) and
> enforce that.

Chose one and only the one which makes technically sense and is not some
horrible vehicle.

Everytime we did those 'oh we need to make x fly workarounds' we regretted
it sooner than later.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v7 22/27] binfmt_elf: Extract .note.gnu.property from an ELF file
  2019-06-17 12:20                 ` Thomas Gleixner
@ 2019-06-18  9:12                   ` Dave Martin
  2019-06-18 12:41                     ` Peter Zijlstra
  0 siblings, 1 reply; 70+ messages in thread
From: Dave Martin @ 2019-06-18  9:12 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Florian Weimer, Yu-cheng Yu, x86, H. Peter Anvin, Ingo Molnar,
	linux-kernel, linux-doc, linux-mm, linux-arch, linux-api,
	Arnd Bergmann, Andy Lutomirski, Balbir Singh, Borislav Petkov,
	Cyrill Gorcunov, Dave Hansen, Eugene Syromiatnikov, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
	Oleg Nesterov, Pavel Machek, Peter Zijlstra, Randy Dunlap,
	Ravi V. Shankar, Vedvyas Shanbhogue

On Mon, Jun 17, 2019 at 02:20:40PM +0200, Thomas Gleixner wrote:
> On Mon, 17 Jun 2019, Florian Weimer wrote:
> > * Dave Martin:
> > > On Tue, Jun 11, 2019 at 12:31:34PM -0700, Yu-cheng Yu wrote:
> > >> We can probably check PT_GNU_PROPERTY first, and fallback (based on ld-linux
> > >> version?) to PT_NOTE scanning?
> > >
> > > For arm64, we can check for PT_GNU_PROPERTY and then give up
> > > unconditionally.
> > >
> > > For x86, we would fall back to PT_NOTE scanning, but this will add a bit
> > > of cost to binaries that don't have NT_GNU_PROPERTY_TYPE_0.  The ld.so
> > > version doesn't tell you what ELF ABI a given executable conforms to.
> > >
> > > Since this sounds like it's largely a distro-specific issue, maybe there
> > > could be a Kconfig option to turn the fallback PT_NOTE scanning on?
> > 
> > I'm worried that this causes interop issues similarly to what we see
> > with VSYSCALL today.  If we need both and a way to disable it, it should
> > be something like a personality flag which can be configured for each
> > process tree separately.  Ideally, we'd settle on one correct approach
> > (i.e., either always process both, or only process PT_GNU_PROPERTY) and
> > enforce that.
> 
> Chose one and only the one which makes technically sense and is not some
> horrible vehicle.
> 
> Everytime we did those 'oh we need to make x fly workarounds' we regretted
> it sooner than later.

So I guess that points to keeping PT_NOTE scanning always available as a
fallback on x86.  This sucks a bit, but if there are binaries already in
the wild that rely on this, I don't think we have much choice...

I'd still favour a Kconfig option to allow this support to be suppressed
by arches that don't have a similar legacy to be compatible with.

Cheers
---Dave


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v7 22/27] binfmt_elf: Extract .note.gnu.property from an ELF file
  2019-06-18  9:12                   ` Dave Martin
@ 2019-06-18 12:41                     ` Peter Zijlstra
  2019-06-18 12:47                       ` Florian Weimer
  0 siblings, 1 reply; 70+ messages in thread
From: Peter Zijlstra @ 2019-06-18 12:41 UTC (permalink / raw)
  To: Dave Martin
  Cc: Thomas Gleixner, Florian Weimer, Yu-cheng Yu, x86,
	H. Peter Anvin, Ingo Molnar, linux-kernel, linux-doc, linux-mm,
	linux-arch, linux-api, Arnd Bergmann, Andy Lutomirski,
	Balbir Singh, Borislav Petkov, Cyrill Gorcunov, Dave Hansen,
	Eugene Syromiatnikov, H.J. Lu, Jann Horn, Jonathan Corbet,
	Kees Cook, Mike Kravetz, Nadav Amit, Oleg Nesterov, Pavel Machek,
	Randy Dunlap, Ravi V. Shankar, Vedvyas Shanbhogue

On Tue, Jun 18, 2019 at 10:12:50AM +0100, Dave Martin wrote:
> On Mon, Jun 17, 2019 at 02:20:40PM +0200, Thomas Gleixner wrote:
> > On Mon, 17 Jun 2019, Florian Weimer wrote:
> > > * Dave Martin:
> > > > On Tue, Jun 11, 2019 at 12:31:34PM -0700, Yu-cheng Yu wrote:
> > > >> We can probably check PT_GNU_PROPERTY first, and fallback (based on ld-linux
> > > >> version?) to PT_NOTE scanning?
> > > >
> > > > For arm64, we can check for PT_GNU_PROPERTY and then give up
> > > > unconditionally.
> > > >
> > > > For x86, we would fall back to PT_NOTE scanning, but this will add a bit
> > > > of cost to binaries that don't have NT_GNU_PROPERTY_TYPE_0.  The ld.so
> > > > version doesn't tell you what ELF ABI a given executable conforms to.
> > > >
> > > > Since this sounds like it's largely a distro-specific issue, maybe there
> > > > could be a Kconfig option to turn the fallback PT_NOTE scanning on?
> > > 
> > > I'm worried that this causes interop issues similarly to what we see
> > > with VSYSCALL today.  If we need both and a way to disable it, it should
> > > be something like a personality flag which can be configured for each
> > > process tree separately.  Ideally, we'd settle on one correct approach
> > > (i.e., either always process both, or only process PT_GNU_PROPERTY) and
> > > enforce that.
> > 
> > Chose one and only the one which makes technically sense and is not some
> > horrible vehicle.
> > 
> > Everytime we did those 'oh we need to make x fly workarounds' we regretted
> > it sooner than later.
> 
> So I guess that points to keeping PT_NOTE scanning always available as a
> fallback on x86.  This sucks a bit, but if there are binaries already in
> the wild that rely on this, I don't think we have much choice...

I'm not sure I read Thomas' comment like that. In my reading keeping the
PT_NOTE fallback is exactly one of those 'fly workarounds'. By not
supporting PT_NOTE only the 'fine' people already shit^Hpping this out
of tree are affected, and we don't have to care about them at all.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v7 22/27] binfmt_elf: Extract .note.gnu.property from an ELF file
  2019-06-18 12:41                     ` Peter Zijlstra
@ 2019-06-18 12:47                       ` Florian Weimer
  2019-06-18 12:55                         ` Peter Zijlstra
  0 siblings, 1 reply; 70+ messages in thread
From: Florian Weimer @ 2019-06-18 12:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dave Martin, Thomas Gleixner, Yu-cheng Yu, x86, H. Peter Anvin,
	Ingo Molnar, linux-kernel, linux-doc, linux-mm, linux-arch,
	linux-api, Arnd Bergmann, Andy Lutomirski, Balbir Singh,
	Borislav Petkov, Cyrill Gorcunov, Dave Hansen,
	Eugene Syromiatnikov, H.J. Lu, Jann Horn, Jonathan Corbet,
	Kees Cook, Mike Kravetz, Nadav Amit, Oleg Nesterov, Pavel Machek,
	Randy Dunlap, Ravi V. Shankar, Vedvyas Shanbhogue

* Peter Zijlstra:

> I'm not sure I read Thomas' comment like that. In my reading keeping the
> PT_NOTE fallback is exactly one of those 'fly workarounds'. By not
> supporting PT_NOTE only the 'fine' people already shit^Hpping this out
> of tree are affected, and we don't have to care about them at all.

Just to be clear here: There was an ABI document that required PT_NOTE
parsing.  The Linux kernel does *not* define the x86-64 ABI, it only
implements it.  The authoritative source should be the ABI document.

In this particularly case, so far anyone implementing this ABI extension
tried to provide value by changing it, sometimes successfully.  Which
makes me wonder why we even bother to mainatain ABI documentation.  The
kernel is just very late to the party.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v7 22/27] binfmt_elf: Extract .note.gnu.property from an ELF file
  2019-06-18 12:47                       ` Florian Weimer
@ 2019-06-18 12:55                         ` Peter Zijlstra
  2019-06-18 13:32                           ` Dave Martin
  0 siblings, 1 reply; 70+ messages in thread
From: Peter Zijlstra @ 2019-06-18 12:55 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Dave Martin, Thomas Gleixner, Yu-cheng Yu, x86, H. Peter Anvin,
	Ingo Molnar, linux-kernel, linux-doc, linux-mm, linux-arch,
	linux-api, Arnd Bergmann, Andy Lutomirski, Balbir Singh,
	Borislav Petkov, Cyrill Gorcunov, Dave Hansen,
	Eugene Syromiatnikov, H.J. Lu, Jann Horn, Jonathan Corbet,
	Kees Cook, Mike Kravetz, Nadav Amit, Oleg Nesterov, Pavel Machek,
	Randy Dunlap, Ravi V. Shankar, Vedvyas Shanbhogue

On Tue, Jun 18, 2019 at 02:47:00PM +0200, Florian Weimer wrote:
> * Peter Zijlstra:
> 
> > I'm not sure I read Thomas' comment like that. In my reading keeping the
> > PT_NOTE fallback is exactly one of those 'fly workarounds'. By not
> > supporting PT_NOTE only the 'fine' people already shit^Hpping this out
> > of tree are affected, and we don't have to care about them at all.
> 
> Just to be clear here: There was an ABI document that required PT_NOTE
> parsing.

URGH.

> The Linux kernel does *not* define the x86-64 ABI, it only
> implements it.  The authoritative source should be the ABI document.
>
> In this particularly case, so far anyone implementing this ABI extension
> tried to provide value by changing it, sometimes successfully.  Which
> makes me wonder why we even bother to mainatain ABI documentation.  The
> kernel is just very late to the party.

How can the kernel be late to the party if all of this is spinning
wheels without kernel support?


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v7 22/27] binfmt_elf: Extract .note.gnu.property from an ELF file
  2019-06-18 12:55                         ` Peter Zijlstra
@ 2019-06-18 13:32                           ` Dave Martin
  2019-06-18 14:58                             ` Yu-cheng Yu
  0 siblings, 1 reply; 70+ messages in thread
From: Dave Martin @ 2019-06-18 13:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Florian Weimer, Thomas Gleixner, Yu-cheng Yu, x86,
	H. Peter Anvin, Ingo Molnar, linux-kernel, linux-doc, linux-mm,
	linux-arch, linux-api, Arnd Bergmann, Andy Lutomirski,
	Balbir Singh, Borislav Petkov, Cyrill Gorcunov, Dave Hansen,
	Eugene Syromiatnikov, H.J. Lu, Jann Horn, Jonathan Corbet,
	Kees Cook, Mike Kravetz, Nadav Amit, Oleg Nesterov, Pavel Machek,
	Randy Dunlap, Ravi V. Shankar, Vedvyas Shanbhogue

On Tue, Jun 18, 2019 at 02:55:12PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 18, 2019 at 02:47:00PM +0200, Florian Weimer wrote:
> > * Peter Zijlstra:
> > 
> > > I'm not sure I read Thomas' comment like that. In my reading keeping the
> > > PT_NOTE fallback is exactly one of those 'fly workarounds'. By not
> > > supporting PT_NOTE only the 'fine' people already shit^Hpping this out
> > > of tree are affected, and we don't have to care about them at all.
> > 
> > Just to be clear here: There was an ABI document that required PT_NOTE
> > parsing.
> 
> URGH.
> 
> > The Linux kernel does *not* define the x86-64 ABI, it only
> > implements it.  The authoritative source should be the ABI document.
> >
> > In this particularly case, so far anyone implementing this ABI extension
> > tried to provide value by changing it, sometimes successfully.  Which
> > makes me wonder why we even bother to mainatain ABI documentation.  The
> > kernel is just very late to the party.
> 
> How can the kernel be late to the party if all of this is spinning
> wheels without kernel support?

PT_GNU_PROPERTY is mentioned and allocated a p_type value in hjl's
spec [1], but otherwise seems underspecified.

In particular, it's not clear whether a PT_GNU_PROPERTY phdr _must_ be
emitted for NT_GNU_PROPERTY_TYPE_0.  While it seems a no-brainer to emit
it, RHEL's linker already doesn't IIUC, and there are binaries in the
wild.

Maybe this phdr type is a late addition -- I haven't attempted to dig
through the history.


For arm64 we don't have this out-of-tree legacy to support, so we can
avoid exhausitvely searching for the note: no PT_GNU_PROPERTY ->
no note.

So, can we do the same for x86, forcing RHEL to carry some code out of
tree to support their legacy binaries?  Or do we accept that there is
already a de facto ABI and try to be compatible with it?


From my side, I want to avoid duplication between x86 and arm64, and
keep unneeded complexity out of the ELF loader where possible.

Cheers
---Dave


[1] https://github.com/hjl-tools/linux-abi/wiki/Linux-Extensions-to-gABI


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v7 22/27] binfmt_elf: Extract .note.gnu.property from an ELF file
  2019-06-18 13:32                           ` Dave Martin
@ 2019-06-18 14:58                             ` Yu-cheng Yu
  2019-06-18 15:49                               ` Florian Weimer
  0 siblings, 1 reply; 70+ messages in thread
From: Yu-cheng Yu @ 2019-06-18 14:58 UTC (permalink / raw)
  To: Dave Martin, Peter Zijlstra
  Cc: Florian Weimer, Thomas Gleixner, x86, H. Peter Anvin,
	Ingo Molnar, linux-kernel, linux-doc, linux-mm, linux-arch,
	linux-api, Arnd Bergmann, Andy Lutomirski, Balbir Singh,
	Borislav Petkov, Cyrill Gorcunov, Dave Hansen,
	Eugene Syromiatnikov, H.J. Lu, Jann Horn, Jonathan Corbet,
	Kees Cook, Mike Kravetz, Nadav Amit, Oleg Nesterov, Pavel Machek,
	Randy Dunlap, Ravi V. Shankar, Vedvyas Shanbhogue

On Tue, 2019-06-18 at 14:32 +0100, Dave Martin wrote:
> On Tue, Jun 18, 2019 at 02:55:12PM +0200, Peter Zijlstra wrote:
> > On Tue, Jun 18, 2019 at 02:47:00PM +0200, Florian Weimer wrote:
> > > * Peter Zijlstra:
> > > 
> > > > I'm not sure I read Thomas' comment like that. In my reading keeping the
> > > > PT_NOTE fallback is exactly one of those 'fly workarounds'. By not
> > > > supporting PT_NOTE only the 'fine' people already shit^Hpping this out
> > > > of tree are affected, and we don't have to care about them at all.
> > > 
> > > Just to be clear here: There was an ABI document that required PT_NOTE
> > > parsing.
> > 
> > URGH.
> > 
> > > The Linux kernel does *not* define the x86-64 ABI, it only
> > > implements it.  The authoritative source should be the ABI document.
> > > 
> > > In this particularly case, so far anyone implementing this ABI extension
> > > tried to provide value by changing it, sometimes successfully.  Which
> > > makes me wonder why we even bother to mainatain ABI documentation.  The
> > > kernel is just very late to the party.
> > 
> > How can the kernel be late to the party if all of this is spinning
> > wheels without kernel support?
> 
> PT_GNU_PROPERTY is mentioned and allocated a p_type value in hjl's
> spec [1], but otherwise seems underspecified.
> 
> In particular, it's not clear whether a PT_GNU_PROPERTY phdr _must_ be
> emitted for NT_GNU_PROPERTY_TYPE_0.  While it seems a no-brainer to emit
> it, RHEL's linker already doesn't IIUC, and there are binaries in the
> wild.
> 
> Maybe this phdr type is a late addition -- I haven't attempted to dig
> through the history.
> 
> 
> For arm64 we don't have this out-of-tree legacy to support, so we can
> avoid exhausitvely searching for the note: no PT_GNU_PROPERTY ->
> no note.
> 
> So, can we do the same for x86, forcing RHEL to carry some code out of
> tree to support their legacy binaries?  Or do we accept that there is
> already a de facto ABI and try to be compatible with it?
> 
> 
> From my side, I want to avoid duplication between x86 and arm64, and
> keep unneeded complexity out of the ELF loader where possible.

Hi Florian,

The kernel looks at only ld-linux.  Other applications are loaded by ld-linux. 
So the issues are limited to three versions of ld-linux's.  Can we somehow
update those??

Thanks,
Yu-cheng


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v7 22/27] binfmt_elf: Extract .note.gnu.property from an ELF file
  2019-06-18 14:58                             ` Yu-cheng Yu
@ 2019-06-18 15:49                               ` Florian Weimer
  2019-06-18 15:53                                 ` Yu-cheng Yu
  0 siblings, 1 reply; 70+ messages in thread
From: Florian Weimer @ 2019-06-18 15:49 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: Dave Martin, Peter Zijlstra, Thomas Gleixner, x86,
	H. Peter Anvin, Ingo Molnar, linux-kernel, linux-doc, linux-mm,
	linux-arch, linux-api, Arnd Bergmann, Andy Lutomirski,
	Balbir Singh, Borislav Petkov, Cyrill Gorcunov, Dave Hansen,
	Eugene Syromiatnikov, H.J. Lu, Jann Horn, Jonathan Corbet,
	Kees Cook, Mike Kravetz, Nadav Amit, Oleg Nesterov, Pavel Machek,
	Randy Dunlap, Ravi V. Shankar, Vedvyas Shanbhogue

* Yu-cheng Yu:

> The kernel looks at only ld-linux.  Other applications are loaded by ld-linux. 
> So the issues are limited to three versions of ld-linux's.  Can we somehow
> update those??

I assumed that it would also parse the main executable and make
adjustments based on that.

ld.so can certainly provide whatever the kernel needs.  We need to tweak
the existing loader anyway.

No valid statically-linked binaries exist today, so this is not a
consideration at this point.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v7 22/27] binfmt_elf: Extract .note.gnu.property from an ELF file
  2019-06-18 15:49                               ` Florian Weimer
@ 2019-06-18 15:53                                 ` Yu-cheng Yu
  2019-06-18 16:05                                   ` Florian Weimer
  0 siblings, 1 reply; 70+ messages in thread
From: Yu-cheng Yu @ 2019-06-18 15:53 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Dave Martin, Peter Zijlstra, Thomas Gleixner, x86,
	H. Peter Anvin, Ingo Molnar, linux-kernel, linux-doc, linux-mm,
	linux-arch, linux-api, Arnd Bergmann, Andy Lutomirski,
	Balbir Singh, Borislav Petkov, Cyrill Gorcunov, Dave Hansen,
	Eugene Syromiatnikov, H.J. Lu, Jann Horn, Jonathan Corbet,
	Kees Cook, Mike Kravetz, Nadav Amit, Oleg Nesterov, Pavel Machek,
	Randy Dunlap, Ravi V. Shankar, Vedvyas Shanbhogue

On Tue, 2019-06-18 at 17:49 +0200, Florian Weimer wrote:
> * Yu-cheng Yu:
> 
> > The kernel looks at only ld-linux.  Other applications are loaded by ld-
> > linux. 
> > So the issues are limited to three versions of ld-linux's.  Can we somehow
> > update those??
> 
> I assumed that it would also parse the main executable and make
> adjustments based on that.

Yes, Linux also looks at the main executable's header, but not its
NT_GNU_PROPERTY_TYPE_0 if there is a loader.

> 
> ld.so can certainly provide whatever the kernel needs.  We need to tweak
> the existing loader anyway.
> 
> No valid statically-linked binaries exist today, so this is not a
> consideration at this point.

So from kernel, we look at only PT_GNU_PROPERTY?

Yu-cheng


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v7 22/27] binfmt_elf: Extract .note.gnu.property from an ELF file
  2019-06-18 16:05                                   ` Florian Weimer
@ 2019-06-18 16:00                                     ` Yu-cheng Yu
  2019-06-18 16:20                                       ` Dave Martin
  0 siblings, 1 reply; 70+ messages in thread
From: Yu-cheng Yu @ 2019-06-18 16:00 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Dave Martin, Peter Zijlstra, Thomas Gleixner, x86,
	H. Peter Anvin, Ingo Molnar, linux-kernel, linux-doc, linux-mm,
	linux-arch, linux-api, Arnd Bergmann, Andy Lutomirski,
	Balbir Singh, Borislav Petkov, Cyrill Gorcunov, Dave Hansen,
	Eugene Syromiatnikov, H.J. Lu, Jann Horn, Jonathan Corbet,
	Kees Cook, Mike Kravetz, Nadav Amit, Oleg Nesterov, Pavel Machek,
	Randy Dunlap, Ravi V. Shankar, Vedvyas Shanbhogue

On Tue, 2019-06-18 at 18:05 +0200, Florian Weimer wrote:
> * Yu-cheng Yu:
> 
> > > I assumed that it would also parse the main executable and make
> > > adjustments based on that.
> > 
> > Yes, Linux also looks at the main executable's header, but not its
> > NT_GNU_PROPERTY_TYPE_0 if there is a loader.
> > 
> > > 
> > > ld.so can certainly provide whatever the kernel needs.  We need to tweak
> > > the existing loader anyway.
> > > 
> > > No valid statically-linked binaries exist today, so this is not a
> > > consideration at this point.
> > 
> > So from kernel, we look at only PT_GNU_PROPERTY?
> 
> If you don't parse notes/segments in the executable for CET, then yes.
> We can put PT_GNU_PROPERTY into the loader.

Thanks!


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v7 22/27] binfmt_elf: Extract .note.gnu.property from an ELF file
  2019-06-18 15:53                                 ` Yu-cheng Yu
@ 2019-06-18 16:05                                   ` Florian Weimer
  2019-06-18 16:00                                     ` Yu-cheng Yu
  0 siblings, 1 reply; 70+ messages in thread
From: Florian Weimer @ 2019-06-18 16:05 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: Dave Martin, Peter Zijlstra, Thomas Gleixner, x86,
	H. Peter Anvin, Ingo Molnar, linux-kernel, linux-doc, linux-mm,
	linux-arch, linux-api, Arnd Bergmann, Andy Lutomirski,
	Balbir Singh, Borislav Petkov, Cyrill Gorcunov, Dave Hansen,
	Eugene Syromiatnikov, H.J. Lu, Jann Horn, Jonathan Corbet,
	Kees Cook, Mike Kravetz, Nadav Amit, Oleg Nesterov, Pavel Machek,
	Randy Dunlap, Ravi V. Shankar, Vedvyas Shanbhogue

* Yu-cheng Yu:

>> I assumed that it would also parse the main executable and make
>> adjustments based on that.
>
> Yes, Linux also looks at the main executable's header, but not its
> NT_GNU_PROPERTY_TYPE_0 if there is a loader.
>
>> 
>> ld.so can certainly provide whatever the kernel needs.  We need to tweak
>> the existing loader anyway.
>> 
>> No valid statically-linked binaries exist today, so this is not a
>> consideration at this point.
>
> So from kernel, we look at only PT_GNU_PROPERTY?

If you don't parse notes/segments in the executable for CET, then yes.
We can put PT_GNU_PROPERTY into the loader.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v7 22/27] binfmt_elf: Extract .note.gnu.property from an ELF file
  2019-06-18 16:00                                     ` Yu-cheng Yu
@ 2019-06-18 16:20                                       ` Dave Martin
  2019-06-18 16:25                                         ` Florian Weimer
  0 siblings, 1 reply; 70+ messages in thread
From: Dave Martin @ 2019-06-18 16:20 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: Florian Weimer, Peter Zijlstra, Thomas Gleixner, x86,
	H. Peter Anvin, Ingo Molnar, linux-kernel, linux-doc, linux-mm,
	linux-arch, linux-api, Arnd Bergmann, Andy Lutomirski,
	Balbir Singh, Borislav Petkov, Cyrill Gorcunov, Dave Hansen,
	Eugene Syromiatnikov, H.J. Lu, Jann Horn, Jonathan Corbet,
	Kees Cook, Mike Kravetz, Nadav Amit, Oleg Nesterov, Pavel Machek,
	Randy Dunlap, Ravi V. Shankar, Vedvyas Shanbhogue

On Tue, Jun 18, 2019 at 09:00:35AM -0700, Yu-cheng Yu wrote:
> On Tue, 2019-06-18 at 18:05 +0200, Florian Weimer wrote:
> > * Yu-cheng Yu:
> > 
> > > > I assumed that it would also parse the main executable and make
> > > > adjustments based on that.
> > > 
> > > Yes, Linux also looks at the main executable's header, but not its
> > > NT_GNU_PROPERTY_TYPE_0 if there is a loader.
> > > 
> > > > 
> > > > ld.so can certainly provide whatever the kernel needs.  We need to tweak
> > > > the existing loader anyway.
> > > > 
> > > > No valid statically-linked binaries exist today, so this is not a
> > > > consideration at this point.
> > > 
> > > So from kernel, we look at only PT_GNU_PROPERTY?
> > 
> > If you don't parse notes/segments in the executable for CET, then yes.
> > We can put PT_GNU_PROPERTY into the loader.
> 
> Thanks!

Would this require the kernel and ld.so to be updated in a particular
order to avoid breakage?  I don't know enough about RHEL to know how
controversial that might be.

Also:

What about static binaries distrubited as part of RHEL?

A user would also reasonably expect static binaries built using the
distro toolchain to work on top of the distro kernel...  which might
be broken by this.


(When I say "broken" I mean that the binary would run, but CET
protections would be silently turned off.)

Cheers
---Dave


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v7 22/27] binfmt_elf: Extract .note.gnu.property from an ELF file
  2019-06-18 16:20                                       ` Dave Martin
@ 2019-06-18 16:25                                         ` Florian Weimer
  2019-06-18 16:50                                           ` Dave Martin
  0 siblings, 1 reply; 70+ messages in thread
From: Florian Weimer @ 2019-06-18 16:25 UTC (permalink / raw)
  To: Dave Martin
  Cc: Yu-cheng Yu, Peter Zijlstra, Thomas Gleixner, x86,
	H. Peter Anvin, Ingo Molnar, linux-kernel, linux-doc, linux-mm,
	linux-arch, linux-api, Arnd Bergmann, Andy Lutomirski,
	Balbir Singh, Borislav Petkov, Cyrill Gorcunov, Dave Hansen,
	Eugene Syromiatnikov, H.J. Lu, Jann Horn, Jonathan Corbet,
	Kees Cook, Mike Kravetz, Nadav Amit, Oleg Nesterov, Pavel Machek,
	Randy Dunlap, Ravi V. Shankar, Vedvyas Shanbhogue

* Dave Martin:

> On Tue, Jun 18, 2019 at 09:00:35AM -0700, Yu-cheng Yu wrote:
>> On Tue, 2019-06-18 at 18:05 +0200, Florian Weimer wrote:
>> > * Yu-cheng Yu:
>> > 
>> > > > I assumed that it would also parse the main executable and make
>> > > > adjustments based on that.
>> > > 
>> > > Yes, Linux also looks at the main executable's header, but not its
>> > > NT_GNU_PROPERTY_TYPE_0 if there is a loader.
>> > > 
>> > > > 
>> > > > ld.so can certainly provide whatever the kernel needs.  We need to tweak
>> > > > the existing loader anyway.
>> > > > 
>> > > > No valid statically-linked binaries exist today, so this is not a
>> > > > consideration at this point.
>> > > 
>> > > So from kernel, we look at only PT_GNU_PROPERTY?
>> > 
>> > If you don't parse notes/segments in the executable for CET, then yes.
>> > We can put PT_GNU_PROPERTY into the loader.
>> 
>> Thanks!
>
> Would this require the kernel and ld.so to be updated in a particular
> order to avoid breakage?  I don't know enough about RHEL to know how
> controversial that might be.

There is no official ld.so that will work with the current userspace
interface (in this patch submission).  Upstream glibc needs to be
updated anyway, so yet another change isn't much of an issue.  This is
not a problem; we knew that something like this might happen.

Sure, people need a new binutils with backports for PT_GNU_PROPERTY, but
given that only very few people will build CET binaries with older
binutils, I think that's not a real issue either.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v7 22/27] binfmt_elf: Extract .note.gnu.property from an ELF file
  2019-06-18 16:25                                         ` Florian Weimer
@ 2019-06-18 16:50                                           ` Dave Martin
  0 siblings, 0 replies; 70+ messages in thread
From: Dave Martin @ 2019-06-18 16:50 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Yu-cheng Yu, Peter Zijlstra, Thomas Gleixner, x86,
	H. Peter Anvin, Ingo Molnar, linux-kernel, linux-doc, linux-mm,
	linux-arch, linux-api, Arnd Bergmann, Andy Lutomirski,
	Balbir Singh, Borislav Petkov, Cyrill Gorcunov, Dave Hansen,
	Eugene Syromiatnikov, H.J. Lu, Jann Horn, Jonathan Corbet,
	Kees Cook, Mike Kravetz, Nadav Amit, Oleg Nesterov, Pavel Machek,
	Randy Dunlap, Ravi V. Shankar, Vedvyas Shanbhogue

On Tue, Jun 18, 2019 at 06:25:51PM +0200, Florian Weimer wrote:
> * Dave Martin:
> 
> > On Tue, Jun 18, 2019 at 09:00:35AM -0700, Yu-cheng Yu wrote:
> >> On Tue, 2019-06-18 at 18:05 +0200, Florian Weimer wrote:
> >> > * Yu-cheng Yu:
> >> > 
> >> > > > I assumed that it would also parse the main executable and make
> >> > > > adjustments based on that.
> >> > > 
> >> > > Yes, Linux also looks at the main executable's header, but not its
> >> > > NT_GNU_PROPERTY_TYPE_0 if there is a loader.
> >> > > 
> >> > > > 
> >> > > > ld.so can certainly provide whatever the kernel needs.  We need to tweak
> >> > > > the existing loader anyway.
> >> > > > 
> >> > > > No valid statically-linked binaries exist today, so this is not a
> >> > > > consideration at this point.
> >> > > 
> >> > > So from kernel, we look at only PT_GNU_PROPERTY?
> >> > 
> >> > If you don't parse notes/segments in the executable for CET, then yes.
> >> > We can put PT_GNU_PROPERTY into the loader.
> >> 
> >> Thanks!
> >
> > Would this require the kernel and ld.so to be updated in a particular
> > order to avoid breakage?  I don't know enough about RHEL to know how
> > controversial that might be.
> 
> There is no official ld.so that will work with the current userspace
> interface (in this patch submission).  Upstream glibc needs to be
> updated anyway, so yet another change isn't much of an issue.  This is
> not a problem; we knew that something like this might happen.
> 
> Sure, people need a new binutils with backports for PT_GNU_PROPERTY, but
> given that only very few people will build CET binaries with older
> binutils, I think that's not a real issue either.

OK, just wanted to check we weren't missing any requirement for x86.

This approach should satisfy the requirement for arm64 nicely.

Cheers
---Dave


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v7 27/27] x86/cet/shstk: Add Shadow Stack instructions to opcode map
  2019-06-06 20:06 ` [PATCH v7 27/27] x86/cet/shstk: Add Shadow Stack instructions to opcode map Yu-cheng Yu
@ 2019-11-01 14:03   ` Adrian Hunter
  2019-11-01 14:17     ` Yu-cheng Yu
  0 siblings, 1 reply; 70+ messages in thread
From: Adrian Hunter @ 2019-11-01 14:03 UTC (permalink / raw)
  To: Yu-cheng Yu, x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	linux-kernel, linux-doc, linux-mm, linux-arch, linux-api,
	Arnd Bergmann, Andy Lutomirski, Balbir Singh, Borislav Petkov,
	Cyrill Gorcunov, Dave Hansen, Eugene Syromiatnikov,
	Florian Weimer, H.J. Lu, Jann Horn, Jonathan Corbet, Kees Cook,
	Mike Kravetz, Nadav Amit, Oleg Nesterov, Pavel Machek,
	Peter Zijlstra, Randy Dunlap, Ravi V. Shankar,
	Vedvyas Shanbhogue, Dave Martin

On 6/06/19 11:06 PM, Yu-cheng Yu wrote:
> Add the following shadow stack management instructions.
> 
> INCSSP:
>     Increment shadow stack pointer by the steps specified.
> 
> RDSSP:
>     Read SSP register into a GPR.
> 
> SAVEPREVSSP:
>     Use "prev ssp" token at top of current shadow stack to
>     create a "restore token" on previous shadow stack.
> 
> RSTORSSP:
>     Restore from a "restore token" pointed by a GPR to SSP.
> 
> WRSS:
>     Write to kernel-mode shadow stack (kernel-mode instruction).
> 
> WRUSS:
>     Write to user-mode shadow stack (kernel-mode instruction).
> 
> SETSSBSY:
>     Verify the "supervisor token" pointed by IA32_PL0_SSP MSR,
>     if valid, set the token to busy, and set SSP to the value
>     of IA32_PL0_SSP MSR.
> 
> CLRSSBSY:
>     Verify the "supervisor token" pointed by a GPR, if valid,
>     clear the busy bit from the token.

Does the notrack prefix also need to be catered for somehow?

> 
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> ---
>  arch/x86/lib/x86-opcode-map.txt               | 26 +++++++++++++------
>  tools/objtool/arch/x86/lib/x86-opcode-map.txt | 26 +++++++++++++------
>  2 files changed, 36 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
> index e0b85930dd77..c5e825d44766 100644
> --- a/arch/x86/lib/x86-opcode-map.txt
> +++ b/arch/x86/lib/x86-opcode-map.txt
> @@ -366,7 +366,7 @@ AVXcode: 1
>  1b: BNDCN Gv,Ev (F2) | BNDMOV Ev,Gv (66) | BNDMK Gv,Ev (F3) | BNDSTX Ev,Gv
>  1c:
>  1d:
> -1e:
> +1e: RDSSP Rd (F3),REX.W
>  1f: NOP Ev
>  # 0x0f 0x20-0x2f
>  20: MOV Rd,Cd
> @@ -610,7 +610,17 @@ fe: paddd Pq,Qq | vpaddd Vx,Hx,Wx (66),(v1)
>  ff: UD0
>  EndTable
>  
> -Table: 3-byte opcode 1 (0x0f 0x38)
> +Table: 3-byte opcode 1 (0x0f 0x01)
> +Referrer:
> +AVXcode:
> +# Skip 0x00-0xe7
> +e8: SETSSBSY (f3)
> +e9:
> +ea: SAVEPREVSSP (f3)
> +# Skip 0xeb-0xff
> +EndTable
> +
> +Table: 3-byte opcode 2 (0x0f 0x38)
>  Referrer: 3-byte escape 1
>  AVXcode: 2
>  # 0x0f 0x38 0x00-0x0f
> @@ -789,12 +799,12 @@ f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
>  f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
>  f2: ANDN Gy,By,Ey (v)
>  f3: Grp17 (1A)
> -f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey (F2),(v)
> -f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v)
> +f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey (F2),(v) | WRUSS Pq,Qq (66),REX.W
> +f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSS Pq,Qq (66),REX.W

AFAICT WRSS does not have 66 prefix

>  f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
>  EndTable
>  
> -Table: 3-byte opcode 2 (0x0f 0x3a)
> +Table: 3-byte opcode 3 (0x0f 0x3a)
>  Referrer: 3-byte escape 2
>  AVXcode: 3
>  # 0x0f 0x3a 0x00-0xff
> @@ -948,7 +958,7 @@ GrpTable: Grp7
>  2: LGDT Ms | XGETBV (000),(11B) | XSETBV (001),(11B) | VMFUNC (100),(11B) | XEND (101)(11B) | XTEST (110)(11B)
>  3: LIDT Ms
>  4: SMSW Mw/Rv
> -5: rdpkru (110),(11B) | wrpkru (111),(11B)
> +5: rdpkru (110),(11B) | wrpkru (111),(11B) | RSTORSSP Mq (F3)
>  6: LMSW Ew
>  7: INVLPG Mb | SWAPGS (o64),(000),(11B) | RDTSCP (001),(11B)
>  EndTable
> @@ -1019,8 +1029,8 @@ GrpTable: Grp15
>  2: vldmxcsr Md (v1) | WRFSBASE Ry (F3),(11B)
>  3: vstmxcsr Md (v1) | WRGSBASE Ry (F3),(11B)
>  4: XSAVE | ptwrite Ey (F3),(11B)
> -5: XRSTOR | lfence (11B)
> -6: XSAVEOPT | clwb (66) | mfence (11B)
> +5: XRSTOR | lfence (11B) | INCSSP Rd (F3),REX.W
> +6: XSAVEOPT | clwb (66) | mfence (11B) | CLRSSBSY Mq (F3)
>  7: clflush | clflushopt (66) | sfence (11B)
>  EndTable
>  
> diff --git a/tools/objtool/arch/x86/lib/x86-opcode-map.txt b/tools/objtool/arch/x86/lib/x86-opcode-map.txt
> index e0b85930dd77..c5e825d44766 100644
> --- a/tools/objtool/arch/x86/lib/x86-opcode-map.txt
> +++ b/tools/objtool/arch/x86/lib/x86-opcode-map.txt
> @@ -366,7 +366,7 @@ AVXcode: 1
>  1b: BNDCN Gv,Ev (F2) | BNDMOV Ev,Gv (66) | BNDMK Gv,Ev (F3) | BNDSTX Ev,Gv
>  1c:
>  1d:
> -1e:
> +1e: RDSSP Rd (F3),REX.W
>  1f: NOP Ev
>  # 0x0f 0x20-0x2f
>  20: MOV Rd,Cd
> @@ -610,7 +610,17 @@ fe: paddd Pq,Qq | vpaddd Vx,Hx,Wx (66),(v1)
>  ff: UD0
>  EndTable
>  
> -Table: 3-byte opcode 1 (0x0f 0x38)
> +Table: 3-byte opcode 1 (0x0f 0x01)
> +Referrer:
> +AVXcode:
> +# Skip 0x00-0xe7
> +e8: SETSSBSY (f3)
> +e9:
> +ea: SAVEPREVSSP (f3)
> +# Skip 0xeb-0xff
> +EndTable
> +
> +Table: 3-byte opcode 2 (0x0f 0x38)
>  Referrer: 3-byte escape 1
>  AVXcode: 2
>  # 0x0f 0x38 0x00-0x0f
> @@ -789,12 +799,12 @@ f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
>  f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
>  f2: ANDN Gy,By,Ey (v)
>  f3: Grp17 (1A)
> -f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey (F2),(v)
> -f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v)
> +f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey (F2),(v) | WRUSS Pq,Qq (66),REX.W
> +f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSS Pq,Qq (66),REX.W
>  f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
>  EndTable
>  
> -Table: 3-byte opcode 2 (0x0f 0x3a)
> +Table: 3-byte opcode 3 (0x0f 0x3a)
>  Referrer: 3-byte escape 2
>  AVXcode: 3
>  # 0x0f 0x3a 0x00-0xff
> @@ -948,7 +958,7 @@ GrpTable: Grp7
>  2: LGDT Ms | XGETBV (000),(11B) | XSETBV (001),(11B) | VMFUNC (100),(11B) | XEND (101)(11B) | XTEST (110)(11B)
>  3: LIDT Ms
>  4: SMSW Mw/Rv
> -5: rdpkru (110),(11B) | wrpkru (111),(11B)
> +5: rdpkru (110),(11B) | wrpkru (111),(11B) | RSTORSSP Mq (F3)
>  6: LMSW Ew
>  7: INVLPG Mb | SWAPGS (o64),(000),(11B) | RDTSCP (001),(11B)
>  EndTable
> @@ -1019,8 +1029,8 @@ GrpTable: Grp15
>  2: vldmxcsr Md (v1) | WRFSBASE Ry (F3),(11B)
>  3: vstmxcsr Md (v1) | WRGSBASE Ry (F3),(11B)
>  4: XSAVE | ptwrite Ey (F3),(11B)
> -5: XRSTOR | lfence (11B)
> -6: XSAVEOPT | clwb (66) | mfence (11B)
> +5: XRSTOR | lfence (11B) | INCSSP Rd (F3),REX.W
> +6: XSAVEOPT | clwb (66) | mfence (11B) | CLRSSBSY Mq (F3)
>  7: clflush | clflushopt (66) | sfence (11B)
>  EndTable
>  
> 



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v7 27/27] x86/cet/shstk: Add Shadow Stack instructions to opcode map
  2019-11-01 14:03   ` Adrian Hunter
@ 2019-11-01 14:17     ` Yu-cheng Yu
  0 siblings, 0 replies; 70+ messages in thread
From: Yu-cheng Yu @ 2019-11-01 14:17 UTC (permalink / raw)
  To: Adrian Hunter, x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	linux-kernel, linux-doc, linux-mm, linux-arch, linux-api,
	Arnd Bergmann, Andy Lutomirski, Balbir Singh, Borislav Petkov,
	Cyrill Gorcunov, Dave Hansen, Eugene Syromiatnikov,
	Florian Weimer, H.J. Lu, Jann Horn, Jonathan Corbet, Kees Cook,
	Mike Kravetz, Nadav Amit, Oleg Nesterov, Pavel Machek,
	Peter Zijlstra, Randy Dunlap, Ravi V. Shankar,
	Vedvyas Shanbhogue, Dave Martin

On Fri, 2019-11-01 at 16:03 +0200, Adrian Hunter wrote:
> On 6/06/19 11:06 PM, Yu-cheng Yu wrote:
> > Add the following shadow stack management instructions.
> > 
> > INCSSP:
> >     Increment shadow stack pointer by the steps specified.
> > 
> > RDSSP:
> >     Read SSP register into a GPR.
> > 
> > SAVEPREVSSP:
> >     Use "prev ssp" token at top of current shadow stack to
> >     create a "restore token" on previous shadow stack.
> > 
> > RSTORSSP:
> >     Restore from a "restore token" pointed by a GPR to SSP.
> > 
> > WRSS:
> >     Write to kernel-mode shadow stack (kernel-mode instruction).
> > 
> > WRUSS:
> >     Write to user-mode shadow stack (kernel-mode instruction).
> > 
> > SETSSBSY:
> >     Verify the "supervisor token" pointed by IA32_PL0_SSP MSR,
> >     if valid, set the token to busy, and set SSP to the value
> >     of IA32_PL0_SSP MSR.
> > 
> > CLRSSBSY:
> >     Verify the "supervisor token" pointed by a GPR, if valid,
> >     clear the busy bit from the token.
> 
> Does the notrack prefix also need to be catered for somehow?

Yes, I will fix it.

> 
> > 
> > Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> > ---
> >  arch/x86/lib/x86-opcode-map.txt               | 26 +++++++++++++------
> >  tools/objtool/arch/x86/lib/x86-opcode-map.txt | 26 +++++++++++++------
> >  2 files changed, 36 insertions(+), 16 deletions(-)
> > 
> > diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
> > index e0b85930dd77..c5e825d44766 100644
> > --- a/arch/x86/lib/x86-opcode-map.txt
> > +++ b/arch/x86/lib/x86-opcode-map.txt
> > @@ -366,7 +366,7 @@ AVXcode: 1
> >  1b: BNDCN Gv,Ev (F2) | BNDMOV Ev,Gv (66) | BNDMK Gv,Ev (F3) | BNDSTX Ev,Gv
> >  1c:
> >  1d:
> > -1e:
> > +1e: RDSSP Rd (F3),REX.W
> >  1f: NOP Ev
> >  # 0x0f 0x20-0x2f
> >  20: MOV Rd,Cd
> > @@ -610,7 +610,17 @@ fe: paddd Pq,Qq | vpaddd Vx,Hx,Wx (66),(v1)
> >  ff: UD0
> >  EndTable
> >  
> > -Table: 3-byte opcode 1 (0x0f 0x38)
> > +Table: 3-byte opcode 1 (0x0f 0x01)
> > +Referrer:
> > +AVXcode:
> > +# Skip 0x00-0xe7
> > +e8: SETSSBSY (f3)
> > +e9:
> > +ea: SAVEPREVSSP (f3)
> > +# Skip 0xeb-0xff
> > +EndTable
> > +
> > +Table: 3-byte opcode 2 (0x0f 0x38)
> >  Referrer: 3-byte escape 1
> >  AVXcode: 2
> >  # 0x0f 0x38 0x00-0x0f
> > @@ -789,12 +799,12 @@ f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
> >  f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
> >  f2: ANDN Gy,By,Ey (v)
> >  f3: Grp17 (1A)
> > -f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey (F2),(v)
> > -f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v)
> > +f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey (F2),(v) | WRUSS Pq,Qq (66),REX.W
> > +f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSS Pq,Qq (66),REX.W
> 
> AFAICT WRSS does not have 66 prefix

I will check.

Thanks,
Yu-cheng


^ permalink raw reply	[flat|nested] 70+ messages in thread

end of thread, other threads:[~2019-11-01 14:28 UTC | newest]

Thread overview: 70+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-06 20:06 [PATCH v7 00/27] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
2019-06-06 20:06 ` [PATCH v7 01/27] Documentation/x86: Add CET description Yu-cheng Yu
2019-06-06 20:06 ` [PATCH v7 02/27] x86/cpufeatures: Add CET CPU feature flags for Control-flow Enforcement Technology (CET) Yu-cheng Yu
2019-06-06 20:06 ` [PATCH v7 03/27] x86/fpu/xstate: Change names to separate XSAVES system and user states Yu-cheng Yu
2019-06-06 20:06 ` [PATCH v7 04/27] x86/fpu/xstate: Introduce XSAVES system states Yu-cheng Yu
2019-06-06 21:18   ` Dave Hansen
2019-06-06 22:04     ` Andy Lutomirski
2019-06-06 22:08       ` Dave Hansen
2019-06-06 22:10         ` Yu-cheng Yu
2019-06-07  1:54         ` Andy Lutomirski
2019-06-06 20:06 ` [PATCH v7 05/27] x86/fpu/xstate: Add XSAVES system states for shadow stack Yu-cheng Yu
2019-06-07  7:07   ` Peter Zijlstra
2019-06-07 16:14     ` Yu-cheng Yu
2019-06-06 20:06 ` [PATCH v7 06/27] x86/cet: Add control protection exception handler Yu-cheng Yu
2019-06-06 20:06 ` [PATCH v7 07/27] x86/cet/shstk: Add Kconfig option for user-mode shadow stack Yu-cheng Yu
2019-06-06 20:06 ` [PATCH v7 08/27] mm: Introduce VM_SHSTK for shadow stack memory Yu-cheng Yu
2019-06-06 20:06 ` [PATCH v7 09/27] mm/mmap: Prevent Shadow Stack VMA merges Yu-cheng Yu
2019-06-06 20:06 ` [PATCH v7 10/27] x86/mm: Change _PAGE_DIRTY to _PAGE_DIRTY_HW Yu-cheng Yu
2019-06-06 20:06 ` [PATCH v7 11/27] x86/mm: Introduce _PAGE_DIRTY_SW Yu-cheng Yu
2019-06-06 20:06 ` [PATCH v7 12/27] drm/i915/gvt: Update _PAGE_DIRTY to _PAGE_DIRTY_BITS Yu-cheng Yu
2019-06-06 20:06 ` [PATCH v7 13/27] x86/mm: Modify ptep_set_wrprotect and pmdp_set_wrprotect for _PAGE_DIRTY_SW Yu-cheng Yu
2019-06-06 20:06 ` [PATCH v7 14/27] x86/mm: Shadow stack page fault error checking Yu-cheng Yu
2019-06-06 20:06 ` [PATCH v7 15/27] mm: Handle shadow stack page fault Yu-cheng Yu
2019-06-07  7:30   ` Peter Zijlstra
2019-06-06 20:06 ` [PATCH v7 16/27] mm: Handle THP/HugeTLB " Yu-cheng Yu
2019-06-06 20:06 ` [PATCH v7 17/27] mm: Update can_follow_write_pte/pmd for shadow stack Yu-cheng Yu
2019-06-06 20:06 ` [PATCH v7 18/27] mm: Introduce do_mmap_locked() Yu-cheng Yu
2019-06-07  7:43   ` Peter Zijlstra
2019-06-07  7:47     ` Peter Zijlstra
2019-06-07 16:16       ` Yu-cheng Yu
2019-06-06 20:06 ` [PATCH v7 19/27] x86/cet/shstk: User-mode shadow stack support Yu-cheng Yu
2019-06-06 20:06 ` [PATCH v7 20/27] x86/cet/shstk: Introduce WRUSS instruction Yu-cheng Yu
2019-06-06 20:06 ` [PATCH v7 21/27] x86/cet/shstk: Handle signals for shadow stack Yu-cheng Yu
2019-06-06 20:06 ` [PATCH v7 22/27] binfmt_elf: Extract .note.gnu.property from an ELF file Yu-cheng Yu
2019-06-07  7:58   ` Peter Zijlstra
2019-06-07 16:17     ` Yu-cheng Yu
2019-06-07 18:01   ` Dave Martin
2019-06-10 16:29     ` Yu-cheng Yu
2019-06-10 16:57       ` Dave Martin
2019-06-10 17:24       ` Florian Weimer
2019-06-11 11:41         ` Dave Martin
2019-06-11 19:31           ` Yu-cheng Yu
2019-06-12  9:32             ` Dave Martin
2019-06-12 19:04               ` Yu-cheng Yu
2019-06-13 13:26                 ` Dave Martin
2019-06-17 11:08               ` Florian Weimer
2019-06-17 12:20                 ` Thomas Gleixner
2019-06-18  9:12                   ` Dave Martin
2019-06-18 12:41                     ` Peter Zijlstra
2019-06-18 12:47                       ` Florian Weimer
2019-06-18 12:55                         ` Peter Zijlstra
2019-06-18 13:32                           ` Dave Martin
2019-06-18 14:58                             ` Yu-cheng Yu
2019-06-18 15:49                               ` Florian Weimer
2019-06-18 15:53                                 ` Yu-cheng Yu
2019-06-18 16:05                                   ` Florian Weimer
2019-06-18 16:00                                     ` Yu-cheng Yu
2019-06-18 16:20                                       ` Dave Martin
2019-06-18 16:25                                         ` Florian Weimer
2019-06-18 16:50                                           ` Dave Martin
2019-06-06 20:06 ` [PATCH v7 23/27] x86/cet/shstk: ELF header parsing of Shadow Stack Yu-cheng Yu
2019-06-07  7:54   ` Peter Zijlstra
2019-06-06 20:06 ` [PATCH v7 24/27] x86/cet/shstk: Handle thread shadow stack Yu-cheng Yu
2019-06-06 20:06 ` [PATCH v7 25/27] mm/mmap: Add Shadow stack pages to memory accounting Yu-cheng Yu
2019-06-11 17:55   ` Dave Hansen
2019-06-11 19:22     ` Yu-cheng Yu
2019-06-06 20:06 ` [PATCH v7 26/27] x86/cet/shstk: Add arch_prctl functions for Shadow Stack Yu-cheng Yu
2019-06-06 20:06 ` [PATCH v7 27/27] x86/cet/shstk: Add Shadow Stack instructions to opcode map Yu-cheng Yu
2019-11-01 14:03   ` Adrian Hunter
2019-11-01 14:17     ` Yu-cheng Yu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).