linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Szabolcs Nagy <szabolcs.nagy@arm.com>
To: Rick Edgecombe <rick.p.edgecombe@intel.com>,
	x86@kernel.org, "H . Peter Anvin" <hpa@zytor.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-mm@kvack.org, linux-arch@vger.kernel.org,
	linux-api@vger.kernel.org, Arnd Bergmann <arnd@arndb.de>,
	Andy Lutomirski <luto@kernel.org>,
	Balbir Singh <bsingharora@gmail.com>,
	Borislav Petkov <bp@alien8.de>,
	Cyrill Gorcunov <gorcunov@gmail.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Eugene Syromiatnikov <esyr@redhat.com>,
	Florian Weimer <fweimer@redhat.com>,
	"H . J . Lu" <hjl.tools@gmail.com>, Jann Horn <jannh@google.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Kees Cook <keescook@chromium.org>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Nadav Amit <nadav.amit@gmail.com>,
	Oleg Nesterov <oleg@redhat.com>, Pavel Machek <pavel@ucw.cz>,
	Peter Zijlstra <peterz@infradead.org>,
	Randy Dunlap <rdunlap@infradead.org>,
	Weijiang Yang <weijiang.yang@intel.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	John Allen <john.allen@amd.com>,
	kcc@google.com, eranian@google.com, rppt@kernel.org,
	jamorris@linux.microsoft.com, dethoma@microsoft.com,
	akpm@linux-foundation.org, Andrew.Cooper3@citrix.com,
	christina.schimpe@intel.com, david@redhat.com,
	debug@rivosinc.com, Mark Brown <broonie@kernel.org>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Subject: Re: [PATCH v7 01/41] Documentation/x86: Add CET shadow stack description
Date: Wed, 1 Mar 2023 14:21:41 +0000	[thread overview]
Message-ID: <Y/9fdYQ8Cd0GI+8C@arm.com> (raw)
In-Reply-To: <20230227222957.24501-2-rick.p.edgecombe@intel.com>

The 02/27/2023 14:29, Rick Edgecombe wrote:
> +Application Enabling
> +====================
> +
> +An application's CET capability is marked in its ELF note and can be verified
> +from readelf/llvm-readelf output::
> +
> +    readelf -n <application> | grep -a SHSTK
> +        properties: x86 feature: SHSTK
> +
> +The kernel does not process these applications markers directly. Applications
> +or loaders must enable CET features using the interface described in section 4.
> +Typically this would be done in dynamic loader or static runtime objects, as is
> +the case in GLIBC.

Note that this has to be an early decision in libc (ld.so or static
exe start code), which will be difficult to hook into system wide
security policy settings. (e.g. to force shstk on marked binaries.)

From userspace POV I'd prefer if a static exe did not have to parse
its own ELF notes (i.e. kernel enabled shstk based on the marking).
But I realize if there is a need for complex shstk enable/disable
decision that is better in userspace and if the kernel decision can
be overridden then it might as well all be in userspace.

> +Enabling arch_prctl()'s
> +=======================
> +
> +Elf features should be enabled by the loader using the below arch_prctl's. They
> +are only supported in 64 bit user applications.
> +
> +arch_prctl(ARCH_SHSTK_ENABLE, unsigned long feature)
> +    Enable a single feature specified in 'feature'. Can only operate on
> +    one feature at a time.
> +
> +arch_prctl(ARCH_SHSTK_DISABLE, unsigned long feature)
> +    Disable a single feature specified in 'feature'. Can only operate on
> +    one feature at a time.
> +
> +arch_prctl(ARCH_SHSTK_LOCK, unsigned long features)
> +    Lock in features at their current enabled or disabled status. 'features'
> +    is a mask of all features to lock. All bits set are processed, unset bits
> +    are ignored. The mask is ORed with the existing value. So any feature bits
> +    set here cannot be enabled or disabled afterwards.

The multi-thread behaviour should be documented here: Only the
current thread is affected. So an application can only change the
setting while single-threaded which is only guaranteed before any
user code is executed. Later using the prctl is complicated and
most c runtimes would not want to do that (async signalling all
threads and prctl from the handler).

In particular these interfaces are not suitable to turn shstk off
at dlopen time when an unmarked binary is loaded. Or any other
late shstk policy change will not work, so as far as i can see
the "permissive" mode in glibc does not work.

Does the main thread have shadow stack allocated before shstk is
enabled? is the shadow stack freed when it is disabled? (e.g.
what would the instruction reading the SSP do in disabled state?)

> +Proc Status
> +===========
> +To check if an application is actually running with shadow stack, the
> +user can read the /proc/$PID/status. It will report "wrss" or "shstk"
> +depending on what is enabled. The lines look like this::
> +
> +    x86_Thread_features: shstk wrss
> +    x86_Thread_features_locked: shstk wrss

Presumaly /proc/$TID/status and /proc/$PID/task/$TID/status also
shows the setting and only valid for the specific thread (not the
entire process). So i would note that this for one thread only.

> +Implementation of the Shadow Stack
> +==================================
> +
> +Shadow Stack Size
> +-----------------
> +
> +A task's shadow stack is allocated from memory to a fixed size of
> +MIN(RLIMIT_STACK, 4 GB). In other words, the shadow stack is allocated to
> +the maximum size of the normal stack, but capped to 4 GB. However,
> +a compat-mode application's address space is smaller, each of its thread's
> +shadow stack size is MIN(1/4 RLIMIT_STACK, 4 GB).

This policy tries to handle all threads with the same shadow stack
size logic, which has limitations. I think it should be improved
(otherwise some applications will have to turn shstk off):

- RLIMIT_STACK is not an upper bound for the main thread stack size
  (rlimit can increase/decrease dynamically).
- RLIMIT_STACK only applies to the main thread, so it is not an upper
  bound for non-main thread stacks.
- i.e. stack size >> startup RLIMIT_STACK is possible and then shadow
  stack can overflow.
- stack size << startup RLIMIT_STACK is also possible and then VA
  space is wasted (can lead to OOM with strict memory overcommit).
- clone3 tells the kernel the thread stack size so that should be
  used instead of RLIMIT_STACK. (clone does not though.)
- I think it's better to have a new limit specifically for shadow
  stack size (which by default can be RLIMIT_STACK) so userspace
  can adjust it if needed (another reason is that stack size is
  not always a good indicator of max call depth).

> +Signal
> +------
> +
> +By default, the main program and its signal handlers use the same shadow
> +stack. Because the shadow stack stores only return addresses, a large
> +shadow stack covers the condition that both the program stack and the
> +signal alternate stack run out.

What does "by default" mean here? Is there a case when the signal handler
is not entered with SSP set to the handling thread'd shadow stack?

> +When a signal happens, the old pre-signal state is pushed on the stack. When
> +shadow stack is enabled, the shadow stack specific state is pushed onto the
> +shadow stack. Today this is only the old SSP (shadow stack pointer), pushed
> +in a special format with bit 63 set. On sigreturn this old SSP token is
> +verified and restored by the kernel. The kernel will also push the normal
> +restorer address to the shadow stack to help userspace avoid a shadow stack
> +violation on the sigreturn path that goes through the restorer.

The kernel pushes on the shadow stack on signal entry so shadow stack
overflow cannot be handled. Please document this as non-recoverable
failure.

I think it can be made recoverable if signals with alternate stack run
on a different shadow stack. And the top of the thread shadow stack is
just corrupted instead of pushed in the overflow case. Then longjmp out
can be made to work (common in stack overflow handling cases), and
reliable crash report from the signal handler works (also common).

Does SSP get stored into the sigcontext struct somewhere?

> +Fork
> +----
> +
> +The shadow stack's vma has VM_SHADOW_STACK flag set; its PTEs are required
> +to be read-only and dirty. When a shadow stack PTE is not RO and dirty, a
> +shadow access triggers a page fault with the shadow stack access bit set
> +in the page fault error code.
> +
> +When a task forks a child, its shadow stack PTEs are copied and both the
> +parent's and the child's shadow stack PTEs are cleared of the dirty bit.
> +Upon the next shadow stack access, the resulting shadow stack page fault
> +is handled by page copy/re-use.
> +
> +When a pthread child is created, the kernel allocates a new shadow stack
> +for the new thread. New shadow stack's behave like mmap() with respect to
> +ASLR behavior.

Please document the shadow stack lifetimes here:

I think thread exit unmaps shadow stack and vfork shares shadow stack
with parent so exit does not unmap.

I think the map_shadow_stack syscall should be mentioned in this
document too.

ABI for initial shadow stack entries:

If one wants to scan the shadow stack how to detect the end (e.g. fast
backtrace)? Is it useful to put an invalid value (-1) there?
(affects map_shadow_stack syscall too).

thanks.
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

  reply	other threads:[~2023-03-01 14:23 UTC|newest]

Thread overview: 159+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-27 22:29 [PATCH v7 00/41] Shadow stacks for userspace Rick Edgecombe
2023-02-27 22:29 ` [PATCH v7 01/41] Documentation/x86: Add CET shadow stack description Rick Edgecombe
2023-03-01 14:21   ` Szabolcs Nagy [this message]
2023-03-01 14:38     ` Szabolcs Nagy
2023-03-01 18:07     ` Edgecombe, Rick P
2023-03-01 18:32       ` Edgecombe, Rick P
2023-03-02 16:34         ` szabolcs.nagy
2023-03-03 22:35           ` Edgecombe, Rick P
2023-03-06 16:20             ` szabolcs.nagy
2023-03-06 16:31               ` Florian Weimer
2023-03-06 18:08                 ` Edgecombe, Rick P
2023-03-07 13:03                   ` szabolcs.nagy
2023-03-07 14:00                     ` Florian Weimer
2023-03-07 16:14                       ` Szabolcs Nagy
2023-03-06 18:05               ` Edgecombe, Rick P
2023-03-06 20:31                 ` Liang, Kan
2023-03-02 16:14       ` szabolcs.nagy
2023-03-02 21:17         ` Edgecombe, Rick P
2023-03-03 16:30           ` szabolcs.nagy
2023-03-03 16:57             ` H.J. Lu
2023-03-03 17:39               ` szabolcs.nagy
2023-03-03 17:50                 ` H.J. Lu
2023-03-03 17:41             ` Edgecombe, Rick P
2023-02-27 22:29 ` [PATCH v7 02/41] x86/shstk: Add Kconfig option for shadow stack Rick Edgecombe
2023-02-27 22:29 ` [PATCH v7 03/41] x86/cpufeatures: Add CPU feature flags for shadow stacks Rick Edgecombe
2023-02-27 22:29 ` [PATCH v7 04/41] x86/cpufeatures: Enable CET CR4 bit for shadow stack Rick Edgecombe
2023-02-27 22:29 ` [PATCH v7 05/41] x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states Rick Edgecombe
2023-02-27 22:29 ` [PATCH v7 06/41] x86/fpu: Add helper for modifying xstate Rick Edgecombe
2023-02-27 22:29 ` [PATCH v7 07/41] x86: Move control protection handler to separate file Rick Edgecombe
2023-03-01 15:38   ` Borislav Petkov
2023-02-27 22:29 ` [PATCH v7 08/41] x86/shstk: Add user control-protection fault handler Rick Edgecombe
2023-03-01 18:06   ` Borislav Petkov
2023-03-01 18:14     ` Edgecombe, Rick P
2023-03-01 18:37       ` Borislav Petkov
2023-02-27 22:29 ` [PATCH v7 09/41] x86/mm: Remove _PAGE_DIRTY from kernel RO pages Rick Edgecombe
2023-02-27 22:29 ` [PATCH v7 10/41] x86/mm: Move pmd_write(), pud_write() up in the file Rick Edgecombe
2023-02-27 22:29 ` [PATCH v7 11/41] mm: Introduce pte_mkwrite_kernel() Rick Edgecombe
2023-02-27 22:29 ` [PATCH v7 12/41] s390/mm: Introduce pmd_mkwrite_kernel() Rick Edgecombe
2023-02-27 22:29 ` [PATCH v7 13/41] mm: Make pte_mkwrite() take a VMA Rick Edgecombe
2023-03-01  7:03   ` Christophe Leroy
2023-03-01  8:16     ` David Hildenbrand
2023-03-02 12:19   ` Borislav Petkov
2023-02-27 22:29 ` [PATCH v7 14/41] x86/mm: Introduce _PAGE_SAVED_DIRTY Rick Edgecombe
2023-03-02 12:48   ` Borislav Petkov
2023-03-02 17:01     ` Edgecombe, Rick P
2023-02-27 22:29 ` [PATCH v7 15/41] x86/mm: Update ptep/pmdp_set_wrprotect() for _PAGE_SAVED_DIRTY Rick Edgecombe
2023-02-27 22:29 ` [PATCH v7 16/41] x86/mm: Start actually marking _PAGE_SAVED_DIRTY Rick Edgecombe
2023-02-27 22:29 ` [PATCH v7 17/41] mm: Move VM_UFFD_MINOR_BIT from 37 to 38 Rick Edgecombe
2023-02-27 22:29 ` [PATCH v7 18/41] mm: Introduce VM_SHADOW_STACK for shadow stack memory Rick Edgecombe
2023-02-27 22:29 ` [PATCH v7 19/41] x86/mm: Check shadow stack page fault errors Rick Edgecombe
2023-03-03 14:00   ` Borislav Petkov
2023-03-03 14:39     ` Dave Hansen
2023-02-27 22:29 ` [PATCH v7 20/41] x86/mm: Teach pte_mkwrite() about stack memory Rick Edgecombe
2023-03-03 15:37   ` Borislav Petkov
2023-02-27 22:29 ` [PATCH v7 21/41] mm: Add guard pages around a shadow stack Rick Edgecombe
2023-03-06  8:08   ` Borislav Petkov
2023-03-07  1:29     ` Edgecombe, Rick P
2023-03-07 10:32       ` Borislav Petkov
2023-03-07 10:44         ` David Hildenbrand
2023-03-08 22:48           ` Edgecombe, Rick P
2023-03-17 17:09   ` Deepak Gupta
2023-02-27 22:29 ` [PATCH v7 22/41] mm/mmap: Add shadow stack pages to memory accounting Rick Edgecombe
2023-03-06 13:01   ` Borislav Petkov
2023-03-06 18:11     ` Edgecombe, Rick P
2023-03-06 18:16       ` Borislav Petkov
2023-03-07 10:42   ` David Hildenbrand
2023-03-17 17:12   ` Deepak Gupta
2023-03-17 17:16     ` Dave Hansen
2023-03-17 17:28       ` Deepak Gupta
2023-03-17 17:42         ` Edgecombe, Rick P
2023-03-17 19:26           ` Deepak Gupta
2023-02-27 22:29 ` [PATCH v7 23/41] mm: Re-introduce vm_flags to do_mmap() Rick Edgecombe
2023-02-27 22:29 ` [PATCH v7 24/41] mm: Don't allow write GUPs to shadow stack memory Rick Edgecombe
2023-03-06 13:10   ` Borislav Petkov
2023-03-06 18:15     ` Andy Lutomirski
2023-03-06 18:33       ` Edgecombe, Rick P
2023-03-06 18:57         ` Andy Lutomirski
2023-03-07  1:47           ` Edgecombe, Rick P
2023-03-17 17:05   ` Deepak Gupta
2023-02-27 22:29 ` [PATCH v7 25/41] x86/mm: Introduce MAP_ABOVE4G Rick Edgecombe
2023-03-06 18:09   ` Borislav Petkov
2023-03-07  1:10     ` Edgecombe, Rick P
2023-02-27 22:29 ` [PATCH v7 26/41] mm: Warn on shadow stack memory in wrong vma Rick Edgecombe
2023-03-08  8:53   ` Borislav Petkov
2023-03-08 23:36     ` Edgecombe, Rick P
2023-02-27 22:29 ` [PATCH v7 27/41] x86/mm: Warn if create Write=0,Dirty=1 with raw prot Rick Edgecombe
2023-02-27 22:54   ` Kees Cook
2023-03-08  9:23   ` Borislav Petkov
2023-03-08 23:35     ` Edgecombe, Rick P
2023-02-27 22:29 ` [PATCH v7 28/41] x86: Introduce userspace API for shadow stack Rick Edgecombe
2023-03-08 10:27   ` Borislav Petkov
2023-03-08 23:32     ` Edgecombe, Rick P
2023-03-09 12:57       ` Borislav Petkov
2023-03-09 16:56         ` Edgecombe, Rick P
2023-03-09 23:51           ` Borislav Petkov
2023-03-10  1:13             ` Edgecombe, Rick P
2023-03-10  2:03               ` H.J. Lu
2023-03-10 20:00                 ` H.J. Lu
2023-03-10 20:27                   ` Edgecombe, Rick P
2023-03-10 20:43                     ` H.J. Lu
2023-03-10 21:01                       ` Edgecombe, Rick P
2023-03-10 11:40               ` Borislav Petkov
2023-02-27 22:29 ` [PATCH v7 29/41] x86/shstk: Add user-mode shadow stack support Rick Edgecombe
2023-02-27 22:29 ` [PATCH v7 30/41] x86/shstk: Handle thread shadow stack Rick Edgecombe
2023-03-02 17:34   ` Szabolcs Nagy
2023-03-02 21:48     ` Edgecombe, Rick P
2023-03-08 15:26   ` Borislav Petkov
2023-03-08 20:03     ` Edgecombe, Rick P
2023-03-09 14:12       ` Borislav Petkov
2023-03-09 16:59         ` Edgecombe, Rick P
2023-03-09 17:04           ` Borislav Petkov
2023-03-09 20:29             ` Edgecombe, Rick P
2023-02-27 22:29 ` [PATCH v7 31/41] x86/shstk: Introduce routines modifying shstk Rick Edgecombe
2023-03-09 16:48   ` Borislav Petkov
2023-03-09 17:03     ` Edgecombe, Rick P
2023-03-09 17:22       ` Borislav Petkov
2023-02-27 22:29 ` [PATCH v7 32/41] x86/shstk: Handle signals for shadow stack Rick Edgecombe
2023-03-09 17:02   ` Borislav Petkov
2023-03-09 17:16     ` Edgecombe, Rick P
2023-03-09 23:35       ` Borislav Petkov
2023-02-27 22:29 ` [PATCH v7 33/41] x86/shstk: Introduce map_shadow_stack syscall Rick Edgecombe
2023-03-02 17:22   ` Szabolcs Nagy
2023-03-02 21:21     ` Edgecombe, Rick P
2023-03-09 18:55     ` Deepak Gupta
2023-03-09 19:39       ` Edgecombe, Rick P
2023-03-09 21:08         ` Deepak Gupta
2023-03-10  0:14           ` Edgecombe, Rick P
2023-03-10 21:00             ` Deepak Gupta
2023-03-10 21:43               ` Edgecombe, Rick P
2023-03-16 20:07                 ` Deepak Gupta
2023-03-14  7:19       ` Mike Rapoport
2023-03-16 19:30         ` Deepak Gupta
2023-03-20 11:35           ` Szabolcs Nagy
2023-03-10 16:11   ` Borislav Petkov
2023-03-10 17:12     ` Edgecombe, Rick P
2023-03-10 20:05       ` Borislav Petkov
2023-03-10 20:19         ` Edgecombe, Rick P
2023-03-10 20:26           ` Borislav Petkov
2023-02-27 22:29 ` [PATCH v7 34/41] x86/shstk: Support WRSS for userspace Rick Edgecombe
2023-03-10 16:44   ` Borislav Petkov
2023-03-10 17:16     ` Edgecombe, Rick P
2023-02-27 22:29 ` [PATCH v7 35/41] x86: Expose thread features in /proc/$PID/status Rick Edgecombe
2023-02-27 22:29 ` [PATCH v7 36/41] x86/shstk: Wire in shadow stack interface Rick Edgecombe
2023-02-27 22:29 ` [PATCH v7 37/41] selftests/x86: Add shadow stack test Rick Edgecombe
2023-02-27 22:29 ` [PATCH v7 38/41] x86/fpu: Add helper for initing features Rick Edgecombe
2023-03-11 12:54   ` Borislav Petkov
2023-03-13  2:45     ` Edgecombe, Rick P
2023-03-13 11:03       ` Borislav Petkov
2023-03-13 16:10         ` Edgecombe, Rick P
2023-03-13 17:10           ` Borislav Petkov
2023-03-13 23:31             ` Edgecombe, Rick P
2023-02-27 22:29 ` [PATCH v7 39/41] x86: Add PTRACE interface for shadow stack Rick Edgecombe
2023-03-11 15:06   ` Borislav Petkov
2023-03-13  2:53     ` Edgecombe, Rick P
2023-02-27 22:29 ` [PATCH v7 40/41] x86/shstk: Add ARCH_SHSTK_UNLOCK Rick Edgecombe
2023-03-11 15:11   ` Borislav Petkov
2023-03-13  3:04     ` Edgecombe, Rick P
2023-03-13 11:05       ` Borislav Petkov
2023-02-27 22:29 ` [PATCH v7 41/41] x86/shstk: Add ARCH_SHSTK_STATUS Rick Edgecombe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y/9fdYQ8Cd0GI+8C@arm.com \
    --to=szabolcs.nagy@arm.com \
    --cc=Andrew.Cooper3@citrix.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=broonie@kernel.org \
    --cc=bsingharora@gmail.com \
    --cc=christina.schimpe@intel.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=debug@rivosinc.com \
    --cc=dethoma@microsoft.com \
    --cc=eranian@google.com \
    --cc=esyr@redhat.com \
    --cc=fweimer@redhat.com \
    --cc=gorcunov@gmail.com \
    --cc=hjl.tools@gmail.com \
    --cc=hpa@zytor.com \
    --cc=jamorris@linux.microsoft.com \
    --cc=jannh@google.com \
    --cc=john.allen@amd.com \
    --cc=kcc@google.com \
    --cc=keescook@chromium.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=mingo@redhat.com \
    --cc=nadav.amit@gmail.com \
    --cc=oleg@redhat.com \
    --cc=pavel@ucw.cz \
    --cc=peterz@infradead.org \
    --cc=rdunlap@infradead.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rppt@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=weijiang.yang@intel.com \
    --cc=x86@kernel.org \
    --cc=yu-cheng.yu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).