All of lore.kernel.org
 help / color / mirror / Atom feed
From: jeffxu@chromium.org
To: akpm@linux-foundation.org, keescook@chromium.org,
	jannh@google.com, sroettger@google.com, willy@infradead.org,
	gregkh@linuxfoundation.org, torvalds@linux-foundation.org,
	usama.anjum@collabora.com, corbet@lwn.net,
	Liam.Howlett@oracle.com, surenb@google.com, merimus@google.com,
	rdunlap@infradead.org
Cc: jeffxu@google.com, jorgelo@chromium.org, groeck@chromium.org,
	linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-mm@kvack.org, pedro.falcato@gmail.com,
	dave.hansen@intel.com, linux-hardening@vger.kernel.org,
	deraadt@openbsd.org, Jeff Xu <jeffxu@chromium.org>
Subject: [PATCH v10 4/5] mseal:add documentation
Date: Mon, 15 Apr 2024 16:35:23 +0000	[thread overview]
Message-ID: <20240415163527.626541-5-jeffxu@chromium.org> (raw)
In-Reply-To: <20240415163527.626541-1-jeffxu@chromium.org>

From: Jeff Xu <jeffxu@chromium.org>

Add documentation for mseal().

Signed-off-by: Jeff Xu <jeffxu@chromium.org>
---
 Documentation/userspace-api/index.rst |   1 +
 Documentation/userspace-api/mseal.rst | 199 ++++++++++++++++++++++++++
 2 files changed, 200 insertions(+)
 create mode 100644 Documentation/userspace-api/mseal.rst

diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst
index afecfe3cc4a8..5926115ec0ed 100644
--- a/Documentation/userspace-api/index.rst
+++ b/Documentation/userspace-api/index.rst
@@ -20,6 +20,7 @@ System calls
    futex2
    ebpf/index
    ioctl/index
+   mseal
 
 Security-related interfaces
 ===========================
diff --git a/Documentation/userspace-api/mseal.rst b/Documentation/userspace-api/mseal.rst
new file mode 100644
index 000000000000..4132eec995a3
--- /dev/null
+++ b/Documentation/userspace-api/mseal.rst
@@ -0,0 +1,199 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=====================
+Introduction of mseal
+=====================
+
+:Author: Jeff Xu <jeffxu@chromium.org>
+
+Modern CPUs support memory permissions such as RW and NX bits. The memory
+permission feature improves security stance on memory corruption bugs, i.e.
+the attacker can’t just write to arbitrary memory and point the code to it,
+the memory has to be marked with X bit, or else an exception will happen.
+
+Memory sealing additionally protects the mapping itself against
+modifications. This is useful to mitigate memory corruption issues where a
+corrupted pointer is passed to a memory management system. For example,
+such an attacker primitive can break control-flow integrity guarantees
+since read-only memory that is supposed to be trusted can become writable
+or .text pages can get remapped. Memory sealing can automatically be
+applied by the runtime loader to seal .text and .rodata pages and
+applications can additionally seal security critical data at runtime.
+
+A similar feature already exists in the XNU kernel with the
+VM_FLAGS_PERMANENT flag [1] and on OpenBSD with the mimmutable syscall [2].
+
+User API
+========
+mseal()
+-----------
+The mseal() syscall has the following signature:
+
+``int mseal(void addr, size_t len, unsigned long flags)``
+
+**addr/len**: virtual memory address range.
+
+The address range set by ``addr``/``len`` must meet:
+   - The start address must be in an allocated VMA.
+   - The start address must be page aligned.
+   - The end address (``addr`` + ``len``) must be in an allocated VMA.
+   - no gap (unallocated memory) between start and end address.
+
+The ``len`` will be paged aligned implicitly by the kernel.
+
+**flags**: reserved for future use.
+
+**return values**:
+
+- ``0``: Success.
+
+- ``-EINVAL``:
+    - Invalid input ``flags``.
+    - The start address (``addr``) is not page aligned.
+    - Address range (``addr`` + ``len``) overflow.
+
+- ``-ENOMEM``:
+    - The start address (``addr``) is not allocated.
+    - The end address (``addr`` + ``len``) is not allocated.
+    - A gap (unallocated memory) between start and end address.
+
+- ``-EPERM``:
+    - sealing is supported only on 64-bit CPUs, 32-bit is not supported.
+
+- For above error cases, users can expect the given memory range is
+  unmodified, i.e. no partial update.
+
+- There might be other internal errors/cases not listed here, e.g.
+  error during merging/splitting VMAs, or the process reaching the max
+  number of supported VMAs. In those cases, partial updates to the given
+  memory range could happen. However, those cases should be rare.
+
+**Blocked operations after sealing**:
+    Unmapping, moving to another location, and shrinking the size,
+    via munmap() and mremap(), can leave an empty space, therefore
+    can be replaced with a VMA with a new set of attributes.
+
+    Moving or expanding a different VMA into the current location,
+    via mremap().
+
+    Modifying a VMA via mmap(MAP_FIXED).
+
+    Size expansion, via mremap(), does not appear to pose any
+    specific risks to sealed VMAs. It is included anyway because
+    the use case is unclear. In any case, users can rely on
+    merging to expand a sealed VMA.
+
+    mprotect() and pkey_mprotect().
+
+    Some destructive madvice() behaviors (e.g. MADV_DONTNEED)
+    for anonymous memory, when users don't have write permission to the
+    memory. Those behaviors can alter region contents by discarding pages,
+    effectively a memset(0) for anonymous memory.
+
+    Kernel will return -EPERM for blocked operations.
+
+    For blocked operations, one can expect the given address is unmodified,
+    i.e. no partial update. Note, this is different from existing mm
+    system call behaviors, where partial updates are made till an error is
+    found and returned to userspace. To give an example:
+
+    Assume following code sequence:
+
+    - ptr = mmap(null, 8192, PROT_NONE);
+    - munmap(ptr + 4096, 4096);
+    - ret1 = mprotect(ptr, 8192, PROT_READ);
+    - mseal(ptr, 4096);
+    - ret2 = mprotect(ptr, 8192, PROT_NONE);
+
+    ret1 will be -ENOMEM, the page from ptr is updated to PROT_READ.
+
+    ret2 will be -EPERM, the page remains to be PROT_READ.
+
+**Note**:
+
+- mseal() only works on 64-bit CPUs, not 32-bit CPU.
+
+- users can call mseal() multiple times, mseal() on an already sealed memory
+  is a no-action (not error).
+
+- munseal() is not supported.
+
+Use cases:
+==========
+- glibc:
+  The dynamic linker, during loading ELF executables, can apply sealing to
+  non-writable memory segments.
+
+- Chrome browser: protect some security sensitive data-structures.
+
+Notes on which memory to seal:
+==============================
+
+It might be important to note that sealing changes the lifetime of a mapping,
+i.e. the sealed mapping won’t be unmapped till the process terminates or the
+exec system call is invoked. Applications can apply sealing to any virtual
+memory region from userspace, but it is crucial to thoroughly analyze the
+mapping's lifetime prior to apply the sealing.
+
+For example:
+
+- aio/shm
+
+  aio/shm can call mmap()/munmap() on behalf of userspace, e.g. ksys_shmdt() in
+  shm.c. The lifetime of those mapping are not tied to the lifetime of the
+  process. If those memories are sealed from userspace, then munmap() will fail,
+  causing leaks in VMA address space during the lifetime of the process.
+
+- Brk (heap)
+
+  Currently, userspace applications can seal parts of the heap by calling
+  malloc() and mseal().
+  let's assume following calls from user space:
+
+  - ptr = malloc(size);
+  - mprotect(ptr, size, RO);
+  - mseal(ptr, size);
+  - free(ptr);
+
+  Technically, before mseal() is added, the user can change the protection of
+  the heap by calling mprotect(RO). As long as the user changes the protection
+  back to RW before free(), the memory range can be reused.
+
+  Adding mseal() into the picture, however, the heap is then sealed partially,
+  the user can still free it, but the memory remains to be RO. If the address
+  is re-used by the heap manager for another malloc, the process might crash
+  soon after. Therefore, it is important not to apply sealing to any memory
+  that might get recycled.
+
+  Furthermore, even if the application never calls the free() for the ptr,
+  the heap manager may invoke the brk system call to shrink the size of the
+  heap. In the kernel, the brk-shrink will call munmap(). Consequently,
+  depending on the location of the ptr, the outcome of brk-shrink is
+  nondeterministic.
+
+
+Additional notes:
+=================
+As Jann Horn pointed out in [3], there are still a few ways to write
+to RO memory, which is, in a way, by design. Those cases are not covered
+by mseal(). If applications want to block such cases, sandbox tools (such as
+seccomp, LSM, etc) might be considered.
+
+Those cases are:
+
+- Write to read-only memory through /proc/self/mem interface.
+- Write to read-only memory through ptrace (such as PTRACE_POKETEXT).
+- userfaultfd.
+
+The idea that inspired this patch comes from Stephen Röttger’s work in V8
+CFI [4]. Chrome browser in ChromeOS will be the first user of this API.
+
+Reference:
+==========
+[1] https://github.com/apple-oss-distributions/xnu/blob/1031c584a5e37aff177559b9f69dbd3c8c3fd30a/osfmk/mach/vm_statistics.h#L274
+
+[2] https://man.openbsd.org/mimmutable.2
+
+[3] https://lore.kernel.org/lkml/CAG48ez3ShUYey+ZAFsU2i1RpQn0a5eOs2hzQ426FkcgnfUGLvA@mail.gmail.com
+
+[4] https://docs.google.com/document/d/1O2jwK4dxI3nRcOJuPYkonhTkNQfbmwdvxQMyXgeaRHo/edit#heading=h.bvaojj9fu6hc
-- 
2.44.0.683.g7961c838ac-goog


  parent reply	other threads:[~2024-04-15 16:35 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-15 16:35 [PATCH v10 0/5] Introduce mseal jeffxu
2024-04-15 16:35 ` [PATCH v10 1/5] mseal: Wire up mseal syscall jeffxu
2024-04-15 18:12   ` Muhammad Usama Anjum
2024-04-15 18:21     ` Linus Torvalds
2024-04-15 19:06       ` Jeff Xu
2024-04-15 16:35 ` [PATCH v10 2/5] mseal: add " jeffxu
2024-04-16 14:59   ` Liam R. Howlett
2024-04-16 15:17     ` Jann Horn
2024-04-16 16:42     ` Theo de Raadt
2024-04-15 16:35 ` [PATCH v10 3/5] selftest mm/mseal memory sealing jeffxu
2024-04-15 18:32   ` Muhammad Usama Anjum
2024-04-15 20:27     ` Jeff Xu
2024-04-16  0:34       ` Kees Cook
2024-05-02 11:24   ` Ryan Roberts
2024-05-02 15:18     ` Jeff Xu
2024-05-02 22:39     ` Jeff Xu
2024-05-03  8:30       ` Ryan Roberts
2024-04-15 16:35 ` jeffxu [this message]
2024-04-15 16:35 ` [PATCH v10 5/5] selftest mm/mseal read-only elf memory segment jeffxu
2024-04-16 15:13 ` [PATCH v10 0/5] Introduce mseal Liam R. Howlett
2024-04-16 19:40   ` Jeff Xu
2024-04-18 20:19     ` Suren Baghdasaryan
2024-04-19  1:22       ` Jeff Xu
2024-04-19 14:57         ` Suren Baghdasaryan
2024-04-19 15:14           ` Jeff Xu
2024-04-19 16:54             ` Suren Baghdasaryan
2024-04-19 17:59         ` Pedro Falcato
2024-04-20  1:23           ` Jeff Xu
2024-05-14 17:46 ` Andrew Morton
2024-05-14 19:52   ` Kees Cook
2024-05-14 20:59   ` Jonathan Corbet
2024-05-14 21:28     ` Matthew Wilcox
2024-05-14 22:48       ` Theo de Raadt
2024-05-14 23:01         ` Andrew Morton
2024-05-14 23:47           ` Theo de Raadt
2024-05-15  2:58             ` Willy Tarreau
2024-05-15  3:36               ` Linus Torvalds
2024-05-15  4:14                 ` Linus Torvalds
2024-05-15  6:14                   ` Willy Tarreau
2024-05-15  0:43         ` Linus Torvalds
2024-05-15  0:57           ` Theo de Raadt
2024-05-15  1:20             ` Linus Torvalds
2024-05-15  1:47               ` Theo de Raadt
2024-05-15  2:28                 ` Linus Torvalds
2024-05-15  2:42                   ` Theo de Raadt
2024-05-15  4:53                     ` Liam R. Howlett
2024-05-14 21:28   ` Liam R. Howlett
2024-05-15 17:18     ` Jeff Xu
2024-05-15 22:19       ` Liam R. Howlett
2024-05-16  0:59         ` Jeff Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240415163527.626541-5-jeffxu@chromium.org \
    --to=jeffxu@chromium.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@intel.com \
    --cc=deraadt@openbsd.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=groeck@chromium.org \
    --cc=jannh@google.com \
    --cc=jeffxu@google.com \
    --cc=jorgelo@chromium.org \
    --cc=keescook@chromium.org \
    --cc=linux-hardening@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=merimus@google.com \
    --cc=pedro.falcato@gmail.com \
    --cc=rdunlap@infradead.org \
    --cc=sroettger@google.com \
    --cc=surenb@google.com \
    --cc=torvalds@linux-foundation.org \
    --cc=usama.anjum@collabora.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.