linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: jeffxu@chromium.org
To: akpm@linux-foundation.org, keescook@chromium.org,
	jannh@google.com, sroettger@google.com, willy@infradead.org,
	gregkh@linuxfoundation.org, torvalds@linux-foundation.org
Cc: jeffxu@google.com, jorgelo@chromium.org, groeck@chromium.org,
	linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-mm@kvack.org, surenb@google.com, alex.sierra@amd.com,
	apopple@nvidia.com, aneesh.kumar@linux.ibm.com,
	axelrasmussen@google.com, ben@decadent.org.uk,
	catalin.marinas@arm.com, david@redhat.com, dwmw@amazon.co.uk,
	ying.huang@intel.com, hughd@google.com, joey.gouly@arm.com,
	corbet@lwn.net, wangkefeng.wang@huawei.com,
	Liam.Howlett@oracle.com, lstoakes@gmail.com,
	mawupeng1@huawei.com, linmiaohe@huawei.com, namit@vmware.com,
	peterx@redhat.com, peterz@infradead.org, ryan.roberts@arm.com,
	shr@devkernel.io, vbabka@suse.cz, xiujianfeng@huawei.com,
	yu.ma@intel.com, zhangpeng362@huawei.com, dave.hansen@intel.com,
	luto@kernel.org, linux-hardening@vger.kernel.org
Subject: [RFC PATCH v2 0/8] Introduce mseal() syscall
Date: Tue, 17 Oct 2023 09:08:07 +0000	[thread overview]
Message-ID: <20231017090815.1067790-1-jeffxu@chromium.org> (raw)

From: Jeff Xu <jeffxu@google.com>

This patchset proposes a new mseal() syscall for the Linux kernel.

Modern CPUs support memory permissions such as RW and NX bits. Linux has
supported NX since the release of kernel version 2.6.8 in August 2004 [1].
The memory permission feature improves security stance on memory
corruption bugs, i.e. the attacker can’t just write to arbitrary memory
and point the code to it, the memory has to be marked with X bit, or
else an exception will happen. The protection is set by mmap(2),
mprotect(2), mremap(2).

Memory sealing additionally protects the mapping itself against
modifications. This is useful to mitigate memory corruption issues where
a corrupted pointer is passed to a memory management syscall. For example,
such an attacker primitive can break control-flow integrity guarantees
since read-only memory that is supposed to be trusted can become writable
or .text pages can get remapped. Memory sealing can automatically be
applied by the runtime loader to seal .text and .rodata pages and
applications can additionally seal security critical data at runtime.
A similar feature already exists in the XNU kernel with the
VM_FLAGS_PERMANENT [3] flag and on OpenBSD with the mimmutable syscall [4].
Also, Chrome wants to adopt this feature for their CFI work [2] and this
patchset has been designed to be compatible with the Chrome use case.

The new mseal() is an architecture independent syscall, and with
following signature:

mseal(void addr, size_t len, unsigned long types, unsigned long flags)

addr/len: memory range.  Must be continuous/allocated memory, or else
mseal() will fail and no VMA is updated. For details on acceptable
arguments, please refer to comments in mseal.c. Those are also fully
covered by the selftest.

types: bit mask to specify which syscall to seal.

Five syscalls can be sealed, as specified by bitmasks:
MM_SEAL_MPROTECT: Deny mprotect(2)/pkey_mprotect(2).
MM_SEAL_MUNMAP: Deny munmap(2).
MM_SEAL_MMAP: Deny mmap(2).
MM_SEAL_MREMAP: Deny mremap(2).
MM_SEAL_MSEAL: Deny adding a new seal type.

Each bit represents sealing for one specific syscall type, e.g.
MM_SEAL_MPROTECT will deny mprotect syscall. The consideration of bitmask
is that the API is extendable, i.e. when needed, the sealing can be
extended to madvise, mlock, etc. Backward compatibility is also easy.

The kernel will remember which seal types are applied, and the application
doesn’t need to repeat all existing seal types in the next mseal().  Once
a seal type is applied, it can’t be unsealed. Call mseal() on an existing
seal type is a no-action, not a failure.

MM_SEAL_MSEAL will deny mseal() calls that try to add a new seal type.

Internally, vm_area_struct adds a new field vm_seals, to store the bit
masks.

For the affected syscalls, such as mprotect, a check(can_modify_mm) for
sealing is added, this usually happens at the early point of the syscall,
before any update is made to VMAs. The effect of that is: if any of the
VMAs in the given address range fails the sealing check, none of the VMA
will be updated.

The idea that inspired this patch comes from Stephen Röttger’s work in
V8 CFI [5],  Chrome browser in ChromeOS will be the first user of this API.

[1] https://kernelnewbies.org/Linux_2_6_8 
[2] https://v8.dev/blog/control-flow-integrity 
[3] https://github.com/apple-oss-distributions/xnu/blob/1031c584a5e37aff177559b9f69dbd3c8c3fd30a/osfmk/mach/vm_statistics.h#L274 
[4] https://man.openbsd.org/mimmutable.2 
[5] https://docs.google.com/document/d/1O2jwK4dxI3nRcOJuPYkonhTkNQfbmwdvxQMyXgeaRHo/edit#heading=h.bvaojj9fu6hc 
 
PATCH history:

v1:
Use _BITUL to define MM_SEAL_XX type.
Use unsigned long for seal type in sys_mseal() and other functions.
Remove internal VM_SEAL_XX type and convert_user_seal_type().
Remove MM_ACTION_XX type.
Remove caller_origin(ON_BEHALF_OF_XX) and replace with sealing bitmask.
Add more comments in code.
Add detailed commit message.

v0:
https://lore.kernel.org/lkml/20231016143828.647848-1-jeffxu@chromium.org/

Jeff Xu (8):
  mseal: Add mseal(2) syscall.
  mseal: Wire up mseal syscall
  mseal: add can_modify_mm and can_modify_vma
  mseal: Check seal flag for mprotect(2)
  mseal: Check seal flag for munmap(2)
  mseal: Check seal flag for mremap(2)
  mseal:Check seal flag for mmap(2)
  selftest mm/mseal mprotect/munmap/mremap/mmap

 arch/alpha/kernel/syscalls/syscall.tbl      |    1 +
 arch/arm/tools/syscall.tbl                  |    1 +
 arch/arm64/include/asm/unistd.h             |    2 +-
 arch/arm64/include/asm/unistd32.h           |    2 +
 arch/ia64/kernel/syscalls/syscall.tbl       |    1 +
 arch/m68k/kernel/syscalls/syscall.tbl       |    1 +
 arch/microblaze/kernel/syscalls/syscall.tbl |    1 +
 arch/mips/kernel/syscalls/syscall_n32.tbl   |    1 +
 arch/mips/kernel/syscalls/syscall_n64.tbl   |    1 +
 arch/mips/kernel/syscalls/syscall_o32.tbl   |    1 +
 arch/parisc/kernel/syscalls/syscall.tbl     |    1 +
 arch/powerpc/kernel/syscalls/syscall.tbl    |    1 +
 arch/s390/kernel/syscalls/syscall.tbl       |    1 +
 arch/sh/kernel/syscalls/syscall.tbl         |    1 +
 arch/sparc/kernel/syscalls/syscall.tbl      |    1 +
 arch/x86/entry/syscalls/syscall_32.tbl      |    1 +
 arch/x86/entry/syscalls/syscall_64.tbl      |    1 +
 arch/xtensa/kernel/syscalls/syscall.tbl     |    1 +
 fs/aio.c                                    |    5 +-
 include/linux/mm.h                          |   44 +-
 include/linux/mm_types.h                    |    7 +
 include/linux/syscalls.h                    |    2 +
 include/uapi/asm-generic/unistd.h           |    5 +-
 include/uapi/linux/mman.h                   |    6 +
 ipc/shm.c                                   |    3 +-
 kernel/sys_ni.c                             |    1 +
 mm/Kconfig                                  |    8 +
 mm/Makefile                                 |    1 +
 mm/internal.h                               |    4 +-
 mm/mmap.c                                   |   57 +-
 mm/mprotect.c                               |   15 +
 mm/mremap.c                                 |   30 +-
 mm/mseal.c                                  |  268 ++++
 mm/nommu.c                                  |    6 +-
 mm/util.c                                   |    8 +-
 tools/testing/selftests/mm/Makefile         |    1 +
 tools/testing/selftests/mm/mseal_test.c     | 1428 +++++++++++++++++++
 37 files changed, 1891 insertions(+), 28 deletions(-)
 create mode 100644 mm/mseal.c
 create mode 100644 tools/testing/selftests/mm/mseal_test.c

-- 
2.42.0.655.g421f12c284-goog


             reply	other threads:[~2023-10-17  9:08 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-17  9:08 jeffxu [this message]
2023-10-17  9:08 ` [RFC PATCH v2 1/8] mseal: Add mseal(2) syscall jeffxu
2023-10-17 15:45   ` Randy Dunlap
2023-10-17  9:08 ` [RFC PATCH v2 2/8] mseal: Wire up mseal syscall jeffxu
2023-10-17  9:08 ` [RFC PATCH v2 3/8] mseal: add can_modify_mm and can_modify_vma jeffxu
2023-10-17  9:08 ` [RFC PATCH v2 4/8] mseal: Check seal flag for mprotect(2) jeffxu
2023-10-17  9:08 ` [RFC PATCH v2 5/8] mseal: Check seal flag for munmap(2) jeffxu
2023-10-17 16:54   ` Linus Torvalds
2023-10-18 15:08     ` Jeff Xu
2023-10-18 17:14       ` Jeff Xu
2023-10-18 18:27         ` Linus Torvalds
2023-10-18 19:07           ` Jeff Xu
2023-10-17  9:08 ` [RFC PATCH v2 6/8] mseal: Check seal flag for mremap(2) jeffxu
2023-10-20 13:56   ` Muhammad Usama Anjum
2023-10-17  9:08 ` [RFC PATCH v2 7/8] mseal:Check seal flag for mmap(2) jeffxu
2023-10-17 17:04   ` Linus Torvalds
2023-10-17 17:43     ` Linus Torvalds
2023-10-18  7:01       ` Jeff Xu
2023-10-19  7:27       ` Stephen Röttger
2023-10-17  9:08 ` [RFC PATCH v2 8/8] selftest mm/mseal mprotect/munmap/mremap/mmap jeffxu
2023-10-20 14:24   ` Muhammad Usama Anjum
2023-10-20 15:23     ` Peter Zijlstra
2023-10-20 16:33       ` Muhammad Usama Anjum
2023-10-19  9:19 ` [RFC PATCH v2 0/8] Introduce mseal() syscall David Laight

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231017090815.1067790-1-jeffxu@chromium.org \
    --to=jeffxu@chromium.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.sierra@amd.com \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=apopple@nvidia.com \
    --cc=axelrasmussen@google.com \
    --cc=ben@decadent.org.uk \
    --cc=catalin.marinas@arm.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=dwmw@amazon.co.uk \
    --cc=gregkh@linuxfoundation.org \
    --cc=groeck@chromium.org \
    --cc=hughd@google.com \
    --cc=jannh@google.com \
    --cc=jeffxu@google.com \
    --cc=joey.gouly@arm.com \
    --cc=jorgelo@chromium.org \
    --cc=keescook@chromium.org \
    --cc=linmiaohe@huawei.com \
    --cc=linux-hardening@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lstoakes@gmail.com \
    --cc=luto@kernel.org \
    --cc=mawupeng1@huawei.com \
    --cc=namit@vmware.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=ryan.roberts@arm.com \
    --cc=shr@devkernel.io \
    --cc=sroettger@google.com \
    --cc=surenb@google.com \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    --cc=wangkefeng.wang@huawei.com \
    --cc=willy@infradead.org \
    --cc=xiujianfeng@huawei.com \
    --cc=ying.huang@intel.com \
    --cc=yu.ma@intel.com \
    --cc=zhangpeng362@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).