linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Theo de Raadt" <deraadt@openbsd.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jeff Xu <jeffxu@google.com>,
	jeffxu@chromium.org, akpm@linux-foundation.org,
	keescook@chromium.org, sroettger@google.com,
	jorgelo@chromium.org, groeck@chromium.org,
	linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-mm@kvack.org, jannh@google.com, surenb@google.com,
	alex.sierra@amd.com, apopple@nvidia.com,
	aneesh.kumar@linux.ibm.com, axelrasmussen@google.com,
	ben@decadent.org.uk, catalin.marinas@arm.com, david@redhat.com,
	dwmw@amazon.co.uk, ying.huang@intel.com, hughd@google.com,
	joey.gouly@arm.com, corbet@lwn.net, wangkefeng.wang@huawei.com,
	Liam.Howlett@oracle.com, lstoakes@gmail.com, willy@infradead.org,
	mawupeng1@huawei.com, linmiaohe@huawei.com, namit@vmware.com,
	peterx@redhat.com, peterz@infradead.org, ryan.roberts@arm.com,
	shr@devkernel.io, vbabka@suse.cz, xiujianfeng@huawei.com,
	yu.ma@intel.com, zhangpeng362@huawei.com, dave.hansen@intel.com,
	luto@kernel.org, linux-hardening@vger.kernel.org
Subject: Re: [RFC PATCH v1 0/8] Introduce mseal() syscall
Date: Tue, 17 Oct 2023 12:20:04 -0600	[thread overview]
Message-ID: <55960.1697566804@cvs.openbsd.org> (raw)
In-Reply-To: <CAHk-=wj87GMTH=5901ob=SjQqegAm2JYBE7E4J7skJzE64U-wQ@mail.gmail.com>

Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, 17 Oct 2023 at 02:08, Jeff Xu <jeffxu@google.com> wrote:
> >
> > It is probably worth noting that I choose to check one and only
> > one sealing type per syscall. i.e. munmap(2) checks
> > MM_SEAL_MUNMAP only.
> 
> Yeah, this is wrong.
> 
> It's wrong exactly because other system calls will unmap things too.
> 
> Using mmap() to over-map something will unmap the old one.
> 
> Same goes for mremap() to move over an existing mapping.
> 
> So the whole "do things by the name of the system call" is not workable.
> 
> All that matters is what the system calls *do*, not what their name is.

I agree completely...

mseal() is a clone of mimmutable(2), but with an extremely
over-complicated API based upon dubious arguments.

I designed mimmutable(2) [1] in OpenBSD, it took about a year to get all
the components working correctly.  There were many intermediate API
during development, but in the end the API is simply:

     int mimmutable(void *addr, size_t len);

The kernel code for mimmutable() traverses the specified VA range.  In
that range, it will find unmapped sub-regions (which are are ignored)
and mapped sub-regions. For these mapped regions, it does not care what
the permissions are, it just marks each sub-region as immutable.

Later on, when any VM operation request upon a VA range attempts to
      (1) change the permissions
      (2) to re-map on top
      (3) or dispose of the mapping,
that operation is refused with errno EPERM.  We don't care where the
request comes from (ie. what system call).  It is a behaviour of the
VM system, when asked to act upon a VA sub-range mapping.

Very simple semantics.

The only case where the immutable marker is ignored is during address space
teardown as a result of process termination.


In his submission of this API, Jeff Xu makes three claims I find dubious;

> Also, Chrome wants to adopt this feature for their CFI work [2] and this
> patchset has been designed to be compatible with the Chrome use case.

I specifically designed mimmutable(2) with chrome in mind, and the
chrome binary running on OpenBSD is full of immutable mappings.  All the
library regions automatically become immutable because ld.so can infer
it and do the mimmutable calls for the right subregions.

So this chrome work has already been done by OpenBSD, and it is dead
simple.  During early development I thought mimmutable(2) would be
called by applications or libraries, but I was dead wrong: 99.9% of
calls are from ld.so, and no applications need to call it, these are the
two exceptions:

In OpenBSD, mimmutable() is used in libc malloc() to lock-down some data
structures at initialization time, so they canoot be attacked to create
an invariant for use in ROP return-to-libc style methods.

In Chrome, there is a v8_flags variable rounded out to a full page, and
placed in .data.  Chrome initialized this variable, and wants to mprotect
PROT_READ, but .data has been made immutable by ld.so.  So we force this
page into a new ELF section called "openbsd.mutable" which also behaves RW
like .data.  Where chrome does the mprotect  PROT_READ, it now also performs
mimmutable() on that page.

> Having a seal type per syscall type helps to add the feature incrementally.

Yet, somehow OpenBSD didn't do it per syscall, and we managed to make our
entire base operating system and 10,000+ applications automatically receive
the benefits.  In one year's effort.  The only application which cared about
it was chrome, described in the previous paragraph.

I think Jeff's idea here is super dangerous.  What will actually happen
is people will add a few mseal() sub-operations and think the job is done.
It isn't done.  They need all the mseal() requests, or the mapping are
not safe.

It is very counterproductive to provide developers a complex API that has
insecure suboperations.

> Applications also know exactly what is sealed.

Actually applicatins won't know because there is no tooling to inspect this --
but I will argue further that applications don't need to know.  Immutable
marking is a system decision, not a program decision.


I'll close by asking for a new look at the mimmutable(2) API we settled
on for OpenBSD.  I think there is nothing wrong with it.  I'm willing to
help guide glibc / ld.so / musl teams through the problems they may find
along the way, I know where the skeletons are buried.  Two in
particular: -znow RELRO already today, and xonly-text in the future.


[1] https://man.openbsd.org/mimmutable.2


  reply	other threads:[~2023-10-17 18:26 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-16 14:38 [RFC PATCH v1 0/8] Introduce mseal() syscall jeffxu
2023-10-16 14:38 ` [RFC PATCH v1 1/8] Add mseal syscall jeffxu
2023-10-16 15:05   ` Greg KH
2023-10-17  6:50     ` Jeff Xu
2023-10-16 14:38 ` [RFC PATCH v1 2/8] Wire up " jeffxu
2023-10-16 14:38 ` [RFC PATCH v1 3/8] mseal: add can_modify_mm and can_modify_vma jeffxu
2023-10-16 14:38 ` [RFC PATCH v1 4/8] mseal: seal mprotect jeffxu
2023-10-16 14:38 ` [RFC PATCH v1 5/8] mseal munmap jeffxu
2023-10-16 14:38 ` [RFC PATCH v1 6/8] mseal mremap jeffxu
2023-10-16 14:38 ` [RFC PATCH v1 7/8] mseal mmap jeffxu
2023-10-16 14:38 ` [RFC PATCH v1 8/8] selftest mm/mseal mprotect/munmap/mremap/mmap jeffxu
2023-10-16 15:18 ` [RFC PATCH v1 0/8] Introduce mseal() syscall Matthew Wilcox
2023-10-17  8:34   ` Jeff Xu
2023-10-17 12:56     ` Matthew Wilcox
2023-10-17 15:29   ` Pedro Falcato
2023-10-17 21:33     ` Jeff Xu
2023-10-17 22:35       ` Pedro Falcato
2023-10-18 18:20         ` Jeff Xu
2023-10-19 17:30           ` Jeff Xu
2023-10-19 22:47             ` Pedro Falcato
2023-10-19 23:06               ` Linus Torvalds
2023-10-23 17:44                 ` Jeff Xu
2023-10-23 17:42               ` Jeff Xu
2023-10-16 17:23 ` Linus Torvalds
2023-10-17  9:07   ` Jeff Xu
2023-10-17 17:22     ` Linus Torvalds
2023-10-17 18:20       ` Theo de Raadt [this message]
2023-10-17 18:38         ` Linus Torvalds
2023-10-17 18:55           ` Theo de Raadt
2023-10-19  8:00           ` Stephen Röttger
2023-10-20 16:27             ` Theo de Raadt
2023-10-24 10:42               ` Stephen Röttger
2023-10-17 23:01         ` Jeff Xu
2023-10-17 23:56           ` Theo de Raadt
2023-10-18  3:18             ` Jeff Xu
2023-10-18  3:37               ` Theo de Raadt
2023-10-18 15:17               ` Matthew Wilcox
2023-10-18 18:54                 ` Jeff Xu
2023-10-18 20:36                   ` Theo de Raadt
2023-10-19  8:28                     ` Stephen Röttger
2023-10-20 15:55                       ` Theo de Raadt
2023-10-16 17:34 ` Jann Horn
2023-10-17  8:42   ` Jeff Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55960.1697566804@cvs.openbsd.org \
    --to=deraadt@openbsd.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.sierra@amd.com \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=apopple@nvidia.com \
    --cc=axelrasmussen@google.com \
    --cc=ben@decadent.org.uk \
    --cc=catalin.marinas@arm.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=dwmw@amazon.co.uk \
    --cc=groeck@chromium.org \
    --cc=hughd@google.com \
    --cc=jannh@google.com \
    --cc=jeffxu@chromium.org \
    --cc=jeffxu@google.com \
    --cc=joey.gouly@arm.com \
    --cc=jorgelo@chromium.org \
    --cc=keescook@chromium.org \
    --cc=linmiaohe@huawei.com \
    --cc=linux-hardening@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lstoakes@gmail.com \
    --cc=luto@kernel.org \
    --cc=mawupeng1@huawei.com \
    --cc=namit@vmware.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=ryan.roberts@arm.com \
    --cc=shr@devkernel.io \
    --cc=sroettger@google.com \
    --cc=surenb@google.com \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    --cc=wangkefeng.wang@huawei.com \
    --cc=willy@infradead.org \
    --cc=xiujianfeng@huawei.com \
    --cc=ying.huang@intel.com \
    --cc=yu.ma@intel.com \
    --cc=zhangpeng362@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).