All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pedro Falcato <pedro.falcato@gmail.com>
To: Jeff Xu <jeffxu@google.com>
Cc: Matthew Wilcox <willy@infradead.org>,
	jeffxu@chromium.org, akpm@linux-foundation.org,
	keescook@chromium.org, sroettger@google.com,
	jorgelo@chromium.org, groeck@chromium.org,
	linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-mm@kvack.org, jannh@google.com, surenb@google.com,
	alex.sierra@amd.com, apopple@nvidia.com,
	aneesh.kumar@linux.ibm.com, axelrasmussen@google.com,
	ben@decadent.org.uk, catalin.marinas@arm.com, david@redhat.com,
	dwmw@amazon.co.uk, ying.huang@intel.com, hughd@google.com,
	joey.gouly@arm.com, corbet@lwn.net, wangkefeng.wang@huawei.com,
	Liam.Howlett@oracle.com, torvalds@linux-foundation.org,
	lstoakes@gmail.com, mawupeng1@huawei.com, linmiaohe@huawei.com,
	namit@vmware.com, peterx@redhat.com, peterz@infradead.org,
	ryan.roberts@arm.com, shr@devkernel.io, vbabka@suse.cz,
	xiujianfeng@huawei.com, yu.ma@intel.com, zhangpeng362@huawei.com,
	dave.hansen@intel.com, luto@kernel.org,
	linux-hardening@vger.kernel.org
Subject: Re: [RFC PATCH v1 0/8] Introduce mseal() syscall
Date: Thu, 19 Oct 2023 23:47:46 +0100	[thread overview]
Message-ID: <CAKbZUD12pEaDCLysOpT3yL3064=P28Pm3c=UBqhOZYeBP026WA@mail.gmail.com> (raw)
In-Reply-To: <CALmYWFt71Vi6ySiZhW+tmE-LZL7Tnu-dQ1uMO10DUkASUTxzKA@mail.gmail.com>

On Thu, Oct 19, 2023 at 6:30 PM Jeff Xu <jeffxu@google.com> wrote:
>
> Hi Pedro
>
> Some followup on mmap() + mprotect():
>
> On Wed, Oct 18, 2023 at 11:20 AM Jeff Xu <jeffxu@google.com> wrote:
> >
> > On Tue, Oct 17, 2023 at 3:35 PM Pedro Falcato <pedro.falcato@gmail.com> wrote:
> > >
> > > > >
> > > > > I think it's worth pointing out that this suggestion (with PROT_*)
> > > > > could easily integrate with mmap() and as such allow for one-shot
> > > > > mmap() + mseal().
> > > > > If we consider the common case as 'addr = mmap(...); mseal(addr);', it
> > > > > definitely sounds like a performance win as we halve the number of
> > > > > syscalls for a sealed mapping. And if we trivially look at e.g OpenBSD
> > > > > ld.so code, mmap() + mimmutable() and mprotect() + mimmutable() seem
> > > > > like common patterns.
> > > > >
> > > > Yes. mmap() can support sealing as well, and memory is allocated as
> > > > immutable from begining.
> > > > This is orthogonal to mseal() though.
> > >
> > > I don't see how this can be orthogonal to mseal().
> > > In the case we opt for adding PROT_ bits, we should more or less only
> > > need to adapt calc_vm_prot_bits(), and the rest should work without
> > > issues.
> > > vma merging won't merge vmas with different prots. The current
> > > interfaces (mmap and mprotect) would work just fine.
> > > In this case, mseal() or mimmutable() would only be needed if you need
> > > to set immutability over a range of VMAs with different permissions.
> > >
> > Agreed. By orthogonal, I meant we can have two APIs:
> > mmap() and mseal()/mprotect()
> > i.e. we can't just rely on mmap() only without mseal()/mprotect()/mimmutable().
> > Sealing can be applied after initial memory creation.
> >
> > > Note: modifications should look kinda like this: https://godbolt.org/z/Tbjjd14Pe
> > > The only annoying wrench in my plans here is that we have effectively
> > > run out of vm_flags bits in 32-bit architectures, so this approach as
> > > I described is not compatible with 32-bit.
> > >
> > > > In case of ld.so, iiuc, memory can be first allocated as W, then later
> > > > changed to RO, for example, during symbol resolution.
> > > > The important point is that the application can decide what type of
> > > > sealing it wants, and when to apply it.  There needs to be an api(),
> > > > that can be mseal() or mprotect2() or mimmutable(), the naming is not
> > > > important to me.
> > > >
> > > > mprotect() in linux have the following signature:
> > > > int mprotect(void addr[.len], size_t len, int prot);
> > > > the prot bitmasks are all taken here.
> > > > I have not checked the prot field in mmap(), there might be bits left,
> > > > even not, we could have mmap2(), so that is not an issue.
> > >
> > > I don't see what you mean. We have plenty of prot bits left (32-bits,
> > > and we seem to have around 8 different bits used).
> > > And even if we didn't, prot is the same in mprotect and mmap and mmap2 :)
> > >
> > > The only issue seems to be that 32-bit ran out of vm_flags, but that
> > > can probably be worked around if need be.
> > >
> > Ah, you are right about this. vm_flags is full, and prot in mprotect() is not.
> > Apology that I was wrong previously and caused confusion.
> >
> > There is a slight difference in the syntax of mprotect and mseal.
> > Each time when mprotect() is called, the kernel takes all of RWX bits
> > and updates vm_flags,
> > In other words, the application sets/unset each RWX, and kernel takes it.
> >
> > In the mseal() case, the kernel will remember which seal types were
> > applied previously, and the application doesn’t need to repeat all
> > existing seal types in the next mseal().  Once a seal type is applied,
> > it can’t be unsealed.
> >
> > So if we want to use mprotect() for sealing, developers need to think
> > of sealing bits differently than the rest of prot bits. It is a
> > different programming model, might or might not be an obvious concept
> > to developers.
> >
> This probably doesn't matter much to developers.
> We can enforce the sealing bit to be the same as the rest of PROT bits.
> If mprotect() tries to unset sealing, it will fail.

Yep. Erroneous or malicious mprotects would all be caught. However, if
we add a PROT_DOWNGRADEABLE (that could let you, lets say, mprotect()
to less permissions or even downright munmap()) you'd want some care
to preserve that bit when setting permissions.

>
> > There is a difference in input check and error handling as well.
> > for mseal(), if a given address range has a gap (unallocated memory),
> > or if one of VMA is sealed with MM_SEAL_SEAL flag, none of VMAs is
> > updated.
> > For mprotect(), some VMAs can be updated, till an error happens to a VMA.
> >
> This difference doesn't matter much.
>
> For mprotect()/mmap(), is Linux implementation limited by POSIX ?

No. POSIX works merely as a baseline that UNIX systems aim towards.
You can (and very frequently do) extend POSIX interfaces (in fact,
it's how most of POSIX was written, through sheer
"design-by-committee" on a bunch of UNIX systems' extensions).

> This can be made backward compatible.
> If there is no objection to adding linux specific values in mmap() and
> mprotect(),
> This works for me.

Linux already has system-specific values for PROT_ (PROT_BTI,
PROT_MTE, PROT_GROWSUP, PROT_GROWSDOWN, etc).
Whether this is the right interface is another question. I do like it
a lot, but there's of course value in being compatible with existing
solutions (like mimmutable()).

-- 
Pedro

  reply	other threads:[~2023-10-19 22:56 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-16 14:38 [RFC PATCH v1 0/8] Introduce mseal() syscall jeffxu
2023-10-16 14:38 ` [RFC PATCH v1 1/8] Add mseal syscall jeffxu
2023-10-16 15:05   ` Greg KH
2023-10-17  6:50     ` Jeff Xu
2023-10-16 14:38 ` [RFC PATCH v1 2/8] Wire up " jeffxu
2023-10-16 14:38 ` [RFC PATCH v1 3/8] mseal: add can_modify_mm and can_modify_vma jeffxu
2023-10-16 14:38 ` [RFC PATCH v1 4/8] mseal: seal mprotect jeffxu
2023-10-16 14:38 ` [RFC PATCH v1 5/8] mseal munmap jeffxu
2023-10-16 14:38 ` [RFC PATCH v1 6/8] mseal mremap jeffxu
2023-10-16 14:38 ` [RFC PATCH v1 7/8] mseal mmap jeffxu
2023-10-16 14:38 ` [RFC PATCH v1 8/8] selftest mm/mseal mprotect/munmap/mremap/mmap jeffxu
2023-10-16 15:18 ` [RFC PATCH v1 0/8] Introduce mseal() syscall Matthew Wilcox
2023-10-17  8:34   ` Jeff Xu
2023-10-17 12:56     ` Matthew Wilcox
2023-10-17 15:29   ` Pedro Falcato
2023-10-17 21:33     ` Jeff Xu
2023-10-17 22:35       ` Pedro Falcato
2023-10-18 18:20         ` Jeff Xu
2023-10-19 17:30           ` Jeff Xu
2023-10-19 22:47             ` Pedro Falcato [this message]
2023-10-19 23:06               ` Linus Torvalds
2023-10-23 17:44                 ` Jeff Xu
2023-10-23 17:42               ` Jeff Xu
2023-10-16 17:23 ` Linus Torvalds
2023-10-17  9:07   ` Jeff Xu
2023-10-17 17:22     ` Linus Torvalds
2023-10-17 18:20       ` Theo de Raadt
2023-10-17 18:38         ` Linus Torvalds
2023-10-17 18:55           ` Theo de Raadt
2023-10-19  8:00           ` Stephen Röttger
2023-10-19  8:00             ` Stephen Röttger
2023-10-20 16:27             ` Theo de Raadt
2023-10-24 10:42               ` Stephen Röttger
2023-10-17 23:01         ` Jeff Xu
2023-10-17 23:56           ` Theo de Raadt
2023-10-18  3:18             ` Jeff Xu
2023-10-18  3:37               ` Theo de Raadt
2023-10-18 15:17               ` Matthew Wilcox
2023-10-18 18:54                 ` Jeff Xu
2023-10-18 20:36                   ` Theo de Raadt
2023-10-19  8:28                     ` Stephen Röttger
2023-10-20 15:55                       ` Theo de Raadt
2023-10-16 17:34 ` Jann Horn
2023-10-17  8:42   ` Jeff Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKbZUD12pEaDCLysOpT3yL3064=P28Pm3c=UBqhOZYeBP026WA@mail.gmail.com' \
    --to=pedro.falcato@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.sierra@amd.com \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=apopple@nvidia.com \
    --cc=axelrasmussen@google.com \
    --cc=ben@decadent.org.uk \
    --cc=catalin.marinas@arm.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=dwmw@amazon.co.uk \
    --cc=groeck@chromium.org \
    --cc=hughd@google.com \
    --cc=jannh@google.com \
    --cc=jeffxu@chromium.org \
    --cc=jeffxu@google.com \
    --cc=joey.gouly@arm.com \
    --cc=jorgelo@chromium.org \
    --cc=keescook@chromium.org \
    --cc=linmiaohe@huawei.com \
    --cc=linux-hardening@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lstoakes@gmail.com \
    --cc=luto@kernel.org \
    --cc=mawupeng1@huawei.com \
    --cc=namit@vmware.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=ryan.roberts@arm.com \
    --cc=shr@devkernel.io \
    --cc=sroettger@google.com \
    --cc=surenb@google.com \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    --cc=wangkefeng.wang@huawei.com \
    --cc=willy@infradead.org \
    --cc=xiujianfeng@huawei.com \
    --cc=ying.huang@intel.com \
    --cc=yu.ma@intel.com \
    --cc=zhangpeng362@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.