linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: jeffxu@chromium.org
Cc: akpm@linux-foundation.org, keescook@chromium.org,
	sroettger@google.com, jeffxu@google.com, jorgelo@chromium.org,
	groeck@chromium.org, linux-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org, linux-mm@kvack.org,
	jannh@google.com, surenb@google.com, alex.sierra@amd.com,
	apopple@nvidia.com, aneesh.kumar@linux.ibm.com,
	axelrasmussen@google.com, ben@decadent.org.uk,
	catalin.marinas@arm.com, david@redhat.com, dwmw@amazon.co.uk,
	ying.huang@intel.com, hughd@google.com, joey.gouly@arm.com,
	corbet@lwn.net, wangkefeng.wang@huawei.com,
	Liam.Howlett@oracle.com, lstoakes@gmail.com, willy@infradead.org,
	mawupeng1@huawei.com, linmiaohe@huawei.com, namit@vmware.com,
	peterx@redhat.com, peterz@infradead.org, ryan.roberts@arm.com,
	shr@devkernel.io, vbabka@suse.cz, xiujianfeng@huawei.com,
	yu.ma@intel.com, zhangpeng362@huawei.com, dave.hansen@intel.com,
	luto@kernel.org, linux-hardening@vger.kernel.org
Subject: Re: [RFC PATCH v1 0/8] Introduce mseal() syscall
Date: Mon, 16 Oct 2023 10:23:29 -0700	[thread overview]
Message-ID: <CAHk-=whFZoap+DBTYvJx6ohqPwn11Puzh7q4huFWDX9vBwXHgg@mail.gmail.com> (raw)
In-Reply-To: <20231016143828.647848-1-jeffxu@chromium.org>

On Mon, 16 Oct 2023 at 07:38, <jeffxu@chromium.org> wrote:
>
> This patchset proposes a new mseal() syscall for the Linux kernel.

So I have no objections to adding some kind of "lock down memory
mappings" model, but this isn't it.

First off, the simple stuff: the commit messages are worthless. Having

   check seal for mmap(2)

as the commit message is not even remotely acceptable, to pick one
random example from the series (7/8).

But that doesn't matter much, since I think the more fundamental
problems are much worse:

 - the whole "ON_BEHALF_OF_KERNEL" and "ON_BEHALF_OF_USERSPACE" is
just complete noise and totally illogical. The whole concept needs to
be redone.

Example of complete nonsense (again, picking 7/8, but that's again
just a random example):

> @@ -3017,8 +3022,8 @@ SYSCALL_DEFINE5(remap_file_pages,
>                 flags |= MAP_LOCKED;
>
>         file = get_file(vma->vm_file);
> -       ret = do_mmap(vma->vm_file, start, size,
> -                       prot, flags, pgoff, &populate, NULL);
> +       ret = do_mmap(vma->vm_file, start, size, prot, flags, pgoff,
> +                       &populate, NULL, ON_BEHALF_OF_KERNEL);
>         fput(file);
>  out:
>         mmap_write_unlock(mm);

Christ. That's *literally* the remap_file_pages() system call
definition. No way in hell does "ON_BEHALF_OF_KERNEL" make any sense
in this context.

It's not the only situation. "mremap() as the same thing. vm_munmap()
has the same thing. vm_brk_flags() has the same thing. None of them
make any sense.

But even if those obvious kinds of complete mis-designs were to be
individually fixed, the whole naming and concept is bogus. The
"ON_BEHALF_OF_KERNEL" thing seems to actually just mean "don't check
sealing". Naming should describe what the thing *means*, not some
random policy thing that may or may not be at all relevant.

 - the whole MM_SEAL_xx vs VM_SEAL_xx artificial distinction needs to go away.

Not only is it unacceptable to pointlessly create two different name
spaces for no obvious reason, code like this (from 1/8) should not
exist:

> +       if (types & MM_SEAL_MSEAL)
> +               newtypes |= VM_SEAL_MSEAL;
> +
> +       if (types & MM_SEAL_MPROTECT)
> +               newtypes |= VM_SEAL_MPROTECT;
> +
> +       if (types & MM_SEAL_MUNMAP)
> +               newtypes |= VM_SEAL_MUNMAP;
> +
> +       if (types & MM_SEAL_MMAP)
> +               newtypes |= VM_SEAL_MMAP;
> +
> +       if (types & MM_SEAL_MREMAP)
> +               newtypes |= VM_SEAL_MREMAP;

Sure, we have that in some cases when there was a *reason* for
namespacing imposed on us from an API standpoint (ie the "open()"
flags that get turned into FMODE_xyz flags, or the MS_xyz mount flags
being turned into MNT_xyz flags), but those tend to be signs of
problems in the system call API where some mixup made it impossible to
use the flags directly (most commonly because there is some extended
internal representation of said things).

For example, the MS_xyz namespace is a combination of "these are flags
for the new mount" (like MS_RDONLY) and "this is how you should mount
it" (like MS_REMOUNT), and so the user interface has that odd mixing
of different things, and the MNT_xyz namespace is a distillation of
the former.

But we certainly should not strive to introduce *new* interfaces that
start out with this kind of mixup and pointless "translate from one
bit to another" code.

 - And finally (for now), I hate that MM_ACTION_xyz thing too!

Why do we have MM_ACTION_MREMAP, and pointless like this (from 3/8):

> +       switch (action) {
> +       case MM_ACTION_MPROTECT:
> +               if (vma->vm_seals & VM_SEAL_MPROTECT)
> +                       return false;
> +               break;

when the sane thing to do is to use the *same* MM_SEAL_xyz bitmask and
sealing bitmask and just say

        if (vma->vm_seal & vm_action)
                return -EPERM;

IOW, you have pointlessly introduced not *two* different namespaces,
but *three*. All doing the exact same thing, and all just causing
pointless and ugly code to "translate" the actions of one into the
model of another.

So get rid of the "MM_ACTION_xyz" thing. Get rid of ther "VM_SEAL_xyz"
thing. Use *one* single "these are the sealing bits".

And get rid of "enum caller_origin" entirely. I don't know what the
right model for that thing is, but that isn't it.

*Maybe* the right model is some MM_SEAL_OVERRIDE bit that user space
cannot set, but that the kernel can use internally - and if that is
the right model, then dammit, the *uses* should be very very obvious
why the override is fine, because that remap_file_pages() use sure as
hell was *not* obvious.

In fact, it's not at all obvious why anything should override the
sealing bits - EVER. So I'm not convinced that the right model is
"replace ON_BEHALF_OF_KERNEL with MM_SEAL_OVERRIDE". Why would we
*ever* want to override sealing? It sounds like complete garbage. Most
of the users seem to be things like "execve()", which is nonsensical,
since the VM wouldn't have been sealed at that point _anyway_, so
having a "don't bother checking sealing bits" flag seems entirely
useless.

Anyway, this is all a resounding NAK on this series in this form. My
complaints are not some kind of small "fix this up". These are
fundamental issues.

            Linus

  parent reply	other threads:[~2023-10-16 17:23 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-16 14:38 [RFC PATCH v1 0/8] Introduce mseal() syscall jeffxu
2023-10-16 14:38 ` [RFC PATCH v1 1/8] Add mseal syscall jeffxu
2023-10-16 15:05   ` Greg KH
2023-10-17  6:50     ` Jeff Xu
2023-10-16 14:38 ` [RFC PATCH v1 2/8] Wire up " jeffxu
2023-10-16 14:38 ` [RFC PATCH v1 3/8] mseal: add can_modify_mm and can_modify_vma jeffxu
2023-10-16 14:38 ` [RFC PATCH v1 4/8] mseal: seal mprotect jeffxu
2023-10-16 14:38 ` [RFC PATCH v1 5/8] mseal munmap jeffxu
2023-10-16 14:38 ` [RFC PATCH v1 6/8] mseal mremap jeffxu
2023-10-16 14:38 ` [RFC PATCH v1 7/8] mseal mmap jeffxu
2023-10-16 14:38 ` [RFC PATCH v1 8/8] selftest mm/mseal mprotect/munmap/mremap/mmap jeffxu
2023-10-16 15:18 ` [RFC PATCH v1 0/8] Introduce mseal() syscall Matthew Wilcox
2023-10-17  8:34   ` Jeff Xu
2023-10-17 12:56     ` Matthew Wilcox
2023-10-17 15:29   ` Pedro Falcato
2023-10-17 21:33     ` Jeff Xu
2023-10-17 22:35       ` Pedro Falcato
2023-10-18 18:20         ` Jeff Xu
2023-10-19 17:30           ` Jeff Xu
2023-10-19 22:47             ` Pedro Falcato
2023-10-19 23:06               ` Linus Torvalds
2023-10-23 17:44                 ` Jeff Xu
2023-10-23 17:42               ` Jeff Xu
2023-10-16 17:23 ` Linus Torvalds [this message]
2023-10-17  9:07   ` Jeff Xu
2023-10-17 17:22     ` Linus Torvalds
2023-10-17 18:20       ` Theo de Raadt
2023-10-17 18:38         ` Linus Torvalds
2023-10-17 18:55           ` Theo de Raadt
2023-10-19  8:00           ` Stephen Röttger
2023-10-20 16:27             ` Theo de Raadt
2023-10-24 10:42               ` Stephen Röttger
2023-10-17 23:01         ` Jeff Xu
2023-10-17 23:56           ` Theo de Raadt
2023-10-18  3:18             ` Jeff Xu
2023-10-18  3:37               ` Theo de Raadt
2023-10-18 15:17               ` Matthew Wilcox
2023-10-18 18:54                 ` Jeff Xu
2023-10-18 20:36                   ` Theo de Raadt
2023-10-19  8:28                     ` Stephen Röttger
2023-10-20 15:55                       ` Theo de Raadt
2023-10-16 17:34 ` Jann Horn
2023-10-17  8:42   ` Jeff Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=whFZoap+DBTYvJx6ohqPwn11Puzh7q4huFWDX9vBwXHgg@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.sierra@amd.com \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=apopple@nvidia.com \
    --cc=axelrasmussen@google.com \
    --cc=ben@decadent.org.uk \
    --cc=catalin.marinas@arm.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=dwmw@amazon.co.uk \
    --cc=groeck@chromium.org \
    --cc=hughd@google.com \
    --cc=jannh@google.com \
    --cc=jeffxu@chromium.org \
    --cc=jeffxu@google.com \
    --cc=joey.gouly@arm.com \
    --cc=jorgelo@chromium.org \
    --cc=keescook@chromium.org \
    --cc=linmiaohe@huawei.com \
    --cc=linux-hardening@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lstoakes@gmail.com \
    --cc=luto@kernel.org \
    --cc=mawupeng1@huawei.com \
    --cc=namit@vmware.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=ryan.roberts@arm.com \
    --cc=shr@devkernel.io \
    --cc=sroettger@google.com \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=wangkefeng.wang@huawei.com \
    --cc=willy@infradead.org \
    --cc=xiujianfeng@huawei.com \
    --cc=ying.huang@intel.com \
    --cc=yu.ma@intel.com \
    --cc=zhangpeng362@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).