From: Andrea Arcangeli <aarcange@redhat.com> To: Andy Lutomirski <luto@kernel.org> Cc: Jann Horn <jannh@google.com>, Daniel Colascione <dancol@google.com>, Linus Torvalds <torvalds@linux-foundation.org>, Pavel Emelyanov <xemul@virtuozzo.com>, Lokesh Gidra <lokeshgidra@google.com>, Nick Kralevich <nnk@google.com>, Nosh Minwalla <nosh@google.com>, Tim Murray <timmurray@google.com>, Mike Rapoport <rppt@linux.vnet.ibm.com>, Linux API <linux-api@vger.kernel.org>, LKML <linux-kernel@vger.kernel.org>, "Dr. David Alan Gilbert" <dgilbert@redhat.com> Subject: Re: [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API. Date: Wed, 23 Oct 2019 17:16:45 -0400 Message-ID: <20191023211645.GC9902@redhat.com> (raw) In-Reply-To: <CALCETrWY+5ynDct7eU_nDUqx=okQvjm=Y5wJvA4ahBja=CQXGw@mail.gmail.com> On Wed, Oct 23, 2019 at 12:21:18PM -0700, Andy Lutomirski wrote: > There are two things going on here. > > 1. Daniel wants to add LSM labels to userfaultfd objects. This seems > reasonable to me. The question, as I understand it, is: who is the > subject that creates a uffd referring to a forked child? I'm sure > this is solvable in any number of straightforward ways, but I think > it's less important than: The new uffd created during fork would definitely need to be accounted on the criu monitor, nor to the parent nor the child, so it'd need to be accounted to the process/context that has the fd in its file descriptors array. But since this is less important let's ignore this for a second. > 2. The existing ABI is busted independently of #1. Suppose you call > userfaultfd to get a userfaultfd and enable UFFD_FEATURE_EVENT_FORK. > Then you do: > > $ sudo <&[userfaultfd number] > > Sudo will read it and get a new fd unexpectedly added to its fd table. > It's worse if SCM_RIGHTS is involved. So the problem is just that a new fd is created. So for this to turn out to a practical issue, it requires finding a reckless suid that won't even bother checking the return value of the open/socket syscalls or some equivalent fd number related side effect. All right that makes more sense now and of course I agree it needs fixing. > So I think we either need to declare that UFFD_FEATURE_EVENT_FORK is > only usable by global root or we need to remove it and maybe re-add it > in some other form. If I had a time machine, I'd rather prefer to do the below: diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index fe6d804a38dc..574062051678 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1958,7 +1958,7 @@ SYSCALL_DEFINE1(userfaultfd, int, flags) return -ENOMEM; refcount_set(&ctx->refcount, 1); - ctx->flags = flags; + ctx->flags = flags | UFFD_CLOEXEC; ctx->features = 0; ctx->state = UFFD_STATE_WAIT_API; ctx->released = false; I mean there's no strong requirement to allow any uffd to survive exec even if UFFD_FEATURE_EVENT_FORK never existed, it's enough if it can be passed through unix domain sockets. Until UFFD_FEATURE_EVENT_FORK come around, there was no particular reason to implicitly enforce O_CLOEXEC on all uffd, it was totally possible to clone() and exec() to pass the fd to a different process. So it never rang a bell that this would turn out to be a problem after UFFD_FEATURE_EVENT_FORK was introduced. There are various ways to approach this: 1) drop all non cooperative features and mark their feature bitflags reserved (no ABI break) 2) enforce UFFD_CLOEXEC with above patch (potential ABI break all userfaultfd users) 3) enforce UFFD_CLOEXEC if UFFD_FEATURE_EVENT_FORK is set (ABI break only if UFFD_FEATURE_EVENT_FORK is set). Note all forked uffd are opened with the same flags inherited from the original uffd. 4) enforce the global root permission check when creating the uffd only if UFFD_FEATURE_EVENT_FORK is set. 5) drop all non cooperative features from API 0xaa and introduce API 0xab with the features back, but with UFFD_CLOEXEC implicitly enforced and with UFFD_CLOEXEC forbidden to be set in the flags 6) stick to API 0xaa and drop only UFFD_FEATURE_EVENT_FORK, but add a UFFD_FEATURE_EVENT_FORK2 that requires UFFD_CLOEXEC to be set (instead of implicitly enforcing it) 7) stick to API 0xaa and drop only UFFD_FEATURE_EVENT_FORK, but add a UFFD_FEATURE_EVENT_FORK2 that does the global root permission check 5 is the non-ABI-break version of 2. 6 is the non-ABI-break version of 3. 7 is the non-ABI-break version of 4. My favorite is 1) for the reason explained in the previous email. However if postcopy live migration of bare metal containers already runs in production anywhere or is at least very close to reach that milestone or if the non-cooperative features are used in production in any other way, we'd like to know where and in such case that will totally change my mind about it. In such case I'd suggest to pick any of the other options except 1). In short there shall be good reason for going through further maintenance burden. Thanks, Andrea
next prev parent reply index Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-10-12 19:15 [PATCH 0/7] Harden userfaultfd Daniel Colascione 2019-10-12 19:15 ` [PATCH 1/7] Add a new flags-accepting interface for anonymous inodes Daniel Colascione 2019-10-14 4:26 ` kbuild test robot 2019-10-14 15:38 ` Jann Horn 2019-10-14 18:15 ` Daniel Colascione 2019-10-14 18:30 ` Jann Horn 2019-10-15 8:08 ` Christoph Hellwig 2019-10-12 19:15 ` [PATCH 2/7] Add a concept of a "secure" anonymous file Daniel Colascione 2019-10-14 3:01 ` kbuild test robot 2019-10-15 8:08 ` Christoph Hellwig 2019-10-12 19:15 ` [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API Daniel Colascione 2019-10-12 23:10 ` Andy Lutomirski 2019-10-13 0:51 ` Daniel Colascione 2019-10-13 1:14 ` Andy Lutomirski 2019-10-13 1:38 ` Daniel Colascione 2019-10-14 16:04 ` Jann Horn 2019-10-23 19:09 ` Andrea Arcangeli 2019-10-23 19:21 ` Andy Lutomirski 2019-10-23 21:16 ` Andrea Arcangeli [this message] 2019-10-23 21:25 ` Andy Lutomirski 2019-10-23 22:41 ` Andrea Arcangeli 2019-10-23 23:01 ` Andy Lutomirski 2019-10-23 23:27 ` Andrea Arcangeli 2019-10-23 20:05 ` Daniel Colascione 2019-10-24 0:23 ` Andrea Arcangeli 2019-10-23 20:15 ` Linus Torvalds 2019-10-24 9:02 ` Mike Rapoport 2019-10-24 15:10 ` Andrea Arcangeli 2019-10-25 20:12 ` Mike Rapoport 2019-10-22 21:27 ` Daniel Colascione 2019-10-23 4:11 ` Andy Lutomirski 2019-10-23 7:29 ` Cyrill Gorcunov 2019-10-23 12:43 ` Mike Rapoport 2019-10-23 17:13 ` Andy Lutomirski 2019-10-12 19:15 ` [PATCH 4/7] Teach SELinux about a new userfaultfd class Daniel Colascione 2019-10-12 23:08 ` Andy Lutomirski 2019-10-13 0:11 ` Daniel Colascione 2019-10-13 0:46 ` Andy Lutomirski 2019-10-12 19:16 ` [PATCH 5/7] Let userfaultfd opt out of handling kernel-mode faults Daniel Colascione 2019-10-12 19:16 ` [PATCH 6/7] Allow users to require UFFD_SECURE Daniel Colascione 2019-10-12 23:12 ` Andy Lutomirski 2019-10-12 19:16 ` [PATCH 7/7] Add a new sysctl for limiting userfaultfd to user mode faults Daniel Colascione 2019-10-16 0:02 ` [PATCH 0/7] Harden userfaultfd James Morris 2019-11-15 15:09 ` Stephen Smalley
Reply instructions: You may reply publically to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20191023211645.GC9902@redhat.com \ --to=aarcange@redhat.com \ --cc=dancol@google.com \ --cc=dgilbert@redhat.com \ --cc=jannh@google.com \ --cc=linux-api@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=lokeshgidra@google.com \ --cc=luto@kernel.org \ --cc=nnk@google.com \ --cc=nosh@google.com \ --cc=rppt@linux.vnet.ibm.com \ --cc=timmurray@google.com \ --cc=torvalds@linux-foundation.org \ --cc=xemul@virtuozzo.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
LKML Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \ linux-kernel@vger.kernel.org public-inbox-index lkml Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel AGPL code for this site: git clone https://public-inbox.org/public-inbox.git