From: Daniel Colascione <firstname.lastname@example.org> To: Andrea Arcangeli <email@example.com> Cc: Andy Lutomirski <firstname.lastname@example.org>, Jann Horn <email@example.com>, Linus Torvalds <firstname.lastname@example.org>, Pavel Emelyanov <email@example.com>, Lokesh Gidra <firstname.lastname@example.org>, Nick Kralevich <email@example.com>, Nosh Minwalla <firstname.lastname@example.org>, Tim Murray <email@example.com>, Mike Rapoport <firstname.lastname@example.org>, Linux API <email@example.com>, LKML <firstname.lastname@example.org> Subject: Re: [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API. Date: Wed, 23 Oct 2019 13:05:47 -0700 Message-ID: <CAKOZuetKkM=PK2QA8LdXwM8cM8qJvFu4u5bjePWai3XRnHe-pA@mail.gmail.com> (raw) In-Reply-To: <20191023190959.GA9902@redhat.com> On Wed, Oct 23, 2019 at 12:10 PM Andrea Arcangeli <email@example.com> wrote: > On Sat, Oct 12, 2019 at 06:14:23PM -0700, Andy Lutomirski wrote: > > [adding more people because this is going to be an ABI break, sigh] > > That wouldn't break the ABI, no more than when if you boot a kernel > built with CONFIG_USERFAULTFD=n. The change Andy is proposing would convert programs that work with CONFIG_USERFAULTFD=y today into programs that don't work with CONFIG_USERFAULTFD. Whether that counts as an ABI break is above my ability to decide, but IMHO, I think it should count. Such a change at least merits more-than-usual scrutiny. I'd love to get Linus's opinion. > All non-cooperative features can be removed any time in a backwards > compatible way, the only precaution is to mark their feature bits as > reserved so they can't be reused for something else later. > > > least severely restricted. A .read implementation MUST NOT ACT ON THE > > CALLING TASK. Ever. Just imagine the effect of passing a userfaultfd > > as stdin to a setuid program. > > With UFFD_EVENT_FORK, the newly created uffd that controls the child, > is not passed to the parent nor to the child. Instead it's passed to > the CRIU monitor only, which has to be already running as root and is > fully trusted and acts a hypervisor (despite there is no hypervisor). The phrase "CRIU monitor" above stands out. :-) Not every process that uses userfaultfd will be CRIU-related, and in particular, there's no requirement right now that limits UFFD_EVENT_FORK to privileged processes. The attack Andy is describing involves a random unprivileged process creating a userfaultfd file object, configuring it to UFFD_EVENT_FORK, somehow (stdin, SCM_RIGHTS, binder, etc.) passing that FD to a more-privileged process, and convincing that privileged process to read(2) that FD and disturb its file descriptor table, which in turn can cause EoP or all kinds of other havoc. This is a serious bug that needs some kind of fix. > On Mon, Oct 14, 2019 at 06:04:22PM +0200, Jann Horn wrote: > > FWIW, <https://codesearch.debian.net/search?q=UFFD_FEATURE_EVENT_FORK&literal=1> > > just shows the kernel, kernel selftests, and strace code for decoding > > syscall arguments. CRIU uses it though (probably for postcopy live > > migration / lazy migration?), I guess that code isn't in debian for > > some reason. > > https://criu.org/Userfaultfd#Limitations > > The CRIU developers did a truly amazing job by making container post > copy live migration work great for a subset of apps, that alone was an > amazing achievement. Is that achievement enough to use post copy live > migration of bare metal containers in production? Unfortunately > probably not and not just in debian. Nobody is claiming that there's anything wrong with UFFD. That UFFD is being used for features that have nothing to do with CRIU or containerization is a signal that UFFD's creators made a good, general-purpose tool. (We're considering it for two completely unrelated purposes in Android in fact.) I don't think we can assume that the UFFD feature has gone unused on the basis of CRIU's slower-than-hoped-for adoption. Who's using it for something? *Probably* nobody, but like I said above, it's worth thinking about and being careful. > In my view there's simply no justification not to use virtual machines > when the alternative requires so much more code to be written and so > much more complexity to be dealt with. This is a debate that won't get resolved here. A ton of work has gone into namespaces, migration, various cgroup things, and so on, and I don't see that work getting torn out. > While at it, as far as userfaultfd is concerned I'd rather see people > spend time to write a malloc library that uses userfaultfd with the > UFFD_FEATURE_SIGBUS features and it replaces mmap with UFFDIO_ZEROPAGE > (plus adding the THP accelleration currently missing) I'd also like to see realloc(3) use mremap(2) in real implementations and for C++ to grow an allocator interface that can use realloc(3). But I think that's a separate matter. > and munmap with > MADV_DONTNEED to do allocations and freeing of memory with full > scalability without ever hitting on the map sem for writing. Some allocators, e.g., jemalloc, already use MADV_DONTNEED. > fork COWs cannot throttle Sure they can. Can't we stick processes in a memcg and set a memory.high threshold beyond which threads in that cgroup will enter direct reclaim on page allocations? I'd call that throttling. > and all apps using fork() risk to hit on x2 > memory usage which can become oom-killer material if the memory size > of the process is huge. fork is one of the reasons people use overcommit all the time. I'd like to see a lot less overcommit in the world. > On my side, instead of trying to fix whatever issue in > UFFD_EVENT_FORK, This issue *has* to get fixed one way or another.
next prev parent reply index Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-10-12 19:15 [PATCH 0/7] Harden userfaultfd Daniel Colascione 2019-10-12 19:15 ` [PATCH 1/7] Add a new flags-accepting interface for anonymous inodes Daniel Colascione 2019-10-14 4:26 ` kbuild test robot 2019-10-14 15:38 ` Jann Horn 2019-10-14 18:15 ` Daniel Colascione 2019-10-14 18:30 ` Jann Horn 2019-10-15 8:08 ` Christoph Hellwig 2019-10-12 19:15 ` [PATCH 2/7] Add a concept of a "secure" anonymous file Daniel Colascione 2019-10-14 3:01 ` kbuild test robot 2019-10-15 8:08 ` Christoph Hellwig 2019-10-12 19:15 ` [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API Daniel Colascione 2019-10-12 23:10 ` Andy Lutomirski 2019-10-13 0:51 ` Daniel Colascione 2019-10-13 1:14 ` Andy Lutomirski 2019-10-13 1:38 ` Daniel Colascione 2019-10-14 16:04 ` Jann Horn 2019-10-23 19:09 ` Andrea Arcangeli 2019-10-23 19:21 ` Andy Lutomirski 2019-10-23 21:16 ` Andrea Arcangeli 2019-10-23 21:25 ` Andy Lutomirski 2019-10-23 22:41 ` Andrea Arcangeli 2019-10-23 23:01 ` Andy Lutomirski 2019-10-23 23:27 ` Andrea Arcangeli 2019-10-23 20:05 ` Daniel Colascione [this message] 2019-10-24 0:23 ` Andrea Arcangeli 2019-10-23 20:15 ` Linus Torvalds 2019-10-24 9:02 ` Mike Rapoport 2019-10-24 15:10 ` Andrea Arcangeli 2019-10-25 20:12 ` Mike Rapoport 2019-10-22 21:27 ` Daniel Colascione 2019-10-23 4:11 ` Andy Lutomirski 2019-10-23 7:29 ` Cyrill Gorcunov 2019-10-23 12:43 ` Mike Rapoport 2019-10-23 17:13 ` Andy Lutomirski 2019-10-12 19:15 ` [PATCH 4/7] Teach SELinux about a new userfaultfd class Daniel Colascione 2019-10-12 23:08 ` Andy Lutomirski 2019-10-13 0:11 ` Daniel Colascione 2019-10-13 0:46 ` Andy Lutomirski 2019-10-12 19:16 ` [PATCH 5/7] Let userfaultfd opt out of handling kernel-mode faults Daniel Colascione 2019-10-12 19:16 ` [PATCH 6/7] Allow users to require UFFD_SECURE Daniel Colascione 2019-10-12 23:12 ` Andy Lutomirski 2019-10-12 19:16 ` [PATCH 7/7] Add a new sysctl for limiting userfaultfd to user mode faults Daniel Colascione 2019-10-16 0:02 ` [PATCH 0/7] Harden userfaultfd James Morris 2019-11-15 15:09 ` Stephen Smalley
Reply instructions: You may reply publically to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to='CAKOZuetKkM=PK2QA8LdXwM8cM8qJvFu4u5bjePWai3XRnHe-pA@mail.gmail.com' \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
LKML Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \ email@example.com public-inbox-index lkml Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel AGPL code for this site: git clone https://public-inbox.org/public-inbox.git