Linux-api Archive on
 help / color / Atom feed
From: Daniel Colascione <>
To: Andrea Arcangeli <>
Cc: Andy Lutomirski <>, Jann Horn <>,
	Linus Torvalds <>,
	Pavel Emelyanov <>,
	Lokesh Gidra <>,
	Nick Kralevich <>, Nosh Minwalla <>,
	Tim Murray <>,
	Mike Rapoport <>,
	Linux API <>,
	LKML <>
Subject: Re: [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API.
Date: Wed, 23 Oct 2019 13:05:47 -0700
Message-ID: <> (raw)
In-Reply-To: <>

On Wed, Oct 23, 2019 at 12:10 PM Andrea Arcangeli <> wrote:
> On Sat, Oct 12, 2019 at 06:14:23PM -0700, Andy Lutomirski wrote:
> > [adding more people because this is going to be an ABI break, sigh]
> That wouldn't break the ABI, no more than when if you boot a kernel
> built with CONFIG_USERFAULTFD=n.

The change Andy is proposing would convert programs that work with
CONFIG_USERFAULTFD=y today into programs that don't work with
CONFIG_USERFAULTFD. Whether that counts as an ABI break is above my
ability to decide, but IMHO, I think it should count. Such a change at
least merits more-than-usual scrutiny. I'd love to get Linus's

> All non-cooperative features can be removed any time in a backwards
> compatible way, the only precaution is to mark their feature bits as
> reserved so they can't be reused for something else later.
> > least severely restricted.  A .read implementation MUST NOT ACT ON THE
> > CALLING TASK.  Ever.  Just imagine the effect of passing a userfaultfd
> > as stdin to a setuid program.
> With UFFD_EVENT_FORK, the newly created uffd that controls the child,
> is not passed to the parent nor to the child. Instead it's passed to
> the CRIU monitor only, which has to be already running as root and is
> fully trusted and acts a hypervisor (despite there is no hypervisor).

The phrase "CRIU monitor" above stands out.  :-) Not every process
that uses userfaultfd will be CRIU-related, and in particular, there's
no requirement right now that limits UFFD_EVENT_FORK to privileged

The attack Andy is describing involves a random unprivileged process
creating a userfaultfd file object, configuring it to UFFD_EVENT_FORK,
somehow (stdin, SCM_RIGHTS, binder, etc.) passing that FD to a
more-privileged process, and convincing that privileged process to
read(2) that FD and disturb its file descriptor table, which in turn
can cause EoP or all kinds of other havoc. This is a serious bug that
needs some kind of fix.

> On Mon, Oct 14, 2019 at 06:04:22PM +0200, Jann Horn wrote:
> > FWIW, <>
> > just shows the kernel, kernel selftests, and strace code for decoding
> > syscall arguments. CRIU uses it though (probably for postcopy live
> > migration / lazy migration?), I guess that code isn't in debian for
> > some reason.
> The CRIU developers did a truly amazing job by making container post
> copy live migration work great for a subset of apps, that alone was an
> amazing achievement. Is that achievement enough to use post copy live
> migration of bare metal containers in production? Unfortunately
> probably not and not just in debian.

Nobody is claiming that there's anything wrong with UFFD. That UFFD is
being used for features that have nothing to do with CRIU or
containerization is a signal that UFFD's creators made a good,
general-purpose tool. (We're considering it for two completely
unrelated purposes in Android in fact.) I don't think we can assume
that the UFFD feature has gone unused on the basis of CRIU's
slower-than-hoped-for adoption. Who's using it for something?
*Probably* nobody, but like I said above, it's worth thinking about
and being careful.

> In my view there's simply no justification not to use virtual machines
> when the alternative requires so much more code to be written and so
> much more complexity to be dealt with.

This is a debate that won't get resolved here. A ton of work has gone
into namespaces, migration, various cgroup things, and so on, and I
don't see that work getting torn out.

> While at it, as far as userfaultfd is concerned I'd rather see people
> spend time to write a malloc library that uses userfaultfd with the
> UFFD_FEATURE_SIGBUS features and it replaces mmap with UFFDIO_ZEROPAGE
> (plus adding the THP accelleration currently missing)

I'd also like to see realloc(3) use mremap(2) in real implementations
and for C++ to grow an allocator interface that can use realloc(3).
But I think that's a separate matter.

> and munmap with
> MADV_DONTNEED to do allocations and freeing of memory with full
> scalability without ever hitting on the map sem for writing.

Some allocators, e.g., jemalloc, already use MADV_DONTNEED.

> fork COWs cannot throttle

Sure they can. Can't we stick processes in a memcg and set a
memory.high threshold beyond which threads in that cgroup will enter
direct reclaim on page allocations? I'd call that throttling.

> and all apps using fork() risk to hit on x2
> memory usage which can become oom-killer material if the memory size
> of the process is huge.

fork is one of the reasons people use overcommit all the time. I'd
like to see a lot less overcommit in the world.

> On my side, instead of trying to fix whatever issue in

This issue *has* to get fixed one way or another.

  parent reply index

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-12 19:15 [PATCH 0/7] Harden userfaultfd Daniel Colascione
2019-10-12 19:15 ` [PATCH 1/7] Add a new flags-accepting interface for anonymous inodes Daniel Colascione
2019-10-14  4:26   ` kbuild test robot
2019-10-14 15:38   ` Jann Horn
2019-10-14 18:15     ` Daniel Colascione
2019-10-14 18:30       ` Jann Horn
2019-10-15  8:08   ` Christoph Hellwig
2019-10-12 19:15 ` [PATCH 2/7] Add a concept of a "secure" anonymous file Daniel Colascione
2019-10-14  3:01   ` kbuild test robot
2019-10-15  8:08   ` Christoph Hellwig
2019-10-12 19:15 ` [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API Daniel Colascione
2019-10-12 23:10   ` Andy Lutomirski
2019-10-13  0:51     ` Daniel Colascione
2019-10-13  1:14       ` Andy Lutomirski
2019-10-13  1:38         ` Daniel Colascione
2019-10-14 16:04         ` Jann Horn
2019-10-23 19:09           ` Andrea Arcangeli
2019-10-23 19:21             ` Andy Lutomirski
2019-10-23 21:16               ` Andrea Arcangeli
2019-10-23 21:25                 ` Andy Lutomirski
2019-10-23 22:41                   ` Andrea Arcangeli
2019-10-23 23:01                     ` Andy Lutomirski
2019-10-23 23:27                       ` Andrea Arcangeli
2019-10-23 20:05             ` Daniel Colascione [this message]
2019-10-24  0:23               ` Andrea Arcangeli
2019-10-23 20:15             ` Linus Torvalds
2019-10-24  9:02             ` Mike Rapoport
2019-10-24 15:10               ` Andrea Arcangeli
2019-10-25 20:12                 ` Mike Rapoport
2019-10-22 21:27         ` Daniel Colascione
2019-10-23  4:11         ` Andy Lutomirski
2019-10-23  7:29           ` Cyrill Gorcunov
2019-10-23 12:43             ` Mike Rapoport
2019-10-23 17:13               ` Andy Lutomirski
2019-10-12 19:15 ` [PATCH 4/7] Teach SELinux about a new userfaultfd class Daniel Colascione
2019-10-12 23:08   ` Andy Lutomirski
2019-10-13  0:11     ` Daniel Colascione
2019-10-13  0:46       ` Andy Lutomirski
2019-10-12 19:16 ` [PATCH 5/7] Let userfaultfd opt out of handling kernel-mode faults Daniel Colascione
2019-10-12 19:16 ` [PATCH 6/7] Allow users to require UFFD_SECURE Daniel Colascione
2019-10-12 23:12   ` Andy Lutomirski
2019-10-12 19:16 ` [PATCH 7/7] Add a new sysctl for limiting userfaultfd to user mode faults Daniel Colascione
2019-10-16  0:02 ` [PATCH 0/7] Harden userfaultfd James Morris
2019-11-15 15:09 ` Stephen Smalley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='' \ \ \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-api Archive on

Archives are clonable:
	git clone --mirror linux-api/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-api linux-api/ \
	public-inbox-index linux-api

Example config snippet for mirrors

Newsgroup available over NNTP:

AGPL code for this site: git clone