LKML Archive on
 help / color / Atom feed
From: Andrea Arcangeli <>
To: Andy Lutomirski <>
Cc: Jann Horn <>,
	Daniel Colascione <>,
	Linus Torvalds <>,
	Pavel Emelyanov <>,
	Lokesh Gidra <>,
	Nick Kralevich <>, Nosh Minwalla <>,
	Tim Murray <>,
	Mike Rapoport <>,
	Linux API <>,
	LKML <>,
	"Dr. David Alan Gilbert" <>
Subject: Re: [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API.
Date: Wed, 23 Oct 2019 17:16:45 -0400
Message-ID: <> (raw)
In-Reply-To: <>

On Wed, Oct 23, 2019 at 12:21:18PM -0700, Andy Lutomirski wrote:
> There are two things going on here.
> 1. Daniel wants to add LSM labels to userfaultfd objects.  This seems
> reasonable to me.  The question, as I understand it, is: who is the
> subject that creates a uffd referring to a forked child?  I'm sure
> this is solvable in any number of straightforward ways, but I think
> it's less important than:

The new uffd created during fork would definitely need to be accounted
on the criu monitor, nor to the parent nor the child, so it'd need to
be accounted to the process/context that has the fd in its file
descriptors array. But since this is less important let's ignore this
for a second.

> 2. The existing ABI is busted independently of #1.  Suppose you call
> userfaultfd to get a userfaultfd and enable UFFD_FEATURE_EVENT_FORK.
> Then you do:
> $ sudo <&[userfaultfd number]
> Sudo will read it and get a new fd unexpectedly added to its fd table.
> It's worse if SCM_RIGHTS is involved.

So the problem is just that a new fd is created. So for this to turn
out to a practical issue, it requires finding a reckless suid that
won't even bother checking the return value of the open/socket
syscalls or some equivalent fd number related side effect. All right
that makes more sense now and of course I agree it needs fixing.

> So I think we either need to declare that UFFD_FEATURE_EVENT_FORK is
> only usable by global root or we need to remove it and maybe re-add it
> in some other form.

If I had a time machine, I'd rather prefer to do the below:

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index fe6d804a38dc..574062051678 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1958,7 +1958,7 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
 		return -ENOMEM;
 	refcount_set(&ctx->refcount, 1);
-	ctx->flags = flags;
+	ctx->flags = flags | UFFD_CLOEXEC;
 	ctx->features = 0;
 	ctx->state = UFFD_STATE_WAIT_API;
 	ctx->released = false;

I mean there's no strong requirement to allow any uffd to survive exec
even if UFFD_FEATURE_EVENT_FORK never existed, it's enough if it can
be passed through unix domain sockets.

Until UFFD_FEATURE_EVENT_FORK come around, there was no particular
reason to implicitly enforce O_CLOEXEC on all uffd, it was totally
possible to clone() and exec() to pass the fd to a different
process. So it never rang a bell that this would turn out to be a
problem after UFFD_FEATURE_EVENT_FORK was introduced.

There are various ways to approach this:

1) drop all non cooperative features and mark their feature bitflags
   reserved (no ABI break)

2) enforce UFFD_CLOEXEC with above patch (potential ABI break all
   userfaultfd users)

3) enforce UFFD_CLOEXEC if UFFD_FEATURE_EVENT_FORK is set (ABI break
   only if UFFD_FEATURE_EVENT_FORK is set). Note all forked uffd
   are opened with the same flags inherited from the original uffd.

4) enforce the global root permission check when creating the uffd only if

5) drop all non cooperative features from API 0xaa and introduce API
   0xab with the features back, but with UFFD_CLOEXEC implicitly
   enforced and with UFFD_CLOEXEC forbidden to be set in the flags

6) stick to API 0xaa and drop only UFFD_FEATURE_EVENT_FORK, but add a
   UFFD_FEATURE_EVENT_FORK2 that requires UFFD_CLOEXEC to be set
   (instead of implicitly enforcing it)

7) stick to API 0xaa and drop only UFFD_FEATURE_EVENT_FORK, but add a
   UFFD_FEATURE_EVENT_FORK2 that does the global root permission check 

5 is the non-ABI-break version of 2.

6 is the non-ABI-break version of 3.

7 is the non-ABI-break version of 4.

My favorite is 1) for the reason explained in the previous email.

However if postcopy live migration of bare metal containers already
runs in production anywhere or is at least very close to reach that
milestone or if the non-cooperative features are used in production in
any other way, we'd like to know where and in such case that will
totally change my mind about it. In such case I'd suggest to pick any
of the other options except 1).

In short there shall be good reason for going through further
maintenance burden.


  reply index

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-12 19:15 [PATCH 0/7] Harden userfaultfd Daniel Colascione
2019-10-12 19:15 ` [PATCH 1/7] Add a new flags-accepting interface for anonymous inodes Daniel Colascione
2019-10-14  4:26   ` kbuild test robot
2019-10-14 15:38   ` Jann Horn
2019-10-14 18:15     ` Daniel Colascione
2019-10-14 18:30       ` Jann Horn
2019-10-15  8:08   ` Christoph Hellwig
2019-10-12 19:15 ` [PATCH 2/7] Add a concept of a "secure" anonymous file Daniel Colascione
2019-10-14  3:01   ` kbuild test robot
2019-10-15  8:08   ` Christoph Hellwig
2019-10-12 19:15 ` [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API Daniel Colascione
2019-10-12 23:10   ` Andy Lutomirski
2019-10-13  0:51     ` Daniel Colascione
2019-10-13  1:14       ` Andy Lutomirski
2019-10-13  1:38         ` Daniel Colascione
2019-10-14 16:04         ` Jann Horn
2019-10-23 19:09           ` Andrea Arcangeli
2019-10-23 19:21             ` Andy Lutomirski
2019-10-23 21:16               ` Andrea Arcangeli [this message]
2019-10-23 21:25                 ` Andy Lutomirski
2019-10-23 22:41                   ` Andrea Arcangeli
2019-10-23 23:01                     ` Andy Lutomirski
2019-10-23 23:27                       ` Andrea Arcangeli
2019-10-23 20:05             ` Daniel Colascione
2019-10-24  0:23               ` Andrea Arcangeli
2019-10-23 20:15             ` Linus Torvalds
2019-10-24  9:02             ` Mike Rapoport
2019-10-24 15:10               ` Andrea Arcangeli
2019-10-25 20:12                 ` Mike Rapoport
2019-10-22 21:27         ` Daniel Colascione
2019-10-23  4:11         ` Andy Lutomirski
2019-10-23  7:29           ` Cyrill Gorcunov
2019-10-23 12:43             ` Mike Rapoport
2019-10-23 17:13               ` Andy Lutomirski
2019-10-12 19:15 ` [PATCH 4/7] Teach SELinux about a new userfaultfd class Daniel Colascione
2019-10-12 23:08   ` Andy Lutomirski
2019-10-13  0:11     ` Daniel Colascione
2019-10-13  0:46       ` Andy Lutomirski
2019-10-12 19:16 ` [PATCH 5/7] Let userfaultfd opt out of handling kernel-mode faults Daniel Colascione
2019-10-12 19:16 ` [PATCH 6/7] Allow users to require UFFD_SECURE Daniel Colascione
2019-10-12 23:12   ` Andy Lutomirski
2019-10-12 19:16 ` [PATCH 7/7] Add a new sysctl for limiting userfaultfd to user mode faults Daniel Colascione
2019-10-16  0:02 ` [PATCH 0/7] Harden userfaultfd James Morris
2019-11-15 15:09 ` Stephen Smalley

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on

Archives are clonable:
	git clone --mirror lkml/git/0.git
	git clone --mirror lkml/git/1.git
	git clone --mirror lkml/git/2.git
	git clone --mirror lkml/git/3.git
	git clone --mirror lkml/git/4.git
	git clone --mirror lkml/git/5.git
	git clone --mirror lkml/git/6.git
	git clone --mirror lkml/git/7.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ \
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:

AGPL code for this site: git clone