linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christian Brauner <christian@brauner.io>
To: torvalds@linux-foundation.org, viro@zeniv.linux.org.uk,
	jannh@google.com, dhowells@redhat.com, linux-api@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: serge@hallyn.com, luto@kernel.org, arnd@arndb.de,
	ebiederm@xmission.com, keescook@chromium.org,
	adobriyan@gmail.com, tglx@linutronix.de, mtk.manpages@gmail.com,
	bl0pbl33p@gmail.com, ldv@altlinux.org, akpm@linux-foundation.org,
	oleg@redhat.com, cyphar@cyphar.com, joel@joelfernandes.org,
	dancol@google.com, Christian Brauner <christian@brauner.io>
Subject: [RFC PATCH] fork: add CLONE_PIDFD
Date: Thu, 11 Apr 2019 01:40:40 +0200	[thread overview]
Message-ID: <20190410234045.29846-1-christian@brauner.io> (raw)

Hey Linus,

This is an RFC for adding a new CLONE_PIDFD flag to clone() as
previously discussed.
While implementing this Jann and I ran into additional complexity that
prompted us to send out an initial RFC patchset to make sure we still
think going forward with the current implementation is a good idea and
also provide an alternative approach:

RFC-1:
This is an RFC for the implementation of pidfds as /proc/<pid> file
descriptors.
The tricky part here is that we need to retrieve a file descriptor for
/proc/<pid> before clone's point of no return. Otherwise, we need to fail
the creation of a process that has already passed all barriers and is
visible in userspace. Getting that file descriptor then becomes a rather
intricate dance including allocating a detached dentry that we need to
commit once attach_pid() has been called.
Note that this RFC only includes the logic we think is needed to return
/proc/<pid> file descriptors from clone. It does *not* yet include the even
more complex logic needed to restrict procfs itself. And the additional
logic needed to prevent attacks such as openat(pidfd, "..", ...) and access
to /proc/<pid>/net/ on top of the procfs restriction.

There are a couple of reasons why we stopped short of this and decided to
sent out an RFC first:
- Even the initial part of getting file descriptors from /proc/<pid> out
  of clone() required rather complex code that struck us as very
  inelegant and heavy (which granted, might partially caused by not seeing
  a cleaner way to implement this). Thus, it felt like we needed to see
  whether this is even remotely considered acceptable.
- While discussing further aspects of this approach with Al we received
  rather substantiated opposition to exposing even more codepaths to
  procfs.
- Restricting access to procfs properly requires a lot of invasive work
  even touching core vfs functions such as
  follow_dotdot()/follow_dotdot_rcu() which also caused 2.

RFC-2:
This alternative patchset uses anonymous file descriptors instead of
file descriptors from /proc/<pid>.
A pidfd in this style comes with additional information in fdinfo: the pid
of the process it refers to in the current pid namespace (Pid: %d).

We have chosen to implement this alternative to illustrate how
strikingly simple this patchset is in comparision to the original
approach.
To remove worries about missing metadata access we have written a POC
that illustrates how a combination of CLONE_PIDFD, fdinfo, and
pidfd_send_signal() can be used to gain race-free access to process
metadata through /proc/<pid>. The sample program can easily be
translated into a helper that would be suitable for inclusion in libc so
that users don't have to worry about writing it themselves.

Thanks!
Jann & Christian

RFC-1:
Christian Brauner (1):
  fork: add CLONE_PIDFD via /proc/<pid>

 fs/proc/base.c             | 130 ++++++++++++++++++++++++++++++++++---
 fs/proc/fd.c               |   4 +-
 fs/proc/internal.h         |   2 +-
 fs/proc/namespaces.c       |   2 +-
 include/linux/proc_fs.h    |  19 ++++++
 include/uapi/linux/sched.h |   1 +
 kernel/fork.c              |  40 ++++++++++--
 7 files changed, 180 insertions(+), 18 deletions(-)

RFC-2:
Christian Brauner (3):
  fork: add CLONE_PIDFD via anonymous inode
  signal: support CLONE_PIDFD with pidfd_send_signal
  samples: show race-free pidfd metadata access

David Howells (1):
  Make anon_inodes unconditional

 arch/arm/kvm/Kconfig           |   1 -
 arch/arm64/kvm/Kconfig         |   1 -
 arch/mips/kvm/Kconfig          |   1 -
 arch/powerpc/kvm/Kconfig       |   1 -
 arch/s390/kvm/Kconfig          |   1 -
 arch/x86/Kconfig               |   1 -
 arch/x86/kvm/Kconfig           |   1 -
 drivers/base/Kconfig           |   1 -
 drivers/char/tpm/Kconfig       |   1 -
 drivers/dma-buf/Kconfig        |   1 -
 drivers/gpio/Kconfig           |   1 -
 drivers/iio/Kconfig            |   1 -
 drivers/infiniband/Kconfig     |   1 -
 drivers/vfio/Kconfig           |   1 -
 fs/Makefile                    |   2 +-
 fs/notify/fanotify/Kconfig     |   1 -
 fs/notify/inotify/Kconfig      |   1 -
 include/linux/pid.h            |   2 +
 include/uapi/linux/sched.h     |   1 +
 init/Kconfig                   |  10 --
 kernel/fork.c                  |  94 +++++++++++++++++-
 kernel/signal.c                |  14 ++-
 kernel/sys_ni.c                |   3 -
 samples/Makefile               |   2 +-
 samples/pidfd/Makefile         |   6 ++
 samples/pidfd/pidfd-metadata.c | 169 +++++++++++++++++++++++++++++++++
 26 files changed, 279 insertions(+), 40 deletions(-)
 create mode 100644 samples/pidfd/Makefile
 create mode 100644 samples/pidfd/pidfd-metadata.c

-- 
2.21.0


             reply	other threads:[~2019-04-10 23:43 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-10 23:40 Christian Brauner [this message]
2019-04-10 23:40 ` [RFC-1 PATCH 1/1] fork: add CLONE_PIDFD via /proc/<pid> Christian Brauner
2019-04-10 23:40 ` [RFC-2 PATCH 1/4] Make anon_inodes unconditional Christian Brauner
2019-04-10 23:40 ` [RFC-2 PATCH 2/4] fork: add CLONE_PIDFD via anonymous inode Christian Brauner
2019-04-10 23:40 ` [RFC-2 PATCH 3/4] signal: support CLONE_PIDFD with pidfd_send_signal Christian Brauner
2019-04-10 23:40 ` [RFC-2 PATCH 4/4] samples: show race-free pidfd metadata access Christian Brauner
2019-04-11  0:08   ` Daniel Colascione
2019-04-11  0:12 ` [RFC PATCH] fork: add CLONE_PIDFD Daniel Colascione
2019-04-11 16:50 ` Linus Torvalds
2019-04-11 18:09   ` Christian Brauner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190410234045.29846-1-christian@brauner.io \
    --to=christian@brauner.io \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=bl0pbl33p@gmail.com \
    --cc=cyphar@cyphar.com \
    --cc=dancol@google.com \
    --cc=dhowells@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=jannh@google.com \
    --cc=joel@joelfernandes.org \
    --cc=keescook@chromium.org \
    --cc=ldv@altlinux.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mtk.manpages@gmail.com \
    --cc=oleg@redhat.com \
    --cc=serge@hallyn.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).