linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] clone: add CLONE_PIDFD
@ 2019-04-14 20:14 Christian Brauner
  2019-04-14 20:14 ` [PATCH 1/4] Make anon_inodes unconditional Christian Brauner
                   ` (5 more replies)
  0 siblings, 6 replies; 52+ messages in thread
From: Christian Brauner @ 2019-04-14 20:14 UTC (permalink / raw)
  To: torvalds, viro, jannh, dhowells, linux-api, linux-kernel
  Cc: serge, luto, arnd, ebiederm, keescook, tglx, mtk.manpages, akpm,
	oleg, cyphar, joel, dancol, Christian Brauner

Hey Linus,

This patchset makes it possible to retrieve pid file descriptors at
process creation time by introducing the new flag CLONE_PIDFD to the
clone() system call as previously discussed.

As decided last week [1] Jann and I have refined the implementation of
pidfds as anonymous inodes. Based on last weeks RFC we have only tweaked
documentation and naming, as well as making the sample program how to
get easy metadata access from a pidfd a little cleaner and more paranoid
when checking for errors.
The sample program can also serve as a test for the patchset.

When clone is called with CLONE_PIDFD a pidfd instead of a pid will be
returned. To make it possible for users of CLONE_PIDFD to apply standard
error checking that is common all across userspace, file descriptor
numbering for pidfds starts at 1 and not 0. This has the major advantage
that users can do:

int pidfd = clone(CLONE_PIDFD);
if (pidfd < 0) {
       /* handle error */
       exit(EXIT_FAILURE):
}

if (pidfd == 0) {
       /* child */
       exit(EXIT_SUCCESS);
}

/* parent */
exit(EXIT_SUCCESS);

We have also taken care that pidfds are created *after* the fd table has
been unshared to not leak pidfds into child processes.
pidfd creation during clone is split into two distinct steps:
1. preparing both an fd and a file referencing struct pid for fd_install()
2. fd_install()ing the pidfd
Step 1. is performed before clone's point of no return and especially
before write_lock_irq(&tasklist_lock) is taken.
Performing 1. before clone's point of no return ensures that we don't
need to fail a process that is already visible to userspace when pidfd
creation fails. Step 2. happens after attach_pid() is performed and the
process is visible to userspace.
Technically, we could have also performed step 1. and 2. together before
clone's point of no return and then calling close on the file descriptor
on failure. This would slightly increase code-locality but it is
semantically more correct and clean to bring the pidfd into existence
once the process is fully attached and not before.

The actual code for CLONE_PIDFD in patch 2 is completely confined to
fork.c (apart from the CLONE_PIDFD definition of course) and is rather
small and hopefully good to review. 

The additional changes listed under David's name in the diffstat below
are here to make anon_inodes available unconditionally. They are needed
for the new mount api and thus for core vfs code in addition to pidfds.
David knows this and he has informed Al that this patch is sent out
here. The changes themselves are rather automatic.

As promised I have also contacted Joel who has sent a patchset to make
pidfds pollable. He has been informed and is happy to port his patchset
once we have moved forward [2].
Jann and I currently plan to target this patchset for inclusion in the 5.2
merge window.

Thanks!
Jann & Christian

[1]: https://lore.kernel.org/lkml/CAHk-=wifyY+XGNW=ZC4MyTHD14w81F8JjQNH-GaGAm2RxZ_S8Q@mail.gmail.com/
[2]: https://lore.kernel.org/lkml/20190411200059.GA75190@google.com/

Christian Brauner (3):
  clone: add CLONE_PIDFD
  signal: support CLONE_PIDFD with pidfd_send_signal
  samples: show race-free pidfd metadata access

David Howells (1):
  Make anon_inodes unconditional

 arch/arm/kvm/Kconfig           |   1 -
 arch/arm64/kvm/Kconfig         |   1 -
 arch/mips/kvm/Kconfig          |   1 -
 arch/powerpc/kvm/Kconfig       |   1 -
 arch/s390/kvm/Kconfig          |   1 -
 arch/x86/Kconfig               |   1 -
 arch/x86/kvm/Kconfig           |   1 -
 drivers/base/Kconfig           |   1 -
 drivers/char/tpm/Kconfig       |   1 -
 drivers/dma-buf/Kconfig        |   1 -
 drivers/gpio/Kconfig           |   1 -
 drivers/iio/Kconfig            |   1 -
 drivers/infiniband/Kconfig     |   1 -
 drivers/vfio/Kconfig           |   1 -
 fs/Makefile                    |   2 +-
 fs/notify/fanotify/Kconfig     |   1 -
 fs/notify/inotify/Kconfig      |   1 -
 include/linux/pid.h            |   2 +
 include/uapi/linux/sched.h     |   1 +
 init/Kconfig                   |  10 --
 kernel/fork.c                  | 117 +++++++++++++++++++++-
 kernel/signal.c                |  14 ++-
 kernel/sys_ni.c                |   3 -
 samples/Makefile               |   2 +-
 samples/pidfd/Makefile         |   6 ++
 samples/pidfd/pidfd-metadata.c | 172 +++++++++++++++++++++++++++++++++
 26 files changed, 305 insertions(+), 40 deletions(-)
 create mode 100644 samples/pidfd/Makefile
 create mode 100644 samples/pidfd/pidfd-metadata.c

-- 
2.21.0


^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2019-05-05  2:32 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-14 20:14 [PATCH 0/4] clone: add CLONE_PIDFD Christian Brauner
2019-04-14 20:14 ` [PATCH 1/4] Make anon_inodes unconditional Christian Brauner
2019-04-14 20:14 ` [PATCH 2/4] clone: add CLONE_PIDFD Christian Brauner
2019-04-15 10:52   ` Oleg Nesterov
2019-04-15 11:42     ` Christian Brauner
2019-04-15 13:24       ` Oleg Nesterov
2019-04-15 13:52         ` Christian Brauner
2019-04-15 16:25           ` Joel Fernandes
2019-04-15 17:15         ` Jonathan Kowalski
2019-04-15 19:39           ` Daniel Colascione
2019-04-14 20:14 ` [PATCH 3/4] signal: support CLONE_PIDFD with pidfd_send_signal Christian Brauner
2019-04-14 20:14 ` [PATCH 4/4] samples: show race-free pidfd metadata access Christian Brauner
2019-04-15 10:08 ` RFC: on adding new CLONE_* flags [WAS Re: [PATCH 0/4] clone: add CLONE_PIDFD] Enrico Weigelt, metux IT consult
2019-04-15 15:50   ` Serge E. Hallyn
2019-04-16 18:32     ` Enrico Weigelt, metux IT consult
2019-04-29 15:49       ` Serge E. Hallyn
2019-04-29 17:31         ` Enrico Weigelt, metux IT consult
2019-05-05  2:32           ` Serge E. Hallyn
2019-04-15 19:59   ` Aleksa Sarai
2019-04-15 20:29     ` Andy Lutomirski
2019-04-15 21:27       ` Jonathan Kowalski
2019-04-15 23:58         ` Andy Lutomirski
2019-04-16 18:45       ` Enrico Weigelt, metux IT consult
2019-04-16 21:31         ` Andy Lutomirski
2019-04-17 12:03           ` Enrico Weigelt, metux IT consult
2019-04-17 12:54             ` Christian Brauner
2019-04-18 15:46               ` Enrico Weigelt, metux IT consult
2019-04-17 12:19       ` Florian Weimer
2019-04-17 16:46         ` Andy Lutomirski
2019-04-20  7:14       ` Kevin Easton
2019-04-20 11:15         ` Christian Brauner
2019-04-20 15:06         ` Daniel Colascione
2019-04-29 19:30         ` Jann Horn
2019-04-29 19:55           ` Jann Horn
2019-04-29 20:21             ` Linus Torvalds
2019-04-29 20:38               ` Florian Weimer
2019-04-29 20:51                 ` Christian Brauner
2019-04-29 21:31                 ` Linus Torvalds
2019-04-30  7:01                   ` Florian Weimer
2019-04-30  0:38               ` Jann Horn
2019-04-30  2:16                 ` Linus Torvalds
2019-04-30  8:21                   ` Florian Weimer
2019-04-30 16:19                     ` Linus Torvalds
2019-04-30 16:26                       ` Linus Torvalds
2019-04-30 17:07                         ` Florian Weimer
2019-04-30 12:39               ` Oleg Nesterov
2019-04-30 16:24                 ` Linus Torvalds
2019-04-29 20:49             ` Florian Weimer
2019-04-29 20:52               ` Christian Brauner
2019-04-20 15:28       ` Al Viro
2019-04-16 18:37     ` Enrico Weigelt, metux IT consult
2019-04-15 10:16 ` [PATCH 0/4] clone: add CLONE_PIDFD Enrico Weigelt, metux IT consult

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).