linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christian Brauner <christian@brauner.io>
To: jannh@google.com, khlebnikov@yandex-team.ru, luto@kernel.org,
	dhowells@redhat.com, serge@hallyn.com, ebiederm@xmission.com,
	linux-api@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: arnd@arndb.de, keescook@chromium.org, adobriyan@gmail.com,
	tglx@linutronix.de, mtk.manpages@gmail.com, bl0pbl33p@gmail.com,
	ldv@altlinux.org, akpm@linux-foundation.org, oleg@redhat.com,
	nagarathnam.muthusamy@oracle.com, cyphar@cyphar.com,
	viro@zeniv.linux.org.uk, joel@joelfernandes.org,
	dancol@google.com, Christian Brauner <christian@brauner.io>
Subject: [PATCH v1 0/4] pid: add pidctl()
Date: Tue, 26 Mar 2019 16:55:09 +0100	[thread overview]
Message-ID: <20190326155513.26964-1-christian@brauner.io> (raw)

This is v1 of this patchset with various minor fixes which are listed in
the individual commits. Notably, pidfds are now O_CLOEXEC by default.

The pidctl() syscalls builds on, extends, and improves translate_pid()
[4] and serves as the natural connection between the pid-based and the
pidfd-based api.

I quote Konstantins original patchset first that has already been acked
and picked up by Eric before and whose functionality is preserved in
this syscall. Multiple people have asked when this patchset will be sent
in for merging (cf. [1], [2]). It has recently been revived by
Nagarathnam Muthusamy from Oracle [3].

The intention of the original translate_pid() syscall was twofold:
1. Provide translation of pids between pid namespaces especially for the
   case of deeply nested pid namespaces.
   The most obvious use-case is strace which has been waiting for this
   feature for a while.
2. Provide implicit pid namespace introspection

Both functionalities are preserved. The latter task has been improved
upon though. In the original version of the pachset passing pid as 1
would allow to deterimine the relationship between the pid namespaces.
This is inherhently racy. If pid 1 inside a pid namespace has died it
would report false negatives. For example, if pid 1 inside of the target
pid namespace already died, it would report that the target pid
namespace cannot be reached from the source pid namespace because it
couldn't find the pid inside of the target pid namespace and thus
falsely report to the user that the two pid namespaces are not related.
This problem is simple to avoid. In the new version we simply walk the
list of ancestors and check whether the namespace are related to each
other. By doing it this way we can reliably report what the relationship
between two pid namespace file descriptors looks like.

Additionally, this syscall has been extended to allow the retrieval of
pidfds independent of procfs. These pidfds can e.g. be used with the new
pidfd_send_signal() syscall we recently merged. The ability to retrieve
pidfds independent of procfs had already been requested in the
pidfd_send_signal patchset by e.g. Andrew [4] and later again by Alexey
[5]. A use-case where a kernel is compiled without procfs but where
pidfds are still useful has been outlined by Andy in [6]. Regular
anon-inode based file descriptors are used that stash a reference to
struct pid in file->private_data and drop that reference on close.

With this pidctl() has three closely related functionalities that
provide a natural connection between the pid-based and the pidfd-based
api. To clarify the semantics and to make it easier for userspace to use
the syscall it has a command argument and three commands clearly
reflecting the functionalities (PIDCMD_QUERY_PID, PIDCMD_QUERY_PIDNS,
PIDCMD_GET_PIDFD).

Embedding the retrieval of pidfds into this syscall has two main
advantages:
- pidctl provides a natural and clean connection between the traditional
  pid-based and the newer pidfd-based process API
- allows the retrieval of pidfds for other pid namespaces while
  enforcing that
  - the caller must have been given access to two file descriptors
    referring to target and source pid namespace
  - the source pid namespace must be an ancestor of the target pid
    namespace
  - the pid must be translatable from the source pid namespace into the
    target pid namespace

Note that this patchset also includes Al's and David's commit to make anon
inodes unconditional. The original intention is to make it possible to use
anon inodes in core vfs functions. pidctl() has the same requirement so
David suggested I sent this in alongside this patch. Both are informed of
this.

The syscall comes with extensive testing for all functionalities.

/* References */
[1]: https://lore.kernel.org/lkml/37b17950-b130-7933-99a1-4846c61c8555@oracle.com/
[2]: https://lore.kernel.org/lkml/20181109034919.GA21681@altlinux.org/
[3]: https://lore.kernel.org/lkml/37b17950-b130-7933-99a1-4846c61c8555@oracle.com/
[4]: 3eb39f47934f9d5a3027fe00d906a45fe3a15fad
[5]: https://lore.kernel.org/lkml/20190320203910.GA2842@avx2/
[6]: https://lore.kernel.org/lkml/CALCETrXO=V=+qEdLDVPf8eCgLZiB9bOTrUfe0V-U-tUZoeoRDA@mail.gmail.com/

Thanks!
Christian

Christian Brauner (3):
  pid: add pidctl()
  signal: support pidctl() with pidfd_send_signal()
  tests: add pidctl() tests

David Howells (1):
  Make anon_inodes unconditional

 arch/arm/kvm/Kconfig                        |   1 -
 arch/arm64/kvm/Kconfig                      |   1 -
 arch/mips/kvm/Kconfig                       |   1 -
 arch/powerpc/kvm/Kconfig                    |   1 -
 arch/s390/kvm/Kconfig                       |   1 -
 arch/x86/Kconfig                            |   1 -
 arch/x86/entry/syscalls/syscall_32.tbl      |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl      |   1 +
 arch/x86/kvm/Kconfig                        |   1 -
 drivers/base/Kconfig                        |   1 -
 drivers/char/tpm/Kconfig                    |   1 -
 drivers/dma-buf/Kconfig                     |   1 -
 drivers/gpio/Kconfig                        |   1 -
 drivers/iio/Kconfig                         |   1 -
 drivers/infiniband/Kconfig                  |   1 -
 drivers/vfio/Kconfig                        |   1 -
 fs/Makefile                                 |   2 +-
 fs/notify/fanotify/Kconfig                  |   1 -
 fs/notify/inotify/Kconfig                   |   1 -
 include/linux/pid.h                         |   2 +
 include/linux/pid_namespace.h               |   8 +
 include/linux/syscalls.h                    |   2 +
 include/uapi/linux/wait.h                   |  14 +
 init/Kconfig                                |  10 -
 kernel/pid.c                                | 161 ++++++
 kernel/pid_namespace.c                      |  25 +
 kernel/signal.c                             |  29 +-
 kernel/sys_ni.c                             |   3 -
 tools/testing/selftests/pidfd/Makefile      |   2 +-
 tools/testing/selftests/pidfd/pidctl_test.c | 537 ++++++++++++++++++++
 30 files changed, 765 insertions(+), 48 deletions(-)
 create mode 100644 tools/testing/selftests/pidfd/pidctl_test.c

-- 
2.21.0


             reply	other threads:[~2019-03-26 15:55 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-26 15:55 Christian Brauner [this message]
2019-03-26 15:55 ` [PATCH v1 1/4] Make anon_inodes unconditional Christian Brauner
2019-03-26 15:55 ` [PATCH v1 2/4] pid: add pidctl() Christian Brauner
2019-03-26 16:17   ` Daniel Colascione
2019-03-26 16:23     ` Christian Brauner
2019-03-26 16:31       ` Christian Brauner
2019-03-26 16:34         ` Christian Brauner
2019-03-26 16:38           ` Daniel Colascione
2019-03-26 16:43             ` Christian Brauner
2019-03-26 16:50               ` Daniel Colascione
2019-03-26 17:05                 ` Christian Brauner
2019-03-26 16:42           ` Andy Lutomirski
2019-03-26 16:46             ` Christian Brauner
2019-03-26 17:00               ` Daniel Colascione
2019-03-26 16:33       ` Daniel Colascione
2019-03-26 17:06   ` Joel Fernandes
2019-03-26 17:08     ` Christian Brauner
2019-03-26 17:15       ` Joel Fernandes
2019-03-26 17:17         ` Christian Brauner
2019-03-26 17:18           ` Joel Fernandes
2019-03-26 17:22     ` Christian Brauner
2019-03-26 18:10       ` Joel Fernandes
2019-03-26 18:19         ` Christian Brauner
2019-03-26 18:47           ` Joel Fernandes
2019-03-26 19:19         ` Christian Brauner
2019-03-26 19:51           ` Joel Fernandes
2019-03-26 19:57             ` Christian Brauner
2019-03-26 15:55 ` [PATCH v1 3/4] signal: support pidctl() with pidfd_send_signal() Christian Brauner
2019-03-26 15:55 ` [PATCH v1 4/4] tests: add pidctl() tests Christian Brauner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190326155513.26964-1-christian@brauner.io \
    --to=christian@brauner.io \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=bl0pbl33p@gmail.com \
    --cc=cyphar@cyphar.com \
    --cc=dancol@google.com \
    --cc=dhowells@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=jannh@google.com \
    --cc=joel@joelfernandes.org \
    --cc=keescook@chromium.org \
    --cc=khlebnikov@yandex-team.ru \
    --cc=ldv@altlinux.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mtk.manpages@gmail.com \
    --cc=nagarathnam.muthusamy@oracle.com \
    --cc=oleg@redhat.com \
    --cc=serge@hallyn.com \
    --cc=tglx@linutronix.de \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).