All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Colascione <dancol@google.com>
To: Christian Brauner <christian@brauner.io>
Cc: Jann Horn <jannh@google.com>,
	khlebnikov@yandex-team.ru, Andy Lutomirski <luto@kernel.org>,
	David Howells <dhowells@redhat.com>,
	"Serge E. Hallyn" <serge@hallyn.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Linux API <linux-api@vger.kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Arnd Bergmann <arnd@arndb.de>, Kees Cook <keescook@chromium.org>,
	Alexey Dobriyan <adobriyan@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Michael Kerrisk-manpages <mtk.manpages@gmail.com>,
	bl0pbl33p@gmail.com, "Dmitry V. Levin" <ldv@altlinux.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Oleg Nesterov <oleg@redhat.com>,
	nagarathnam.muthusamy@oracle.com,
	Aleksa Sarai <cyphar@cyphar.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Joel Fernandes <joel@joelfernandes.org>
Subject: Re: [PATCH 0/4] pid: add pidctl()
Date: Mon, 25 Mar 2019 09:48:43 -0700	[thread overview]
Message-ID: <CAKOZueuJSAiKU7fZR2FNDKKCktdyE-sADtWprpsNMAqYQvD9Jw@mail.gmail.com> (raw)
In-Reply-To: <20190325162052.28987-1-christian@brauner.io>

On Mon, Mar 25, 2019 at 9:21 AM Christian Brauner <christian@brauner.io> wrote:
> The pidctl() syscalls builds on, extends, and improves translate_pid() [4].
> I quote Konstantins original patchset first that has already been acked and
> picked up by Eric before and whose functionality is preserved in this
> syscall. Multiple people have asked when this patchset will be sent in
> for merging (cf. [1], [2]). It has recently been revived by Nagarathnam
> Muthusamy from Oracle [3].
>
> The intention of the original translate_pid() syscall was twofold:
> 1. Provide translation of pids between pid namespaces
> 2. Provide implicit pid namespace introspection
>
> Both functionalities are preserved. The latter task has been improved
> upon though. In the original version of the pachset passing pid as 1
> would allow to deterimine the relationship between the pid namespaces.
> This is inherhently racy. If pid 1 inside a pid namespace has died it
> would report false negatives. For example, if pid 1 inside of the target
> pid namespace already died, it would report that the target pid
> namespace cannot be reached from the source pid namespace because it
> couldn't find the pid inside of the target pid namespace and thus
> falsely report to the user that the two pid namespaces are not related.
> This problem is simple to avoid. In the new version we simply walk the
> list of ancestors and check whether the namespace are related to each
> other. By doing it this way we can reliably report what the relationship
> between two pid namespace file descriptors looks like.
>
> Additionally, this syscall has been extended to allow the retrieval of
> pidfds independent of procfs. These pidfds can e.g. be used with the new
> pidfd_send_signal() syscall we recently merged. The ability to retrieve
> pidfds independent of procfs had already been requested in the
> pidfd_send_signal patchset by e.g. Andrew [4] and later again by Alexey
> [5]. A use-case where a kernel is compiled without procfs but where
> pidfds are still useful has been outlined by Andy in [6]. Regular
> anon-inode based file descriptors are used that stash a reference to
> struct pid in file->private_data and drop that reference on close.
>
> With this translate_pid() has three closely related but still distinct
> functionalities. To clarify the semantics and to make it easier for
> userspace to use the syscall it has:
> - gained a command argument and three commands clearly reflecting the
>   distinct functionalities (PIDCMD_QUERY_PID, PIDCMD_QUERY_PIDNS,
>   PIDCMD_GET_PIDFD).
> - been renamed to pidctl()

Having made these changes, you've built a general-purpose command
command multiplexer, not one operation that happens to be flexible.
The general-purpose command multiplexer is a common antipattern:
multiplexers make it hard to talk about different kernel-provided
operations using the common vocabulary we use to distinguish
kernel-related operations, the system call number. socketcall, for
example, turned out to be cumbersome for users like SELinux policy
writers. People had to do work work later to split socketcall into
fine-grained system calls. Please split the pidctl system call so that
the design is clean from the start and we avoid work later. System
calls are cheap.

Also, I'm still confused about how metadata access is supposed to work
for these procfs-less pidfs. If I use PIDCMD_GET_PIDFD on a process,
You snipped out a portion of a previous email in which I asked about
your thoughts on this question. With the PIDCMD_GET_PIDFD command in
place, we have two different kinds of file descriptors for processes,
one derived from procfs and one that's independent. The former works
with openat(2). The latter does not. To be very specific; if I'm
writing a function that accepts a pidfd and I get a pidfd that comes
from PIDCMD_GET_PIDFD, how am I supposed to get the equivalent of
smaps or oom_score_adj or statm for the named process in a race-free
manner?

WARNING: multiple messages have this Message-ID (diff)
From: Daniel Colascione <dancol@google.com>
To: Christian Brauner <christian@brauner.io>
Cc: Jann Horn <jannh@google.com>,
	khlebnikov@yandex-team.ru, Andy Lutomirski <luto@kernel.org>,
	David Howells <dhowells@redhat.com>,
	"Serge E. Hallyn" <serge@hallyn.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Linux API <linux-api@vger.kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Arnd Bergmann <arnd@arndb.de>, Kees Cook <keescook@chromium.org>,
	Alexey Dobriyan <adobriyan@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Michael Kerrisk-manpages <mtk.manpages@gmail.com>,
	bl0pbl33p@gmail.com, "Dmitry V. Levin" <ldv@altlinux.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Oleg Nesterov <oleg@redhat.com>,
	nagarathnam.muthusamy@oracle.com,
	Aleksa Sarai <cyphar@cyphar.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Joel Fernandes <joel@joelfernand>
Subject: Re: [PATCH 0/4] pid: add pidctl()
Date: Mon, 25 Mar 2019 09:48:43 -0700	[thread overview]
Message-ID: <CAKOZueuJSAiKU7fZR2FNDKKCktdyE-sADtWprpsNMAqYQvD9Jw@mail.gmail.com> (raw)
In-Reply-To: <20190325162052.28987-1-christian@brauner.io>

On Mon, Mar 25, 2019 at 9:21 AM Christian Brauner <christian@brauner.io> wrote:
> The pidctl() syscalls builds on, extends, and improves translate_pid() [4].
> I quote Konstantins original patchset first that has already been acked and
> picked up by Eric before and whose functionality is preserved in this
> syscall. Multiple people have asked when this patchset will be sent in
> for merging (cf. [1], [2]). It has recently been revived by Nagarathnam
> Muthusamy from Oracle [3].
>
> The intention of the original translate_pid() syscall was twofold:
> 1. Provide translation of pids between pid namespaces
> 2. Provide implicit pid namespace introspection
>
> Both functionalities are preserved. The latter task has been improved
> upon though. In the original version of the pachset passing pid as 1
> would allow to deterimine the relationship between the pid namespaces.
> This is inherhently racy. If pid 1 inside a pid namespace has died it
> would report false negatives. For example, if pid 1 inside of the target
> pid namespace already died, it would report that the target pid
> namespace cannot be reached from the source pid namespace because it
> couldn't find the pid inside of the target pid namespace and thus
> falsely report to the user that the two pid namespaces are not related.
> This problem is simple to avoid. In the new version we simply walk the
> list of ancestors and check whether the namespace are related to each
> other. By doing it this way we can reliably report what the relationship
> between two pid namespace file descriptors looks like.
>
> Additionally, this syscall has been extended to allow the retrieval of
> pidfds independent of procfs. These pidfds can e.g. be used with the new
> pidfd_send_signal() syscall we recently merged. The ability to retrieve
> pidfds independent of procfs had already been requested in the
> pidfd_send_signal patchset by e.g. Andrew [4] and later again by Alexey
> [5]. A use-case where a kernel is compiled without procfs but where
> pidfds are still useful has been outlined by Andy in [6]. Regular
> anon-inode based file descriptors are used that stash a reference to
> struct pid in file->private_data and drop that reference on close.
>
> With this translate_pid() has three closely related but still distinct
> functionalities. To clarify the semantics and to make it easier for
> userspace to use the syscall it has:
> - gained a command argument and three commands clearly reflecting the
>   distinct functionalities (PIDCMD_QUERY_PID, PIDCMD_QUERY_PIDNS,
>   PIDCMD_GET_PIDFD).
> - been renamed to pidctl()

Having made these changes, you've built a general-purpose command
command multiplexer, not one operation that happens to be flexible.
The general-purpose command multiplexer is a common antipattern:
multiplexers make it hard to talk about different kernel-provided
operations using the common vocabulary we use to distinguish
kernel-related operations, the system call number. socketcall, for
example, turned out to be cumbersome for users like SELinux policy
writers. People had to do work work later to split socketcall into
fine-grained system calls. Please split the pidctl system call so that
the design is clean from the start and we avoid work later. System
calls are cheap.

Also, I'm still confused about how metadata access is supposed to work
for these procfs-less pidfs. If I use PIDCMD_GET_PIDFD on a process,
You snipped out a portion of a previous email in which I asked about
your thoughts on this question. With the PIDCMD_GET_PIDFD command in
place, we have two different kinds of file descriptors for processes,
one derived from procfs and one that's independent. The former works
with openat(2). The latter does not. To be very specific; if I'm
writing a function that accepts a pidfd and I get a pidfd that comes
from PIDCMD_GET_PIDFD, how am I supposed to get the equivalent of
smaps or oom_score_adj or statm for the named process in a race-free
manner?

  parent reply	other threads:[~2019-03-25 16:48 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-25 16:20 [PATCH 0/4] pid: add pidctl() Christian Brauner
2019-03-25 16:20 ` [PATCH 1/4] Make anon_inodes unconditional Christian Brauner
2019-03-25 16:20 ` [PATCH 2/4] pid: add pidctl() Christian Brauner
2019-03-25 17:20   ` Mika Penttilä
2019-03-25 19:59     ` Christian Brauner
2019-03-25 19:59       ` Christian Brauner
2019-03-25 18:18   ` Jann Horn
2019-03-25 18:18     ` Jann Horn
2019-03-25 19:58     ` Christian Brauner
2019-03-25 19:58       ` Christian Brauner
2019-03-26 16:07     ` Joel Fernandes
2019-03-26 16:07       ` Joel Fernandes
2019-03-26 16:15       ` Christian Brauner
2019-03-26 16:15         ` Christian Brauner
2019-03-25 16:20 ` [PATCH 3/4] signal: support pidctl() with pidfd_send_signal() Christian Brauner
2019-03-25 18:28   ` Jonathan Kowalski
2019-03-25 18:28     ` Jonathan Kowalski
2019-03-25 20:05     ` Christian Brauner
2019-03-25 20:05       ` Christian Brauner
2019-03-25 18:39   ` Jann Horn
2019-03-25 18:39     ` Jann Horn
2019-03-25 19:41     ` Christian Brauner
2019-03-25 19:41       ` Christian Brauner
2019-03-25 16:20 ` [PATCH 4/4] tests: add pidctl() tests Christian Brauner
2019-03-25 16:48 ` Daniel Colascione [this message]
2019-03-25 16:48   ` [PATCH 0/4] pid: add pidctl() Daniel Colascione
2019-03-25 17:05   ` Konstantin Khlebnikov
2019-03-25 17:07     ` Daniel Colascione
2019-03-25 17:07       ` Daniel Colascione
2019-03-25 17:36   ` Joel Fernandes
2019-03-25 17:36     ` Joel Fernandes
2019-03-25 17:53     ` Daniel Colascione
2019-03-25 17:53       ` Daniel Colascione
2019-03-25 18:19       ` Jonathan Kowalski
2019-03-25 18:19         ` Jonathan Kowalski
2019-03-25 18:57         ` Daniel Colascione
2019-03-25 18:57           ` Daniel Colascione
2019-03-25 19:42           ` Jonathan Kowalski
2019-03-25 19:42             ` Jonathan Kowalski
2019-03-25 20:14             ` Daniel Colascione
2019-03-25 20:14               ` Daniel Colascione
2019-03-25 20:34               ` Jann Horn
2019-03-25 20:34                 ` Jann Horn
2019-03-25 20:40                 ` Jonathan Kowalski
2019-03-25 20:40                   ` Jonathan Kowalski
2019-03-25 21:14                   ` Jonathan Kowalski
2019-03-25 21:14                     ` Jonathan Kowalski
2019-03-25 21:15                   ` Jann Horn
2019-03-25 21:15                     ` Jann Horn
2019-03-25 20:40                 ` Christian Brauner
2019-03-25 20:40                   ` Christian Brauner
2019-03-25 20:15     ` Christian Brauner
2019-03-25 20:15       ` Christian Brauner
2019-03-25 21:11       ` Joel Fernandes
2019-03-25 21:11         ` Joel Fernandes
2019-03-25 21:17         ` Daniel Colascione
2019-03-25 21:17           ` Daniel Colascione
2019-03-25 21:19         ` Jann Horn
2019-03-25 21:19           ` Jann Horn
2019-03-25 21:43           ` Joel Fernandes
2019-03-25 21:43             ` Joel Fernandes
2019-03-25 21:54             ` Jonathan Kowalski
2019-03-25 21:54               ` Jonathan Kowalski
2019-03-25 22:07               ` Daniel Colascione
2019-03-25 22:07                 ` Daniel Colascione
2019-03-25 22:37                 ` Jonathan Kowalski
2019-03-25 22:37                   ` Jonathan Kowalski
2019-03-25 23:14                   ` Daniel Colascione
2019-03-25 23:14                     ` Daniel Colascione
2019-03-26  3:03               ` Joel Fernandes
2019-03-26  3:03                 ` Joel Fernandes
2019-03-25 16:56 ` David Howells
2019-03-25 16:56   ` David Howells
2019-03-25 16:58   ` Daniel Colascione
2019-03-25 16:58     ` Daniel Colascione
2019-03-25 23:39   ` Andy Lutomirski
2019-03-25 23:39     ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKOZueuJSAiKU7fZR2FNDKKCktdyE-sADtWprpsNMAqYQvD9Jw@mail.gmail.com \
    --to=dancol@google.com \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=bl0pbl33p@gmail.com \
    --cc=christian@brauner.io \
    --cc=cyphar@cyphar.com \
    --cc=dhowells@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=jannh@google.com \
    --cc=joel@joelfernandes.org \
    --cc=keescook@chromium.org \
    --cc=khlebnikov@yandex-team.ru \
    --cc=ldv@altlinux.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mtk.manpages@gmail.com \
    --cc=nagarathnam.muthusamy@oracle.com \
    --cc=oleg@redhat.com \
    --cc=serge@hallyn.com \
    --cc=tglx@linutronix.de \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.