All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christian Brauner <christian@brauner.io>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jann Horn <jannh@google.com>,
	Joel Fernandes <joel@joelfernandes.org>,
	Daniel Colascione <dancol@google.com>,
	Andrew Lutomirski <luto@kernel.org>,
	David Howells <dhowells@redhat.com>,
	"Serge E. Hallyn" <serge@hallyn.com>,
	Linux API <linux-api@vger.kernel.org>,
	Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
	Arnd Bergmann <arnd@arndb.de>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
	Kees Cook <keescook@chromium.org>,
	Alexey Dobriyan <adobriyan@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Michael Kerrisk-manpages <mtk.manpages@gmail.com>,
	Jonathan Kowalski <bl0pbl33p@gmail.com>,
	"Dmitry V. Levin" <ldv@altlinux.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Oleg Nesterov <oleg@redhat.com>,
	Nagarathnam Muthusamy <nagarathnam.muthusamy@oracle.com>,
	Aleksa Sarai <cyphar@cyphar.com>,
	Al Viro <viro@zeniv.linux.org.uk>
Subject: Re: [PATCH v2 0/5] pid: add pidfd_open()
Date: Sun, 31 Mar 2019 17:05:08 +0200	[thread overview]
Message-ID: <20190331150507.zpyugdvtmr6rgpda@brauner.io> (raw)
In-Reply-To: <CAHk-=wiZ40LVjnXSi9iHLE_-ZBsWFGCgdmNiYZUXn1-V5YBg2g@mail.gmail.com>

On Sun, Mar 31, 2019 at 07:52:28AM -0700, Linus Torvalds wrote:
> On Sat, Mar 30, 2019 at 9:47 PM Jann Horn <jannh@google.com> wrote:
> >
> > Sure, given a pidfd_clone() syscall, as long as the parent of the
> > process is giving you a pidfd for it and you don't have to deal with
> > grandchildren created by fork() calls outside your control, that
> > works.
> 
> Don't do pidfd_clone() and pidfd_wait().
> 
> Both of those existing system calls already get a "flags" argument.
> Just make a WPIDFD (for waitid) and CLONE_PIDFD (for clone) bit, and
> make the existing system calls just take/return a pidfd.

Yes, that's one of the options I was considering but was afraid of
pitching it because of the very massive opposition I got
regarding"multiplexers". I'm perfectly happy with doing it this way.

> 
> Side note: we could (should?) also make the default maxpid just be
> larger. It needs to fit in an 'int', but MAXINT instead of 65535 would
> likely alreadt make a lot of these attacks harder.

Yes, agreed.

> 
> There was some really old legacy reason why we actually limited it to
> 65535 originally.  It was old and crufty even back when..

So Jann and I have been thinking about going forward with the following
idea:
With the pidfd_open() patchset I have pidfds are simple anone inode file
descriptors stashing a reference to struct pid of a process. I have
mentioned this is in prior mails. This cleanly decouples pidfds from
procfs completely.
The reason why we want to use pidfds with no connection to a specific procfs
instance, even in environments that have procfs, is that we would like to add
the API to clone with CLONE_PIDFD that you just mentioned that creates a
new process or thread and returns a pidfd to it. In the context of such
a syscall, it would be awkward to have the kernel open a file in some
procfs instance, since then userspace would have to specify which procfs
instance the fd should come from.

There is an argument to be made that for consistency's sake we should - although
I don't feel strongly about it - disallow the usage of pidfd_send_signal() with
fds referring to /proc/<pid> then. Unless you want this to always work. If you
want this to work then we would simply submit pidfd_open() for the 5.2
window. If you agree that it makes sense to only have pidfd_open() file
descriptors working with pidfd_send_signal() we would send a revert for
pidfd_send_signal() now and resubmit it together with pidfd_open()
during the 5.2. merge window. This decouples pidfds completely from
procfs not just when it is not compiled in or mounted.

I very much care about this being done right. If this means temporarily
kicking pidfd_send_signal() out until 5.2 I'm happy to do so.

Btw, the /proc/<pid> race issue that is mentioned constantly is simply
avoidable by placing the pid that the pidfd has stashed relative to the
callers' procfs mount's pid namespace in the pidfd's fdinfo. So there's
not even a need to really go through /proc/<pid> in the first place. A
caller wanting to get metadata access and avoid a race with pid
recycling can then simply do:

int pidfd = pidfd_open(pid, 0);
int pid = parse_fdinfo("/proc/self/fdinfo/<pidfd>");
int procpidfd = open("/proc/<pid>", ...);

/* Test if process still exists by sending signal 0 through our pidfd. */
int ret = pidfd_send_signal(pid, 0, NULL, PIDFD_SIGNAL_THREAD);
if (ret < 0 && errno == ESRCH) {
        /* pid has been recycled and procpidfd refers to another process */
}

Christian

WARNING: multiple messages have this Message-ID (diff)
From: Christian Brauner <christian@brauner.io>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jann Horn <jannh@google.com>,
	Joel Fernandes <joel@joelfernandes.org>,
	Daniel Colascione <dancol@google.com>,
	Andrew Lutomirski <luto@kernel.org>,
	David Howells <dhowells@redhat.com>,
	"Serge E. Hallyn" <serge@hallyn.com>,
	Linux API <linux-api@vger.kernel.org>,
	Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
	Arnd Bergmann <arnd@arndb.de>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
	Kees Cook <keescook@chromium.org>,
	Alexey Dobriyan <adobriyan@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Michael Kerrisk-manpages <mtk.manpages@gmail.com>,
	Jonathan Kowalski <bl0pbl33p@gmail.com>,
	"Dmitry V. Levin" <ldv@altlinux.org>,
	Andrew Morton <akpm@linux-foundation.org>, Oleg Nesterov <oleg@r>
Subject: Re: [PATCH v2 0/5] pid: add pidfd_open()
Date: Sun, 31 Mar 2019 17:05:08 +0200	[thread overview]
Message-ID: <20190331150507.zpyugdvtmr6rgpda@brauner.io> (raw)
In-Reply-To: <CAHk-=wiZ40LVjnXSi9iHLE_-ZBsWFGCgdmNiYZUXn1-V5YBg2g@mail.gmail.com>

On Sun, Mar 31, 2019 at 07:52:28AM -0700, Linus Torvalds wrote:
> On Sat, Mar 30, 2019 at 9:47 PM Jann Horn <jannh@google.com> wrote:
> >
> > Sure, given a pidfd_clone() syscall, as long as the parent of the
> > process is giving you a pidfd for it and you don't have to deal with
> > grandchildren created by fork() calls outside your control, that
> > works.
> 
> Don't do pidfd_clone() and pidfd_wait().
> 
> Both of those existing system calls already get a "flags" argument.
> Just make a WPIDFD (for waitid) and CLONE_PIDFD (for clone) bit, and
> make the existing system calls just take/return a pidfd.

Yes, that's one of the options I was considering but was afraid of
pitching it because of the very massive opposition I got
regarding"multiplexers". I'm perfectly happy with doing it this way.

> 
> Side note: we could (should?) also make the default maxpid just be
> larger. It needs to fit in an 'int', but MAXINT instead of 65535 would
> likely alreadt make a lot of these attacks harder.

Yes, agreed.

> 
> There was some really old legacy reason why we actually limited it to
> 65535 originally.  It was old and crufty even back when..

So Jann and I have been thinking about going forward with the following
idea:
With the pidfd_open() patchset I have pidfds are simple anone inode file
descriptors stashing a reference to struct pid of a process. I have
mentioned this is in prior mails. This cleanly decouples pidfds from
procfs completely.
The reason why we want to use pidfds with no connection to a specific procfs
instance, even in environments that have procfs, is that we would like to add
the API to clone with CLONE_PIDFD that you just mentioned that creates a
new process or thread and returns a pidfd to it. In the context of such
a syscall, it would be awkward to have the kernel open a file in some
procfs instance, since then userspace would have to specify which procfs
instance the fd should come from.

There is an argument to be made that for consistency's sake we should - although
I don't feel strongly about it - disallow the usage of pidfd_send_signal() with
fds referring to /proc/<pid> then. Unless you want this to always work. If you
want this to work then we would simply submit pidfd_open() for the 5.2
window. If you agree that it makes sense to only have pidfd_open() file
descriptors working with pidfd_send_signal() we would send a revert for
pidfd_send_signal() now and resubmit it together with pidfd_open()
during the 5.2. merge window. This decouples pidfds completely from
procfs not just when it is not compiled in or mounted.

I very much care about this being done right. If this means temporarily
kicking pidfd_send_signal() out until 5.2 I'm happy to do so.

Btw, the /proc/<pid> race issue that is mentioned constantly is simply
avoidable by placing the pid that the pidfd has stashed relative to the
callers' procfs mount's pid namespace in the pidfd's fdinfo. So there's
not even a need to really go through /proc/<pid> in the first place. A
caller wanting to get metadata access and avoid a race with pid
recycling can then simply do:

int pidfd = pidfd_open(pid, 0);
int pid = parse_fdinfo("/proc/self/fdinfo/<pidfd>");
int procpidfd = open("/proc/<pid>", ...);

/* Test if process still exists by sending signal 0 through our pidfd. */
int ret = pidfd_send_signal(pid, 0, NULL, PIDFD_SIGNAL_THREAD);
if (ret < 0 && errno == ESRCH) {
        /* pid has been recycled and procpidfd refers to another process */
}

Christian

  reply	other threads:[~2019-03-31 15:05 UTC|newest]

Thread overview: 158+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-29 15:54 [PATCH v2 0/5] pid: add pidfd_open() Christian Brauner
2019-03-29 15:54 ` [PATCH v2 1/5] Make anon_inodes unconditional Christian Brauner
2019-03-29 15:54 ` [PATCH v2 2/5] pid: add pidfd_open() Christian Brauner
2019-03-29 23:45   ` Jann Horn
2019-03-29 23:45     ` Jann Horn
2019-03-29 23:55     ` Christian Brauner
2019-03-29 23:55       ` Christian Brauner
2019-03-30 11:53   ` Jürg Billeter
2019-03-30 14:37     ` Christian Brauner
2019-03-30 14:51       ` Jonathan Kowalski
2019-03-30 14:51         ` Jonathan Kowalski
2019-03-29 15:54 ` [PATCH v2 3/5] signal: support pidfd_open() with pidfd_send_signal() Christian Brauner
2019-03-29 15:54 ` [PATCH v2 4/5] signal: PIDFD_SIGNAL_TID threads via pidfds Christian Brauner
2019-03-30  1:06   ` Jann Horn
2019-03-30  1:06     ` Jann Horn
2019-03-30  1:22     ` Christian Brauner
2019-03-30  1:22       ` Christian Brauner
2019-03-30  1:34       ` Christian Brauner
2019-03-30  1:34         ` Christian Brauner
2019-03-30  1:42         ` Christian Brauner
2019-03-30  1:42           ` Christian Brauner
2019-03-29 15:54 ` [PATCH v2 5/5] tests: add pidfd_open() tests Christian Brauner
2019-03-30 16:09 ` [PATCH v2 0/5] pid: add pidfd_open() Linus Torvalds
2019-03-30 16:09   ` Linus Torvalds
2019-03-30 16:11   ` Daniel Colascione
2019-03-30 16:11     ` Daniel Colascione
2019-03-30 16:16     ` Linus Torvalds
2019-03-30 16:16       ` Linus Torvalds
2019-03-30 16:18       ` Linus Torvalds
2019-03-30 16:18         ` Linus Torvalds
2019-03-31  1:07         ` Joel Fernandes
2019-03-31  1:07           ` Joel Fernandes
2019-03-31  2:34           ` Jann Horn
2019-03-31  2:34             ` Jann Horn
2019-03-31  4:08             ` Joel Fernandes
2019-03-31  4:08               ` Joel Fernandes
2019-03-31  4:46               ` Jann Horn
2019-03-31  4:46                 ` Jann Horn
2019-03-31 14:52                 ` Linus Torvalds
2019-03-31 14:52                   ` Linus Torvalds
2019-03-31 15:05                   ` Christian Brauner [this message]
2019-03-31 15:05                     ` Christian Brauner
2019-03-31 15:21                     ` Daniel Colascione
2019-03-31 15:21                       ` Daniel Colascione
2019-03-31 15:33                   ` Jonathan Kowalski
2019-03-31 15:33                     ` Jonathan Kowalski
2019-03-30 16:19   ` Christian Brauner
2019-03-30 16:19     ` Christian Brauner
2019-03-30 16:24     ` Linus Torvalds
2019-03-30 16:24       ` Linus Torvalds
2019-03-30 16:34       ` Daniel Colascione
2019-03-30 16:34         ` Daniel Colascione
2019-03-30 16:38         ` Christian Brauner
2019-03-30 16:38           ` Christian Brauner
2019-03-30 17:04         ` Linus Torvalds
2019-03-30 17:04           ` Linus Torvalds
2019-03-30 17:12           ` Christian Brauner
2019-03-30 17:12             ` Christian Brauner
2019-03-30 17:24             ` Linus Torvalds
2019-03-30 17:24               ` Linus Torvalds
2019-03-30 17:37               ` Christian Brauner
2019-03-30 17:37                 ` Christian Brauner
2019-03-30 17:50               ` Jonathan Kowalski
2019-03-30 17:50                 ` Jonathan Kowalski
2019-03-30 17:52                 ` Christian Brauner
2019-03-30 17:52                   ` Christian Brauner
2019-03-30 17:59                   ` Jonathan Kowalski
2019-03-30 17:59                     ` Jonathan Kowalski
2019-03-30 18:02                     ` Christian Brauner
2019-03-30 18:02                       ` Christian Brauner
2019-03-30 18:00               ` Jann Horn
2019-03-30 18:00                 ` Jann Horn
2019-03-31 20:09               ` Andy Lutomirski
2019-03-31 20:09                 ` Andy Lutomirski
2019-03-31 21:03                 ` Linus Torvalds
2019-03-31 21:03                   ` Linus Torvalds
2019-03-31 21:10                   ` Christian Brauner
2019-03-31 21:10                     ` Christian Brauner
2019-03-31 21:17                     ` Linus Torvalds
2019-03-31 21:17                       ` Linus Torvalds
2019-03-31 22:03                       ` Christian Brauner
2019-03-31 22:03                         ` Christian Brauner
2019-03-31 22:16                         ` Linus Torvalds
2019-03-31 22:16                           ` Linus Torvalds
2019-03-31 22:33                           ` Christian Brauner
2019-03-31 22:33                             ` Christian Brauner
2019-04-01  0:52                             ` Jann Horn
2019-04-01  0:52                               ` Jann Horn
2019-04-01  8:47                               ` Yann Droneaud
2019-04-01  8:47                                 ` Yann Droneaud
2019-04-01 10:03                               ` Jonathan Kowalski
2019-04-01 10:03                                 ` Jonathan Kowalski
2019-03-31 23:40                           ` Linus Torvalds
2019-03-31 23:40                             ` Linus Torvalds
2019-04-01  0:09                             ` Al Viro
2019-04-01  0:09                               ` Al Viro
2019-04-01  0:18                               ` Linus Torvalds
2019-04-01  0:18                                 ` Linus Torvalds
2019-04-01  0:21                                 ` Christian Brauner
2019-04-01  0:21                                   ` Christian Brauner
2019-04-01  6:37                                 ` Al Viro
2019-04-01  6:37                                   ` Al Viro
2019-04-01  6:41                                   ` Al Viro
2019-04-01  6:41                                     ` Al Viro
2019-03-31 22:03                       ` Jonathan Kowalski
2019-03-31 22:03                         ` Jonathan Kowalski
2019-04-01  2:13                       ` Andy Lutomirski
2019-04-01  2:13                         ` Andy Lutomirski
2019-04-01 11:40                         ` Aleksa Sarai
2019-04-01 11:40                           ` Aleksa Sarai
2019-04-01 15:36                           ` Linus Torvalds
2019-04-01 15:36                             ` Linus Torvalds
2019-04-01 15:47                             ` Christian Brauner
2019-04-01 15:47                               ` Christian Brauner
2019-04-01 15:55                             ` Daniel Colascione
2019-04-01 15:55                               ` Daniel Colascione
2019-04-01 16:01                               ` Linus Torvalds
2019-04-01 16:01                                 ` Linus Torvalds
2019-04-01 16:13                                 ` Daniel Colascione
2019-04-01 16:13                                   ` Daniel Colascione
2019-04-01 19:42                                 ` Christian Brauner
2019-04-01 19:42                                   ` Christian Brauner
2019-04-01 21:30                                   ` Linus Torvalds
2019-04-01 21:30                                     ` Linus Torvalds
2019-04-01 21:58                                     ` Jonathan Kowalski
2019-04-01 21:58                                       ` Jonathan Kowalski
2019-04-01 22:13                                       ` Linus Torvalds
2019-04-01 22:13                                         ` Linus Torvalds
2019-04-01 22:34                                         ` Daniel Colascione
2019-04-01 22:34                                           ` Daniel Colascione
2019-04-01 16:07                               ` Jonathan Kowalski
2019-04-01 16:07                                 ` Jonathan Kowalski
2019-04-01 16:15                                 ` Linus Torvalds
2019-04-01 16:15                                   ` Linus Torvalds
2019-04-01 16:27                                   ` Jonathan Kowalski
2019-04-01 16:27                                     ` Jonathan Kowalski
2019-04-01 16:21                                 ` Daniel Colascione
2019-04-01 16:21                                   ` Daniel Colascione
2019-04-01 16:29                                   ` Linus Torvalds
2019-04-01 16:29                                     ` Linus Torvalds
2019-04-01 16:45                                     ` Daniel Colascione
2019-04-01 16:45                                       ` Daniel Colascione
2019-04-01 17:00                                       ` David Laight
2019-04-01 17:00                                         ` David Laight
2019-04-01 17:32                                       ` Linus Torvalds
2019-04-01 17:32                                         ` Linus Torvalds
2019-04-02 11:03                                       ` Florian Weimer
2019-04-02 11:03                                         ` Florian Weimer
2019-04-01 16:10                             ` Andy Lutomirski
2019-04-01 16:10                               ` Andy Lutomirski
2019-04-01 12:04                         ` Christian Brauner
2019-04-01 12:04                           ` Christian Brauner
2019-04-01 13:43                           ` Jann Horn
2019-04-01 13:43                             ` Jann Horn
2019-03-31 21:19                 ` Christian Brauner
2019-03-31 21:19                   ` Christian Brauner
2019-03-30 16:37       ` Christian Brauner
2019-03-30 16:37         ` Christian Brauner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190331150507.zpyugdvtmr6rgpda@brauner.io \
    --to=christian@brauner.io \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=bl0pbl33p@gmail.com \
    --cc=cyphar@cyphar.com \
    --cc=dancol@google.com \
    --cc=dhowells@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=jannh@google.com \
    --cc=joel@joelfernandes.org \
    --cc=keescook@chromium.org \
    --cc=khlebnikov@yandex-team.ru \
    --cc=ldv@altlinux.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mtk.manpages@gmail.com \
    --cc=nagarathnam.muthusamy@oracle.com \
    --cc=oleg@redhat.com \
    --cc=serge@hallyn.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.