All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joel Fernandes <joel@joelfernandes.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Daniel Colascione <dancol@google.com>,
	Christian Brauner <christian@brauner.io>,
	Jann Horn <jannh@google.com>, Andrew Lutomirski <luto@kernel.org>,
	David Howells <dhowells@redhat.com>,
	"Serge E. Hallyn" <serge@hallyn.com>,
	Linux API <linux-api@vger.kernel.org>,
	Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
	Arnd Bergmann <arnd@arndb.de>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
	Kees Cook <keescook@chromium.org>,
	Alexey Dobriyan <adobriyan@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Michael Kerrisk-manpages <mtk.manpages@gmail.com>,
	Jonathan Kowalski <bl0pbl33p@gmail.com>,
	"Dmitry V. Levin" <ldv@altlinux.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Oleg Nesterov <oleg@redhat.com>,
	Nagarathnam Muthusamy <nagarathnam.muthusamy@oracle.com>,
	Aleksa Sarai <cyphar@cyphar.com>,
	Al Viro <viro@zeniv.linux.org.uk>
Subject: Re: [PATCH v2 0/5] pid: add pidfd_open()
Date: Sat, 30 Mar 2019 21:07:16 -0400	[thread overview]
Message-ID: <20190331010716.GA189578@google.com> (raw)
In-Reply-To: <CAHk-=wi1kQDVN22qv7M=bWU+CngZZXcPAMKCsDZMeuY3k2tkyg@mail.gmail.com>

On Sat, Mar 30, 2019 at 09:18:12AM -0700, Linus Torvalds wrote:
> On Sat, Mar 30, 2019 at 9:16 AM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > And no, we are *NOT* making pidfd_open() into some "let's expose /proc
> > even when it's not mounted" horror. Seriously. That's disgusting,
> > wrong, insecure and stupid.
> 
> And btw, this is not debatable. In fact, this whole discussion is
> making me feel like I should just revert pidfd, not because the code I
> merged is wrong, but because people are clearly intending to do
> completely inappropriate things with this.
> 
> Get your act together.  Stop hacking up garbage.
> 
>                Linus

I am supportive of Linus's view of keeping /proc separate from pidfd. I
didn't really care about "pidfd" solving any racing issues with /proc/<pid>/*
access.  My main interest in pidfd is because we can implement "waiting"
semantics in the future using something like a pidfd_wait call, which solves
one of the issues of Android's low-memory where it needs to be notified as
quickly as possible about a non-child process's death. Android Low memory
killer right now just keeps checking for /proc/<pid> 's existence which is
slow and more CPU intense for this.

As I said I don't really care about "pidfd" solving any racing issues with
/proc/<pid>/* accesses - because I still find it hard to imagine that the pid
number can be reused easily from the time you know which <pid> to deal with,
to the time when you want to read, say, the /proc/<pid>/status file. I am yet
to see any real data to show that such overflow happens - you literally need
32k process deaths and forks in such a short time frame, and on 64-bit, that
number is really high that its not even an issue. And if this is really an
issue, then you can just open a handle to /proc/<pid> at process creation
time and keep it around. If the <pid> is reused, you can still use openat(2)
on that handle without any races.

What I think will be most immediately valuable for Android in my opinion is
the pidfd_open() and pidfd_send_signal() syscalls, along with the future
pidfd_wait() that we're working on to solve the issue with Android's Low
memory killer I mentioned above.

Now, in relation to Daniel's concern with procfs, I feel we need to be clear
what is the meaning of a "pidfd" and it should consistently work across all
APIs no matter how pidfd is obtained. There shouldn't be like "if you obtain
pidfd using this method and it only works with these APIs", that's just plain
wrong IMO.

For example: with pidfd_open() or whatever that ends up being called is
available, the pidfd obtained is not tied to /proc. However, if one obtained
a pidfd using open("/proc/<pid>/", ..), then that pidfd *is* tied to /proc.
pidfd_send_signal will happily work on this "pidfd". Both these methods of
obtainings pidfd(s) make openat(2) work on them inconsistently. In one case
openat(2) fails, and in the other case it succeeds. I feel there should be no
ambiguity between how "pidfd" works with openat(2).  Either make openat(2)
never work on any "pidfd", or always make it work on it.

I would say, don't call a /proc/<pid> file descriptor as a "pidfd". Next, to
remedy all this, I feel pidfd_send_signal *should not* work on fds that are
obtained by opening of /proc/<pid> if we can agree those are not pidfds.
Only the ones obtained using the proposed pidfd_open (or pidfd_clone) should
be allowed to be used with pidfd_send_signal.

So I believe the call to tgid_pidfd_to_pid() from pidfd_to_pid in Christian's
patch 3/5 needs to go away.

+static struct pid *pidfd_to_pid(const struct file *file)
+{
+	if (file->f_op == &pidfd_fops)
+		return file->private_data;
+
+	return tgid_pidfd_to_pid(file); <-------- Should just return error
+}
+

I agree with Christian that lets not put pidfd_send_signal at risk. Let us
make the notion of a pidfd and the APIs that work on it *consistent* and make
it do just what we really need for our usescases. Which for Android, is,
pidfd_open/clone, pidfd_send_signal and pidfd_wait. I hope I added some
clarity around the usecases.

thanks,

 - Joel


WARNING: multiple messages have this Message-ID (diff)
From: Joel Fernandes <joel@joelfernandes.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Daniel Colascione <dancol@google.com>,
	Christian Brauner <christian@brauner.io>,
	Jann Horn <jannh@google.com>, Andrew Lutomirski <luto@kernel.org>,
	David Howells <dhowells@redhat.com>,
	"Serge E. Hallyn" <serge@hallyn.com>,
	Linux API <linux-api@vger.kernel.org>,
	Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
	Arnd Bergmann <arnd@arndb.de>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
	Kees Cook <keescook@chromium.org>,
	Alexey Dobriyan <adobriyan@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Michael Kerrisk-manpages <mtk.manpages@gmail.com>,
	Jonathan Kowalski <bl0pbl33p@gmail.com>,
	"Dmitry V. Levin" <ldv@altlinux.org>,
	Andrew Morton <akpm@linux-foundation.org>, Oleg Nesterov <oleg@>
Subject: Re: [PATCH v2 0/5] pid: add pidfd_open()
Date: Sat, 30 Mar 2019 21:07:16 -0400	[thread overview]
Message-ID: <20190331010716.GA189578@google.com> (raw)
In-Reply-To: <CAHk-=wi1kQDVN22qv7M=bWU+CngZZXcPAMKCsDZMeuY3k2tkyg@mail.gmail.com>

On Sat, Mar 30, 2019 at 09:18:12AM -0700, Linus Torvalds wrote:
> On Sat, Mar 30, 2019 at 9:16 AM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > And no, we are *NOT* making pidfd_open() into some "let's expose /proc
> > even when it's not mounted" horror. Seriously. That's disgusting,
> > wrong, insecure and stupid.
> 
> And btw, this is not debatable. In fact, this whole discussion is
> making me feel like I should just revert pidfd, not because the code I
> merged is wrong, but because people are clearly intending to do
> completely inappropriate things with this.
> 
> Get your act together.  Stop hacking up garbage.
> 
>                Linus

I am supportive of Linus's view of keeping /proc separate from pidfd. I
didn't really care about "pidfd" solving any racing issues with /proc/<pid>/*
access.  My main interest in pidfd is because we can implement "waiting"
semantics in the future using something like a pidfd_wait call, which solves
one of the issues of Android's low-memory where it needs to be notified as
quickly as possible about a non-child process's death. Android Low memory
killer right now just keeps checking for /proc/<pid> 's existence which is
slow and more CPU intense for this.

As I said I don't really care about "pidfd" solving any racing issues with
/proc/<pid>/* accesses - because I still find it hard to imagine that the pid
number can be reused easily from the time you know which <pid> to deal with,
to the time when you want to read, say, the /proc/<pid>/status file. I am yet
to see any real data to show that such overflow happens - you literally need
32k process deaths and forks in such a short time frame, and on 64-bit, that
number is really high that its not even an issue. And if this is really an
issue, then you can just open a handle to /proc/<pid> at process creation
time and keep it around. If the <pid> is reused, you can still use openat(2)
on that handle without any races.

What I think will be most immediately valuable for Android in my opinion is
the pidfd_open() and pidfd_send_signal() syscalls, along with the future
pidfd_wait() that we're working on to solve the issue with Android's Low
memory killer I mentioned above.

Now, in relation to Daniel's concern with procfs, I feel we need to be clear
what is the meaning of a "pidfd" and it should consistently work across all
APIs no matter how pidfd is obtained. There shouldn't be like "if you obtain
pidfd using this method and it only works with these APIs", that's just plain
wrong IMO.

For example: with pidfd_open() or whatever that ends up being called is
available, the pidfd obtained is not tied to /proc. However, if one obtained
a pidfd using open("/proc/<pid>/", ..), then that pidfd *is* tied to /proc.
pidfd_send_signal will happily work on this "pidfd". Both these methods of
obtainings pidfd(s) make openat(2) work on them inconsistently. In one case
openat(2) fails, and in the other case it succeeds. I feel there should be no
ambiguity between how "pidfd" works with openat(2).  Either make openat(2)
never work on any "pidfd", or always make it work on it.

I would say, don't call a /proc/<pid> file descriptor as a "pidfd". Next, to
remedy all this, I feel pidfd_send_signal *should not* work on fds that are
obtained by opening of /proc/<pid> if we can agree those are not pidfds.
Only the ones obtained using the proposed pidfd_open (or pidfd_clone) should
be allowed to be used with pidfd_send_signal.

So I believe the call to tgid_pidfd_to_pid() from pidfd_to_pid in Christian's
patch 3/5 needs to go away.

+static struct pid *pidfd_to_pid(const struct file *file)
+{
+	if (file->f_op == &pidfd_fops)
+		return file->private_data;
+
+	return tgid_pidfd_to_pid(file); <-------- Should just return error
+}
+

I agree with Christian that lets not put pidfd_send_signal at risk. Let us
make the notion of a pidfd and the APIs that work on it *consistent* and make
it do just what we really need for our usescases. Which for Android, is,
pidfd_open/clone, pidfd_send_signal and pidfd_wait. I hope I added some
clarity around the usecases.

thanks,

 - Joel

  reply	other threads:[~2019-03-31  1:07 UTC|newest]

Thread overview: 158+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-29 15:54 [PATCH v2 0/5] pid: add pidfd_open() Christian Brauner
2019-03-29 15:54 ` [PATCH v2 1/5] Make anon_inodes unconditional Christian Brauner
2019-03-29 15:54 ` [PATCH v2 2/5] pid: add pidfd_open() Christian Brauner
2019-03-29 23:45   ` Jann Horn
2019-03-29 23:45     ` Jann Horn
2019-03-29 23:55     ` Christian Brauner
2019-03-29 23:55       ` Christian Brauner
2019-03-30 11:53   ` Jürg Billeter
2019-03-30 14:37     ` Christian Brauner
2019-03-30 14:51       ` Jonathan Kowalski
2019-03-30 14:51         ` Jonathan Kowalski
2019-03-29 15:54 ` [PATCH v2 3/5] signal: support pidfd_open() with pidfd_send_signal() Christian Brauner
2019-03-29 15:54 ` [PATCH v2 4/5] signal: PIDFD_SIGNAL_TID threads via pidfds Christian Brauner
2019-03-30  1:06   ` Jann Horn
2019-03-30  1:06     ` Jann Horn
2019-03-30  1:22     ` Christian Brauner
2019-03-30  1:22       ` Christian Brauner
2019-03-30  1:34       ` Christian Brauner
2019-03-30  1:34         ` Christian Brauner
2019-03-30  1:42         ` Christian Brauner
2019-03-30  1:42           ` Christian Brauner
2019-03-29 15:54 ` [PATCH v2 5/5] tests: add pidfd_open() tests Christian Brauner
2019-03-30 16:09 ` [PATCH v2 0/5] pid: add pidfd_open() Linus Torvalds
2019-03-30 16:09   ` Linus Torvalds
2019-03-30 16:11   ` Daniel Colascione
2019-03-30 16:11     ` Daniel Colascione
2019-03-30 16:16     ` Linus Torvalds
2019-03-30 16:16       ` Linus Torvalds
2019-03-30 16:18       ` Linus Torvalds
2019-03-30 16:18         ` Linus Torvalds
2019-03-31  1:07         ` Joel Fernandes [this message]
2019-03-31  1:07           ` Joel Fernandes
2019-03-31  2:34           ` Jann Horn
2019-03-31  2:34             ` Jann Horn
2019-03-31  4:08             ` Joel Fernandes
2019-03-31  4:08               ` Joel Fernandes
2019-03-31  4:46               ` Jann Horn
2019-03-31  4:46                 ` Jann Horn
2019-03-31 14:52                 ` Linus Torvalds
2019-03-31 14:52                   ` Linus Torvalds
2019-03-31 15:05                   ` Christian Brauner
2019-03-31 15:05                     ` Christian Brauner
2019-03-31 15:21                     ` Daniel Colascione
2019-03-31 15:21                       ` Daniel Colascione
2019-03-31 15:33                   ` Jonathan Kowalski
2019-03-31 15:33                     ` Jonathan Kowalski
2019-03-30 16:19   ` Christian Brauner
2019-03-30 16:19     ` Christian Brauner
2019-03-30 16:24     ` Linus Torvalds
2019-03-30 16:24       ` Linus Torvalds
2019-03-30 16:34       ` Daniel Colascione
2019-03-30 16:34         ` Daniel Colascione
2019-03-30 16:38         ` Christian Brauner
2019-03-30 16:38           ` Christian Brauner
2019-03-30 17:04         ` Linus Torvalds
2019-03-30 17:04           ` Linus Torvalds
2019-03-30 17:12           ` Christian Brauner
2019-03-30 17:12             ` Christian Brauner
2019-03-30 17:24             ` Linus Torvalds
2019-03-30 17:24               ` Linus Torvalds
2019-03-30 17:37               ` Christian Brauner
2019-03-30 17:37                 ` Christian Brauner
2019-03-30 17:50               ` Jonathan Kowalski
2019-03-30 17:50                 ` Jonathan Kowalski
2019-03-30 17:52                 ` Christian Brauner
2019-03-30 17:52                   ` Christian Brauner
2019-03-30 17:59                   ` Jonathan Kowalski
2019-03-30 17:59                     ` Jonathan Kowalski
2019-03-30 18:02                     ` Christian Brauner
2019-03-30 18:02                       ` Christian Brauner
2019-03-30 18:00               ` Jann Horn
2019-03-30 18:00                 ` Jann Horn
2019-03-31 20:09               ` Andy Lutomirski
2019-03-31 20:09                 ` Andy Lutomirski
2019-03-31 21:03                 ` Linus Torvalds
2019-03-31 21:03                   ` Linus Torvalds
2019-03-31 21:10                   ` Christian Brauner
2019-03-31 21:10                     ` Christian Brauner
2019-03-31 21:17                     ` Linus Torvalds
2019-03-31 21:17                       ` Linus Torvalds
2019-03-31 22:03                       ` Christian Brauner
2019-03-31 22:03                         ` Christian Brauner
2019-03-31 22:16                         ` Linus Torvalds
2019-03-31 22:16                           ` Linus Torvalds
2019-03-31 22:33                           ` Christian Brauner
2019-03-31 22:33                             ` Christian Brauner
2019-04-01  0:52                             ` Jann Horn
2019-04-01  0:52                               ` Jann Horn
2019-04-01  8:47                               ` Yann Droneaud
2019-04-01  8:47                                 ` Yann Droneaud
2019-04-01 10:03                               ` Jonathan Kowalski
2019-04-01 10:03                                 ` Jonathan Kowalski
2019-03-31 23:40                           ` Linus Torvalds
2019-03-31 23:40                             ` Linus Torvalds
2019-04-01  0:09                             ` Al Viro
2019-04-01  0:09                               ` Al Viro
2019-04-01  0:18                               ` Linus Torvalds
2019-04-01  0:18                                 ` Linus Torvalds
2019-04-01  0:21                                 ` Christian Brauner
2019-04-01  0:21                                   ` Christian Brauner
2019-04-01  6:37                                 ` Al Viro
2019-04-01  6:37                                   ` Al Viro
2019-04-01  6:41                                   ` Al Viro
2019-04-01  6:41                                     ` Al Viro
2019-03-31 22:03                       ` Jonathan Kowalski
2019-03-31 22:03                         ` Jonathan Kowalski
2019-04-01  2:13                       ` Andy Lutomirski
2019-04-01  2:13                         ` Andy Lutomirski
2019-04-01 11:40                         ` Aleksa Sarai
2019-04-01 11:40                           ` Aleksa Sarai
2019-04-01 15:36                           ` Linus Torvalds
2019-04-01 15:36                             ` Linus Torvalds
2019-04-01 15:47                             ` Christian Brauner
2019-04-01 15:47                               ` Christian Brauner
2019-04-01 15:55                             ` Daniel Colascione
2019-04-01 15:55                               ` Daniel Colascione
2019-04-01 16:01                               ` Linus Torvalds
2019-04-01 16:01                                 ` Linus Torvalds
2019-04-01 16:13                                 ` Daniel Colascione
2019-04-01 16:13                                   ` Daniel Colascione
2019-04-01 19:42                                 ` Christian Brauner
2019-04-01 19:42                                   ` Christian Brauner
2019-04-01 21:30                                   ` Linus Torvalds
2019-04-01 21:30                                     ` Linus Torvalds
2019-04-01 21:58                                     ` Jonathan Kowalski
2019-04-01 21:58                                       ` Jonathan Kowalski
2019-04-01 22:13                                       ` Linus Torvalds
2019-04-01 22:13                                         ` Linus Torvalds
2019-04-01 22:34                                         ` Daniel Colascione
2019-04-01 22:34                                           ` Daniel Colascione
2019-04-01 16:07                               ` Jonathan Kowalski
2019-04-01 16:07                                 ` Jonathan Kowalski
2019-04-01 16:15                                 ` Linus Torvalds
2019-04-01 16:15                                   ` Linus Torvalds
2019-04-01 16:27                                   ` Jonathan Kowalski
2019-04-01 16:27                                     ` Jonathan Kowalski
2019-04-01 16:21                                 ` Daniel Colascione
2019-04-01 16:21                                   ` Daniel Colascione
2019-04-01 16:29                                   ` Linus Torvalds
2019-04-01 16:29                                     ` Linus Torvalds
2019-04-01 16:45                                     ` Daniel Colascione
2019-04-01 16:45                                       ` Daniel Colascione
2019-04-01 17:00                                       ` David Laight
2019-04-01 17:00                                         ` David Laight
2019-04-01 17:32                                       ` Linus Torvalds
2019-04-01 17:32                                         ` Linus Torvalds
2019-04-02 11:03                                       ` Florian Weimer
2019-04-02 11:03                                         ` Florian Weimer
2019-04-01 16:10                             ` Andy Lutomirski
2019-04-01 16:10                               ` Andy Lutomirski
2019-04-01 12:04                         ` Christian Brauner
2019-04-01 12:04                           ` Christian Brauner
2019-04-01 13:43                           ` Jann Horn
2019-04-01 13:43                             ` Jann Horn
2019-03-31 21:19                 ` Christian Brauner
2019-03-31 21:19                   ` Christian Brauner
2019-03-30 16:37       ` Christian Brauner
2019-03-30 16:37         ` Christian Brauner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190331010716.GA189578@google.com \
    --to=joel@joelfernandes.org \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=bl0pbl33p@gmail.com \
    --cc=christian@brauner.io \
    --cc=cyphar@cyphar.com \
    --cc=dancol@google.com \
    --cc=dhowells@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=jannh@google.com \
    --cc=keescook@chromium.org \
    --cc=khlebnikov@yandex-team.ru \
    --cc=ldv@altlinux.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mtk.manpages@gmail.com \
    --cc=nagarathnam.muthusamy@oracle.com \
    --cc=oleg@redhat.com \
    --cc=serge@hallyn.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.