From: Jonathan Kowalski <bl0pbl33p@gmail.com> To: Linus Torvalds <torvalds@linux-foundation.org> Cc: Daniel Colascione <dancol@google.com>, Aleksa Sarai <cyphar@cyphar.com>, Andy Lutomirski <luto@amacapital.net>, Christian Brauner <christian@brauner.io>, Jann Horn <jannh@google.com>, Andrew Lutomirski <luto@kernel.org>, David Howells <dhowells@redhat.com>, "Serge E. Hallyn" <serge@hallyn.com>, Linux API <linux-api@vger.kernel.org>, Linux List Kernel Mailing <linux-kernel@vger.kernel.org>, Arnd Bergmann <arnd@arndb.de>, "Eric W. Biederman" <ebiederm@xmission.com>, Konstantin Khlebnikov <khlebnikov@yandex-team.ru>, Kees Cook <keescook@chromium.org>, Alexey Dobriyan <adobriyan@gmail.com>, Thomas Gleixner <tglx@linutronix.de>, Michael Kerrisk-manpages <mtk.manpages@gmail.com>, "Dmitry V. Levin" <ldv@altlinux.org>, Andrew Morton <akpm@linux-foundation.org>, Oleg Nesterov <oleg@redhat.com>, Nagarathnam Muthusamy <nagarathnam.muthusamy@oracle.com>, Al Viro <viro@zeniv.linux.org.uk>, Joel Fernandes <joel@joelfernandes.org> Subject: Re: [PATCH v2 0/5] pid: add pidfd_open() Date: Mon, 1 Apr 2019 17:27:39 +0100 [thread overview] Message-ID: <CAGLj2rHdv1rvx++Vf5LaAnZENMoq2+b-OPYmgGMnuOnaCzav3A@mail.gmail.com> (raw) In-Reply-To: <CAHk-=wi1UEsMv5wmBZB5S+5YiCbxd-AtVsoz1TfuD-KrdcLQew@mail.gmail.com> On Mon, Apr 1, 2019 at 5:15 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Mon, Apr 1, 2019 at 9:07 AM Jonathan Kowalski <bl0pbl33p@gmail.com> wrote: > > > > With the POLLHUP model on a simple pidfd, you'd know when the process > > you were referring to is dead (and one can map POLLPRI to dead and > > POLLHUP to zombie, etc). > > Adding ->poll() to the pidfd should be easy. Again, it would be > trivially be made to work for the directory fd you get from > /proc/<pid> too. > > Yeah, yeah, pollable directories are odd, but the vfs layer doesn't > care about things like "is this a directory or not". It will just call > the f_op->poll() method. I know, Andy even sent a patch for that long back. The question is, this sure solves the immediate usecase, but it inhibits some very powerful (and natural) things from being realised in the future, and makes some choices harder. Currently, pidfd_send_signal doesn't work across PID namespaces. It would be possible to make it work, but some things need to be taken care of, precisely, that one allows a task to open pidfds for tasks *it* can see. Why? because you essentially isolate the PID namespace, so your open() for this namespace suddenly doesn't start opening things it cannot see through some other namespace (i.e. /proc), precisely how you cannot open sockets in network namespaces from the outside, though if you can setns, you should be able to (same with pidfds). This makes for a nice delegation model, I can essentially put a task in a namespace with no other tasks, keep pushing pidfds into the damn thing, and subject to kernel permissions and capabilities it can yield in the owner userns, signal the said task. You can extend this to ptrace and other things, by making them accept a pidfd. This means userspace has to explicitly pass such descriptors around to make this work, like it does today (and how I can use an open socket and accept connections whilst living in totally isolated network namespace). Besides that, /proc comes with too much stuff, it should be possible to go from pidfd to /proc/<PID> and do whatever you wish to, but atleast two things that require varying levels of capabilities of inspection, the latter of which can be isolated by mount namespaces even if the process would usially be allowed to peek into it and read the entire thing, do not end being munged together. I can choose to pass both, but if /proc dir fds *are* pidfds, you need the entire complexity of masking and whatnot (which would be usable on its own, no doubt), making directory descriptors pollable and readable, etc etc. > > Linus
WARNING: multiple messages have this Message-ID (diff)
From: Jonathan Kowalski <bl0pbl33p@gmail.com> To: Linus Torvalds <torvalds@linux-foundation.org> Cc: Daniel Colascione <dancol@google.com>, Aleksa Sarai <cyphar@cyphar.com>, Andy Lutomirski <luto@amacapital.net>, Christian Brauner <christian@brauner.io>, Jann Horn <jannh@google.com>, Andrew Lutomirski <luto@kernel.org>, David Howells <dhowells@redhat.com>, "Serge E. Hallyn" <serge@hallyn.com>, Linux API <linux-api@vger.kernel.org>, Linux List Kernel Mailing <linux-kernel@vger.kernel.org>, Arnd Bergmann <arnd@arndb.de>, "Eric W. Biederman" <ebiederm@xmission.com>, Konstantin Khlebnikov <khlebnikov@yandex-team.ru>, Kees Cook <keescook@chromium.org>, Alexey Dobriyan <adobriyan@gmail.com>, Thomas Gleixner <tglx@linutronix.de>, Michael Kerrisk-manpages <mtk.manpages@gmail.com>, "Dmitry V. Levin" <ldv@altlinux.org>, Andrew Morton <akpm@linux-found> Subject: Re: [PATCH v2 0/5] pid: add pidfd_open() Date: Mon, 1 Apr 2019 17:27:39 +0100 [thread overview] Message-ID: <CAGLj2rHdv1rvx++Vf5LaAnZENMoq2+b-OPYmgGMnuOnaCzav3A@mail.gmail.com> (raw) In-Reply-To: <CAHk-=wi1UEsMv5wmBZB5S+5YiCbxd-AtVsoz1TfuD-KrdcLQew@mail.gmail.com> On Mon, Apr 1, 2019 at 5:15 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Mon, Apr 1, 2019 at 9:07 AM Jonathan Kowalski <bl0pbl33p@gmail.com> wrote: > > > > With the POLLHUP model on a simple pidfd, you'd know when the process > > you were referring to is dead (and one can map POLLPRI to dead and > > POLLHUP to zombie, etc). > > Adding ->poll() to the pidfd should be easy. Again, it would be > trivially be made to work for the directory fd you get from > /proc/<pid> too. > > Yeah, yeah, pollable directories are odd, but the vfs layer doesn't > care about things like "is this a directory or not". It will just call > the f_op->poll() method. I know, Andy even sent a patch for that long back. The question is, this sure solves the immediate usecase, but it inhibits some very powerful (and natural) things from being realised in the future, and makes some choices harder. Currently, pidfd_send_signal doesn't work across PID namespaces. It would be possible to make it work, but some things need to be taken care of, precisely, that one allows a task to open pidfds for tasks *it* can see. Why? because you essentially isolate the PID namespace, so your open() for this namespace suddenly doesn't start opening things it cannot see through some other namespace (i.e. /proc), precisely how you cannot open sockets in network namespaces from the outside, though if you can setns, you should be able to (same with pidfds). This makes for a nice delegation model, I can essentially put a task in a namespace with no other tasks, keep pushing pidfds into the damn thing, and subject to kernel permissions and capabilities it can yield in the owner userns, signal the said task. You can extend this to ptrace and other things, by making them accept a pidfd. This means userspace has to explicitly pass such descriptors around to make this work, like it does today (and how I can use an open socket and accept connections whilst living in totally isolated network namespace). Besides that, /proc comes with too much stuff, it should be possible to go from pidfd to /proc/<PID> and do whatever you wish to, but atleast two things that require varying levels of capabilities of inspection, the latter of which can be isolated by mount namespaces even if the process would usially be allowed to peek into it and read the entire thing, do not end being munged together. I can choose to pass both, but if /proc dir fds *are* pidfds, you need the entire complexity of masking and whatnot (which would be usable on its own, no doubt), making directory descriptors pollable and readable, etc etc. > > Linus
next prev parent reply other threads:[~2019-04-01 16:27 UTC|newest] Thread overview: 158+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-03-29 15:54 [PATCH v2 0/5] pid: add pidfd_open() Christian Brauner 2019-03-29 15:54 ` [PATCH v2 1/5] Make anon_inodes unconditional Christian Brauner 2019-03-29 15:54 ` [PATCH v2 2/5] pid: add pidfd_open() Christian Brauner 2019-03-29 23:45 ` Jann Horn 2019-03-29 23:45 ` Jann Horn 2019-03-29 23:55 ` Christian Brauner 2019-03-29 23:55 ` Christian Brauner 2019-03-30 11:53 ` Jürg Billeter 2019-03-30 14:37 ` Christian Brauner 2019-03-30 14:51 ` Jonathan Kowalski 2019-03-30 14:51 ` Jonathan Kowalski 2019-03-29 15:54 ` [PATCH v2 3/5] signal: support pidfd_open() with pidfd_send_signal() Christian Brauner 2019-03-29 15:54 ` [PATCH v2 4/5] signal: PIDFD_SIGNAL_TID threads via pidfds Christian Brauner 2019-03-30 1:06 ` Jann Horn 2019-03-30 1:06 ` Jann Horn 2019-03-30 1:22 ` Christian Brauner 2019-03-30 1:22 ` Christian Brauner 2019-03-30 1:34 ` Christian Brauner 2019-03-30 1:34 ` Christian Brauner 2019-03-30 1:42 ` Christian Brauner 2019-03-30 1:42 ` Christian Brauner 2019-03-29 15:54 ` [PATCH v2 5/5] tests: add pidfd_open() tests Christian Brauner 2019-03-30 16:09 ` [PATCH v2 0/5] pid: add pidfd_open() Linus Torvalds 2019-03-30 16:09 ` Linus Torvalds 2019-03-30 16:11 ` Daniel Colascione 2019-03-30 16:11 ` Daniel Colascione 2019-03-30 16:16 ` Linus Torvalds 2019-03-30 16:16 ` Linus Torvalds 2019-03-30 16:18 ` Linus Torvalds 2019-03-30 16:18 ` Linus Torvalds 2019-03-31 1:07 ` Joel Fernandes 2019-03-31 1:07 ` Joel Fernandes 2019-03-31 2:34 ` Jann Horn 2019-03-31 2:34 ` Jann Horn 2019-03-31 4:08 ` Joel Fernandes 2019-03-31 4:08 ` Joel Fernandes 2019-03-31 4:46 ` Jann Horn 2019-03-31 4:46 ` Jann Horn 2019-03-31 14:52 ` Linus Torvalds 2019-03-31 14:52 ` Linus Torvalds 2019-03-31 15:05 ` Christian Brauner 2019-03-31 15:05 ` Christian Brauner 2019-03-31 15:21 ` Daniel Colascione 2019-03-31 15:21 ` Daniel Colascione 2019-03-31 15:33 ` Jonathan Kowalski 2019-03-31 15:33 ` Jonathan Kowalski 2019-03-30 16:19 ` Christian Brauner 2019-03-30 16:19 ` Christian Brauner 2019-03-30 16:24 ` Linus Torvalds 2019-03-30 16:24 ` Linus Torvalds 2019-03-30 16:34 ` Daniel Colascione 2019-03-30 16:34 ` Daniel Colascione 2019-03-30 16:38 ` Christian Brauner 2019-03-30 16:38 ` Christian Brauner 2019-03-30 17:04 ` Linus Torvalds 2019-03-30 17:04 ` Linus Torvalds 2019-03-30 17:12 ` Christian Brauner 2019-03-30 17:12 ` Christian Brauner 2019-03-30 17:24 ` Linus Torvalds 2019-03-30 17:24 ` Linus Torvalds 2019-03-30 17:37 ` Christian Brauner 2019-03-30 17:37 ` Christian Brauner 2019-03-30 17:50 ` Jonathan Kowalski 2019-03-30 17:50 ` Jonathan Kowalski 2019-03-30 17:52 ` Christian Brauner 2019-03-30 17:52 ` Christian Brauner 2019-03-30 17:59 ` Jonathan Kowalski 2019-03-30 17:59 ` Jonathan Kowalski 2019-03-30 18:02 ` Christian Brauner 2019-03-30 18:02 ` Christian Brauner 2019-03-30 18:00 ` Jann Horn 2019-03-30 18:00 ` Jann Horn 2019-03-31 20:09 ` Andy Lutomirski 2019-03-31 20:09 ` Andy Lutomirski 2019-03-31 21:03 ` Linus Torvalds 2019-03-31 21:03 ` Linus Torvalds 2019-03-31 21:10 ` Christian Brauner 2019-03-31 21:10 ` Christian Brauner 2019-03-31 21:17 ` Linus Torvalds 2019-03-31 21:17 ` Linus Torvalds 2019-03-31 22:03 ` Christian Brauner 2019-03-31 22:03 ` Christian Brauner 2019-03-31 22:16 ` Linus Torvalds 2019-03-31 22:16 ` Linus Torvalds 2019-03-31 22:33 ` Christian Brauner 2019-03-31 22:33 ` Christian Brauner 2019-04-01 0:52 ` Jann Horn 2019-04-01 0:52 ` Jann Horn 2019-04-01 8:47 ` Yann Droneaud 2019-04-01 8:47 ` Yann Droneaud 2019-04-01 10:03 ` Jonathan Kowalski 2019-04-01 10:03 ` Jonathan Kowalski 2019-03-31 23:40 ` Linus Torvalds 2019-03-31 23:40 ` Linus Torvalds 2019-04-01 0:09 ` Al Viro 2019-04-01 0:09 ` Al Viro 2019-04-01 0:18 ` Linus Torvalds 2019-04-01 0:18 ` Linus Torvalds 2019-04-01 0:21 ` Christian Brauner 2019-04-01 0:21 ` Christian Brauner 2019-04-01 6:37 ` Al Viro 2019-04-01 6:37 ` Al Viro 2019-04-01 6:41 ` Al Viro 2019-04-01 6:41 ` Al Viro 2019-03-31 22:03 ` Jonathan Kowalski 2019-03-31 22:03 ` Jonathan Kowalski 2019-04-01 2:13 ` Andy Lutomirski 2019-04-01 2:13 ` Andy Lutomirski 2019-04-01 11:40 ` Aleksa Sarai 2019-04-01 11:40 ` Aleksa Sarai 2019-04-01 15:36 ` Linus Torvalds 2019-04-01 15:36 ` Linus Torvalds 2019-04-01 15:47 ` Christian Brauner 2019-04-01 15:47 ` Christian Brauner 2019-04-01 15:55 ` Daniel Colascione 2019-04-01 15:55 ` Daniel Colascione 2019-04-01 16:01 ` Linus Torvalds 2019-04-01 16:01 ` Linus Torvalds 2019-04-01 16:13 ` Daniel Colascione 2019-04-01 16:13 ` Daniel Colascione 2019-04-01 19:42 ` Christian Brauner 2019-04-01 19:42 ` Christian Brauner 2019-04-01 21:30 ` Linus Torvalds 2019-04-01 21:30 ` Linus Torvalds 2019-04-01 21:58 ` Jonathan Kowalski 2019-04-01 21:58 ` Jonathan Kowalski 2019-04-01 22:13 ` Linus Torvalds 2019-04-01 22:13 ` Linus Torvalds 2019-04-01 22:34 ` Daniel Colascione 2019-04-01 22:34 ` Daniel Colascione 2019-04-01 16:07 ` Jonathan Kowalski 2019-04-01 16:07 ` Jonathan Kowalski 2019-04-01 16:15 ` Linus Torvalds 2019-04-01 16:15 ` Linus Torvalds 2019-04-01 16:27 ` Jonathan Kowalski [this message] 2019-04-01 16:27 ` Jonathan Kowalski 2019-04-01 16:21 ` Daniel Colascione 2019-04-01 16:21 ` Daniel Colascione 2019-04-01 16:29 ` Linus Torvalds 2019-04-01 16:29 ` Linus Torvalds 2019-04-01 16:45 ` Daniel Colascione 2019-04-01 16:45 ` Daniel Colascione 2019-04-01 17:00 ` David Laight 2019-04-01 17:00 ` David Laight 2019-04-01 17:32 ` Linus Torvalds 2019-04-01 17:32 ` Linus Torvalds 2019-04-02 11:03 ` Florian Weimer 2019-04-02 11:03 ` Florian Weimer 2019-04-01 16:10 ` Andy Lutomirski 2019-04-01 16:10 ` Andy Lutomirski 2019-04-01 12:04 ` Christian Brauner 2019-04-01 12:04 ` Christian Brauner 2019-04-01 13:43 ` Jann Horn 2019-04-01 13:43 ` Jann Horn 2019-03-31 21:19 ` Christian Brauner 2019-03-31 21:19 ` Christian Brauner 2019-03-30 16:37 ` Christian Brauner 2019-03-30 16:37 ` Christian Brauner
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=CAGLj2rHdv1rvx++Vf5LaAnZENMoq2+b-OPYmgGMnuOnaCzav3A@mail.gmail.com \ --to=bl0pbl33p@gmail.com \ --cc=adobriyan@gmail.com \ --cc=akpm@linux-foundation.org \ --cc=arnd@arndb.de \ --cc=christian@brauner.io \ --cc=cyphar@cyphar.com \ --cc=dancol@google.com \ --cc=dhowells@redhat.com \ --cc=ebiederm@xmission.com \ --cc=jannh@google.com \ --cc=joel@joelfernandes.org \ --cc=keescook@chromium.org \ --cc=khlebnikov@yandex-team.ru \ --cc=ldv@altlinux.org \ --cc=linux-api@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=luto@amacapital.net \ --cc=luto@kernel.org \ --cc=mtk.manpages@gmail.com \ --cc=nagarathnam.muthusamy@oracle.com \ --cc=oleg@redhat.com \ --cc=serge@hallyn.com \ --cc=tglx@linutronix.de \ --cc=torvalds@linux-foundation.org \ --cc=viro@zeniv.linux.org.uk \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.