All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Kowalski <bl0pbl33p@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christian Brauner <christian@brauner.io>,
	Jann Horn <jannh@google.com>,
	Daniel Colascione <dancol@google.com>,
	Aleksa Sarai <cyphar@cyphar.com>,
	Andy Lutomirski <luto@amacapital.net>,
	Andrew Lutomirski <luto@kernel.org>,
	David Howells <dhowells@redhat.com>,
	"Serge E. Hallyn" <serge@hallyn.com>,
	Linux API <linux-api@vger.kernel.org>,
	Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
	Arnd Bergmann <arnd@arndb.de>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
	Kees Cook <keescook@chromium.org>,
	Alexey Dobriyan <adobriyan@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Michael Kerrisk-manpages <mtk.manpages@gmail.com>,
	"Dmitry V. Levin" <ldv@altlinux.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Oleg Nesterov <oleg@redhat.com>,
	Nagarathnam Muthusamy <nagarathnam.muthusamy@oracle.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Joel Fernandes <joel@joelfernandes.org>
Subject: Re: [PATCH v2 0/5] pid: add pidfd_open()
Date: Mon, 1 Apr 2019 22:58:04 +0100	[thread overview]
Message-ID: <CAGLj2rFvWE6ENFyKSbZMrhqSrLjkm0YrRh2+O0LC04HmJuEY5g@mail.gmail.com> (raw)
In-Reply-To: <CAHk-=wgjndAqzMBxgXZxbSRXLRqdXtNM3aHc9X-xj+Px1fsG-Q@mail.gmail.com>

On Mon, Apr 1, 2019 at 10:35 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Mon, Apr 1, 2019 at 12:42 PM Christian Brauner <christian@brauner.io> wrote:
> >
> > From what I gather from this thread we are still best of with using fds
> > to /proc/<pid> as pidfds. Linus, do you agree or have I misunderstood?

You mention the race about learning the PID, PID being recycled, and
pidfd_open getting the wrong reference.

This exists with the /proc model to way. How do you propose to address this?


>
> That does seem to be the most flexible option.
>
> > Yes, we can have an internal mount option to restrict access to various
> > parts of procfs from such pidfds
>
> I suspect you'd find that other parties might want such a "restrict
> proc" mount option too, so I don't think it needs to be anything
> internal.
>
> But it would be pretty much independent of the pidfd issue, of course.
>
> > One thing is that we also need something to disable access to the
> > "/proc/<pid>/net". One option could be to give the files in "net/" an
> > ->open-handler which checks that our file->f_path.mnt is not one of our
> > special clone() mounts and if it is refuse the open.
>
> I would expect that that would be part of the "restrict proc" mount options, no?

I was thinking if some part of /proc could be split in a procpidfs,
with possibly shared code, which means with the new mount API, you
could configure a superblock with restricted views by virtue of mount
options, per task. This only gives you the view as inside /proc/<PID>,
and you might not be able to lift restrctions depending on privileges
in owning userns of the task.

Now, this would mean we keep the notion of anon inode fds as pidfds,
and on supported systems, you could configure an instance, pass the
pidfd descriptor in the fsconfig stage (David Howells demonstrated
similar support for passing user and net namespace descriptors during
superblock configuration) and also the pidns descriptor. Without
mounting the fs, the mount fd can then be used to do metadata access
passing it to openat and friends, possibly passed to others. This is
more granular than restrcting access over the entire /proc instance,
and can be adjusted per process, and since you need the pidfd's pidns
descriptor, you cannot easily cross namespace boundaries with just a
single pidfd. You can also create many variants of restricted versions
of a single task's proc directory.

The pidfd API and /proc access can be connected while also keeping
them both separate, conceptually.

There are some details but this will wound up being much more powerful
(restricting /proc as a whole is ofcourse also fine, in addition to
this). There are some details on when and how someone should be able
to do this, but is something like this up for discussion?

>
> > Basically, if you have a system without CONFIG_PROC_FS it makes sense
> > that clone gives back an anon inode file descriptor as pidfds because
> > you can still signal threads in a race-free way. But it doesn't make a
> > lot of sense to have pidfd_open() in this scenario because you can't
> > really do anything with that pidfd apart from sending signals.
>
> Well, people might want that.
>
> But realistically, everybody enables /proc support anyway. Even if you
> don't actually fully *mount* it in some restricted area, the support
> is pretty much always there in any kernel config.
>
> But yes, in general I agree that that also most likely means that a
> separate system call for "open_pidfd()" isn't worth it.
>
> Because if the main objection to /proc is that it exposes too much,
> then I think a much better option is to see how to sanely restrict the
> "too much" parts.
>
> Because I think there might be a lot of people who want a restricted
> /proc, in various container models etc.
>
>                    Linus

WARNING: multiple messages have this Message-ID (diff)
From: Jonathan Kowalski <bl0pbl33p@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christian Brauner <christian@brauner.io>,
	Jann Horn <jannh@google.com>,
	Daniel Colascione <dancol@google.com>,
	Aleksa Sarai <cyphar@cyphar.com>,
	Andy Lutomirski <luto@amacapital.net>,
	Andrew Lutomirski <luto@kernel.org>,
	David Howells <dhowells@redhat.com>,
	"Serge E. Hallyn" <serge@hallyn.com>,
	Linux API <linux-api@vger.kernel.org>,
	Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
	Arnd Bergmann <arnd@arndb.de>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
	Kees Cook <keescook@chromium.org>,
	Alexey Dobriyan <adobriyan@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Michael Kerrisk-manpages <mtk.manpages@gmail.com>,
	"Dmitry V. Levin" <ldv@altlinux.org>,
	Andrew Morton <akpm@linux-found>
Subject: Re: [PATCH v2 0/5] pid: add pidfd_open()
Date: Mon, 1 Apr 2019 22:58:04 +0100	[thread overview]
Message-ID: <CAGLj2rFvWE6ENFyKSbZMrhqSrLjkm0YrRh2+O0LC04HmJuEY5g@mail.gmail.com> (raw)
In-Reply-To: <CAHk-=wgjndAqzMBxgXZxbSRXLRqdXtNM3aHc9X-xj+Px1fsG-Q@mail.gmail.com>

On Mon, Apr 1, 2019 at 10:35 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Mon, Apr 1, 2019 at 12:42 PM Christian Brauner <christian@brauner.io> wrote:
> >
> > From what I gather from this thread we are still best of with using fds
> > to /proc/<pid> as pidfds. Linus, do you agree or have I misunderstood?

You mention the race about learning the PID, PID being recycled, and
pidfd_open getting the wrong reference.

This exists with the /proc model to way. How do you propose to address this?


>
> That does seem to be the most flexible option.
>
> > Yes, we can have an internal mount option to restrict access to various
> > parts of procfs from such pidfds
>
> I suspect you'd find that other parties might want such a "restrict
> proc" mount option too, so I don't think it needs to be anything
> internal.
>
> But it would be pretty much independent of the pidfd issue, of course.
>
> > One thing is that we also need something to disable access to the
> > "/proc/<pid>/net". One option could be to give the files in "net/" an
> > ->open-handler which checks that our file->f_path.mnt is not one of our
> > special clone() mounts and if it is refuse the open.
>
> I would expect that that would be part of the "restrict proc" mount options, no?

I was thinking if some part of /proc could be split in a procpidfs,
with possibly shared code, which means with the new mount API, you
could configure a superblock with restricted views by virtue of mount
options, per task. This only gives you the view as inside /proc/<PID>,
and you might not be able to lift restrctions depending on privileges
in owning userns of the task.

Now, this would mean we keep the notion of anon inode fds as pidfds,
and on supported systems, you could configure an instance, pass the
pidfd descriptor in the fsconfig stage (David Howells demonstrated
similar support for passing user and net namespace descriptors during
superblock configuration) and also the pidns descriptor. Without
mounting the fs, the mount fd can then be used to do metadata access
passing it to openat and friends, possibly passed to others. This is
more granular than restrcting access over the entire /proc instance,
and can be adjusted per process, and since you need the pidfd's pidns
descriptor, you cannot easily cross namespace boundaries with just a
single pidfd. You can also create many variants of restricted versions
of a single task's proc directory.

The pidfd API and /proc access can be connected while also keeping
them both separate, conceptually.

There are some details but this will wound up being much more powerful
(restricting /proc as a whole is ofcourse also fine, in addition to
this). There are some details on when and how someone should be able
to do this, but is something like this up for discussion?

>
> > Basically, if you have a system without CONFIG_PROC_FS it makes sense
> > that clone gives back an anon inode file descriptor as pidfds because
> > you can still signal threads in a race-free way. But it doesn't make a
> > lot of sense to have pidfd_open() in this scenario because you can't
> > really do anything with that pidfd apart from sending signals.
>
> Well, people might want that.
>
> But realistically, everybody enables /proc support anyway. Even if you
> don't actually fully *mount* it in some restricted area, the support
> is pretty much always there in any kernel config.
>
> But yes, in general I agree that that also most likely means that a
> separate system call for "open_pidfd()" isn't worth it.
>
> Because if the main objection to /proc is that it exposes too much,
> then I think a much better option is to see how to sanely restrict the
> "too much" parts.
>
> Because I think there might be a lot of people who want a restricted
> /proc, in various container models etc.
>
>                    Linus

  reply	other threads:[~2019-04-01 21:58 UTC|newest]

Thread overview: 158+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-29 15:54 [PATCH v2 0/5] pid: add pidfd_open() Christian Brauner
2019-03-29 15:54 ` [PATCH v2 1/5] Make anon_inodes unconditional Christian Brauner
2019-03-29 15:54 ` [PATCH v2 2/5] pid: add pidfd_open() Christian Brauner
2019-03-29 23:45   ` Jann Horn
2019-03-29 23:45     ` Jann Horn
2019-03-29 23:55     ` Christian Brauner
2019-03-29 23:55       ` Christian Brauner
2019-03-30 11:53   ` Jürg Billeter
2019-03-30 14:37     ` Christian Brauner
2019-03-30 14:51       ` Jonathan Kowalski
2019-03-30 14:51         ` Jonathan Kowalski
2019-03-29 15:54 ` [PATCH v2 3/5] signal: support pidfd_open() with pidfd_send_signal() Christian Brauner
2019-03-29 15:54 ` [PATCH v2 4/5] signal: PIDFD_SIGNAL_TID threads via pidfds Christian Brauner
2019-03-30  1:06   ` Jann Horn
2019-03-30  1:06     ` Jann Horn
2019-03-30  1:22     ` Christian Brauner
2019-03-30  1:22       ` Christian Brauner
2019-03-30  1:34       ` Christian Brauner
2019-03-30  1:34         ` Christian Brauner
2019-03-30  1:42         ` Christian Brauner
2019-03-30  1:42           ` Christian Brauner
2019-03-29 15:54 ` [PATCH v2 5/5] tests: add pidfd_open() tests Christian Brauner
2019-03-30 16:09 ` [PATCH v2 0/5] pid: add pidfd_open() Linus Torvalds
2019-03-30 16:09   ` Linus Torvalds
2019-03-30 16:11   ` Daniel Colascione
2019-03-30 16:11     ` Daniel Colascione
2019-03-30 16:16     ` Linus Torvalds
2019-03-30 16:16       ` Linus Torvalds
2019-03-30 16:18       ` Linus Torvalds
2019-03-30 16:18         ` Linus Torvalds
2019-03-31  1:07         ` Joel Fernandes
2019-03-31  1:07           ` Joel Fernandes
2019-03-31  2:34           ` Jann Horn
2019-03-31  2:34             ` Jann Horn
2019-03-31  4:08             ` Joel Fernandes
2019-03-31  4:08               ` Joel Fernandes
2019-03-31  4:46               ` Jann Horn
2019-03-31  4:46                 ` Jann Horn
2019-03-31 14:52                 ` Linus Torvalds
2019-03-31 14:52                   ` Linus Torvalds
2019-03-31 15:05                   ` Christian Brauner
2019-03-31 15:05                     ` Christian Brauner
2019-03-31 15:21                     ` Daniel Colascione
2019-03-31 15:21                       ` Daniel Colascione
2019-03-31 15:33                   ` Jonathan Kowalski
2019-03-31 15:33                     ` Jonathan Kowalski
2019-03-30 16:19   ` Christian Brauner
2019-03-30 16:19     ` Christian Brauner
2019-03-30 16:24     ` Linus Torvalds
2019-03-30 16:24       ` Linus Torvalds
2019-03-30 16:34       ` Daniel Colascione
2019-03-30 16:34         ` Daniel Colascione
2019-03-30 16:38         ` Christian Brauner
2019-03-30 16:38           ` Christian Brauner
2019-03-30 17:04         ` Linus Torvalds
2019-03-30 17:04           ` Linus Torvalds
2019-03-30 17:12           ` Christian Brauner
2019-03-30 17:12             ` Christian Brauner
2019-03-30 17:24             ` Linus Torvalds
2019-03-30 17:24               ` Linus Torvalds
2019-03-30 17:37               ` Christian Brauner
2019-03-30 17:37                 ` Christian Brauner
2019-03-30 17:50               ` Jonathan Kowalski
2019-03-30 17:50                 ` Jonathan Kowalski
2019-03-30 17:52                 ` Christian Brauner
2019-03-30 17:52                   ` Christian Brauner
2019-03-30 17:59                   ` Jonathan Kowalski
2019-03-30 17:59                     ` Jonathan Kowalski
2019-03-30 18:02                     ` Christian Brauner
2019-03-30 18:02                       ` Christian Brauner
2019-03-30 18:00               ` Jann Horn
2019-03-30 18:00                 ` Jann Horn
2019-03-31 20:09               ` Andy Lutomirski
2019-03-31 20:09                 ` Andy Lutomirski
2019-03-31 21:03                 ` Linus Torvalds
2019-03-31 21:03                   ` Linus Torvalds
2019-03-31 21:10                   ` Christian Brauner
2019-03-31 21:10                     ` Christian Brauner
2019-03-31 21:17                     ` Linus Torvalds
2019-03-31 21:17                       ` Linus Torvalds
2019-03-31 22:03                       ` Christian Brauner
2019-03-31 22:03                         ` Christian Brauner
2019-03-31 22:16                         ` Linus Torvalds
2019-03-31 22:16                           ` Linus Torvalds
2019-03-31 22:33                           ` Christian Brauner
2019-03-31 22:33                             ` Christian Brauner
2019-04-01  0:52                             ` Jann Horn
2019-04-01  0:52                               ` Jann Horn
2019-04-01  8:47                               ` Yann Droneaud
2019-04-01  8:47                                 ` Yann Droneaud
2019-04-01 10:03                               ` Jonathan Kowalski
2019-04-01 10:03                                 ` Jonathan Kowalski
2019-03-31 23:40                           ` Linus Torvalds
2019-03-31 23:40                             ` Linus Torvalds
2019-04-01  0:09                             ` Al Viro
2019-04-01  0:09                               ` Al Viro
2019-04-01  0:18                               ` Linus Torvalds
2019-04-01  0:18                                 ` Linus Torvalds
2019-04-01  0:21                                 ` Christian Brauner
2019-04-01  0:21                                   ` Christian Brauner
2019-04-01  6:37                                 ` Al Viro
2019-04-01  6:37                                   ` Al Viro
2019-04-01  6:41                                   ` Al Viro
2019-04-01  6:41                                     ` Al Viro
2019-03-31 22:03                       ` Jonathan Kowalski
2019-03-31 22:03                         ` Jonathan Kowalski
2019-04-01  2:13                       ` Andy Lutomirski
2019-04-01  2:13                         ` Andy Lutomirski
2019-04-01 11:40                         ` Aleksa Sarai
2019-04-01 11:40                           ` Aleksa Sarai
2019-04-01 15:36                           ` Linus Torvalds
2019-04-01 15:36                             ` Linus Torvalds
2019-04-01 15:47                             ` Christian Brauner
2019-04-01 15:47                               ` Christian Brauner
2019-04-01 15:55                             ` Daniel Colascione
2019-04-01 15:55                               ` Daniel Colascione
2019-04-01 16:01                               ` Linus Torvalds
2019-04-01 16:01                                 ` Linus Torvalds
2019-04-01 16:13                                 ` Daniel Colascione
2019-04-01 16:13                                   ` Daniel Colascione
2019-04-01 19:42                                 ` Christian Brauner
2019-04-01 19:42                                   ` Christian Brauner
2019-04-01 21:30                                   ` Linus Torvalds
2019-04-01 21:30                                     ` Linus Torvalds
2019-04-01 21:58                                     ` Jonathan Kowalski [this message]
2019-04-01 21:58                                       ` Jonathan Kowalski
2019-04-01 22:13                                       ` Linus Torvalds
2019-04-01 22:13                                         ` Linus Torvalds
2019-04-01 22:34                                         ` Daniel Colascione
2019-04-01 22:34                                           ` Daniel Colascione
2019-04-01 16:07                               ` Jonathan Kowalski
2019-04-01 16:07                                 ` Jonathan Kowalski
2019-04-01 16:15                                 ` Linus Torvalds
2019-04-01 16:15                                   ` Linus Torvalds
2019-04-01 16:27                                   ` Jonathan Kowalski
2019-04-01 16:27                                     ` Jonathan Kowalski
2019-04-01 16:21                                 ` Daniel Colascione
2019-04-01 16:21                                   ` Daniel Colascione
2019-04-01 16:29                                   ` Linus Torvalds
2019-04-01 16:29                                     ` Linus Torvalds
2019-04-01 16:45                                     ` Daniel Colascione
2019-04-01 16:45                                       ` Daniel Colascione
2019-04-01 17:00                                       ` David Laight
2019-04-01 17:00                                         ` David Laight
2019-04-01 17:32                                       ` Linus Torvalds
2019-04-01 17:32                                         ` Linus Torvalds
2019-04-02 11:03                                       ` Florian Weimer
2019-04-02 11:03                                         ` Florian Weimer
2019-04-01 16:10                             ` Andy Lutomirski
2019-04-01 16:10                               ` Andy Lutomirski
2019-04-01 12:04                         ` Christian Brauner
2019-04-01 12:04                           ` Christian Brauner
2019-04-01 13:43                           ` Jann Horn
2019-04-01 13:43                             ` Jann Horn
2019-03-31 21:19                 ` Christian Brauner
2019-03-31 21:19                   ` Christian Brauner
2019-03-30 16:37       ` Christian Brauner
2019-03-30 16:37         ` Christian Brauner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGLj2rFvWE6ENFyKSbZMrhqSrLjkm0YrRh2+O0LC04HmJuEY5g@mail.gmail.com \
    --to=bl0pbl33p@gmail.com \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=christian@brauner.io \
    --cc=cyphar@cyphar.com \
    --cc=dancol@google.com \
    --cc=dhowells@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=jannh@google.com \
    --cc=joel@joelfernandes.org \
    --cc=keescook@chromium.org \
    --cc=khlebnikov@yandex-team.ru \
    --cc=ldv@altlinux.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=luto@kernel.org \
    --cc=mtk.manpages@gmail.com \
    --cc=nagarathnam.muthusamy@oracle.com \
    --cc=oleg@redhat.com \
    --cc=serge@hallyn.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.