linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christian Brauner <christian@brauner.io>
To: Kevin Easton <kevin@guarana.org>, Andy Lutomirski <luto@kernel.org>
Cc: Aleksa Sarai <cyphar@cyphar.com>,
	"Enrico Weigelt, metux IT consult" <lkml@metux.net>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Al Viro <viro@zeniv.linux.org.uk>, Jann Horn <jannh@google.com>,
	David Howells <dhowells@redhat.com>,
	Linux API <linux-api@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"Serge E. Hallyn" <serge@hallyn.com>,
	Arnd Bergmann <arnd@arndb.de>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Kees Cook <keescook@chromium.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Oleg Nesterov <oleg@redhat.com>,
	Joel Fernandes <joel@joelfernandes.org>,
	Daniel Colascione <dancol@google.com>
Subject: Re: RFC: on adding new CLONE_* flags [WAS Re: [PATCH 0/4] clone: add CLONE_PIDFD]
Date: Sat, 20 Apr 2019 13:15:11 +0200	[thread overview]
Message-ID: <D2514402-C7FA-4465-8B37-6C0579F4CD84@brauner.io> (raw)
In-Reply-To: <20190420071406.GA22257@ip-172-31-15-78>

On April 20, 2019 9:14:06 AM GMT+02:00, Kevin Easton <kevin@guarana.org> wrote:
>On Mon, Apr 15, 2019 at 01:29:23PM -0700, Andy Lutomirski wrote:
>> On Mon, Apr 15, 2019 at 12:59 PM Aleksa Sarai <cyphar@cyphar.com>
>wrote:
>> >
>> > On 2019-04-15, Enrico Weigelt, metux IT consult <lkml@metux.net>
>wrote:
>> > > > This patchset makes it possible to retrieve pid file
>descriptors at
>> > > > process creation time by introducing the new flag CLONE_PIDFD
>to the
>> > > > clone() system call as previously discussed.
>> > >
>> > > Sorry, for highjacking this thread, but I'm curious on what
>things to
>> > > consider when introducing new CLONE_* flags.
>> > >
>> > > The reason I'm asking is:
>> > >
>> > > I'm working on implementing plan9-like fs namespaces, where
>unprivileged
>> > > processes can change their own namespace at will. For that,
>certain
>> > > traditional unix'ish things have to be disabled, most notably
>suid.
>> > > As forbidding suid can be helpful in other scenarios, too, I
>thought
>> > > about making this its own feature. Doing that switch on clone()
>seems
>> > > a nice place for that, IMHO.
>> >
>> > Just spit-balling -- is no_new_privs not sufficient for this
>usecase?
>> > Not granting privileges such as setuid during execve(2) is the main
>> > point of that flag.
>> >
>> 
>> I would personally *love* it if distros started setting no_new_privs
>> for basically all processes.  And pidfd actually gets us part of the
>> way toward a straightforward way to make sudo and su still work in a
>> no_new_privs world: su could call into a daemon that would spawn the
>> privileged task, and su would get a (read-only!) pidfd back and then
>> wait for the fd and exit.  I suppose that, done naively, this might
>> cause some odd effects with respect to tty handling, but I bet it's
>> solveable.  I suppose it would be nifty if there were a way for a
>> process, by mutual agreement, to reparent itself to an unrelated
>> process.
>> 
>> Anyway, clone(2) is an enormous mess.  Surely the right solution here
>> is to have a whole new process creation API that takes a big,
>> extensible struct as an argument, and supports *at least* the full
>> abilities of posix_spawn() and ideally covers all the use cases for
>> fork() + do stuff + exec().  It would be nifty if this API also had a
>> way to say "add no_new_privs and therefore enable extra functionality
>> that doesn't work without no_new_privs".  This functionality would
>> include things like returning a future extra-privileged pidfd that
>> gives ptrace-like access.
>> 
>> As basic examples, the improved process creation API should take a
>> list of dup2() operations to perform, fds to remove the O_CLOEXEC
>flag
>> from, fds to close (or, maybe even better, a list of fds to *not*
>> close), a list of rlimit changes to make, a list of signal changes to
>> make, the ability to set sid, pgrp, uid, gid (as in
>> setresuid/setresgid), the ability to do capset() operations, etc. 
>The
>> posix_spawn() API, for all that it's rather complicated, covers a
>> bunch of the basics pretty well.
>
>The idea of a system call that takes an infinitely-extendable laundry
>list of operations to perform in kernel space seems quite inelegant, if
>only for the error-reporting reason.
>
>Instead, I suggest that what you'd want is a way to create a new
>embryonic process that has no address space and isn't yet schedulable.
>You then just need other-process-directed variants of all the normal
>setup functions - so pr_openat(pidfd, dirfd, pathname, flags, mode),
>pr_sigaction(pidfd, signum, act, oldact), pr_dup2(pidfd, oldfd, newfd)
>etc.
>
>Then when it's all set up you pr_execve() to kick it off.
>
>    - Kevin

I proposed a version of this a while back when we first started talking about this.

  reply	other threads:[~2019-04-20 11:15 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-14 20:14 [PATCH 0/4] clone: add CLONE_PIDFD Christian Brauner
2019-04-14 20:14 ` [PATCH 1/4] Make anon_inodes unconditional Christian Brauner
2019-04-14 20:14 ` [PATCH 2/4] clone: add CLONE_PIDFD Christian Brauner
2019-04-15 10:52   ` Oleg Nesterov
2019-04-15 11:42     ` Christian Brauner
2019-04-15 13:24       ` Oleg Nesterov
2019-04-15 13:52         ` Christian Brauner
2019-04-15 16:25           ` Joel Fernandes
2019-04-15 17:15         ` Jonathan Kowalski
2019-04-15 19:39           ` Daniel Colascione
2019-04-14 20:14 ` [PATCH 3/4] signal: support CLONE_PIDFD with pidfd_send_signal Christian Brauner
2019-04-14 20:14 ` [PATCH 4/4] samples: show race-free pidfd metadata access Christian Brauner
2019-04-15 10:08 ` RFC: on adding new CLONE_* flags [WAS Re: [PATCH 0/4] clone: add CLONE_PIDFD] Enrico Weigelt, metux IT consult
2019-04-15 15:50   ` Serge E. Hallyn
2019-04-16 18:32     ` Enrico Weigelt, metux IT consult
2019-04-29 15:49       ` Serge E. Hallyn
2019-04-29 17:31         ` Enrico Weigelt, metux IT consult
2019-05-05  2:32           ` Serge E. Hallyn
2019-04-15 19:59   ` Aleksa Sarai
2019-04-15 20:29     ` Andy Lutomirski
2019-04-15 21:27       ` Jonathan Kowalski
2019-04-15 23:58         ` Andy Lutomirski
2019-04-16 18:45       ` Enrico Weigelt, metux IT consult
2019-04-16 21:31         ` Andy Lutomirski
2019-04-17 12:03           ` Enrico Weigelt, metux IT consult
2019-04-17 12:54             ` Christian Brauner
2019-04-18 15:46               ` Enrico Weigelt, metux IT consult
2019-04-17 12:19       ` Florian Weimer
2019-04-17 16:46         ` Andy Lutomirski
2019-04-20  7:14       ` Kevin Easton
2019-04-20 11:15         ` Christian Brauner [this message]
2019-04-20 15:06         ` Daniel Colascione
2019-04-29 19:30         ` Jann Horn
2019-04-29 19:55           ` Jann Horn
2019-04-29 20:21             ` Linus Torvalds
2019-04-29 20:38               ` Florian Weimer
2019-04-29 20:51                 ` Christian Brauner
2019-04-29 21:31                 ` Linus Torvalds
2019-04-30  7:01                   ` Florian Weimer
2019-04-30  0:38               ` Jann Horn
2019-04-30  2:16                 ` Linus Torvalds
2019-04-30  8:21                   ` Florian Weimer
2019-04-30 16:19                     ` Linus Torvalds
2019-04-30 16:26                       ` Linus Torvalds
2019-04-30 17:07                         ` Florian Weimer
2019-04-30 12:39               ` Oleg Nesterov
2019-04-30 16:24                 ` Linus Torvalds
2019-04-29 20:49             ` Florian Weimer
2019-04-29 20:52               ` Christian Brauner
2019-04-20 15:28       ` Al Viro
2019-04-16 18:37     ` Enrico Weigelt, metux IT consult
2019-04-15 10:16 ` [PATCH 0/4] clone: add CLONE_PIDFD Enrico Weigelt, metux IT consult

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=D2514402-C7FA-4465-8B37-6C0579F4CD84@brauner.io \
    --to=christian@brauner.io \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=cyphar@cyphar.com \
    --cc=dancol@google.com \
    --cc=dhowells@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=jannh@google.com \
    --cc=joel@joelfernandes.org \
    --cc=keescook@chromium.org \
    --cc=kevin@guarana.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkml@metux.net \
    --cc=luto@kernel.org \
    --cc=mtk.manpages@gmail.com \
    --cc=oleg@redhat.com \
    --cc=serge@hallyn.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).