From: "Serge E. Hallyn" <serge@hallyn.com>
To: Christian Brauner <christian@brauner.io>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
Daniel Colascione <dancol@google.com>,
Aleksa Sarai <cyphar@cyphar.com>,
linux-kernel <linux-kernel@vger.kernel.org>,
"Serge E. Hallyn" <serge@hallyn.com>,
Jann Horn <jannh@google.com>, Andy Lutomirski <luto@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Oleg Nesterov <oleg@redhat.com>,
Al Viro <viro@zeniv.linux.org.uk>,
Linux FS Devel <linux-fsdevel@vger.kernel.org>,
Linux API <linux-api@vger.kernel.org>,
Tim Murray <timmurray@google.com>,
linux-man <linux-man@vger.kernel.org>,
Kees Cook <keescook@chromium.org>
Subject: Re: [PATCH v1 2/2] signal: add procfd_signal() syscall
Date: Wed, 21 Nov 2018 15:39:46 -0600 [thread overview]
Message-ID: <20181121213946.GA10795@mail.hallyn.com> (raw)
In-Reply-To: <20181120103111.etlqp7zop34v6nv4@brauner.io>
On Tue, Nov 20, 2018 at 11:31:13AM +0100, Christian Brauner wrote:
> On Mon, Nov 19, 2018 at 10:59:12PM -0600, Eric W. Biederman wrote:
> > Daniel Colascione <dancol@google.com> writes:
> >
> > > On Mon, Nov 19, 2018 at 1:37 PM Christian Brauner <christian@brauner.io> wrote:
> > >>
> > >> On Mon, Nov 19, 2018 at 01:26:22PM -0800, Daniel Colascione wrote:
> > >> > On Mon, Nov 19, 2018 at 1:21 PM, Christian Brauner <christian@brauner.io> wrote:
> > >> > > That can be done without a loop by comparing the level counter for the
> > >> > > two pid namespaces.
> > >> > >
> > >> > >>
> > >> > >> And you can rewrite pidns_get_parent to use it. So you would instead be
> > >> > >> doing:
> > >> > >>
> > >> > >> if (pidns_is_descendant(proc_pid_ns, task_active_pid_ns(current)))
> > >> > >> return -EPERM;
> > >> > >>
> > >> > >> (Or you can just copy the 5-line loop into procfd_signal -- though I
> > >> > >> imagine we'll need this for all of the procfd_* APIs.)
> > >> >
> > >> > Why is any of this even necessary? Why does the child namespace we're
> > >> > considering even have a file descriptor to its ancestor's procfs? If
> > >>
> > >> Because you can send file descriptors between processes and container
> > >> runtimes tend to do that.
> > >
> > > Right. But why *would* a container runtime send one of these procfs
> > > FDs to a container?
> > >
> > >> > it has one of these FDs, it can already *read* all sorts of
> > >> > information it really shouldn't be able to acquire, so the additional
> > >> > ability to send a signal (subject to the usual permission checks)
> > >> > feels like sticking a finger in a dike that's already well-perforated.
> > >> > IMHO, we shouldn't bother with this check. The patch would be simpler
> > >> > without it.
> > >>
> > >> We will definitely not allow signaling processes in an ancestor pid
> > >> namespace! That is a security issue! I can imagine container runtimes
> > >> killing their monitoring process etc. pp. Not happening, unless someone
> > >> with deep expertise in signals can convince me otherwise.
> > >
> > > If parent namespace procfs FDs or mounts really can leak into child
> > > namespaces as easily as Aleksa says, then I don't mind adding the
> > > check. I was under the impression that if you find yourself in this
> > > situation, you already have a big problem.
> >
> > There is one big reason to have the check, and I have not seen it
> > mentioned yet in this thread.
> >
> > When SI_USER is set we report the pid of the sender of the signal in
> > si_pid. When the signal comes from the kernel si_pid == 0. When signal
> > is sent from an ancestor pid namespace si_pid also equals 0 (which is
> > reasonable).
> >
> > A signal out to a process in a parent pid namespace such as SIGCHLD is
> > reasonable as we can map the pid. I really don't see the point of
> > forbidding that. From the perspective of the process in the parent pid
> > namespace it is just another process in it's pid namespace. So it
> > should pose no problem from the perspective of the receiving process.
> >
> > A signal to a process in a pid namespace that is neither a parent nor a
> > descendent pid namespace would be a problem, as there is no well defined
> > notion of what si_pid should be set to. So for that case perhaps we
> > should have something like a noprocess pid that we can set. Perhaps we
> > could set si_pid to 0xffffffff. That would take a small extension to
> > pid_nr_ns.
> >
> > File descriptors are not namespaced. It is completely legitimate to use
> > file descriptors to get around limitations of namespaces.
>
> Frankly, I don't see a good argument for why we would allow that even if
> safe. I have not heard a legitimate use-case or need for this.
> At this point I care about very simple semantics. Being able to signal
> into ancestor pid namespaces and cousin namespaces is interesting but
> makes the syscall more brittle and harder to understand.
Yeah, I'm with you on that. We can always open that door later if a good
use case comes up, but I prefer simple at first.
> Changing pid_nr_ns() might be the solution but this function is called
> all over the place in the kernel and I'm not going to risk breaking
> something by changing it for a feature that no one so far has ever
> asked for.
> If you are ok with this then we should hold off on this. We can always
> add this feature later by removing the check when someone has a use-case
> for it.
> I'll send a v2 of the patch that keeps the restriction for now. If you
> insist on it being removed we can make the change in a follow-up
> iteration.
>
> Christian
>
> >
> > Adding limitations to a file descriptor based api because someone else
> > can't set up their processes in such a way as to get the restrictions
> > they are looking for seems very sad.
> >
> > Frankly I think it is one of the better features of namespaces that we
> > have to carefully handle and define these cases so that when the
> > inevitable leaks happen you are not immediately in a world of hurt. All
> > of the other permission checks etc continue to do their job. Plus you
> > are prepared for the case when someone wants their containers to have an
> > interesting communication primitive.
> >
> > Eric
> >
> >
> >
> >
next prev parent reply other threads:[~2018-11-21 21:39 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-19 10:32 [PATCH v1 0/2] proc: allow signaling processes via file descriptors Christian Brauner
2018-11-19 10:32 ` [PATCH v1 1/2] proc: get process file descriptor from /proc/<pid> Christian Brauner
2018-11-19 15:32 ` Andy Lutomirski
2018-11-19 18:20 ` Christian Brauner
2018-11-19 10:32 ` [PATCH v1 2/2] signal: add procfd_signal() syscall Christian Brauner
2018-11-19 15:45 ` Andy Lutomirski
2018-11-19 15:57 ` Daniel Colascione
2018-11-19 18:39 ` Christian Brauner
2018-11-19 15:59 ` Daniel Colascione
2018-11-19 18:29 ` Christian Brauner
2018-11-19 19:02 ` Eric W. Biederman
2018-11-19 19:31 ` Christian Brauner
2018-11-19 19:39 ` Daniel Colascione
2018-11-19 17:10 ` Eugene Syromiatnikov
2018-11-19 18:23 ` Christian Brauner
2018-11-19 17:14 ` Eugene Syromiatnikov
2018-11-19 20:28 ` Aleksa Sarai
2018-11-19 20:55 ` Christian Brauner
2018-11-19 21:13 ` Christian Brauner
2018-11-19 21:18 ` Aleksa Sarai
2018-11-19 21:20 ` Christian Brauner
2018-11-19 21:21 ` Christian Brauner
2018-11-19 21:25 ` Aleksa Sarai
2018-11-19 21:26 ` Daniel Colascione
2018-11-19 21:36 ` Aleksa Sarai
2018-11-19 21:37 ` Christian Brauner
2018-11-19 21:41 ` Daniel Colascione
2018-11-20 4:59 ` Eric W. Biederman
2018-11-20 10:31 ` Christian Brauner
2018-11-21 21:39 ` Serge E. Hallyn [this message]
2018-11-19 21:23 ` Aleksa Sarai
2018-11-22 7:41 ` Serge E. Hallyn
2018-11-19 22:39 ` Tycho Andersen
2018-11-19 22:49 ` Daniel Colascione
2018-11-19 23:07 ` Tycho Andersen
2018-11-20 0:27 ` Andy Lutomirski
2018-11-20 0:32 ` Christian Brauner
2018-11-20 0:34 ` Andy Lutomirski
2018-11-20 0:49 ` Daniel Colascione
2018-11-22 7:48 ` Serge E. Hallyn
2018-11-19 23:35 ` kbuild test robot
2018-11-19 23:37 ` kbuild test robot
2018-11-19 23:45 ` Christian Brauner
2018-11-28 21:45 ` Joey Pabalinas
2018-11-28 22:05 ` Christian Brauner
2018-11-28 23:02 ` Joey Pabalinas
2018-11-19 10:32 ` [PATCH] procfd_signal.2: document procfd_signal syscall Christian Brauner
2018-11-20 13:29 ` Michael Kerrisk (man-pages)
2018-11-28 20:59 ` Florian Weimer
2018-11-28 21:12 ` Christian Brauner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181121213946.GA10795@mail.hallyn.com \
--to=serge@hallyn.com \
--cc=akpm@linux-foundation.org \
--cc=christian@brauner.io \
--cc=cyphar@cyphar.com \
--cc=dancol@google.com \
--cc=ebiederm@xmission.com \
--cc=jannh@google.com \
--cc=keescook@chromium.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-man@vger.kernel.org \
--cc=luto@kernel.org \
--cc=oleg@redhat.com \
--cc=timmurray@google.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).