All of lore.kernel.org
 help / color / mirror / Atom feed
From: Aleksa Sarai <cyphar@cyphar.com>
To: Daniel Colascione <dancol@google.com>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
	Tim Murray <timmurray@google.com>,
	Joel Fernandes <joelaf@google.com>,
	Suren Baghdasaryan <surenb@google.com>
Subject: Re: [RFC PATCH] Implement /proc/pid/kill
Date: Wed, 31 Oct 2018 07:45:01 +1100	[thread overview]
Message-ID: <20181030204501.jnbe7dyqui47hd2x@yavin> (raw)
In-Reply-To: <CAKOZuesKVic-5epRj8XcFNR_63SgvaPcbjDQuSLxNrEfS=O=Jg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3404 bytes --]

On 2018-10-30, Daniel Colascione <dancol@google.com> wrote:
> >> Add a simple proc-based kill interface. To use /proc/pid/kill, just
> >> write the signal number in base-10 ASCII to the kill file of the
> >> process to be killed: for example, 'echo 9 > /proc/$$/kill'.
> >>
> >> Semantically, /proc/pid/kill works like kill(2), except that the
> >> process ID comes from the proc filesystem context instead of from an
> >> explicit system call parameter. This way, it's possible to avoid races
> >> between inspecting some aspect of a process and that process's PID
> >> being reused for some other process.
> >
> > (Aside from any UX concerns other folks might have.)
> >
> > I think it would be a good idea to (at least temporarily) restrict this
> > so that only processes that are in the same PID namespace as the /proc
> > being resolved through may use this interface. Otherwise you might have
> > cases where partial container breakouts can start sending signals to
> > PIDs they wouldn't normally be able to address.
> 
> That's a good idea.

(Oh and I wonder how this interacts with SELinux/AppArmor signal
mediation.)

> > (Unfortunately
> > there are lots of things that make it a bit difficult to use /proc/$pid
> > exclusively for introspection of a process -- especially in the context
> > of containers.)
> 
> Tons of things already break without a working /proc. What do you have in mind?

Heh, if only that was the only blocker. :P

The basic problem is that currently container runtimes either depend on
some non-transient on-disk state (which becomes invalid on machine
reboots or dead processes and so on), or on long-running processes that
keep file descriptors required for administration of a container alive
(think O_PATH to /dev/pts/ptmx to avoid malicious container filesystem
attacks). Usually both.

What would be really useful would be having some way of "hiding away" a
mount namespace (of the pid1 of the container) that has all of the
information and bind-mounts-to-file-descriptors that are necessary for
administration. If the container's pid1 dies all of the transient state
has disappeared automatically -- because the stashed mount namespace has
died. In addition, if this was done the way I'm thinking with (and this
is the contentious bit) hierarchical mount namespaces you could make it
so that the pid1 could not manipulate its current mount namespace to
confuse the administrative process. You would also then create an
intermediate user namespace to help with several race conditions (that
have caused security bugs like CVE-2016-9962) we've seen when joining
containers.

Unfortunately this all depends on hierarchical mount namespaces (and
note that this would just be that NS_GET_PARENT gives you the mount
namespace that it was created in -- I'm not suggesting we redesign peers
or anything like that). This makes it basically a non-starter.

But if, on top of this ground-work, we then referenced containers
entirely via an fd to /proc/$pid then you could also avoid PID reuse
races (as well as being able to find out implicitly whether a container
has died thanks to the error semantics of /proc/$pid). And that's the
way I would suggest doing it (if we had these other things in place).

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2018-10-30 20:45 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-29 22:10 [RFC PATCH] Implement /proc/pid/kill Daniel Colascione
2018-10-30  3:21 ` Joel Fernandes
2018-10-30  8:50   ` Daniel Colascione
2018-10-30 10:39     ` Christian Brauner
2018-10-30 10:40       ` Christian Brauner
2018-10-30 10:48         ` Daniel Colascione
2018-10-30 11:04           ` Christian Brauner
2018-10-30 11:12             ` Daniel Colascione
2018-10-30 11:19               ` Christian Brauner
2018-10-31  5:00                 ` Eric W. Biederman
2018-10-30 17:01     ` Joel Fernandes
2018-10-30  5:00 ` Aleksa Sarai
2018-10-30  9:05   ` Daniel Colascione
2018-10-30 20:45     ` Aleksa Sarai [this message]
2018-10-30 21:42       ` Joel Fernandes
2018-10-30 22:23         ` Aleksa Sarai
2018-10-30 22:33           ` Joel Fernandes
2018-10-30 22:49             ` Aleksa Sarai
2018-10-31  0:42               ` Joel Fernandes
2018-10-31  1:59                 ` Daniel Colascione
2018-10-30 23:10             ` Daniel Colascione
2018-10-30 23:23               ` Christian Brauner
2018-10-30 23:55                 ` Daniel Colascione
2018-10-31  2:56                 ` Aleksa Sarai
2018-10-31  4:24                   ` Joel Fernandes
2018-11-01 20:40                     ` Joel Fernandes
2018-11-02  9:46                       ` Christian Brauner
2018-11-02 14:34                         ` Serge E. Hallyn
2018-10-31  0:57               ` Joel Fernandes
2018-10-31  1:56                 ` Daniel Colascione
2018-10-31  4:47   ` Eric W. Biederman
2018-10-31  4:44 ` Eric W. Biederman
2018-10-31 12:44   ` Oleg Nesterov
2018-10-31 13:27     ` Daniel Colascione
2018-10-31 15:10       ` Oleg Nesterov
2018-10-31 15:16         ` Daniel Colascione
2018-10-31 15:49           ` Oleg Nesterov
2018-11-01 11:53       ` David Laight
2018-11-01 15:50         ` Daniel Colascione
2018-10-31 14:37 ` [PATCH v2] " Daniel Colascione
2018-10-31 15:05   ` Joel Fernandes
2018-10-31 17:33     ` Aleksa Sarai
2018-10-31 21:47       ` Joel Fernandes
2018-10-31 15:59 ` [PATCH v3] " Daniel Colascione
2018-10-31 17:54   ` Tycho Andersen
2018-10-31 18:00     ` Daniel Colascione
2018-10-31 18:17       ` Tycho Andersen
2018-10-31 19:33         ` Daniel Colascione
2018-10-31 20:06           ` Tycho Andersen
2018-11-01 11:33           ` David Laight
2018-11-12  1:19             ` Eric W. Biederman
2018-10-31 16:22 ` [RFC PATCH] " Jann Horn
2018-11-01  4:53   ` Andy Lutomirski
2018-11-12 23:13 ` Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181030204501.jnbe7dyqui47hd2x@yavin \
    --to=cyphar@cyphar.com \
    --cc=dancol@google.com \
    --cc=joelaf@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=surenb@google.com \
    --cc=timmurray@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.