LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Andy Lutomirski <luto@amacapital.net>
To: Jann Horn <jannh@google.com>
Cc: Andy Lutomirski <luto@kernel.org>,
	Aleksa Sarai <cyphar@cyphar.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Jeff Layton <jlayton@kernel.org>,
	"J. Bruce Fields" <bfields@fieldses.org>,
	Arnd Bergmann <arnd@arndb.de>,
	David Howells <dhowells@redhat.com>,
	Eric Biederman <ebiederm@xmission.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Kees Cook <keescook@chromium.org>,
	Christian Brauner <christian@brauner.io>,
	Tycho Andersen <tycho@tycho.ws>,
	David Drysdale <drysdale@google.com>,
	Chanho Min <chanho.min@lge.com>, Oleg Nesterov <oleg@redhat.com>,
	Aleksa Sarai <asarai@suse.de>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Linux Containers <containers@lists.linux-foundation.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>,
	kernel list <linux-kernel@vger.kernel.org>,
	linux-arch <linux-arch@vger.kernel.org>
Subject: Re: [PATCH v6 5/6] binfmt_*: scope path resolution of interpreters
Date: Sat, 11 May 2019 10:00:47 -0700
Message-ID: <C60DC580-854D-478D-AF23-5F29FB7C3E50@amacapital.net> (raw)
In-Reply-To: <20190510225527.GA59914@google.com>



> On May 10, 2019, at 3:55 PM, Jann Horn <jannh@google.com> wrote:
> 
>> On Fri, May 10, 2019 at 02:20:23PM -0700, Andy Lutomirski wrote:
>>> On Fri, May 10, 2019 at 1:41 PM Jann Horn <jannh@google.com> wrote:
>>> 
>>>> On Tue, May 07, 2019 at 05:17:35AM +1000, Aleksa Sarai wrote:
>>>>> On 2019-05-06, Jann Horn <jannh@google.com> wrote:
>>>>> In my opinion, CVE-2019-5736 points out two different problems:
>>>>> 
>>>>> The big problem: The __ptrace_may_access() logic has a special-case
>>>>> short-circuit for "introspection" that you can't opt out of; this
>>>>> makes it possible to open things in procfs that are related to the
>>>>> current process even if the credentials of the process wouldn't permit
>>>>> accessing another process like it. I think the proper fix to deal with
>>>>> this would be to add a prctl() flag for "set whether introspection is
>>>>> allowed for this process", and if userspace has manually un-set that
>>>>> flag, any introspection special-case logic would be skipped.
>>>> 
>>>> We could do PR_SET_DUMPABLE=3 for this, I guess?
>>> 
>>> Hmm... I'd make it a new prctl() command, since introspection is
>>> somewhat orthogonal to dumpability. Also, dumpability is per-mm, and I
>>> think the introspection flag should be per-thread.
>> 
>> I've lost track of the context here, but it seems to me that
>> mitigating attacks involving accidental following of /proc links
>> shouldn't depend on dumpability.  What's the actual problem this is
>> trying to solve again?
> 
> The one actual security problem that I've seen related to this is
> CVE-2019-5736. There is a write-up of it at
> <https://blog.dragonsector.pl/2019/02/cve-2019-5736-escape-from-docker-and.html>
> under "Successful approach", but it goes more or less as follows:
> 
> A container is running that doesn't use user namespaces (because for
> some reason I don't understand, apparently some people still do that).
> An evil process is running inside the container with UID 0 (as in,
> GLOBAL_ROOT_UID); so if the evil process inside the container was able
> to reach root-owned files on the host filesystem, it could write into
> them.
> 
> The container engine wants to spawn a new process inside the container.
> It forks off a child that joins the container's namespaces (including
> PID and mount namespaces), and then the child calls execve() on some
> path in the container.

I think that, at this point, the task should be considered owned by the container.  Maybe we should have a better API than execve() to execute a program in a safer way, but fiddling with dumpability seems like a band-aid.  In fact, the process is arguably pwned even *before* execve.

A better “spawn” API should fix this.  In the mean time, I think it should be assumed that, if you join a container’s namespaces, you are at its mercy.

> The attacker replaces the executable in the container with a symlink
> to /proc/self/exe and replaces a library inside the container with a
> malicious one.

Cute.

> When the container engine calls execve(), intending to run an executable
> inside the container, it instead goes through ptrace_may_access() using
> the introspection short-circuit and re-executes its own executable
> through the jumped symlink /proc/self/exe (which is normally unreachable
> for the container). After the execve(), the process loads an evil
> library from inside the container and is under the control of the
> container.
> Now the container controls a process whose /proc/self/exe is a jumped
> symlink to a host executable, and the container can write into it.
> 
> Some container engines are now using an extremely ugly hack to work
> around this - whenever they want to enter a container, they copy the
> host binary into a new memfd and execute that to avoid exposing the
> original host binary to containers:
> <https://github.com/opencontainers/runc/commit/0a8e4117e7f715d5fbeef398405813ce8e88558b>
> 
> 
> In my opinion, the problems here are:
> 
> - Apparently some people run untrusted containers without user
>   namespaces. It would be really nice if people could not do that.
>   (Probably the biggest problem here.)

> - ptrace_may_access() has a short-circuit that permits a process to
>   unintentionally look at itself even if it has dropped privileges -
>   here, it permits the execve("/proc/self/exe", ...) that would
>   normally be blocked by the check for CAP_SYS_PTRACE if the process
>   is nondumpable.

I don’t see this as a problem.  Dumpable is about protecting a task from others, not about protecting a task against itself.

> - You can use /proc/*/exe to get a writable fd.

This is IMO the real bug.

  parent reply index

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-06 16:54 [PATCH v6 0/6] namei: resolveat(2) path resolution restriction API Aleksa Sarai
2019-05-06 16:54 ` [PATCH v6 1/6] namei: split out nd->dfd handling to dirfd_path_init Aleksa Sarai
2019-05-06 16:54 ` [PATCH v6 2/6] namei: O_BENEATH-style path resolution flags Aleksa Sarai
2019-05-06 16:54 ` [PATCH v6 3/6] namei: LOOKUP_IN_ROOT: chroot-like path resolution Aleksa Sarai
2019-05-06 16:54 ` [PATCH v6 4/6] namei: aggressively check for nd->root escape on ".." resolution Aleksa Sarai
2019-05-06 16:54 ` [PATCH v6 5/6] binfmt_*: scope path resolution of interpreters Aleksa Sarai
2019-05-06 18:37   ` Jann Horn
2019-05-06 19:17     ` Aleksa Sarai
2019-05-06 23:41       ` Andy Lutomirski
2019-05-08  0:54       ` Aleksa Sarai
2019-05-10 20:41       ` Jann Horn
2019-05-10 21:20         ` Andy Lutomirski
2019-05-10 22:55           ` Jann Horn
2019-05-10 23:36             ` Christian Brauner
2019-05-11 15:49               ` Aleksa Sarai
2019-05-11 17:00             ` Andy Lutomirski [this message]
2019-05-11 17:21               ` Linus Torvalds
2019-05-11 17:26                 ` Linus Torvalds
2019-05-11 17:31                   ` Aleksa Sarai
2019-05-11 17:43                     ` Linus Torvalds
2019-05-11 17:48                       ` Christian Brauner
2019-05-11 18:00                       ` Aleksa Sarai
2019-05-11 22:39                 ` Andy Lutomirski
     [not found]                   ` <CAHk-=wg3+3GfHsHdB4o78jNiPh_5ShrzxBuTN-Y8EZfiFMhCvw@mail.gmail.com>
2019-05-12 10:19                     ` Christian Brauner
     [not found]                     ` <9CD2B97D-A6BD-43BE-9040-B410D996A195@amacapital.net>
2019-05-12 10:44                       ` Linus Torvalds
2019-05-12 13:35                         ` Aleksa Sarai
2019-05-12 13:38                           ` Aleksa Sarai
2019-05-12 14:34                           ` Andy Lutomirski
2019-05-11 17:26               ` Aleksa Sarai
2019-05-08  0:38     ` Eric W. Biederman
2019-05-10 20:10       ` Jann Horn
2019-05-06 16:54 ` [PATCH v6 6/6] namei: resolveat(2) syscall Aleksa Sarai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=C60DC580-854D-478D-AF23-5F29FB7C3E50@amacapital.net \
    --to=luto@amacapital.net \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=asarai@suse.de \
    --cc=ast@kernel.org \
    --cc=bfields@fieldses.org \
    --cc=chanho.min@lge.com \
    --cc=christian@brauner.io \
    --cc=containers@lists.linux-foundation.org \
    --cc=cyphar@cyphar.com \
    --cc=dhowells@redhat.com \
    --cc=drysdale@google.com \
    --cc=ebiederm@xmission.com \
    --cc=jannh@google.com \
    --cc=jlayton@kernel.org \
    --cc=keescook@chromium.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=oleg@redhat.com \
    --cc=torvalds@linux-foundation.org \
    --cc=tycho@tycho.ws \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git
	git clone --mirror https://lore.kernel.org/lkml/10 lkml/git/10.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git