Potential issues (security and otherwise) with the current cgroup-bpf API

From: Andy Lutomirski <luto@kernel.org>
To: "Daniel Mack" <daniel@zonque.org>,
	"Alexei Starovoitov" <alexei.starovoitov@gmail.com>,
	"Mickaël Salaün" <mic@digikod.net>,
	"Kees Cook" <keescook@chromium.org>, "Jann Horn" <jann@thejh.net>,
	"Tejun Heo" <tj@kernel.org>, "David Ahern" <dsahern@gmail.com>,
	"David S. Miller" <davem@davemloft.net>,
	"Thomas Graf" <tgraf@suug.ch>,
	"Michael Kerrisk" <mtk.manpages@gmail.com>,
	"Peter Zijlstra" <peterz@infradead.org>
Cc: Linux API <linux-api@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Network Development <netdev@vger.kernel.org>
Subject: Potential issues (security and otherwise) with the current cgroup-bpf API
Date: Sat, 17 Dec 2016 10:18:44 -0800	[thread overview]
Message-ID: <CALCETrV81oFwq2AgeRsN54HA1jR=b5cOZfAgve8H8zhx83DTyA@mail.gmail.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 4679 bytes --]

Hi all-

I apologize for being rather late with this.  I didn't realize that
cgroup-bpf was going to be submitted for Linux 4.10, and I didn't see
it on the linux-api list, so I missed the discussion.

I think that the inet ingress, egress etc filters are a neat feature,
but I think the API has some issues that will bite us down the road
if it becomes stable in its current form.

Most of the problems I see are summarized in this transcript:

# mkdir cg2
# mount -t cgroup2 none cg2
# mkdir cg2/nosockets
# strace cgrp_socket_rule cg2/nosockets/ 0
...
open("cg2/nosockets/", O_RDONLY|O_DIRECTORY) = 3

^^^^ You can modify a cgroup after opening it O_RDONLY?

bpf(BPF_PROG_LOAD, {prog_type=0x9 /* BPF_PROG_TYPE_??? */, insn_cnt=2,
insns=0x7fffe3568c10, license="GPL", log_level=1, log_size=262144,
log_buf=0x6020c0, kern_version=0}, 48) = 4

^^^^ This is fine.  The bpf() syscall manipulates bpf objects.

bpf(0x8 /* BPF_??? */, 0x7fffe3568bf0, 48) = 0

^^^^ This is not so good:
^^^^
^^^^ a) The bpf() syscall is supposed to manipulate bpf objects.  This
^^^^    is manipulating a cgroup.  There's no reason that a socket creation
^^^^    filter couldn't be written in a different language (new iptables
^^^^    table?  Simple list of address families?), but if that happened,
^^^^    then using bpf() to install it would be entirely nonsensical.
^^^^
^^^^ b) This is starting to be an excessively ugly multiplexer.  Among
^^^^    other things, it's very unfriendly to seccomp.

# echo $$ >cg2/nosockets/cgroup.procs
# ping 127.0.0.1
ping: socket: Operation not permitted
# ls cg2/nosockets/
cgroup.controllers  cgroup.events  cgroup.procs  cgroup.subtree_control
# cat cg2/nosockets/cgroup.controllers

^^^^ Something in cgroupfs should give an indication that this cgroup
^^^^ filters socket creation, but there's nothing there.  You should also
^^^^ be able to turn the filter off from cgroupfs.

# mkdir cg2/nosockets/sockets
# /home/luto/apps/linux/samples/bpf/cgrp_socket_rule cg2/nosockets/sockets/ 1

^^^^ This succeeded, which means that, if this feature is enabled in 4.10,
^^^^ then we're stuck with its semantics.  If it returned -EINVAL instead,
^^^^ there would be a chance to refine it.

# echo $$ >cg2/nosockets/sockets/cgroup.procs
# ping 127.0.0.1
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.029 ms
^C
--- 127.0.0.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.029/0.029/0.029/0.000 ms

^^^^ Bash was inside a cgroup that disallowed socket creation, but socket
^^^^ creation wasn't disallowed.  This means that the obvious use of socket
^^^^ creation filters in nestable constainers fails insecurely.

There's also a subtle but nasty potential security problem here.
In 4.9 and before, cgroups has only one real effect in the kernel:
resource control. A process in a malicious cgroup could be DoSed,
but that was about the extent of the damage that a malicious cgroup
could do.

In 4.10 with With CONFIG_CGROUP_BPF=y, a cgroup can have bpf
programs attached that can do things if various events occur. (Right
now, this means socket operations, but there are plans in the works
to do this for LSM hooks too.) These bpf programs can say yes or no,
but they can also read out various data (including socket payloads!)
and save them away where an attacker can find them. This sounds a
lot like seccomp with a narrower scope but a much stronger ability to
exfiltrate private information.

Unfortunately, while seccomp is very, very careful to prevent
injection of a privileged victim into a malicious sandbox, the
CGROUP_BPF mechanism appears to have no real security model. There
is nothing to prevent a program that's in a malicious cgroup from
running a setuid binary, and there is nothing to prevent a program
that has the ability to move itself or another program into a
malicious cgroup from doing so and then, if needed for exploitation,
exec a setuid binary.

This isn't much of a problem yet because you currently need
CAP_NET_ADMIN to create a malicious sandbox in the first place.  I'm
sure that, in the near future, someone will want to make this stuff
work in containers with delegated cgroup hierarchies, and then there
may be a real problem here.

I've included a few security people on this thread.  The current API
looks abusable, and it would be nice to find all the holes before
4.10 comes out.

(The cgrp_socket_rule source is attached.  You can build it by sticking it
 in samples/bpf and doing:

 $ make headers_install
 $ cd samples/bpf
 $ gcc -o cgrp_socket_rule cgrp_socket_rule.c libbpf.c -I../../usr/include
)

--Andy

[-- Attachment #2: cgrp_socket_rule.c --]
[-- Type: text/x-csrc, Size: 1545 bytes --]

/* eBPF example program:
 *
 * - Loads eBPF program
 *
 *   The eBPF program sets the sk_bound_dev_if index in new AF_INET{6}
 *   sockets opened by processes in the cgroup.
 *
 * - Attaches the new program to a cgroup using BPF_PROG_ATTACH
 */

#define _GNU_SOURCE

#include <stdio.h>
#include <stdlib.h>
#include <stddef.h>
#include <string.h>
#include <unistd.h>
#include <assert.h>
#include <errno.h>
#include <fcntl.h>
#include <net/if.h>
#include <linux/bpf.h>

#include "libbpf.h"

static int prog_load(int value)
{
	struct bpf_insn prog[] = {
		BPF_MOV64_IMM(BPF_REG_0, value), /* r0 = verdict */
		BPF_EXIT_INSN(),
	};

	return bpf_prog_load(BPF_PROG_TYPE_CGROUP_SOCK, prog, sizeof(prog),
			     "GPL", 0);
}

static int usage(const char *argv0)
{
	printf("Usage: %s cg-path value\n", argv0);
	return EXIT_FAILURE;
}

int main(int argc, char **argv)
{
	int cg_fd, prog_fd, value, ret;

	if (argc < 2)
		return usage(argv[0]);

	cg_fd = open(argv[1], O_DIRECTORY | O_RDONLY);
	if (cg_fd < 0) {
		printf("Failed to open cgroup path: '%s'\n", strerror(errno));
		return EXIT_FAILURE;
	}

	value = atoi(argv[2]);

	prog_fd = prog_load(value);
	/* printf("Output from kernel verifier:\n%s\n-------\n", bpf_log_buf); */

	if (prog_fd < 0) {
		printf("Failed to load prog: '%s'\n", strerror(errno));
		return EXIT_FAILURE;
	}

	ret = bpf_prog_attach(prog_fd, cg_fd, BPF_CGROUP_INET_SOCK_CREATE);
	if (ret < 0) {
		printf("Failed to attach prog to cgroup: '%s'\n",
		       strerror(errno));
		return EXIT_FAILURE;
	}

	return EXIT_SUCCESS;
}