linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/2] bpf: unprivileged
@ 2015-10-05 20:48 Alexei Starovoitov
  2015-10-05 20:48 ` [PATCH net-next 1/2] bpf: enable non-root eBPF programs Alexei Starovoitov
  2015-10-05 20:48 ` [PATCH net-next 2/2] bpf: charge user for creation of BPF maps and programs Alexei Starovoitov
  0 siblings, 2 replies; 30+ messages in thread
From: Alexei Starovoitov @ 2015-10-05 20:48 UTC (permalink / raw)
  To: David S. Miller
  Cc: Andy Lutomirski, Ingo Molnar, Hannes Frederic Sowa, Eric Dumazet,
	Daniel Borkmann, Kees Cook, linux-api, netdev, linux-kernel

I think it is time to liberate eBPF from CAP_SYS_ADMIN.
As was discussed when eBPF was first introduced two years ago
the only piece missing in eBPF verifier is 'pointer leak detection'
to make it available to non-root users.
Patch 1 adds this pointer analysis.
The eBPF programs, obviously, need to see and operate on kernel addresses,
but with these extra checks they won't be able to pass these addresses
to user space.
Patch 2 adds accounting of kernel memory used by programs and maps.
It changes behavoir for existing root users, but I think it needs
to be done consistently for both root and non-root, since today
programs and maps are only limited by number of open FDs (RLIMIT_NOFILE).
Patch 2 accounts program's and map's kernel memory as RLIMIT_MEMLOCK.

Unprivileged eBPF is only meaningful for 'socket filter'-like programs.
eBPF programs for tracing and TC classifiers/actions will stay root only.

In parallel the bpf fuzzing effort is ongoing and so far
we've found only one verifier bug and that was already fixed.
The 'constant blinding' pass also being worked on.
It will obfuscate constant-like values that are part of eBPF ISA
to make jit spraying attacks even harder.

Alexei Starovoitov (2):
  bpf: enable non-root eBPF programs
  bpf: charge user for creation of BPF maps and programs

 include/linux/bpf.h         |    6 ++
 include/linux/sched.h       |    2 +-
 kernel/bpf/arraymap.c       |    2 +-
 kernel/bpf/hashtab.c        |    4 +
 kernel/bpf/syscall.c        |   74 +++++++++++++-
 kernel/bpf/verifier.c       |  114 +++++++++++++++++++--
 kernel/sysctl.c             |   10 ++
 kernel/trace/bpf_trace.c    |    3 +
 samples/bpf/libbpf.h        |    8 ++
 samples/bpf/test_verifier.c |  239 ++++++++++++++++++++++++++++++++++++++++++-
 10 files changed, 443 insertions(+), 19 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH net-next 1/2] bpf: enable non-root eBPF programs
  2015-10-05 20:48 [PATCH net-next 0/2] bpf: unprivileged Alexei Starovoitov
@ 2015-10-05 20:48 ` Alexei Starovoitov
  2015-10-05 21:00   ` Kees Cook
                     ` (2 more replies)
  2015-10-05 20:48 ` [PATCH net-next 2/2] bpf: charge user for creation of BPF maps and programs Alexei Starovoitov
  1 sibling, 3 replies; 30+ messages in thread
From: Alexei Starovoitov @ 2015-10-05 20:48 UTC (permalink / raw)
  To: David S. Miller
  Cc: Andy Lutomirski, Ingo Molnar, Hannes Frederic Sowa, Eric Dumazet,
	Daniel Borkmann, Kees Cook, linux-api, netdev, linux-kernel

In order to let unprivileged users load and execute eBPF programs
teach verifier to prevent pointer leaks.
Verifier will prevent
- any arithmetic on pointers
  (except R10+Imm which is used to compute stack addresses)
- comparison of pointers
- passing pointers to helper functions
- indirectly passing pointers in stack to helper functions
- returning pointer from bpf program
- storing pointers into ctx or maps

Spill/fill of pointers into stack is allowed, but mangling
of pointers stored in the stack or reading them byte by byte is not.

Within bpf programs the pointers do exist, since programs need to
be able to access maps, pass skb pointer to LD_ABS insns, etc
but programs cannot pass such pointer values to the outside
or obfuscate them.

Only allow BPF_PROG_TYPE_SOCKET_FILTER unprivileged programs,
so that socket filters (tcpdump), af_packet (quic acceleration)
and future kcm can use it.
tracing and tc cls/act program types still require root permissions,
since tracing actually needs to be able to see all kernel pointers
and tc is for root only.

For example, the following unprivileged socket filter program is allowed:
int foo(struct __sk_buff *skb)
{
  char fmt[] = "hello %d\n";
  bpf_trace_printk(fmt, sizeof(fmt), skb->len);
  return 0;
}

but the following program is not:
int foo(struct __sk_buff *skb)
{
  char fmt[] = "hello %p\n";
  bpf_trace_printk(fmt, sizeof(fmt), fmt);
  return 0;
}
since it would leak the kernel stack address via bpf_trace_printk().

Unprivileged socket filter bpf programs have access to the
following helper functions:
- map lookup/update/delete (but they cannot store kernel pointers into them)
- get_random (it's already exposed to unprivileged user space)
- get_smp_processor_id
- tail_call into another socket filter program
- ktime_get_ns
- bpf_trace_printk (for debugging)

The feature is controlled by sysctl kernel.bpf_enable_unprivileged
which is off by default.

New tests were added to test_verifier:
 unpriv: return pointer OK
 unpriv: add const to pointer OK
 unpriv: add pointer to pointer OK
 unpriv: neg pointer OK
 unpriv: cmp pointer with const OK
 unpriv: cmp pointer with pointer OK
 unpriv: pass pointer to printk OK
 unpriv: pass pointer to helper function OK
 unpriv: indirectly pass pointer on stack to helper function OK
 unpriv: mangle pointer on stack 1 OK
 unpriv: mangle pointer on stack 2 OK
 unpriv: read pointer from stack in small chunks OK
 unpriv: write pointer into ctx OK
 unpriv: write pointer into map elem value OK
 unpriv: partial copy of pointer OK

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 include/linux/bpf.h         |    3 +
 kernel/bpf/syscall.c        |   11 +-
 kernel/bpf/verifier.c       |  114 +++++++++++++++++++--
 kernel/sysctl.c             |   10 ++
 kernel/trace/bpf_trace.c    |    3 +
 samples/bpf/libbpf.h        |    8 ++
 samples/bpf/test_verifier.c |  239 ++++++++++++++++++++++++++++++++++++++++++-
 7 files changed, 371 insertions(+), 17 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f57d7fed9ec3..acf97d66b681 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -66,6 +66,7 @@ enum bpf_arg_type {
 
 	ARG_PTR_TO_CTX,		/* pointer to context */
 	ARG_ANYTHING,		/* any (initialized) argument is ok */
+	ARG_VARARG,		/* optional argument */
 };
 
 /* type of values returned from helper functions */
@@ -168,6 +169,8 @@ void bpf_prog_put_rcu(struct bpf_prog *prog);
 struct bpf_map *bpf_map_get(struct fd f);
 void bpf_map_put(struct bpf_map *map);
 
+extern int sysctl_bpf_enable_unprivileged;
+
 /* verify correctness of eBPF program */
 int bpf_check(struct bpf_prog **fp, union bpf_attr *attr);
 #else
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 5f35f420c12f..9a2098da2da9 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -18,6 +18,8 @@
 #include <linux/filter.h>
 #include <linux/version.h>
 
+int sysctl_bpf_enable_unprivileged __read_mostly;
+
 static LIST_HEAD(bpf_map_types);
 
 static struct bpf_map *find_and_alloc_map(union bpf_attr *attr)
@@ -542,6 +544,9 @@ static int bpf_prog_load(union bpf_attr *attr)
 	    attr->kern_version != LINUX_VERSION_CODE)
 		return -EINVAL;
 
+	if (type != BPF_PROG_TYPE_SOCKET_FILTER && !capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
 	/* plain bpf_prog allocation */
 	prog = bpf_prog_alloc(bpf_prog_size(attr->insn_cnt), GFP_USER);
 	if (!prog)
@@ -597,11 +602,7 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
 	union bpf_attr attr = {};
 	int err;
 
-	/* the syscall is limited to root temporarily. This restriction will be
-	 * lifted when security audit is clean. Note that eBPF+tracing must have
-	 * this restriction, since it may pass kernel data to user space
-	 */
-	if (!capable(CAP_SYS_ADMIN))
+	if (!capable(CAP_SYS_ADMIN) && !sysctl_bpf_enable_unprivileged)
 		return -EPERM;
 
 	if (!access_ok(VERIFY_READ, uattr, 1))
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index b074b23000d6..dccaeeeb1128 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -199,6 +199,7 @@ struct verifier_env {
 	struct verifier_state_list **explored_states; /* search pruning optimization */
 	struct bpf_map *used_maps[MAX_USED_MAPS]; /* array of map's used by eBPF program */
 	u32 used_map_cnt;		/* number of used maps */
+	bool allow_ptr_leaks;
 };
 
 /* verbose verifier prints what it's seeing
@@ -538,6 +539,21 @@ static int bpf_size_to_bytes(int bpf_size)
 		return -EINVAL;
 }
 
+static bool is_spillable_regtype(enum bpf_reg_type type)
+{
+	switch (type) {
+	case PTR_TO_MAP_VALUE:
+	case PTR_TO_MAP_VALUE_OR_NULL:
+	case PTR_TO_STACK:
+	case PTR_TO_CTX:
+	case FRAME_PTR:
+	case CONST_PTR_TO_MAP:
+		return true;
+	default:
+		return false;
+	}
+}
+
 /* check_stack_read/write functions track spill/fill of registers,
  * stack boundary and alignment are checked in check_mem_access()
  */
@@ -550,9 +566,7 @@ static int check_stack_write(struct verifier_state *state, int off, int size,
 	 */
 
 	if (value_regno >= 0 &&
-	    (state->regs[value_regno].type == PTR_TO_MAP_VALUE ||
-	     state->regs[value_regno].type == PTR_TO_STACK ||
-	     state->regs[value_regno].type == PTR_TO_CTX)) {
+	    is_spillable_regtype(state->regs[value_regno].type)) {
 
 		/* register containing pointer is being spilled into stack */
 		if (size != BPF_REG_SIZE) {
@@ -643,6 +657,20 @@ static int check_ctx_access(struct verifier_env *env, int off, int size,
 	return -EACCES;
 }
 
+static bool is_pointer_value(struct verifier_env *env, int regno)
+{
+	if (env->allow_ptr_leaks)
+		return false;
+
+	switch (env->cur_state.regs[regno].type) {
+	case UNKNOWN_VALUE:
+	case CONST_IMM:
+		return false;
+	default:
+		return true;
+	}
+}
+
 /* check whether memory at (regno + off) is accessible for t = (read | write)
  * if t==write, value_regno is a register which value is stored into memory
  * if t==read, value_regno is a register which will receive the value from memory
@@ -669,11 +697,21 @@ static int check_mem_access(struct verifier_env *env, u32 regno, int off,
 	}
 
 	if (state->regs[regno].type == PTR_TO_MAP_VALUE) {
+		if (t == BPF_WRITE && value_regno >= 0 &&
+		    is_pointer_value(env, value_regno)) {
+			verbose("R%d leaks addr into map\n", value_regno);
+			return -EACCES;
+		}
 		err = check_map_access(env, regno, off, size);
 		if (!err && t == BPF_READ && value_regno >= 0)
 			mark_reg_unknown_value(state->regs, value_regno);
 
 	} else if (state->regs[regno].type == PTR_TO_CTX) {
+		if (t == BPF_WRITE && value_regno >= 0 &&
+		    is_pointer_value(env, value_regno)) {
+			verbose("R%d leaks addr into ctx\n", value_regno);
+			return -EACCES;
+		}
 		err = check_ctx_access(env, off, size, t);
 		if (!err && t == BPF_READ && value_regno >= 0)
 			mark_reg_unknown_value(state->regs, value_regno);
@@ -684,10 +722,17 @@ static int check_mem_access(struct verifier_env *env, u32 regno, int off,
 			verbose("invalid stack off=%d size=%d\n", off, size);
 			return -EACCES;
 		}
-		if (t == BPF_WRITE)
+		if (t == BPF_WRITE) {
+			if (!env->allow_ptr_leaks &&
+			    state->stack_slot_type[MAX_BPF_STACK + off] == STACK_SPILL &&
+			    size != BPF_REG_SIZE) {
+				verbose("attempt to corrupt spilled pointer on stack\n");
+				return -EACCES;
+			}
 			err = check_stack_write(state, off, size, value_regno);
-		else
+		} else {
 			err = check_stack_read(state, off, size, value_regno);
+		}
 	} else {
 		verbose("R%d invalid mem access '%s'\n",
 			regno, reg_type_str[state->regs[regno].type]);
@@ -770,13 +815,26 @@ static int check_func_arg(struct verifier_env *env, u32 regno,
 	if (arg_type == ARG_DONTCARE)
 		return 0;
 
+	if (arg_type == ARG_VARARG) {
+		if (is_pointer_value(env, regno)) {
+			verbose("R%d leaks addr into printk\n", regno);
+			return -EACCES;
+		}
+		return 0;
+	}
+
 	if (reg->type == NOT_INIT) {
 		verbose("R%d !read_ok\n", regno);
 		return -EACCES;
 	}
 
-	if (arg_type == ARG_ANYTHING)
+	if (arg_type == ARG_ANYTHING) {
+		if (is_pointer_value(env, regno)) {
+			verbose("R%d leaks addr into helper function\n", regno);
+			return -EACCES;
+		}
 		return 0;
+	}
 
 	if (arg_type == ARG_PTR_TO_STACK || arg_type == ARG_PTR_TO_MAP_KEY ||
 	    arg_type == ARG_PTR_TO_MAP_VALUE) {
@@ -950,8 +1008,9 @@ static int check_call(struct verifier_env *env, int func_id)
 }
 
 /* check validity of 32-bit and 64-bit arithmetic operations */
-static int check_alu_op(struct reg_state *regs, struct bpf_insn *insn)
+static int check_alu_op(struct verifier_env *env, struct bpf_insn *insn)
 {
+	struct reg_state *regs = env->cur_state.regs;
 	u8 opcode = BPF_OP(insn->code);
 	int err;
 
@@ -976,6 +1035,12 @@ static int check_alu_op(struct reg_state *regs, struct bpf_insn *insn)
 		if (err)
 			return err;
 
+		if (is_pointer_value(env, insn->dst_reg)) {
+			verbose("R%d pointer arithmetic prohibited\n",
+				insn->dst_reg);
+			return -EACCES;
+		}
+
 		/* check dest operand */
 		err = check_reg_arg(regs, insn->dst_reg, DST_OP);
 		if (err)
@@ -1012,6 +1077,11 @@ static int check_alu_op(struct reg_state *regs, struct bpf_insn *insn)
 				 */
 				regs[insn->dst_reg] = regs[insn->src_reg];
 			} else {
+				if (is_pointer_value(env, insn->src_reg)) {
+					verbose("R%d partial copy of pointer\n",
+						insn->src_reg);
+					return -EACCES;
+				}
 				regs[insn->dst_reg].type = UNKNOWN_VALUE;
 				regs[insn->dst_reg].map_ptr = NULL;
 			}
@@ -1061,8 +1131,18 @@ static int check_alu_op(struct reg_state *regs, struct bpf_insn *insn)
 		/* pattern match 'bpf_add Rx, imm' instruction */
 		if (opcode == BPF_ADD && BPF_CLASS(insn->code) == BPF_ALU64 &&
 		    regs[insn->dst_reg].type == FRAME_PTR &&
-		    BPF_SRC(insn->code) == BPF_K)
+		    BPF_SRC(insn->code) == BPF_K) {
 			stack_relative = true;
+		} else if (is_pointer_value(env, insn->dst_reg)) {
+			verbose("R%d pointer arithmetic prohibited\n",
+				insn->dst_reg);
+			return -EACCES;
+		} else if (BPF_SRC(insn->code) == BPF_X &&
+			   is_pointer_value(env, insn->src_reg)) {
+			verbose("R%d pointer arithmetic prohibited\n",
+				insn->src_reg);
+			return -EACCES;
+		}
 
 		/* check dest operand */
 		err = check_reg_arg(regs, insn->dst_reg, DST_OP);
@@ -1101,6 +1181,12 @@ static int check_cond_jmp_op(struct verifier_env *env,
 		err = check_reg_arg(regs, insn->src_reg, SRC_OP);
 		if (err)
 			return err;
+
+		if (is_pointer_value(env, insn->src_reg)) {
+			verbose("R%d pointer comparison prohibited\n",
+				insn->src_reg);
+			return -EACCES;
+		}
 	} else {
 		if (insn->src_reg != BPF_REG_0) {
 			verbose("BPF_JMP uses reserved fields\n");
@@ -1155,6 +1241,9 @@ static int check_cond_jmp_op(struct verifier_env *env,
 			regs[insn->dst_reg].type = CONST_IMM;
 			regs[insn->dst_reg].imm = 0;
 		}
+	} else if (is_pointer_value(env, insn->dst_reg)) {
+		verbose("R%d pointer comparison prohibited\n", insn->dst_reg);
+		return -EACCES;
 	} else if (BPF_SRC(insn->code) == BPF_K &&
 		   (opcode == BPF_JEQ || opcode == BPF_JNE)) {
 
@@ -1658,7 +1747,7 @@ static int do_check(struct verifier_env *env)
 		}
 
 		if (class == BPF_ALU || class == BPF_ALU64) {
-			err = check_alu_op(regs, insn);
+			err = check_alu_op(env, insn);
 			if (err)
 				return err;
 
@@ -1816,6 +1905,11 @@ static int do_check(struct verifier_env *env)
 				if (err)
 					return err;
 
+				if (is_pointer_value(env, BPF_REG_0)) {
+					verbose("R0 leaks addr as return value\n");
+					return -EACCES;
+				}
+
 process_bpf_exit:
 				insn_idx = pop_stack(env, &prev_insn_idx);
 				if (insn_idx < 0) {
@@ -2144,6 +2238,8 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr)
 	if (ret < 0)
 		goto skip_full_check;
 
+	env->allow_ptr_leaks = capable(CAP_SYS_ADMIN);
+
 	ret = do_check(env);
 
 skip_full_check:
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index e69201d8094e..a281849278d9 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -64,6 +64,7 @@
 #include <linux/binfmts.h>
 #include <linux/sched/sysctl.h>
 #include <linux/kexec.h>
+#include <linux/bpf.h>
 
 #include <asm/uaccess.h>
 #include <asm/processor.h>
@@ -1139,6 +1140,15 @@ static struct ctl_table kern_table[] = {
 		.proc_handler	= timer_migration_handler,
 	},
 #endif
+#ifdef CONFIG_BPF_SYSCALL
+	{
+		.procname	= "bpf_enable_unprivileged",
+		.data		= &sysctl_bpf_enable_unprivileged,
+		.maxlen		= sizeof(sysctl_bpf_enable_unprivileged),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+	},
+#endif
 	{ }
 };
 
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 0fe96c7c8803..46bbed24d1f5 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -173,6 +173,9 @@ static const struct bpf_func_proto bpf_trace_printk_proto = {
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_STACK,
 	.arg2_type	= ARG_CONST_STACK_SIZE,
+	.arg3_type	= ARG_VARARG,
+	.arg4_type	= ARG_VARARG,
+	.arg5_type	= ARG_VARARG,
 };
 
 const struct bpf_func_proto *bpf_get_trace_printk_proto(void)
diff --git a/samples/bpf/libbpf.h b/samples/bpf/libbpf.h
index 7235e292a03b..b7f63c70b4a2 100644
--- a/samples/bpf/libbpf.h
+++ b/samples/bpf/libbpf.h
@@ -64,6 +64,14 @@ extern char bpf_log_buf[LOG_BUF_SIZE];
 		.off   = 0,					\
 		.imm   = 0 })
 
+#define BPF_MOV32_REG(DST, SRC)					\
+	((struct bpf_insn) {					\
+		.code  = BPF_ALU | BPF_MOV | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
 /* Short form of mov, dst_reg = imm32 */
 
 #define BPF_MOV64_IMM(DST, IMM)					\
diff --git a/samples/bpf/test_verifier.c b/samples/bpf/test_verifier.c
index ee0f110c9c54..266d014a885e 100644
--- a/samples/bpf/test_verifier.c
+++ b/samples/bpf/test_verifier.c
@@ -15,6 +15,7 @@
 #include <string.h>
 #include <linux/filter.h>
 #include <stddef.h>
+#include <stdbool.h>
 #include "libbpf.h"
 
 #define MAX_INSNS 512
@@ -25,10 +26,12 @@ struct bpf_test {
 	struct bpf_insn	insns[MAX_INSNS];
 	int fixup[32];
 	const char *errstr;
+	const char *errstr_unpriv;
 	enum {
+		UNDEF,
 		ACCEPT,
 		REJECT
-	} result;
+	} result, result_unpriv;
 	enum bpf_prog_type prog_type;
 };
 
@@ -96,6 +99,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.errstr = "invalid BPF_LD_IMM insn",
+		.errstr_unpriv = "R1 pointer comparison",
 		.result = REJECT,
 	},
 	{
@@ -109,6 +113,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.errstr = "invalid BPF_LD_IMM insn",
+		.errstr_unpriv = "R1 pointer comparison",
 		.result = REJECT,
 	},
 	{
@@ -219,6 +224,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.errstr = "R0 !read_ok",
+		.errstr_unpriv = "R1 pointer comparison",
 		.result = REJECT,
 	},
 	{
@@ -294,7 +300,9 @@ static struct bpf_test tests[] = {
 			BPF_MOV64_REG(BPF_REG_0, BPF_REG_2),
 			BPF_EXIT_INSN(),
 		},
+		.errstr_unpriv = "R0 leaks addr",
 		.result = ACCEPT,
+		.result_unpriv = REJECT,
 	},
 	{
 		"check corrupted spill/fill",
@@ -310,6 +318,7 @@ static struct bpf_test tests[] = {
 
 			BPF_EXIT_INSN(),
 		},
+		.errstr_unpriv = "attempt to corrupt spilled",
 		.errstr = "corrupted spill",
 		.result = REJECT,
 	},
@@ -473,6 +482,7 @@ static struct bpf_test tests[] = {
 		},
 		.fixup = {3},
 		.errstr = "R0 invalid mem access",
+		.errstr_unpriv = "R0 leaks addr",
 		.result = REJECT,
 	},
 	{
@@ -495,6 +505,8 @@ static struct bpf_test tests[] = {
 			BPF_MOV64_IMM(BPF_REG_0, 0),
 			BPF_EXIT_INSN(),
 		},
+		.errstr_unpriv = "R1 pointer comparison",
+		.result_unpriv = REJECT,
 		.result = ACCEPT,
 	},
 	{
@@ -521,6 +533,8 @@ static struct bpf_test tests[] = {
 			BPF_MOV64_IMM(BPF_REG_0, 0),
 			BPF_EXIT_INSN(),
 		},
+		.errstr_unpriv = "R1 pointer comparison",
+		.result_unpriv = REJECT,
 		.result = ACCEPT,
 	},
 	{
@@ -555,6 +569,8 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup = {24},
+		.errstr_unpriv = "R1 pointer comparison",
+		.result_unpriv = REJECT,
 		.result = ACCEPT,
 	},
 	{
@@ -603,6 +619,8 @@ static struct bpf_test tests[] = {
 			BPF_MOV64_IMM(BPF_REG_0, 0),
 			BPF_EXIT_INSN(),
 		},
+		.errstr_unpriv = "R1 pointer comparison",
+		.result_unpriv = REJECT,
 		.result = ACCEPT,
 	},
 	{
@@ -642,6 +660,8 @@ static struct bpf_test tests[] = {
 			BPF_MOV64_IMM(BPF_REG_0, 0),
 			BPF_EXIT_INSN(),
 		},
+		.errstr_unpriv = "R1 pointer comparison",
+		.result_unpriv = REJECT,
 		.result = ACCEPT,
 	},
 	{
@@ -699,6 +719,7 @@ static struct bpf_test tests[] = {
 		},
 		.fixup = {4},
 		.errstr = "different pointers",
+		.errstr_unpriv = "R1 pointer comparison",
 		.result = REJECT,
 	},
 	{
@@ -720,6 +741,7 @@ static struct bpf_test tests[] = {
 		},
 		.fixup = {6},
 		.errstr = "different pointers",
+		.errstr_unpriv = "R1 pointer comparison",
 		.result = REJECT,
 	},
 	{
@@ -742,6 +764,7 @@ static struct bpf_test tests[] = {
 		},
 		.fixup = {7},
 		.errstr = "different pointers",
+		.errstr_unpriv = "R1 pointer comparison",
 		.result = REJECT,
 	},
 	{
@@ -752,6 +775,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.errstr = "invalid bpf_context access",
+		.errstr_unpriv = "R1 leaks addr",
 		.result = REJECT,
 	},
 	{
@@ -762,6 +786,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.errstr = "invalid bpf_context access",
+		.errstr_unpriv = "R1 leaks addr",
 		.result = REJECT,
 	},
 	{
@@ -772,6 +797,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.errstr = "invalid bpf_context access",
+		.errstr_unpriv = "R1 leaks addr",
 		.result = REJECT,
 	},
 	{
@@ -782,6 +808,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.errstr = "invalid bpf_context access",
+		.errstr_unpriv = "",
 		.result = REJECT,
 		.prog_type = BPF_PROG_TYPE_SCHED_ACT,
 	},
@@ -803,6 +830,8 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.result = ACCEPT,
+		.errstr_unpriv = "R1 leaks addr",
+		.result_unpriv = REJECT,
 	},
 	{
 		"write skb fields from tc_cls_act prog",
@@ -819,6 +848,8 @@ static struct bpf_test tests[] = {
 				    offsetof(struct __sk_buff, cb[3])),
 			BPF_EXIT_INSN(),
 		},
+		.errstr_unpriv = "",
+		.result_unpriv = REJECT,
 		.result = ACCEPT,
 		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	},
@@ -881,6 +912,195 @@ static struct bpf_test tests[] = {
 		.result = REJECT,
 		.errstr = "invalid stack off=0 size=8",
 	},
+	{
+		"unpriv: return pointer",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_0, BPF_REG_10),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.result_unpriv = REJECT,
+		.errstr_unpriv = "R0 leaks addr",
+	},
+	{
+		"unpriv: add const to pointer",
+		.insns = {
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 8),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.result_unpriv = REJECT,
+		.errstr_unpriv = "R1 pointer arithmetic",
+	},
+	{
+		"unpriv: add pointer to pointer",
+		.insns = {
+			BPF_ALU64_REG(BPF_ADD, BPF_REG_1, BPF_REG_10),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.result_unpriv = REJECT,
+		.errstr_unpriv = "R1 pointer arithmetic",
+	},
+	{
+		"unpriv: neg pointer",
+		.insns = {
+			BPF_ALU64_IMM(BPF_NEG, BPF_REG_1, 0),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.result_unpriv = REJECT,
+		.errstr_unpriv = "R1 pointer arithmetic",
+	},
+	{
+		"unpriv: cmp pointer with const",
+		.insns = {
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 0),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.result_unpriv = REJECT,
+		.errstr_unpriv = "R1 pointer comparison",
+	},
+	{
+		"unpriv: cmp pointer with pointer",
+		.insns = {
+			BPF_JMP_REG(BPF_JEQ, BPF_REG_1, BPF_REG_10, 0),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.result_unpriv = REJECT,
+		.errstr_unpriv = "R10 pointer comparison",
+	},
+	{
+		"unpriv: pass pointer to printk",
+		.insns = {
+			BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8),
+			BPF_MOV64_IMM(BPF_REG_2, 8),
+			BPF_MOV64_REG(BPF_REG_3, BPF_REG_1),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_trace_printk),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.errstr_unpriv = "R3 leaks addr",
+		.result_unpriv = REJECT,
+		.result = ACCEPT,
+	},
+	{
+		"unpriv: pass pointer to helper function",
+		.insns = {
+			BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_MOV64_REG(BPF_REG_3, BPF_REG_2),
+			BPF_MOV64_REG(BPF_REG_4, BPF_REG_2),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_update_elem),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.fixup = {3},
+		.errstr_unpriv = "R4 leaks addr",
+		.result_unpriv = REJECT,
+		.result = ACCEPT,
+	},
+	{
+		"unpriv: indirectly pass pointer on stack to helper function",
+		.insns = {
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_10, -8),
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.fixup = {3},
+		.errstr = "invalid indirect read from stack off -8+0 size 8",
+		.result = REJECT,
+	},
+	{
+		"unpriv: mangle pointer on stack 1",
+		.insns = {
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_10, -8),
+			BPF_ST_MEM(BPF_W, BPF_REG_10, -8, 0),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.errstr_unpriv = "attempt to corrupt spilled",
+		.result_unpriv = REJECT,
+		.result = ACCEPT,
+	},
+	{
+		"unpriv: mangle pointer on stack 2",
+		.insns = {
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_10, -8),
+			BPF_ST_MEM(BPF_B, BPF_REG_10, -1, 0),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.errstr_unpriv = "attempt to corrupt spilled",
+		.result_unpriv = REJECT,
+		.result = ACCEPT,
+	},
+	{
+		"unpriv: read pointer from stack in small chunks",
+		.insns = {
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_10, -8),
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_10, -8),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "invalid size",
+		.result = REJECT,
+	},
+	{
+		"unpriv: write pointer into ctx",
+		.insns = {
+			BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, 0),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.errstr_unpriv = "R1 leaks addr",
+		.result_unpriv = REJECT,
+		.errstr = "invalid bpf_context access",
+		.result = REJECT,
+	},
+	{
+		"unpriv: write pointer into map elem value",
+		.insns = {
+			BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
+			BPF_STX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.fixup = {3},
+		.errstr_unpriv = "R0 leaks addr",
+		.result_unpriv = REJECT,
+		.result = ACCEPT,
+	},
+	{
+		"unpriv: partial copy of pointer",
+		.insns = {
+			BPF_MOV32_REG(BPF_REG_1, BPF_REG_10),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.errstr_unpriv = "R10 partial copy",
+		.result_unpriv = REJECT,
+		.result = ACCEPT,
+	},
 };
 
 static int probe_filter_length(struct bpf_insn *fp)
@@ -910,12 +1130,15 @@ static int create_map(void)
 static int test(void)
 {
 	int prog_fd, i, pass_cnt = 0, err_cnt = 0;
+	bool unpriv = geteuid() != 0;
 
 	for (i = 0; i < ARRAY_SIZE(tests); i++) {
 		struct bpf_insn *prog = tests[i].insns;
 		int prog_type = tests[i].prog_type;
 		int prog_len = probe_filter_length(prog);
 		int *fixup = tests[i].fixup;
+		int expected_result;
+		const char *expected_errstr;
 		int map_fd = -1;
 
 		if (*fixup) {
@@ -932,7 +1155,17 @@ static int test(void)
 					prog, prog_len * sizeof(struct bpf_insn),
 					"GPL", 0);
 
-		if (tests[i].result == ACCEPT) {
+		if (unpriv && tests[i].result_unpriv != UNDEF)
+			expected_result = tests[i].result_unpriv;
+		else
+			expected_result = tests[i].result;
+
+		if (unpriv && tests[i].errstr_unpriv)
+			expected_errstr = tests[i].errstr_unpriv;
+		else
+			expected_errstr = tests[i].errstr;
+
+		if (expected_result == ACCEPT) {
 			if (prog_fd < 0) {
 				printf("FAIL\nfailed to load prog '%s'\n",
 				       strerror(errno));
@@ -947,7 +1180,7 @@ static int test(void)
 				err_cnt++;
 				goto fail;
 			}
-			if (strstr(bpf_log_buf, tests[i].errstr) == 0) {
+			if (strstr(bpf_log_buf, expected_errstr) == 0) {
 				printf("FAIL\nunexpected error message: %s",
 				       bpf_log_buf);
 				err_cnt++;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH net-next 2/2] bpf: charge user for creation of BPF maps and programs
  2015-10-05 20:48 [PATCH net-next 0/2] bpf: unprivileged Alexei Starovoitov
  2015-10-05 20:48 ` [PATCH net-next 1/2] bpf: enable non-root eBPF programs Alexei Starovoitov
@ 2015-10-05 20:48 ` Alexei Starovoitov
  1 sibling, 0 replies; 30+ messages in thread
From: Alexei Starovoitov @ 2015-10-05 20:48 UTC (permalink / raw)
  To: David S. Miller
  Cc: Andy Lutomirski, Ingo Molnar, Hannes Frederic Sowa, Eric Dumazet,
	Daniel Borkmann, Kees Cook, linux-api, netdev, linux-kernel

since eBPF programs and maps use kernel memory consider it 'locked' memory
from user accounting point of view and charge it against RLIMIT_MEMLOCK limit.
This limit is typically set to 64Kbytes by distros, so almost all
bpf+tracing programs would need to increase it, since they use maps,
but kernel charges maximum map size upfront.
For example the hash map of 1024 elements will be charged as 64Kbyte.
It's inconvenient for current users and changes current behavior for root,
but probably worth doing to be consistent root vs non-root.

Similar accounting logic is done by mmap of perf_event.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
The charging is agressive and even basic test_maps, test_verifier are
hitting memlock limit, so alternatively we can drop charging
for cap_sys_admin.
---
 include/linux/bpf.h   |    3 +++
 include/linux/sched.h |    2 +-
 kernel/bpf/arraymap.c |    2 +-
 kernel/bpf/hashtab.c  |    4 ++++
 kernel/bpf/syscall.c  |   63 +++++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 72 insertions(+), 2 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index acf97d66b681..740906f8684a 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -37,6 +37,8 @@ struct bpf_map {
 	u32 key_size;
 	u32 value_size;
 	u32 max_entries;
+	u32 pages;
+	struct user_struct *user;
 	const struct bpf_map_ops *ops;
 	struct work_struct work;
 };
@@ -130,6 +132,7 @@ struct bpf_prog_aux {
 	const struct bpf_verifier_ops *ops;
 	struct bpf_map **used_maps;
 	struct bpf_prog *prog;
+	struct user_struct *user;
 	union {
 		struct work_struct work;
 		struct rcu_head	rcu;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index b7b9501b41af..4817df5fffae 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -840,7 +840,7 @@ struct user_struct {
 	struct hlist_node uidhash_node;
 	kuid_t uid;
 
-#ifdef CONFIG_PERF_EVENTS
+#if defined(CONFIG_PERF_EVENTS) || defined(CONFIG_BPF_SYSCALL)
 	atomic_long_t locked_vm;
 #endif
 };
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 29ace107f236..d9061730a26c 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -48,7 +48,7 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
 	array->map.key_size = attr->key_size;
 	array->map.value_size = attr->value_size;
 	array->map.max_entries = attr->max_entries;
-
+	array->map.pages = round_up(array_size, PAGE_SIZE) >> PAGE_SHIFT;
 	array->elem_size = elem_size;
 
 	return &array->map;
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 83c209d9b17a..28592d79502b 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -88,6 +88,10 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 	htab->elem_size = sizeof(struct htab_elem) +
 			  round_up(htab->map.key_size, 8) +
 			  htab->map.value_size;
+
+	htab->map.pages = round_up(htab->n_buckets * sizeof(struct hlist_head) +
+				   htab->elem_size * htab->map.max_entries,
+				   PAGE_SIZE) >> PAGE_SHIFT;
 	return &htab->map;
 
 free_htab:
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 9a2098da2da9..5d65c40486a1 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -46,11 +46,38 @@ void bpf_register_map_type(struct bpf_map_type_list *tl)
 	list_add(&tl->list_node, &bpf_map_types);
 }
 
+static int bpf_map_charge_memlock(struct bpf_map *map)
+{
+	struct user_struct *user = get_current_user();
+	unsigned long memlock_limit;
+
+	memlock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
+
+	atomic_long_add(map->pages, &user->locked_vm);
+
+	if (atomic_long_read(&user->locked_vm) > memlock_limit) {
+		atomic_long_sub(map->pages, &user->locked_vm);
+		free_uid(user);
+		return -EPERM;
+	}
+	map->user = user;
+	return 0;
+}
+
+static void bpf_map_uncharge_memlock(struct bpf_map *map)
+{
+	struct user_struct *user = map->user;
+
+	atomic_long_sub(map->pages, &user->locked_vm);
+	free_uid(user);
+}
+
 /* called from workqueue */
 static void bpf_map_free_deferred(struct work_struct *work)
 {
 	struct bpf_map *map = container_of(work, struct bpf_map, work);
 
+	bpf_map_uncharge_memlock(map);
 	/* implementation dependent freeing */
 	map->ops->map_free(map);
 }
@@ -110,6 +137,10 @@ static int map_create(union bpf_attr *attr)
 
 	atomic_set(&map->refcnt, 1);
 
+	err = bpf_map_charge_memlock(map);
+	if (err)
+		goto free_map;
+
 	err = anon_inode_getfd("bpf-map", &bpf_map_fops, map, O_RDWR | O_CLOEXEC);
 
 	if (err < 0)
@@ -440,11 +471,37 @@ static void free_used_maps(struct bpf_prog_aux *aux)
 	kfree(aux->used_maps);
 }
 
+static int bpf_prog_charge_memlock(struct bpf_prog *prog)
+{
+	struct user_struct *user = get_current_user();
+	unsigned long memlock_limit;
+
+	memlock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
+
+	atomic_long_add(prog->pages, &user->locked_vm);
+	if (atomic_long_read(&user->locked_vm) > memlock_limit) {
+		atomic_long_sub(prog->pages, &user->locked_vm);
+		free_uid(user);
+		return -EPERM;
+	}
+	prog->aux->user = user;
+	return 0;
+}
+
+static void bpf_prog_uncharge_memlock(struct bpf_prog *prog)
+{
+	struct user_struct *user = prog->aux->user;
+
+	atomic_long_sub(prog->pages, &user->locked_vm);
+	free_uid(user);
+}
+
 static void __prog_put_rcu(struct rcu_head *rcu)
 {
 	struct bpf_prog_aux *aux = container_of(rcu, struct bpf_prog_aux, rcu);
 
 	free_used_maps(aux);
+	bpf_prog_uncharge_memlock(aux->prog);
 	bpf_prog_free(aux->prog);
 }
 
@@ -552,6 +609,10 @@ static int bpf_prog_load(union bpf_attr *attr)
 	if (!prog)
 		return -ENOMEM;
 
+	err = bpf_prog_charge_memlock(prog);
+	if (err)
+		goto free_prog_nouncharge;
+
 	prog->len = attr->insn_cnt;
 
 	err = -EFAULT;
@@ -593,6 +654,8 @@ static int bpf_prog_load(union bpf_attr *attr)
 free_used_maps:
 	free_used_maps(prog->aux);
 free_prog:
+	bpf_prog_uncharge_memlock(prog);
+free_prog_nouncharge:
 	bpf_prog_free(prog);
 	return err;
 }
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
  2015-10-05 20:48 ` [PATCH net-next 1/2] bpf: enable non-root eBPF programs Alexei Starovoitov
@ 2015-10-05 21:00   ` Kees Cook
  2015-10-05 21:12     ` Alexei Starovoitov
  2015-10-05 22:14   ` Daniel Borkmann
  2015-10-08  2:29   ` Alexei Starovoitov
  2 siblings, 1 reply; 30+ messages in thread
From: Kees Cook @ 2015-10-05 21:00 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Andy Lutomirski, Ingo Molnar,
	Hannes Frederic Sowa, Eric Dumazet, Daniel Borkmann, Linux API,
	Network Development, LKML

On Mon, Oct 5, 2015 at 1:48 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> In order to let unprivileged users load and execute eBPF programs
> teach verifier to prevent pointer leaks.
> Verifier will prevent
> - any arithmetic on pointers
>   (except R10+Imm which is used to compute stack addresses)
> - comparison of pointers
> - passing pointers to helper functions
> - indirectly passing pointers in stack to helper functions
> - returning pointer from bpf program
> - storing pointers into ctx or maps

Does the arithmetic restriction include using a pointer as an index to
a maps-based tail call? I'm still worried about pointer-based
side-effects.

-Kees

>
> Spill/fill of pointers into stack is allowed, but mangling
> of pointers stored in the stack or reading them byte by byte is not.
>
> Within bpf programs the pointers do exist, since programs need to
> be able to access maps, pass skb pointer to LD_ABS insns, etc
> but programs cannot pass such pointer values to the outside
> or obfuscate them.
>
> Only allow BPF_PROG_TYPE_SOCKET_FILTER unprivileged programs,
> so that socket filters (tcpdump), af_packet (quic acceleration)
> and future kcm can use it.
> tracing and tc cls/act program types still require root permissions,
> since tracing actually needs to be able to see all kernel pointers
> and tc is for root only.
>
> For example, the following unprivileged socket filter program is allowed:
> int foo(struct __sk_buff *skb)
> {
>   char fmt[] = "hello %d\n";
>   bpf_trace_printk(fmt, sizeof(fmt), skb->len);
>   return 0;
> }
>
> but the following program is not:
> int foo(struct __sk_buff *skb)
> {
>   char fmt[] = "hello %p\n";
>   bpf_trace_printk(fmt, sizeof(fmt), fmt);
>   return 0;
> }
> since it would leak the kernel stack address via bpf_trace_printk().
>
> Unprivileged socket filter bpf programs have access to the
> following helper functions:
> - map lookup/update/delete (but they cannot store kernel pointers into them)
> - get_random (it's already exposed to unprivileged user space)
> - get_smp_processor_id
> - tail_call into another socket filter program
> - ktime_get_ns
> - bpf_trace_printk (for debugging)
>
> The feature is controlled by sysctl kernel.bpf_enable_unprivileged
> which is off by default.
>
> New tests were added to test_verifier:
>  unpriv: return pointer OK
>  unpriv: add const to pointer OK
>  unpriv: add pointer to pointer OK
>  unpriv: neg pointer OK
>  unpriv: cmp pointer with const OK
>  unpriv: cmp pointer with pointer OK
>  unpriv: pass pointer to printk OK
>  unpriv: pass pointer to helper function OK
>  unpriv: indirectly pass pointer on stack to helper function OK
>  unpriv: mangle pointer on stack 1 OK
>  unpriv: mangle pointer on stack 2 OK
>  unpriv: read pointer from stack in small chunks OK
>  unpriv: write pointer into ctx OK
>  unpriv: write pointer into map elem value OK
>  unpriv: partial copy of pointer OK
>
> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
> ---
>  include/linux/bpf.h         |    3 +
>  kernel/bpf/syscall.c        |   11 +-
>  kernel/bpf/verifier.c       |  114 +++++++++++++++++++--
>  kernel/sysctl.c             |   10 ++
>  kernel/trace/bpf_trace.c    |    3 +
>  samples/bpf/libbpf.h        |    8 ++
>  samples/bpf/test_verifier.c |  239 ++++++++++++++++++++++++++++++++++++++++++-
>  7 files changed, 371 insertions(+), 17 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index f57d7fed9ec3..acf97d66b681 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -66,6 +66,7 @@ enum bpf_arg_type {
>
>         ARG_PTR_TO_CTX,         /* pointer to context */
>         ARG_ANYTHING,           /* any (initialized) argument is ok */
> +       ARG_VARARG,             /* optional argument */
>  };
>
>  /* type of values returned from helper functions */
> @@ -168,6 +169,8 @@ void bpf_prog_put_rcu(struct bpf_prog *prog);
>  struct bpf_map *bpf_map_get(struct fd f);
>  void bpf_map_put(struct bpf_map *map);
>
> +extern int sysctl_bpf_enable_unprivileged;
> +
>  /* verify correctness of eBPF program */
>  int bpf_check(struct bpf_prog **fp, union bpf_attr *attr);
>  #else
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 5f35f420c12f..9a2098da2da9 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -18,6 +18,8 @@
>  #include <linux/filter.h>
>  #include <linux/version.h>
>
> +int sysctl_bpf_enable_unprivileged __read_mostly;
> +
>  static LIST_HEAD(bpf_map_types);
>
>  static struct bpf_map *find_and_alloc_map(union bpf_attr *attr)
> @@ -542,6 +544,9 @@ static int bpf_prog_load(union bpf_attr *attr)
>             attr->kern_version != LINUX_VERSION_CODE)
>                 return -EINVAL;
>
> +       if (type != BPF_PROG_TYPE_SOCKET_FILTER && !capable(CAP_SYS_ADMIN))
> +               return -EPERM;
> +
>         /* plain bpf_prog allocation */
>         prog = bpf_prog_alloc(bpf_prog_size(attr->insn_cnt), GFP_USER);
>         if (!prog)
> @@ -597,11 +602,7 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
>         union bpf_attr attr = {};
>         int err;
>
> -       /* the syscall is limited to root temporarily. This restriction will be
> -        * lifted when security audit is clean. Note that eBPF+tracing must have
> -        * this restriction, since it may pass kernel data to user space
> -        */
> -       if (!capable(CAP_SYS_ADMIN))
> +       if (!capable(CAP_SYS_ADMIN) && !sysctl_bpf_enable_unprivileged)
>                 return -EPERM;
>
>         if (!access_ok(VERIFY_READ, uattr, 1))
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index b074b23000d6..dccaeeeb1128 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -199,6 +199,7 @@ struct verifier_env {
>         struct verifier_state_list **explored_states; /* search pruning optimization */
>         struct bpf_map *used_maps[MAX_USED_MAPS]; /* array of map's used by eBPF program */
>         u32 used_map_cnt;               /* number of used maps */
> +       bool allow_ptr_leaks;
>  };
>
>  /* verbose verifier prints what it's seeing
> @@ -538,6 +539,21 @@ static int bpf_size_to_bytes(int bpf_size)
>                 return -EINVAL;
>  }
>
> +static bool is_spillable_regtype(enum bpf_reg_type type)
> +{
> +       switch (type) {
> +       case PTR_TO_MAP_VALUE:
> +       case PTR_TO_MAP_VALUE_OR_NULL:
> +       case PTR_TO_STACK:
> +       case PTR_TO_CTX:
> +       case FRAME_PTR:
> +       case CONST_PTR_TO_MAP:
> +               return true;
> +       default:
> +               return false;
> +       }
> +}
> +
>  /* check_stack_read/write functions track spill/fill of registers,
>   * stack boundary and alignment are checked in check_mem_access()
>   */
> @@ -550,9 +566,7 @@ static int check_stack_write(struct verifier_state *state, int off, int size,
>          */
>
>         if (value_regno >= 0 &&
> -           (state->regs[value_regno].type == PTR_TO_MAP_VALUE ||
> -            state->regs[value_regno].type == PTR_TO_STACK ||
> -            state->regs[value_regno].type == PTR_TO_CTX)) {
> +           is_spillable_regtype(state->regs[value_regno].type)) {
>
>                 /* register containing pointer is being spilled into stack */
>                 if (size != BPF_REG_SIZE) {
> @@ -643,6 +657,20 @@ static int check_ctx_access(struct verifier_env *env, int off, int size,
>         return -EACCES;
>  }
>
> +static bool is_pointer_value(struct verifier_env *env, int regno)
> +{
> +       if (env->allow_ptr_leaks)
> +               return false;
> +
> +       switch (env->cur_state.regs[regno].type) {
> +       case UNKNOWN_VALUE:
> +       case CONST_IMM:
> +               return false;
> +       default:
> +               return true;
> +       }
> +}
> +
>  /* check whether memory at (regno + off) is accessible for t = (read | write)
>   * if t==write, value_regno is a register which value is stored into memory
>   * if t==read, value_regno is a register which will receive the value from memory
> @@ -669,11 +697,21 @@ static int check_mem_access(struct verifier_env *env, u32 regno, int off,
>         }
>
>         if (state->regs[regno].type == PTR_TO_MAP_VALUE) {
> +               if (t == BPF_WRITE && value_regno >= 0 &&
> +                   is_pointer_value(env, value_regno)) {
> +                       verbose("R%d leaks addr into map\n", value_regno);
> +                       return -EACCES;
> +               }
>                 err = check_map_access(env, regno, off, size);
>                 if (!err && t == BPF_READ && value_regno >= 0)
>                         mark_reg_unknown_value(state->regs, value_regno);
>
>         } else if (state->regs[regno].type == PTR_TO_CTX) {
> +               if (t == BPF_WRITE && value_regno >= 0 &&
> +                   is_pointer_value(env, value_regno)) {
> +                       verbose("R%d leaks addr into ctx\n", value_regno);
> +                       return -EACCES;
> +               }
>                 err = check_ctx_access(env, off, size, t);
>                 if (!err && t == BPF_READ && value_regno >= 0)
>                         mark_reg_unknown_value(state->regs, value_regno);
> @@ -684,10 +722,17 @@ static int check_mem_access(struct verifier_env *env, u32 regno, int off,
>                         verbose("invalid stack off=%d size=%d\n", off, size);
>                         return -EACCES;
>                 }
> -               if (t == BPF_WRITE)
> +               if (t == BPF_WRITE) {
> +                       if (!env->allow_ptr_leaks &&
> +                           state->stack_slot_type[MAX_BPF_STACK + off] == STACK_SPILL &&
> +                           size != BPF_REG_SIZE) {
> +                               verbose("attempt to corrupt spilled pointer on stack\n");
> +                               return -EACCES;
> +                       }
>                         err = check_stack_write(state, off, size, value_regno);
> -               else
> +               } else {
>                         err = check_stack_read(state, off, size, value_regno);
> +               }
>         } else {
>                 verbose("R%d invalid mem access '%s'\n",
>                         regno, reg_type_str[state->regs[regno].type]);
> @@ -770,13 +815,26 @@ static int check_func_arg(struct verifier_env *env, u32 regno,
>         if (arg_type == ARG_DONTCARE)
>                 return 0;
>
> +       if (arg_type == ARG_VARARG) {
> +               if (is_pointer_value(env, regno)) {
> +                       verbose("R%d leaks addr into printk\n", regno);
> +                       return -EACCES;
> +               }
> +               return 0;
> +       }
> +
>         if (reg->type == NOT_INIT) {
>                 verbose("R%d !read_ok\n", regno);
>                 return -EACCES;
>         }
>
> -       if (arg_type == ARG_ANYTHING)
> +       if (arg_type == ARG_ANYTHING) {
> +               if (is_pointer_value(env, regno)) {
> +                       verbose("R%d leaks addr into helper function\n", regno);
> +                       return -EACCES;
> +               }
>                 return 0;
> +       }
>
>         if (arg_type == ARG_PTR_TO_STACK || arg_type == ARG_PTR_TO_MAP_KEY ||
>             arg_type == ARG_PTR_TO_MAP_VALUE) {
> @@ -950,8 +1008,9 @@ static int check_call(struct verifier_env *env, int func_id)
>  }
>
>  /* check validity of 32-bit and 64-bit arithmetic operations */
> -static int check_alu_op(struct reg_state *regs, struct bpf_insn *insn)
> +static int check_alu_op(struct verifier_env *env, struct bpf_insn *insn)
>  {
> +       struct reg_state *regs = env->cur_state.regs;
>         u8 opcode = BPF_OP(insn->code);
>         int err;
>
> @@ -976,6 +1035,12 @@ static int check_alu_op(struct reg_state *regs, struct bpf_insn *insn)
>                 if (err)
>                         return err;
>
> +               if (is_pointer_value(env, insn->dst_reg)) {
> +                       verbose("R%d pointer arithmetic prohibited\n",
> +                               insn->dst_reg);
> +                       return -EACCES;
> +               }
> +
>                 /* check dest operand */
>                 err = check_reg_arg(regs, insn->dst_reg, DST_OP);
>                 if (err)
> @@ -1012,6 +1077,11 @@ static int check_alu_op(struct reg_state *regs, struct bpf_insn *insn)
>                                  */
>                                 regs[insn->dst_reg] = regs[insn->src_reg];
>                         } else {
> +                               if (is_pointer_value(env, insn->src_reg)) {
> +                                       verbose("R%d partial copy of pointer\n",
> +                                               insn->src_reg);
> +                                       return -EACCES;
> +                               }
>                                 regs[insn->dst_reg].type = UNKNOWN_VALUE;
>                                 regs[insn->dst_reg].map_ptr = NULL;
>                         }
> @@ -1061,8 +1131,18 @@ static int check_alu_op(struct reg_state *regs, struct bpf_insn *insn)
>                 /* pattern match 'bpf_add Rx, imm' instruction */
>                 if (opcode == BPF_ADD && BPF_CLASS(insn->code) == BPF_ALU64 &&
>                     regs[insn->dst_reg].type == FRAME_PTR &&
> -                   BPF_SRC(insn->code) == BPF_K)
> +                   BPF_SRC(insn->code) == BPF_K) {
>                         stack_relative = true;
> +               } else if (is_pointer_value(env, insn->dst_reg)) {
> +                       verbose("R%d pointer arithmetic prohibited\n",
> +                               insn->dst_reg);
> +                       return -EACCES;
> +               } else if (BPF_SRC(insn->code) == BPF_X &&
> +                          is_pointer_value(env, insn->src_reg)) {
> +                       verbose("R%d pointer arithmetic prohibited\n",
> +                               insn->src_reg);
> +                       return -EACCES;
> +               }
>
>                 /* check dest operand */
>                 err = check_reg_arg(regs, insn->dst_reg, DST_OP);
> @@ -1101,6 +1181,12 @@ static int check_cond_jmp_op(struct verifier_env *env,
>                 err = check_reg_arg(regs, insn->src_reg, SRC_OP);
>                 if (err)
>                         return err;
> +
> +               if (is_pointer_value(env, insn->src_reg)) {
> +                       verbose("R%d pointer comparison prohibited\n",
> +                               insn->src_reg);
> +                       return -EACCES;
> +               }
>         } else {
>                 if (insn->src_reg != BPF_REG_0) {
>                         verbose("BPF_JMP uses reserved fields\n");
> @@ -1155,6 +1241,9 @@ static int check_cond_jmp_op(struct verifier_env *env,
>                         regs[insn->dst_reg].type = CONST_IMM;
>                         regs[insn->dst_reg].imm = 0;
>                 }
> +       } else if (is_pointer_value(env, insn->dst_reg)) {
> +               verbose("R%d pointer comparison prohibited\n", insn->dst_reg);
> +               return -EACCES;
>         } else if (BPF_SRC(insn->code) == BPF_K &&
>                    (opcode == BPF_JEQ || opcode == BPF_JNE)) {
>
> @@ -1658,7 +1747,7 @@ static int do_check(struct verifier_env *env)
>                 }
>
>                 if (class == BPF_ALU || class == BPF_ALU64) {
> -                       err = check_alu_op(regs, insn);
> +                       err = check_alu_op(env, insn);
>                         if (err)
>                                 return err;
>
> @@ -1816,6 +1905,11 @@ static int do_check(struct verifier_env *env)
>                                 if (err)
>                                         return err;
>
> +                               if (is_pointer_value(env, BPF_REG_0)) {
> +                                       verbose("R0 leaks addr as return value\n");
> +                                       return -EACCES;
> +                               }
> +
>  process_bpf_exit:
>                                 insn_idx = pop_stack(env, &prev_insn_idx);
>                                 if (insn_idx < 0) {
> @@ -2144,6 +2238,8 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr)
>         if (ret < 0)
>                 goto skip_full_check;
>
> +       env->allow_ptr_leaks = capable(CAP_SYS_ADMIN);
> +
>         ret = do_check(env);
>
>  skip_full_check:
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index e69201d8094e..a281849278d9 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -64,6 +64,7 @@
>  #include <linux/binfmts.h>
>  #include <linux/sched/sysctl.h>
>  #include <linux/kexec.h>
> +#include <linux/bpf.h>
>
>  #include <asm/uaccess.h>
>  #include <asm/processor.h>
> @@ -1139,6 +1140,15 @@ static struct ctl_table kern_table[] = {
>                 .proc_handler   = timer_migration_handler,
>         },
>  #endif
> +#ifdef CONFIG_BPF_SYSCALL
> +       {
> +               .procname       = "bpf_enable_unprivileged",
> +               .data           = &sysctl_bpf_enable_unprivileged,
> +               .maxlen         = sizeof(sysctl_bpf_enable_unprivileged),
> +               .mode           = 0644,
> +               .proc_handler   = proc_dointvec,
> +       },
> +#endif
>         { }
>  };
>
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 0fe96c7c8803..46bbed24d1f5 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -173,6 +173,9 @@ static const struct bpf_func_proto bpf_trace_printk_proto = {
>         .ret_type       = RET_INTEGER,
>         .arg1_type      = ARG_PTR_TO_STACK,
>         .arg2_type      = ARG_CONST_STACK_SIZE,
> +       .arg3_type      = ARG_VARARG,
> +       .arg4_type      = ARG_VARARG,
> +       .arg5_type      = ARG_VARARG,
>  };
>
>  const struct bpf_func_proto *bpf_get_trace_printk_proto(void)
> diff --git a/samples/bpf/libbpf.h b/samples/bpf/libbpf.h
> index 7235e292a03b..b7f63c70b4a2 100644
> --- a/samples/bpf/libbpf.h
> +++ b/samples/bpf/libbpf.h
> @@ -64,6 +64,14 @@ extern char bpf_log_buf[LOG_BUF_SIZE];
>                 .off   = 0,                                     \
>                 .imm   = 0 })
>
> +#define BPF_MOV32_REG(DST, SRC)                                        \
> +       ((struct bpf_insn) {                                    \
> +               .code  = BPF_ALU | BPF_MOV | BPF_X,             \
> +               .dst_reg = DST,                                 \
> +               .src_reg = SRC,                                 \
> +               .off   = 0,                                     \
> +               .imm   = 0 })
> +
>  /* Short form of mov, dst_reg = imm32 */
>
>  #define BPF_MOV64_IMM(DST, IMM)                                        \
> diff --git a/samples/bpf/test_verifier.c b/samples/bpf/test_verifier.c
> index ee0f110c9c54..266d014a885e 100644
> --- a/samples/bpf/test_verifier.c
> +++ b/samples/bpf/test_verifier.c
> @@ -15,6 +15,7 @@
>  #include <string.h>
>  #include <linux/filter.h>
>  #include <stddef.h>
> +#include <stdbool.h>
>  #include "libbpf.h"
>
>  #define MAX_INSNS 512
> @@ -25,10 +26,12 @@ struct bpf_test {
>         struct bpf_insn insns[MAX_INSNS];
>         int fixup[32];
>         const char *errstr;
> +       const char *errstr_unpriv;
>         enum {
> +               UNDEF,
>                 ACCEPT,
>                 REJECT
> -       } result;
> +       } result, result_unpriv;
>         enum bpf_prog_type prog_type;
>  };
>
> @@ -96,6 +99,7 @@ static struct bpf_test tests[] = {
>                         BPF_EXIT_INSN(),
>                 },
>                 .errstr = "invalid BPF_LD_IMM insn",
> +               .errstr_unpriv = "R1 pointer comparison",
>                 .result = REJECT,
>         },
>         {
> @@ -109,6 +113,7 @@ static struct bpf_test tests[] = {
>                         BPF_EXIT_INSN(),
>                 },
>                 .errstr = "invalid BPF_LD_IMM insn",
> +               .errstr_unpriv = "R1 pointer comparison",
>                 .result = REJECT,
>         },
>         {
> @@ -219,6 +224,7 @@ static struct bpf_test tests[] = {
>                         BPF_EXIT_INSN(),
>                 },
>                 .errstr = "R0 !read_ok",
> +               .errstr_unpriv = "R1 pointer comparison",
>                 .result = REJECT,
>         },
>         {
> @@ -294,7 +300,9 @@ static struct bpf_test tests[] = {
>                         BPF_MOV64_REG(BPF_REG_0, BPF_REG_2),
>                         BPF_EXIT_INSN(),
>                 },
> +               .errstr_unpriv = "R0 leaks addr",
>                 .result = ACCEPT,
> +               .result_unpriv = REJECT,
>         },
>         {
>                 "check corrupted spill/fill",
> @@ -310,6 +318,7 @@ static struct bpf_test tests[] = {
>
>                         BPF_EXIT_INSN(),
>                 },
> +               .errstr_unpriv = "attempt to corrupt spilled",
>                 .errstr = "corrupted spill",
>                 .result = REJECT,
>         },
> @@ -473,6 +482,7 @@ static struct bpf_test tests[] = {
>                 },
>                 .fixup = {3},
>                 .errstr = "R0 invalid mem access",
> +               .errstr_unpriv = "R0 leaks addr",
>                 .result = REJECT,
>         },
>         {
> @@ -495,6 +505,8 @@ static struct bpf_test tests[] = {
>                         BPF_MOV64_IMM(BPF_REG_0, 0),
>                         BPF_EXIT_INSN(),
>                 },
> +               .errstr_unpriv = "R1 pointer comparison",
> +               .result_unpriv = REJECT,
>                 .result = ACCEPT,
>         },
>         {
> @@ -521,6 +533,8 @@ static struct bpf_test tests[] = {
>                         BPF_MOV64_IMM(BPF_REG_0, 0),
>                         BPF_EXIT_INSN(),
>                 },
> +               .errstr_unpriv = "R1 pointer comparison",
> +               .result_unpriv = REJECT,
>                 .result = ACCEPT,
>         },
>         {
> @@ -555,6 +569,8 @@ static struct bpf_test tests[] = {
>                         BPF_EXIT_INSN(),
>                 },
>                 .fixup = {24},
> +               .errstr_unpriv = "R1 pointer comparison",
> +               .result_unpriv = REJECT,
>                 .result = ACCEPT,
>         },
>         {
> @@ -603,6 +619,8 @@ static struct bpf_test tests[] = {
>                         BPF_MOV64_IMM(BPF_REG_0, 0),
>                         BPF_EXIT_INSN(),
>                 },
> +               .errstr_unpriv = "R1 pointer comparison",
> +               .result_unpriv = REJECT,
>                 .result = ACCEPT,
>         },
>         {
> @@ -642,6 +660,8 @@ static struct bpf_test tests[] = {
>                         BPF_MOV64_IMM(BPF_REG_0, 0),
>                         BPF_EXIT_INSN(),
>                 },
> +               .errstr_unpriv = "R1 pointer comparison",
> +               .result_unpriv = REJECT,
>                 .result = ACCEPT,
>         },
>         {
> @@ -699,6 +719,7 @@ static struct bpf_test tests[] = {
>                 },
>                 .fixup = {4},
>                 .errstr = "different pointers",
> +               .errstr_unpriv = "R1 pointer comparison",
>                 .result = REJECT,
>         },
>         {
> @@ -720,6 +741,7 @@ static struct bpf_test tests[] = {
>                 },
>                 .fixup = {6},
>                 .errstr = "different pointers",
> +               .errstr_unpriv = "R1 pointer comparison",
>                 .result = REJECT,
>         },
>         {
> @@ -742,6 +764,7 @@ static struct bpf_test tests[] = {
>                 },
>                 .fixup = {7},
>                 .errstr = "different pointers",
> +               .errstr_unpriv = "R1 pointer comparison",
>                 .result = REJECT,
>         },
>         {
> @@ -752,6 +775,7 @@ static struct bpf_test tests[] = {
>                         BPF_EXIT_INSN(),
>                 },
>                 .errstr = "invalid bpf_context access",
> +               .errstr_unpriv = "R1 leaks addr",
>                 .result = REJECT,
>         },
>         {
> @@ -762,6 +786,7 @@ static struct bpf_test tests[] = {
>                         BPF_EXIT_INSN(),
>                 },
>                 .errstr = "invalid bpf_context access",
> +               .errstr_unpriv = "R1 leaks addr",
>                 .result = REJECT,
>         },
>         {
> @@ -772,6 +797,7 @@ static struct bpf_test tests[] = {
>                         BPF_EXIT_INSN(),
>                 },
>                 .errstr = "invalid bpf_context access",
> +               .errstr_unpriv = "R1 leaks addr",
>                 .result = REJECT,
>         },
>         {
> @@ -782,6 +808,7 @@ static struct bpf_test tests[] = {
>                         BPF_EXIT_INSN(),
>                 },
>                 .errstr = "invalid bpf_context access",
> +               .errstr_unpriv = "",
>                 .result = REJECT,
>                 .prog_type = BPF_PROG_TYPE_SCHED_ACT,
>         },
> @@ -803,6 +830,8 @@ static struct bpf_test tests[] = {
>                         BPF_EXIT_INSN(),
>                 },
>                 .result = ACCEPT,
> +               .errstr_unpriv = "R1 leaks addr",
> +               .result_unpriv = REJECT,
>         },
>         {
>                 "write skb fields from tc_cls_act prog",
> @@ -819,6 +848,8 @@ static struct bpf_test tests[] = {
>                                     offsetof(struct __sk_buff, cb[3])),
>                         BPF_EXIT_INSN(),
>                 },
> +               .errstr_unpriv = "",
> +               .result_unpriv = REJECT,
>                 .result = ACCEPT,
>                 .prog_type = BPF_PROG_TYPE_SCHED_CLS,
>         },
> @@ -881,6 +912,195 @@ static struct bpf_test tests[] = {
>                 .result = REJECT,
>                 .errstr = "invalid stack off=0 size=8",
>         },
> +       {
> +               "unpriv: return pointer",
> +               .insns = {
> +                       BPF_MOV64_REG(BPF_REG_0, BPF_REG_10),
> +                       BPF_EXIT_INSN(),
> +               },
> +               .result = ACCEPT,
> +               .result_unpriv = REJECT,
> +               .errstr_unpriv = "R0 leaks addr",
> +       },
> +       {
> +               "unpriv: add const to pointer",
> +               .insns = {
> +                       BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 8),
> +                       BPF_MOV64_IMM(BPF_REG_0, 0),
> +                       BPF_EXIT_INSN(),
> +               },
> +               .result = ACCEPT,
> +               .result_unpriv = REJECT,
> +               .errstr_unpriv = "R1 pointer arithmetic",
> +       },
> +       {
> +               "unpriv: add pointer to pointer",
> +               .insns = {
> +                       BPF_ALU64_REG(BPF_ADD, BPF_REG_1, BPF_REG_10),
> +                       BPF_MOV64_IMM(BPF_REG_0, 0),
> +                       BPF_EXIT_INSN(),
> +               },
> +               .result = ACCEPT,
> +               .result_unpriv = REJECT,
> +               .errstr_unpriv = "R1 pointer arithmetic",
> +       },
> +       {
> +               "unpriv: neg pointer",
> +               .insns = {
> +                       BPF_ALU64_IMM(BPF_NEG, BPF_REG_1, 0),
> +                       BPF_MOV64_IMM(BPF_REG_0, 0),
> +                       BPF_EXIT_INSN(),
> +               },
> +               .result = ACCEPT,
> +               .result_unpriv = REJECT,
> +               .errstr_unpriv = "R1 pointer arithmetic",
> +       },
> +       {
> +               "unpriv: cmp pointer with const",
> +               .insns = {
> +                       BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 0),
> +                       BPF_MOV64_IMM(BPF_REG_0, 0),
> +                       BPF_EXIT_INSN(),
> +               },
> +               .result = ACCEPT,
> +               .result_unpriv = REJECT,
> +               .errstr_unpriv = "R1 pointer comparison",
> +       },
> +       {
> +               "unpriv: cmp pointer with pointer",
> +               .insns = {
> +                       BPF_JMP_REG(BPF_JEQ, BPF_REG_1, BPF_REG_10, 0),
> +                       BPF_MOV64_IMM(BPF_REG_0, 0),
> +                       BPF_EXIT_INSN(),
> +               },
> +               .result = ACCEPT,
> +               .result_unpriv = REJECT,
> +               .errstr_unpriv = "R10 pointer comparison",
> +       },
> +       {
> +               "unpriv: pass pointer to printk",
> +               .insns = {
> +                       BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
> +                       BPF_MOV64_REG(BPF_REG_1, BPF_REG_10),
> +                       BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8),
> +                       BPF_MOV64_IMM(BPF_REG_2, 8),
> +                       BPF_MOV64_REG(BPF_REG_3, BPF_REG_1),
> +                       BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_trace_printk),
> +                       BPF_MOV64_IMM(BPF_REG_0, 0),
> +                       BPF_EXIT_INSN(),
> +               },
> +               .errstr_unpriv = "R3 leaks addr",
> +               .result_unpriv = REJECT,
> +               .result = ACCEPT,
> +       },
> +       {
> +               "unpriv: pass pointer to helper function",
> +               .insns = {
> +                       BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
> +                       BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
> +                       BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
> +                       BPF_LD_MAP_FD(BPF_REG_1, 0),
> +                       BPF_MOV64_REG(BPF_REG_3, BPF_REG_2),
> +                       BPF_MOV64_REG(BPF_REG_4, BPF_REG_2),
> +                       BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_update_elem),
> +                       BPF_MOV64_IMM(BPF_REG_0, 0),
> +                       BPF_EXIT_INSN(),
> +               },
> +               .fixup = {3},
> +               .errstr_unpriv = "R4 leaks addr",
> +               .result_unpriv = REJECT,
> +               .result = ACCEPT,
> +       },
> +       {
> +               "unpriv: indirectly pass pointer on stack to helper function",
> +               .insns = {
> +                       BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_10, -8),
> +                       BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
> +                       BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
> +                       BPF_LD_MAP_FD(BPF_REG_1, 0),
> +                       BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
> +                       BPF_MOV64_IMM(BPF_REG_0, 0),
> +                       BPF_EXIT_INSN(),
> +               },
> +               .fixup = {3},
> +               .errstr = "invalid indirect read from stack off -8+0 size 8",
> +               .result = REJECT,
> +       },
> +       {
> +               "unpriv: mangle pointer on stack 1",
> +               .insns = {
> +                       BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_10, -8),
> +                       BPF_ST_MEM(BPF_W, BPF_REG_10, -8, 0),
> +                       BPF_MOV64_IMM(BPF_REG_0, 0),
> +                       BPF_EXIT_INSN(),
> +               },
> +               .errstr_unpriv = "attempt to corrupt spilled",
> +               .result_unpriv = REJECT,
> +               .result = ACCEPT,
> +       },
> +       {
> +               "unpriv: mangle pointer on stack 2",
> +               .insns = {
> +                       BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_10, -8),
> +                       BPF_ST_MEM(BPF_B, BPF_REG_10, -1, 0),
> +                       BPF_MOV64_IMM(BPF_REG_0, 0),
> +                       BPF_EXIT_INSN(),
> +               },
> +               .errstr_unpriv = "attempt to corrupt spilled",
> +               .result_unpriv = REJECT,
> +               .result = ACCEPT,
> +       },
> +       {
> +               "unpriv: read pointer from stack in small chunks",
> +               .insns = {
> +                       BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_10, -8),
> +                       BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_10, -8),
> +                       BPF_MOV64_IMM(BPF_REG_0, 0),
> +                       BPF_EXIT_INSN(),
> +               },
> +               .errstr = "invalid size",
> +               .result = REJECT,
> +       },
> +       {
> +               "unpriv: write pointer into ctx",
> +               .insns = {
> +                       BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, 0),
> +                       BPF_MOV64_IMM(BPF_REG_0, 0),
> +                       BPF_EXIT_INSN(),
> +               },
> +               .errstr_unpriv = "R1 leaks addr",
> +               .result_unpriv = REJECT,
> +               .errstr = "invalid bpf_context access",
> +               .result = REJECT,
> +       },
> +       {
> +               "unpriv: write pointer into map elem value",
> +               .insns = {
> +                       BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
> +                       BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
> +                       BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
> +                       BPF_LD_MAP_FD(BPF_REG_1, 0),
> +                       BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
> +                       BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
> +                       BPF_STX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0),
> +                       BPF_EXIT_INSN(),
> +               },
> +               .fixup = {3},
> +               .errstr_unpriv = "R0 leaks addr",
> +               .result_unpriv = REJECT,
> +               .result = ACCEPT,
> +       },
> +       {
> +               "unpriv: partial copy of pointer",
> +               .insns = {
> +                       BPF_MOV32_REG(BPF_REG_1, BPF_REG_10),
> +                       BPF_MOV64_IMM(BPF_REG_0, 0),
> +                       BPF_EXIT_INSN(),
> +               },
> +               .errstr_unpriv = "R10 partial copy",
> +               .result_unpriv = REJECT,
> +               .result = ACCEPT,
> +       },
>  };
>
>  static int probe_filter_length(struct bpf_insn *fp)
> @@ -910,12 +1130,15 @@ static int create_map(void)
>  static int test(void)
>  {
>         int prog_fd, i, pass_cnt = 0, err_cnt = 0;
> +       bool unpriv = geteuid() != 0;
>
>         for (i = 0; i < ARRAY_SIZE(tests); i++) {
>                 struct bpf_insn *prog = tests[i].insns;
>                 int prog_type = tests[i].prog_type;
>                 int prog_len = probe_filter_length(prog);
>                 int *fixup = tests[i].fixup;
> +               int expected_result;
> +               const char *expected_errstr;
>                 int map_fd = -1;
>
>                 if (*fixup) {
> @@ -932,7 +1155,17 @@ static int test(void)
>                                         prog, prog_len * sizeof(struct bpf_insn),
>                                         "GPL", 0);
>
> -               if (tests[i].result == ACCEPT) {
> +               if (unpriv && tests[i].result_unpriv != UNDEF)
> +                       expected_result = tests[i].result_unpriv;
> +               else
> +                       expected_result = tests[i].result;
> +
> +               if (unpriv && tests[i].errstr_unpriv)
> +                       expected_errstr = tests[i].errstr_unpriv;
> +               else
> +                       expected_errstr = tests[i].errstr;
> +
> +               if (expected_result == ACCEPT) {
>                         if (prog_fd < 0) {
>                                 printf("FAIL\nfailed to load prog '%s'\n",
>                                        strerror(errno));
> @@ -947,7 +1180,7 @@ static int test(void)
>                                 err_cnt++;
>                                 goto fail;
>                         }
> -                       if (strstr(bpf_log_buf, tests[i].errstr) == 0) {
> +                       if (strstr(bpf_log_buf, expected_errstr) == 0) {
>                                 printf("FAIL\nunexpected error message: %s",
>                                        bpf_log_buf);
>                                 err_cnt++;
> --
> 1.7.9.5
>



-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
  2015-10-05 21:00   ` Kees Cook
@ 2015-10-05 21:12     ` Alexei Starovoitov
  2015-10-05 21:16       ` Andy Lutomirski
  2015-10-05 22:02       ` Kees Cook
  0 siblings, 2 replies; 30+ messages in thread
From: Alexei Starovoitov @ 2015-10-05 21:12 UTC (permalink / raw)
  To: Kees Cook
  Cc: David S. Miller, Andy Lutomirski, Ingo Molnar,
	Hannes Frederic Sowa, Eric Dumazet, Daniel Borkmann, Linux API,
	Network Development, LKML

On 10/5/15 2:00 PM, Kees Cook wrote:
> On Mon, Oct 5, 2015 at 1:48 PM, Alexei Starovoitov<ast@plumgrid.com>  wrote:
>> >In order to let unprivileged users load and execute eBPF programs
>> >teach verifier to prevent pointer leaks.
>> >Verifier will prevent
>> >- any arithmetic on pointers
>> >   (except R10+Imm which is used to compute stack addresses)
>> >- comparison of pointers
>> >- passing pointers to helper functions
>> >- indirectly passing pointers in stack to helper functions
>> >- returning pointer from bpf program
>> >- storing pointers into ctx or maps
> Does the arithmetic restriction include using a pointer as an index to
> a maps-based tail call? I'm still worried about pointer-based
> side-effects.

the array maps that hold FDs (BPF_MAP_TYPE_PROG_ARRAY and
BPF_MAP_TYPE_PERF_EVENT_ARRAY) don't have lookup/update accessors
from the program side, so programs cannot see or manipulate
those pointers.
For the former only bpf_tail_call() is allowed that takes integer
index and jumps to it. And the latter map accessed with
bpf_perf_event_read() that also takes index only (this helper
is not available to socket filters anyway).
Also bpf_tail_call() can only jump to the program of the same type.
So I'm quite certain it's safe.

Yes, please ask questions and try to poke holes. Now it is time.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
  2015-10-05 21:12     ` Alexei Starovoitov
@ 2015-10-05 21:16       ` Andy Lutomirski
  2015-10-05 21:32         ` Alexei Starovoitov
  2015-10-05 22:02       ` Kees Cook
  1 sibling, 1 reply; 30+ messages in thread
From: Andy Lutomirski @ 2015-10-05 21:16 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Kees Cook, David S. Miller, Ingo Molnar, Hannes Frederic Sowa,
	Eric Dumazet, Daniel Borkmann, Linux API, Network Development,
	LKML

On Mon, Oct 5, 2015 at 2:12 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> On 10/5/15 2:00 PM, Kees Cook wrote:
>>
>> On Mon, Oct 5, 2015 at 1:48 PM, Alexei Starovoitov<ast@plumgrid.com>
>> wrote:
>>>
>>> >In order to let unprivileged users load and execute eBPF programs
>>> >teach verifier to prevent pointer leaks.
>>> >Verifier will prevent
>>> >- any arithmetic on pointers
>>> >   (except R10+Imm which is used to compute stack addresses)
>>> >- comparison of pointers
>>> >- passing pointers to helper functions
>>> >- indirectly passing pointers in stack to helper functions
>>> >- returning pointer from bpf program
>>> >- storing pointers into ctx or maps
>>
>> Does the arithmetic restriction include using a pointer as an index to
>> a maps-based tail call? I'm still worried about pointer-based
>> side-effects.
>
>
> the array maps that hold FDs (BPF_MAP_TYPE_PROG_ARRAY and
> BPF_MAP_TYPE_PERF_EVENT_ARRAY) don't have lookup/update accessors
> from the program side, so programs cannot see or manipulate
> those pointers.
> For the former only bpf_tail_call() is allowed that takes integer
> index and jumps to it. And the latter map accessed with
> bpf_perf_event_read() that also takes index only (this helper
> is not available to socket filters anyway).
> Also bpf_tail_call() can only jump to the program of the same type.
> So I'm quite certain it's safe.

At some point there will be an unprivileged way to create a map,
though, and we don't want to let pointers get poked into the map.

Or am I misunderstanding?

--Andy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
  2015-10-05 21:16       ` Andy Lutomirski
@ 2015-10-05 21:32         ` Alexei Starovoitov
  0 siblings, 0 replies; 30+ messages in thread
From: Alexei Starovoitov @ 2015-10-05 21:32 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kees Cook, David S. Miller, Ingo Molnar, Hannes Frederic Sowa,
	Eric Dumazet, Daniel Borkmann, Linux API, Network Development,
	LKML

On 10/5/15 2:16 PM, Andy Lutomirski wrote:
> On Mon, Oct 5, 2015 at 2:12 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
>> On 10/5/15 2:00 PM, Kees Cook wrote:
>>>
>>> On Mon, Oct 5, 2015 at 1:48 PM, Alexei Starovoitov<ast@plumgrid.com>
>>> wrote:
>>>>
>>>>> In order to let unprivileged users load and execute eBPF programs
>>>>> teach verifier to prevent pointer leaks.
>>>>> Verifier will prevent
>>>>> - any arithmetic on pointers
>>>>>    (except R10+Imm which is used to compute stack addresses)
>>>>> - comparison of pointers
>>>>> - passing pointers to helper functions
>>>>> - indirectly passing pointers in stack to helper functions
>>>>> - returning pointer from bpf program
>>>>> - storing pointers into ctx or maps
>>>
>>> Does the arithmetic restriction include using a pointer as an index to
>>> a maps-based tail call? I'm still worried about pointer-based
>>> side-effects.
>>
>>
>> the array maps that hold FDs (BPF_MAP_TYPE_PROG_ARRAY and
>> BPF_MAP_TYPE_PERF_EVENT_ARRAY) don't have lookup/update accessors
>> from the program side, so programs cannot see or manipulate
>> those pointers.
>> For the former only bpf_tail_call() is allowed that takes integer
>> index and jumps to it. And the latter map accessed with
>> bpf_perf_event_read() that also takes index only (this helper
>> is not available to socket filters anyway).
>> Also bpf_tail_call() can only jump to the program of the same type.
>> So I'm quite certain it's safe.
>
> At some point there will be an unprivileged way to create a map,
> though, and we don't want to let pointers get poked into the map.

yes. exactly. With these two patches non-root can create a map
against memlock user limit and have a program store bytes into it
(like data from network packet), but it cannot store pointers into it.
That's covered by test "unpriv: write pointer into map elem value"
I've added new tests for all cases that can 'leak pointer':
  unpriv: return pointer OK
  unpriv: add const to pointer OK
  unpriv: add pointer to pointer OK
  unpriv: neg pointer OK
  unpriv: cmp pointer with const OK
  unpriv: cmp pointer with pointer OK
  unpriv: pass pointer to printk OK
  unpriv: pass pointer to helper function OK
  unpriv: indirectly pass pointer on stack to helper function OK
  unpriv: mangle pointer on stack 1 OK
  unpriv: mangle pointer on stack 2 OK
  unpriv: read pointer from stack in small chunks OK
  unpriv: write pointer into ctx OK
  unpriv: write pointer into map elem value OK
  unpriv: partial copy of pointer OK

the most interesting one is 'indirectly pass pointer'.
It checks the case where user stores a pointer into a stack
and then uses that stack region either as a key for lookup or
as part of format string for printk.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
  2015-10-05 21:12     ` Alexei Starovoitov
  2015-10-05 21:16       ` Andy Lutomirski
@ 2015-10-05 22:02       ` Kees Cook
  2015-10-06  0:28         ` Alexei Starovoitov
  1 sibling, 1 reply; 30+ messages in thread
From: Kees Cook @ 2015-10-05 22:02 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Andy Lutomirski, Ingo Molnar,
	Hannes Frederic Sowa, Eric Dumazet, Daniel Borkmann, Linux API,
	Network Development, LKML

On Mon, Oct 5, 2015 at 2:12 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> On 10/5/15 2:00 PM, Kees Cook wrote:
>>
>> On Mon, Oct 5, 2015 at 1:48 PM, Alexei Starovoitov<ast@plumgrid.com>
>> wrote:
>>>
>>> >In order to let unprivileged users load and execute eBPF programs
>>> >teach verifier to prevent pointer leaks.
>>> >Verifier will prevent
>>> >- any arithmetic on pointers
>>> >   (except R10+Imm which is used to compute stack addresses)
>>> >- comparison of pointers
>>> >- passing pointers to helper functions
>>> >- indirectly passing pointers in stack to helper functions
>>> >- returning pointer from bpf program
>>> >- storing pointers into ctx or maps
>>
>> Does the arithmetic restriction include using a pointer as an index to
>> a maps-based tail call? I'm still worried about pointer-based
>> side-effects.
>
>
> the array maps that hold FDs (BPF_MAP_TYPE_PROG_ARRAY and
> BPF_MAP_TYPE_PERF_EVENT_ARRAY) don't have lookup/update accessors
> from the program side, so programs cannot see or manipulate
> those pointers.
> For the former only bpf_tail_call() is allowed that takes integer
> index and jumps to it. And the latter map accessed with

Okay, so I can't take a pointer, put it on the stack, take it back any
part of it as an integer and use it for a tail call?

Sounds like this is shaping up nicely! Thanks for adding all these checks.

-Kees

> bpf_perf_event_read() that also takes index only (this helper
> is not available to socket filters anyway).
> Also bpf_tail_call() can only jump to the program of the same type.
> So I'm quite certain it's safe.
>
> Yes, please ask questions and try to poke holes. Now it is time.
>



-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
  2015-10-05 20:48 ` [PATCH net-next 1/2] bpf: enable non-root eBPF programs Alexei Starovoitov
  2015-10-05 21:00   ` Kees Cook
@ 2015-10-05 22:14   ` Daniel Borkmann
  2015-10-06  0:51     ` Alexei Starovoitov
  2015-10-08  2:29   ` Alexei Starovoitov
  2 siblings, 1 reply; 30+ messages in thread
From: Daniel Borkmann @ 2015-10-05 22:14 UTC (permalink / raw)
  To: Alexei Starovoitov, David S. Miller
  Cc: Andy Lutomirski, Ingo Molnar, Hannes Frederic Sowa, Eric Dumazet,
	Kees Cook, linux-api, netdev, linux-kernel

On 10/05/2015 10:48 PM, Alexei Starovoitov wrote:
> In order to let unprivileged users load and execute eBPF programs
> teach verifier to prevent pointer leaks.
> Verifier will prevent
> - any arithmetic on pointers
>    (except R10+Imm which is used to compute stack addresses)
> - comparison of pointers
> - passing pointers to helper functions
> - indirectly passing pointers in stack to helper functions
> - returning pointer from bpf program
> - storing pointers into ctx or maps
>
> Spill/fill of pointers into stack is allowed, but mangling
> of pointers stored in the stack or reading them byte by byte is not.
>
> Within bpf programs the pointers do exist, since programs need to
> be able to access maps, pass skb pointer to LD_ABS insns, etc
> but programs cannot pass such pointer values to the outside
> or obfuscate them.
>
> Only allow BPF_PROG_TYPE_SOCKET_FILTER unprivileged programs,
> so that socket filters (tcpdump), af_packet (quic acceleration)
> and future kcm can use it.
> tracing and tc cls/act program types still require root permissions,
> since tracing actually needs to be able to see all kernel pointers
> and tc is for root only.
>
> For example, the following unprivileged socket filter program is allowed:
> int foo(struct __sk_buff *skb)
> {
>    char fmt[] = "hello %d\n";
>    bpf_trace_printk(fmt, sizeof(fmt), skb->len);
>    return 0;
> }
>
> but the following program is not:
> int foo(struct __sk_buff *skb)
> {
>    char fmt[] = "hello %p\n";
>    bpf_trace_printk(fmt, sizeof(fmt), fmt);
>    return 0;
> }
> since it would leak the kernel stack address via bpf_trace_printk().
>
> Unprivileged socket filter bpf programs have access to the
> following helper functions:
> - map lookup/update/delete (but they cannot store kernel pointers into them)
> - get_random (it's already exposed to unprivileged user space)
> - get_smp_processor_id
> - tail_call into another socket filter program
> - ktime_get_ns
> - bpf_trace_printk (for debugging)
>
> The feature is controlled by sysctl kernel.bpf_enable_unprivileged
> which is off by default.
>
> New tests were added to test_verifier:
>   unpriv: return pointer OK
>   unpriv: add const to pointer OK
>   unpriv: add pointer to pointer OK
>   unpriv: neg pointer OK
>   unpriv: cmp pointer with const OK
>   unpriv: cmp pointer with pointer OK
>   unpriv: pass pointer to printk OK
>   unpriv: pass pointer to helper function OK
>   unpriv: indirectly pass pointer on stack to helper function OK
>   unpriv: mangle pointer on stack 1 OK
>   unpriv: mangle pointer on stack 2 OK
>   unpriv: read pointer from stack in small chunks OK
>   unpriv: write pointer into ctx OK
>   unpriv: write pointer into map elem value OK
>   unpriv: partial copy of pointer OK
>
> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>

One scenario that comes to mind ... what happens when there are kernel
pointers stored in skb->cb[] (either from the current layer or an old
one from a different layer that the skb went through previously, but
which did not get overwritten)?

Socket filters could read a portion of skb->cb[] also when unprived and
leak that out through maps. I think the verifier doesn't catch that,
right?

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
  2015-10-05 22:02       ` Kees Cook
@ 2015-10-06  0:28         ` Alexei Starovoitov
  0 siblings, 0 replies; 30+ messages in thread
From: Alexei Starovoitov @ 2015-10-06  0:28 UTC (permalink / raw)
  To: Kees Cook
  Cc: David S. Miller, Andy Lutomirski, Ingo Molnar,
	Hannes Frederic Sowa, Eric Dumazet, Daniel Borkmann, Linux API,
	Network Development, LKML

On 10/5/15 3:02 PM, Kees Cook wrote:
>> the array maps that hold FDs (BPF_MAP_TYPE_PROG_ARRAY and
>> >BPF_MAP_TYPE_PERF_EVENT_ARRAY) don't have lookup/update accessors
>> >from the program side, so programs cannot see or manipulate
>> >those pointers.
>> >For the former only bpf_tail_call() is allowed that takes integer
>> >index and jumps to it. And the latter map accessed with
> Okay, so I can't take a pointer, put it on the stack, take it back any
> part of it as an integer and use it for a tail call?

not quite.
you can store a pointer to stack and read it as 8 byte load back into
another register, but reading <8 byte of it will be rejected.
That's the test:
unpriv: read pointer from stack in small chunks
we obviously want to avoid hiding pointer in integers.
After reading it back from stack as a pointer you cannnot use
this register to pass as index into bpf_tail_call().
That's the test:
unpriv: pass pointer to helper function

please keep shooting everything that comes to mind.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
  2015-10-05 22:14   ` Daniel Borkmann
@ 2015-10-06  0:51     ` Alexei Starovoitov
  2015-10-06  7:13       ` Ingo Molnar
  2015-10-06 12:45       ` Daniel Borkmann
  0 siblings, 2 replies; 30+ messages in thread
From: Alexei Starovoitov @ 2015-10-06  0:51 UTC (permalink / raw)
  To: Daniel Borkmann, David S. Miller
  Cc: Andy Lutomirski, Ingo Molnar, Hannes Frederic Sowa, Eric Dumazet,
	Kees Cook, linux-api, netdev, linux-kernel

On 10/5/15 3:14 PM, Daniel Borkmann wrote:
> One scenario that comes to mind ... what happens when there are kernel
> pointers stored in skb->cb[] (either from the current layer or an old
> one from a different layer that the skb went through previously, but
> which did not get overwritten)?
>
> Socket filters could read a portion of skb->cb[] also when unprived and
> leak that out through maps. I think the verifier doesn't catch that,
> right?

grrr. indeed. previous layer before sk_filter() can leave junk in there.
Would need to disable cb[0-5] for unpriv, but that will make tail_call
much harder to use, since cb[0-5] is a way to pass arguments from
one prog to another and clearing them is not an option, since it's
too expensive. Like samples/bpf/sockex3_kern.c usage of cb[0]
won't work anymore. I guess that's the price of unpriv.
Will fix this, add few tail_call specific tests and respin.
Please keep poking.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
  2015-10-06  0:51     ` Alexei Starovoitov
@ 2015-10-06  7:13       ` Ingo Molnar
  2015-10-06  8:05         ` Daniel Borkmann
  2015-10-06 12:45       ` Daniel Borkmann
  1 sibling, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2015-10-06  7:13 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Borkmann, David S. Miller, Andy Lutomirski,
	Hannes Frederic Sowa, Eric Dumazet, Kees Cook, linux-api, netdev,
	linux-kernel


* Alexei Starovoitov <ast@plumgrid.com> wrote:

> On 10/5/15 3:14 PM, Daniel Borkmann wrote:
> >One scenario that comes to mind ... what happens when there are kernel
> >pointers stored in skb->cb[] (either from the current layer or an old
> >one from a different layer that the skb went through previously, but
> >which did not get overwritten)?
> >
> >Socket filters could read a portion of skb->cb[] also when unprived and
> >leak that out through maps. I think the verifier doesn't catch that,
> >right?
> 
> grrr. indeed. previous layer before sk_filter() can leave junk in there.

Could this be solved by activating zeroing/sanitizing of this data if there's an 
active BPF function around that can access that socket?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
  2015-10-06  7:13       ` Ingo Molnar
@ 2015-10-06  8:05         ` Daniel Borkmann
  2015-10-06  8:20           ` Ingo Molnar
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel Borkmann @ 2015-10-06  8:05 UTC (permalink / raw)
  To: Ingo Molnar, Alexei Starovoitov
  Cc: David S. Miller, Andy Lutomirski, Hannes Frederic Sowa,
	Eric Dumazet, Kees Cook, linux-api, netdev, linux-kernel

On 10/06/2015 09:13 AM, Ingo Molnar wrote:
>
> * Alexei Starovoitov <ast@plumgrid.com> wrote:
>
>> On 10/5/15 3:14 PM, Daniel Borkmann wrote:
>>> One scenario that comes to mind ... what happens when there are kernel
>>> pointers stored in skb->cb[] (either from the current layer or an old
>>> one from a different layer that the skb went through previously, but
>>> which did not get overwritten)?
>>>
>>> Socket filters could read a portion of skb->cb[] also when unprived and
>>> leak that out through maps. I think the verifier doesn't catch that,
>>> right?
>>
>> grrr. indeed. previous layer before sk_filter() can leave junk in there.
>
> Could this be solved by activating zeroing/sanitizing of this data if there's an
> active BPF function around that can access that socket?

I think this check could only be done in sk_filter() for testing these
conditions (unprivileged user + access to cb area), so it would need to
happen from outside a native eBPF program. :/ Also classic BPF would
then need to test for it, since a socket filter doesn't really know
whether native eBPF is loaded there or a classic-to-eBPF transformed one,
and classic never makes use of this. Anyway, it could be done by adding
a bit flag cb_access:1 to the bpf_prog, set it during eBPF verification
phase, and test it inside sk_filter() if I see it correctly.

The reason is that this sanitizing must only be done in the 'top-level'
program that is run from sk_filter() _directly_, because a user at any
time could decide to put an already loaded eBPF fd into a tail call map.
And cb[] is then used to pass args/state around between two programs,
thus it cannot be unconditionally cleared from within the program. The
association to a socket filter (SO_ATTACH_BPF) happens at a later time
after a native eBPF program has already been loaded via bpf(2).

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
  2015-10-06  8:05         ` Daniel Borkmann
@ 2015-10-06  8:20           ` Ingo Molnar
  2015-10-06  8:39             ` Daniel Borkmann
  0 siblings, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2015-10-06  8:20 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, David S. Miller, Andy Lutomirski,
	Hannes Frederic Sowa, Eric Dumazet, Kees Cook, linux-api, netdev,
	linux-kernel


* Daniel Borkmann <daniel@iogearbox.net> wrote:

> On 10/06/2015 09:13 AM, Ingo Molnar wrote:
> >
> >* Alexei Starovoitov <ast@plumgrid.com> wrote:
> >
> >>On 10/5/15 3:14 PM, Daniel Borkmann wrote:
> >>>One scenario that comes to mind ... what happens when there are kernel
> >>>pointers stored in skb->cb[] (either from the current layer or an old
> >>>one from a different layer that the skb went through previously, but
> >>>which did not get overwritten)?
> >>>
> >>>Socket filters could read a portion of skb->cb[] also when unprived and
> >>>leak that out through maps. I think the verifier doesn't catch that,
> >>>right?
> >>
> >>grrr. indeed. previous layer before sk_filter() can leave junk in there.
> >
> >Could this be solved by activating zeroing/sanitizing of this data if there's an
> >active BPF function around that can access that socket?
> 
> I think this check could only be done in sk_filter() for testing these
> conditions (unprivileged user + access to cb area), so it would need to
> happen from outside a native eBPF program. :/

Yes, the kernel (with code running outside of any eBPF program) would guarantee 
that those data fields are zeroed/sanitized, if there's an eBPF program that is 
attached to that socket.

> [...] Also classic BPF would then need to test for it, since a socket filter 
> doesn't really know whether native eBPF is loaded there or a classic-to-eBPF 
> transformed one, and classic never makes use of this. Anyway, it could be done 
> by adding a bit flag cb_access:1 to the bpf_prog, set it during eBPF 
> verification phase, and test it inside sk_filter() if I see it correctly.

That could also be done in an unlikely() branch, to keep the cost to the non-eBPF 
case near zero.

> The reason is that this sanitizing must only be done in the 'top-level' program 
> that is run from sk_filter() _directly_, because a user at any time could decide 
> to put an already loaded eBPF fd into a tail call map. And cb[] is then used to 
> pass args/state around between two programs, thus it cannot be unconditionally 
> cleared from within the program. The association to a socket filter 
> (SO_ATTACH_BPF) happens at a later time after a native eBPF program has already 
> been loaded via bpf(2).

So zeroing tends to be very cheap and it could also be beneficial to performance 
in terms of bringing the cacheline into the CPU cache. But I really don't know the 
filter code so I'm just handwaving.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
  2015-10-06  8:20           ` Ingo Molnar
@ 2015-10-06  8:39             ` Daniel Borkmann
  2015-10-06 17:50               ` Alexei Starovoitov
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel Borkmann @ 2015-10-06  8:39 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Alexei Starovoitov, David S. Miller, Andy Lutomirski,
	Hannes Frederic Sowa, Eric Dumazet, Kees Cook, linux-api, netdev,
	linux-kernel

On 10/06/2015 10:20 AM, Ingo Molnar wrote:
>
> * Daniel Borkmann <daniel@iogearbox.net> wrote:
>
>> On 10/06/2015 09:13 AM, Ingo Molnar wrote:
>>>
>>> * Alexei Starovoitov <ast@plumgrid.com> wrote:
>>>
>>>> On 10/5/15 3:14 PM, Daniel Borkmann wrote:
>>>>> One scenario that comes to mind ... what happens when there are kernel
>>>>> pointers stored in skb->cb[] (either from the current layer or an old
>>>>> one from a different layer that the skb went through previously, but
>>>>> which did not get overwritten)?
>>>>>
>>>>> Socket filters could read a portion of skb->cb[] also when unprived and
>>>>> leak that out through maps. I think the verifier doesn't catch that,
>>>>> right?
>>>>
>>>> grrr. indeed. previous layer before sk_filter() can leave junk in there.
>>>
>>> Could this be solved by activating zeroing/sanitizing of this data if there's an
>>> active BPF function around that can access that socket?
>>
>> I think this check could only be done in sk_filter() for testing these
>> conditions (unprivileged user + access to cb area), so it would need to
>> happen from outside a native eBPF program. :/
>
> Yes, the kernel (with code running outside of any eBPF program) would guarantee
> that those data fields are zeroed/sanitized, if there's an eBPF program that is
> attached to that socket.
>
>> [...] Also classic BPF would then need to test for it, since a socket filter
>> doesn't really know whether native eBPF is loaded there or a classic-to-eBPF
>> transformed one, and classic never makes use of this. Anyway, it could be done
>> by adding a bit flag cb_access:1 to the bpf_prog, set it during eBPF
>> verification phase, and test it inside sk_filter() if I see it correctly.
>
> That could also be done in an unlikely() branch, to keep the cost to the non-eBPF
> case near zero.

Yes, agreed. For the time being, the majority of users are coming from the
classic BPF side anyway and the unlikely() could still be changed later on
if it should not be the case anymore. The flag and bpf_func would share the
same cacheline as well.

>> The reason is that this sanitizing must only be done in the 'top-level' program
>> that is run from sk_filter() _directly_, because a user at any time could decide
>> to put an already loaded eBPF fd into a tail call map. And cb[] is then used to
>> pass args/state around between two programs, thus it cannot be unconditionally
>> cleared from within the program. The association to a socket filter
>> (SO_ATTACH_BPF) happens at a later time after a native eBPF program has already
>> been loaded via bpf(2).
>
> So zeroing tends to be very cheap and it could also be beneficial to performance
> in terms of bringing the cacheline into the CPU cache. But I really don't know the
> filter code so I'm just handwaving.
>
> Thanks,
>
> 	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
  2015-10-06  0:51     ` Alexei Starovoitov
  2015-10-06  7:13       ` Ingo Molnar
@ 2015-10-06 12:45       ` Daniel Borkmann
  2015-10-07 21:20         ` Alexei Starovoitov
  1 sibling, 1 reply; 30+ messages in thread
From: Daniel Borkmann @ 2015-10-06 12:45 UTC (permalink / raw)
  To: Alexei Starovoitov, David S. Miller
  Cc: Andy Lutomirski, Ingo Molnar, Hannes Frederic Sowa, Eric Dumazet,
	Kees Cook, linux-api, netdev, linux-kernel

On 10/06/2015 02:51 AM, Alexei Starovoitov wrote:
> On 10/5/15 3:14 PM, Daniel Borkmann wrote:
>> One scenario that comes to mind ... what happens when there are kernel
>> pointers stored in skb->cb[] (either from the current layer or an old
>> one from a different layer that the skb went through previously, but
>> which did not get overwritten)?
>>
>> Socket filters could read a portion of skb->cb[] also when unprived and
>> leak that out through maps. I think the verifier doesn't catch that,
>> right?
...
> Please keep poking.

;)

I'm still wondering whether sysctl_bpf_enable_unprivileged is a good
way to go with regards to controlling capabilties of bpf(2), hmm, but
don't really have a good idea at the moment.

So, the rationale of this is to give it some soaking time before flipping
the switch that then defaults to on, and later on to still have a
possibility for an admin to turn it off (if not silently overwritten by
some system application later on ;)).

I think only having a Kconfig doesn't really make sense as distros
will blindly turn lots of stuff on anyway. A hidden Kconfig entry
that is not exposed into menuconfig might allow for sorting everything
out first, but with the issue of getting only little testing exposure.

If I see this correctly, perf_event_open(2) has a number of paranoia
levels with some helpers wrapped around it, f.e.:

/*
  * perf event paranoia level:
  *  -1 - not paranoid at all
  *   0 - disallow raw tracepoint access for unpriv
  *   1 - disallow cpu events for unpriv
  *   2 - disallow kernel profiling for unpriv
  */
int sysctl_perf_event_paranoid __read_mostly = 1;

Should instead something similar be adapted on bpf(2) as well? Or, would
that even be more painful for application developers shipping their stuff
through distros in the end (where they might then decide to just setup
everything BPF-related and then drop privs)?

I'm also wondering with regards to seccomp, which could adapt to eBPF at
some point and be used by unprivileged programs. Perhaps then, a single
paranoia alike setting might not suit to all eBPF subsystem users. Any
ideas?

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
  2015-10-06  8:39             ` Daniel Borkmann
@ 2015-10-06 17:50               ` Alexei Starovoitov
  2015-10-06 17:56                 ` Eric Dumazet
  2015-10-06 18:03                 ` Daniel Borkmann
  0 siblings, 2 replies; 30+ messages in thread
From: Alexei Starovoitov @ 2015-10-06 17:50 UTC (permalink / raw)
  To: Daniel Borkmann, Ingo Molnar
  Cc: David S. Miller, Andy Lutomirski, Hannes Frederic Sowa,
	Eric Dumazet, Kees Cook, linux-api, netdev, linux-kernel

On 10/6/15 1:39 AM, Daniel Borkmann wrote:
>>> [...] Also classic BPF would then need to test for it, since a socket
>>> filter
>>> doesn't really know whether native eBPF is loaded there or a
>>> classic-to-eBPF
>>> transformed one, and classic never makes use of this. Anyway, it
>>> could be done
>>> by adding a bit flag cb_access:1 to the bpf_prog, set it during eBPF
>>> verification phase, and test it inside sk_filter() if I see it
>>> correctly.
>>
>> That could also be done in an unlikely() branch, to keep the cost to
>> the non-eBPF
>> case near zero.
>
> Yes, agreed. For the time being, the majority of users are coming from the
> classic BPF side anyway and the unlikely() could still be changed later on
> if it should not be the case anymore. The flag and bpf_func would share the
> same cacheline as well.

was also thinking that we can do it only in paths that actually
have multiple protocol layers, since today bpf is mainly used with
tcpdump(raw_socket) and new af_packet fanout both have cb cleared
on RX, because it just came out of alloc_skb and no layers were called,
and on TX we can clear 20 bytes in dev_queue_xmit_nit().
af_unix/netlink also have clean skb. Need to analyze tun and sctp...
but it feels overly fragile to save a branch in sk_filter,
so planning to go with
if(unlikely(prog->cb_access)) memset in sk_filter().


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
  2015-10-06 17:50               ` Alexei Starovoitov
@ 2015-10-06 17:56                 ` Eric Dumazet
  2015-10-06 18:05                   ` Andy Lutomirski
  2015-10-06 19:26                   ` Alexei Starovoitov
  2015-10-06 18:03                 ` Daniel Borkmann
  1 sibling, 2 replies; 30+ messages in thread
From: Eric Dumazet @ 2015-10-06 17:56 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Borkmann, Ingo Molnar, David S. Miller, Andy Lutomirski,
	Hannes Frederic Sowa, Eric Dumazet, Kees Cook, linux-api, netdev,
	linux-kernel

On Tue, 2015-10-06 at 10:50 -0700, Alexei Starovoitov wrote:

> was also thinking that we can do it only in paths that actually
> have multiple protocol layers, since today bpf is mainly used with
> tcpdump(raw_socket) and new af_packet fanout both have cb cleared
> on RX, because it just came out of alloc_skb and no layers were called,
> and on TX we can clear 20 bytes in dev_queue_xmit_nit().
> af_unix/netlink also have clean skb. Need to analyze tun and sctp...
> but it feels overly fragile to save a branch in sk_filter,
> so planning to go with
> if(unlikely(prog->cb_access)) memset in sk_filter().
> 

This will break TCP use of sk_filter().
skb->cb[] contains useful data in TCP layer.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
  2015-10-06 17:50               ` Alexei Starovoitov
  2015-10-06 17:56                 ` Eric Dumazet
@ 2015-10-06 18:03                 ` Daniel Borkmann
  1 sibling, 0 replies; 30+ messages in thread
From: Daniel Borkmann @ 2015-10-06 18:03 UTC (permalink / raw)
  To: Alexei Starovoitov, Ingo Molnar
  Cc: David S. Miller, Andy Lutomirski, Hannes Frederic Sowa,
	Eric Dumazet, Kees Cook, linux-api, netdev, linux-kernel

On 10/06/2015 07:50 PM, Alexei Starovoitov wrote:
> On 10/6/15 1:39 AM, Daniel Borkmann wrote:
>>>> [...] Also classic BPF would then need to test for it, since a socket
>>>> filter
>>>> doesn't really know whether native eBPF is loaded there or a
>>>> classic-to-eBPF
>>>> transformed one, and classic never makes use of this. Anyway, it
>>>> could be done
>>>> by adding a bit flag cb_access:1 to the bpf_prog, set it during eBPF
>>>> verification phase, and test it inside sk_filter() if I see it
>>>> correctly.
>>>
>>> That could also be done in an unlikely() branch, to keep the cost to
>>> the non-eBPF
>>> case near zero.
>>
>> Yes, agreed. For the time being, the majority of users are coming from the
>> classic BPF side anyway and the unlikely() could still be changed later on
>> if it should not be the case anymore. The flag and bpf_func would share the
>> same cacheline as well.
>
> was also thinking that we can do it only in paths that actually
> have multiple protocol layers, since today bpf is mainly used with
> tcpdump(raw_socket) and new af_packet fanout both have cb cleared
> on RX, because it just came out of alloc_skb and no layers were called,
> and on TX we can clear 20 bytes in dev_queue_xmit_nit().
> af_unix/netlink also have clean skb. Need to analyze tun and sctp...
> but it feels overly fragile to save a branch in sk_filter,
> so planning to go with
> if(unlikely(prog->cb_access)) memset in sk_filter().

I was also thinking that for dev_queue_xmit_nit(), since we do the skb_clone()
there, to have a clone version (w/o affecting performance of the current one)
that instead of copying cb[] over, it would just do a memset(). But that would
just be limited to AF_PACKET, and doesn't catch all sk_filter() users.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
  2015-10-06 17:56                 ` Eric Dumazet
@ 2015-10-06 18:05                   ` Andy Lutomirski
  2015-10-07  6:05                     ` Ingo Molnar
  2015-10-06 19:26                   ` Alexei Starovoitov
  1 sibling, 1 reply; 30+ messages in thread
From: Andy Lutomirski @ 2015-10-06 18:05 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alexei Starovoitov, Daniel Borkmann, Ingo Molnar,
	David S. Miller, Hannes Frederic Sowa, Eric Dumazet, Kees Cook,
	Linux API, Network Development, linux-kernel

On Tue, Oct 6, 2015 at 10:56 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2015-10-06 at 10:50 -0700, Alexei Starovoitov wrote:
>
>> was also thinking that we can do it only in paths that actually
>> have multiple protocol layers, since today bpf is mainly used with
>> tcpdump(raw_socket) and new af_packet fanout both have cb cleared
>> on RX, because it just came out of alloc_skb and no layers were called,
>> and on TX we can clear 20 bytes in dev_queue_xmit_nit().
>> af_unix/netlink also have clean skb. Need to analyze tun and sctp...
>> but it feels overly fragile to save a branch in sk_filter,
>> so planning to go with
>> if(unlikely(prog->cb_access)) memset in sk_filter().
>>
>
> This will break TCP use of sk_filter().
> skb->cb[] contains useful data in TCP layer.
>
>

Since I don't know too much about the networking details:

1. Does "skb->cb" *ever* contain anything useful for an unprivileged user?

2. Does sbk->cb form a stable ABI?

Unless both answers are solid yesses, then maybe the right solution is
to just deny access entirely to unprivileged users.

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
  2015-10-06 17:56                 ` Eric Dumazet
  2015-10-06 18:05                   ` Andy Lutomirski
@ 2015-10-06 19:26                   ` Alexei Starovoitov
  1 sibling, 0 replies; 30+ messages in thread
From: Alexei Starovoitov @ 2015-10-06 19:26 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Daniel Borkmann, Ingo Molnar, David S. Miller, Andy Lutomirski,
	Hannes Frederic Sowa, Eric Dumazet, Kees Cook, linux-api, netdev,
	linux-kernel

On 10/6/15 10:56 AM, Eric Dumazet wrote:
> On Tue, 2015-10-06 at 10:50 -0700, Alexei Starovoitov wrote:
>
>> was also thinking that we can do it only in paths that actually
>> have multiple protocol layers, since today bpf is mainly used with
>> tcpdump(raw_socket) and new af_packet fanout both have cb cleared
>> on RX, because it just came out of alloc_skb and no layers were called,
>> and on TX we can clear 20 bytes in dev_queue_xmit_nit().
>> af_unix/netlink also have clean skb. Need to analyze tun and sctp...
>> but it feels overly fragile to save a branch in sk_filter,
>> so planning to go with
>> if(unlikely(prog->cb_access)) memset in sk_filter().
>>
>
> This will break TCP use of sk_filter().
> skb->cb[] contains useful data in TCP layer.

oops. thanks for catching. In case of sk_filter on top of tcp sock,
it shouldn't be looking at cb at all.
I'm thinking to send a patch to get rid of cb access for socket filters
all together until better solution found.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
  2015-10-06 18:05                   ` Andy Lutomirski
@ 2015-10-07  6:05                     ` Ingo Molnar
  0 siblings, 0 replies; 30+ messages in thread
From: Ingo Molnar @ 2015-10-07  6:05 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Eric Dumazet, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Hannes Frederic Sowa, Eric Dumazet, Kees Cook,
	Linux API, Network Development, linux-kernel


* Andy Lutomirski <luto@amacapital.net> wrote:

> On Tue, Oct 6, 2015 at 10:56 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Tue, 2015-10-06 at 10:50 -0700, Alexei Starovoitov wrote:
> >
> >> was also thinking that we can do it only in paths that actually
> >> have multiple protocol layers, since today bpf is mainly used with
> >> tcpdump(raw_socket) and new af_packet fanout both have cb cleared
> >> on RX, because it just came out of alloc_skb and no layers were called,
> >> and on TX we can clear 20 bytes in dev_queue_xmit_nit().
> >> af_unix/netlink also have clean skb. Need to analyze tun and sctp...
> >> but it feels overly fragile to save a branch in sk_filter,
> >> so planning to go with
> >> if(unlikely(prog->cb_access)) memset in sk_filter().
> >>
> >
> > This will break TCP use of sk_filter().
> > skb->cb[] contains useful data in TCP layer.
> >
> >
> 
> Since I don't know too much about the networking details:
> 
> 1. Does "skb->cb" *ever* contain anything useful for an unprivileged user?
> 
> 2. Does sbk->cb form a stable ABI?
> 
> Unless both answers are solid yesses, then maybe the right solution is
> to just deny access entirely to unprivileged users.

So this kind of instrumentation data is not an ABI in a similar fashion as tracing 
information is not an ABI either.

I.e. tracepoints can (and sometimes do) change 'semantics' - in that the 
interpretation of the implementational details behind that data changes as the 
implementation changes. That's not something that can ever be an ABI, just like 
the contents of /proc/kcore or /proc/slabinfo can not be an ABI.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
  2015-10-06 12:45       ` Daniel Borkmann
@ 2015-10-07 21:20         ` Alexei Starovoitov
  2015-10-07 22:07           ` Daniel Borkmann
  0 siblings, 1 reply; 30+ messages in thread
From: Alexei Starovoitov @ 2015-10-07 21:20 UTC (permalink / raw)
  To: Daniel Borkmann, David S. Miller
  Cc: Andy Lutomirski, Ingo Molnar, Hannes Frederic Sowa, Eric Dumazet,
	Kees Cook, linux-api, netdev, linux-kernel

On 10/6/15 5:45 AM, Daniel Borkmann wrote:
> Should instead something similar be adapted on bpf(2) as well? Or, would
> that even be more painful for application developers shipping their stuff
> through distros in the end (where they might then decide to just setup
> everything BPF-related and then drop privs)?

I think loading as root and then dropping privs won't work in many
cases, since apps still need to access maps even after dropping privs
and today it's not possible, since cap_sys_admin is tested for every
bpf syscall.

> I'm also wondering with regards to seccomp, which could adapt to eBPF at
> some point and be used by unprivileged programs. Perhaps then, a single
> paranoia alike setting might not suit to all eBPF subsystem users. Any
> ideas?

There is no such paranoid sysctl for cBPF, so there is no reason to
add one for eBPF other than fear.
Adding multiple sysctl knobs for seccomp, socket, tracing is only
reflection of even higher fear.
What sysadmins suppose to do with such sysctl when kernel is kinda
saying 'may be something unsafe here you're on your own' ?
Also the presence of this sysctl_bpf_enable_unprivileged or any other
one doesn't help with CVEs. Any bug with security implications will
be a CVE regardless, so I think the better course of action is to
avoid introducing this sysctl.

We've discussed adding something like CAP_BPF to control it,
but then again, do we want this because of fear of bugs or because
it's actually needed. I think the design of all CAP_* is to give
unprivileged users permissions to do something beyond normal that
can potentially be harmful for other users or the whole system.
In this case it's not the case. One user can load eBPF programs
and maps up to its MEMLOCK limit and they cannot interfere with
other users or affect the host, so CAP_BPF is not necessary either.

Thoughts?


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
  2015-10-07 21:20         ` Alexei Starovoitov
@ 2015-10-07 22:07           ` Daniel Borkmann
  2015-10-07 22:22             ` Kees Cook
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel Borkmann @ 2015-10-07 22:07 UTC (permalink / raw)
  To: Alexei Starovoitov, David S. Miller
  Cc: Andy Lutomirski, Ingo Molnar, Hannes Frederic Sowa, Eric Dumazet,
	Kees Cook, linux-api, netdev, linux-kernel

On 10/07/2015 11:20 PM, Alexei Starovoitov wrote:
> On 10/6/15 5:45 AM, Daniel Borkmann wrote:
>> Should instead something similar be adapted on bpf(2) as well? Or, would
>> that even be more painful for application developers shipping their stuff
>> through distros in the end (where they might then decide to just setup
>> everything BPF-related and then drop privs)?
>
> I think loading as root and then dropping privs won't work in many
> cases, since apps still need to access maps even after dropping privs
> and today it's not possible, since cap_sys_admin is tested for every
> bpf syscall.

Yep, maps-only would then need to be made accessible in some way.

>> I'm also wondering with regards to seccomp, which could adapt to eBPF at
>> some point and be used by unprivileged programs. Perhaps then, a single
>> paranoia alike setting might not suit to all eBPF subsystem users. Any
>> ideas?
>
> There is no such paranoid sysctl for cBPF, so there is no reason to
> add one for eBPF other than fear.
> Adding multiple sysctl knobs for seccomp, socket, tracing is only
> reflection of even higher fear.
> What sysadmins suppose to do with such sysctl when kernel is kinda
> saying 'may be something unsafe here you're on your own' ?
> Also the presence of this sysctl_bpf_enable_unprivileged or any other
> one doesn't help with CVEs. Any bug with security implications will
> be a CVE regardless, so I think the better course of action is to
> avoid introducing this sysctl.

Yes, I agree with you that there would be a CVE regardless. I still
like the option of configurable access, not a big fan of the sysctl
either. Thinking out loudly, what about a Kconfig option? We started
out like this on bpf(2) itself (initially under expert settings, now
afaik not anymore), and depending on usage scenarios, a requirement
could be to have immutable cap_sys_admin-only, for other use-cases a
requirement on the kernel might instead be to have unprivileged users
as well.

> We've discussed adding something like CAP_BPF to control it,
> but then again, do we want this because of fear of bugs or because
> it's actually needed. I think the design of all CAP_* is to give
> unprivileged users permissions to do something beyond normal that
> can potentially be harmful for other users or the whole system.
> In this case it's not the case. One user can load eBPF programs
> and maps up to its MEMLOCK limit and they cannot interfere with
> other users or affect the host, so CAP_BPF is not necessary either.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
  2015-10-07 22:07           ` Daniel Borkmann
@ 2015-10-07 22:22             ` Kees Cook
  2015-10-07 23:49               ` Alexei Starovoitov
  0 siblings, 1 reply; 30+ messages in thread
From: Kees Cook @ 2015-10-07 22:22 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, David S. Miller, Andy Lutomirski,
	Ingo Molnar, Hannes Frederic Sowa, Eric Dumazet, Linux API,
	Network Development, LKML

On Wed, Oct 7, 2015 at 3:07 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> On 10/07/2015 11:20 PM, Alexei Starovoitov wrote:
>>
>> On 10/6/15 5:45 AM, Daniel Borkmann wrote:
>>>
>>> Should instead something similar be adapted on bpf(2) as well? Or, would
>>> that even be more painful for application developers shipping their stuff
>>> through distros in the end (where they might then decide to just setup
>>> everything BPF-related and then drop privs)?
>>
>>
>> I think loading as root and then dropping privs won't work in many
>> cases, since apps still need to access maps even after dropping privs
>> and today it's not possible, since cap_sys_admin is tested for every
>> bpf syscall.
>
>
> Yep, maps-only would then need to be made accessible in some way.
>
>>> I'm also wondering with regards to seccomp, which could adapt to eBPF at
>>> some point and be used by unprivileged programs. Perhaps then, a single
>>> paranoia alike setting might not suit to all eBPF subsystem users. Any
>>> ideas?
>>
>>
>> There is no such paranoid sysctl for cBPF, so there is no reason to
>> add one for eBPF other than fear.
>> Adding multiple sysctl knobs for seccomp, socket, tracing is only
>> reflection of even higher fear.
>> What sysadmins suppose to do with such sysctl when kernel is kinda
>> saying 'may be something unsafe here you're on your own' ?
>> Also the presence of this sysctl_bpf_enable_unprivileged or any other
>> one doesn't help with CVEs. Any bug with security implications will
>> be a CVE regardless, so I think the better course of action is to
>> avoid introducing this sysctl.
>
>
> Yes, I agree with you that there would be a CVE regardless. I still
> like the option of configurable access, not a big fan of the sysctl
> either. Thinking out loudly, what about a Kconfig option? We started
> out like this on bpf(2) itself (initially under expert settings, now
> afaik not anymore), and depending on usage scenarios, a requirement
> could be to have immutable cap_sys_admin-only, for other use-cases a
> requirement on the kernel might instead be to have unprivileged users
> as well.

It'd be nice to have it just be a Kconfig, but this shoots
distro-users in the foot if a distro decides to include unpriv bpf and
the user doesn't want it. I think it's probably a good idea to keep
the sysctl.

-Kees

>
>> We've discussed adding something like CAP_BPF to control it,
>> but then again, do we want this because of fear of bugs or because
>> it's actually needed. I think the design of all CAP_* is to give
>> unprivileged users permissions to do something beyond normal that
>> can potentially be harmful for other users or the whole system.
>> In this case it's not the case. One user can load eBPF programs
>> and maps up to its MEMLOCK limit and they cannot interfere with
>> other users or affect the host, so CAP_BPF is not necessary either.
>
>
> Thanks,
> Daniel



-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
  2015-10-07 22:22             ` Kees Cook
@ 2015-10-07 23:49               ` Alexei Starovoitov
  2015-10-08  6:21                 ` Ingo Molnar
  2015-10-08 17:42                 ` Kees Cook
  0 siblings, 2 replies; 30+ messages in thread
From: Alexei Starovoitov @ 2015-10-07 23:49 UTC (permalink / raw)
  To: Kees Cook, Daniel Borkmann
  Cc: David S. Miller, Andy Lutomirski, Ingo Molnar,
	Hannes Frederic Sowa, Eric Dumazet, Linux API,
	Network Development, LKML

On 10/7/15 3:22 PM, Kees Cook wrote:
>> Yes, I agree with you that there would be a CVE regardless. I still
>> >like the option of configurable access, not a big fan of the sysctl
>> >either. Thinking out loudly, what about a Kconfig option? We started
>> >out like this on bpf(2) itself (initially under expert settings, now
>> >afaik not anymore), and depending on usage scenarios, a requirement
>> >could be to have immutable cap_sys_admin-only, for other use-cases a
>> >requirement on the kernel might instead be to have unprivileged users
>> >as well.
> It'd be nice to have it just be a Kconfig, but this shoots
> distro-users in the foot if a distro decides to include unpriv bpf and
> the user doesn't want it. I think it's probably a good idea to keep
> the sysctl.

I don't like introducing Kconfig for no clear reason. It only adds
to the testing matrix and makes it harder to hack around.
Paranoid distros can disable bpf via single config already,
there is no reason to go more fine grained here.
Unpriv checks add minimal amount of code, so even for tinification
purpose there is no need to chop of few bytes. tiny kernels would
disable bpf all together.

As far as sysctl we can look at two with similar purpose:
sysctl_perf_event_paranoid and modules_disabled.
First one is indeed multi level, but not because of the fear of bugs,
but because of real security implications. Like raw events on
hyperthreaded cpu or uncore events can extract data from other
user processes. So it controls these extra privileges.
For bpf there are no hw implications to deal with.
If we make seccomp+bpf in the future it shouldn't need another knob
or extra bit. There are no extra privileges to grant, so not needed.

modules_disabled is off by default and can be toggled on once.
I think for paranoid distro users that "don't want bpf" that is
the better model.
So I'm thinking to do sysctl_unprivileged_bpf_disabled that will be
0=off by default (meaning that users can load unpriv socket filter
programs and seccomp in the future) and that can be switched
to 1=on once and stay that way until reboot.
I think that's the best balance that avoids adding checks to all
apps that want to use bpf and admins can still act on it.
 From app point of view it's no different than bpf syscall
was not compiled in. So single feature test for bpf syscall will
be enough.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
  2015-10-05 20:48 ` [PATCH net-next 1/2] bpf: enable non-root eBPF programs Alexei Starovoitov
  2015-10-05 21:00   ` Kees Cook
  2015-10-05 22:14   ` Daniel Borkmann
@ 2015-10-08  2:29   ` Alexei Starovoitov
  2 siblings, 0 replies; 30+ messages in thread
From: Alexei Starovoitov @ 2015-10-08  2:29 UTC (permalink / raw)
  To: David S. Miller
  Cc: Andy Lutomirski, Ingo Molnar, Hannes Frederic Sowa, Eric Dumazet,
	Daniel Borkmann, Kees Cook, linux-api, netdev, linux-kernel

On 10/5/15 1:48 PM, Alexei Starovoitov wrote:
> Unprivileged socket filter bpf programs have access to the
> following helper functions:
> - map lookup/update/delete (but they cannot store kernel pointers into them)
> - get_random (it's already exposed to unprivileged user space)
> - get_smp_processor_id
> - tail_call into another socket filter program
> - ktime_get_ns
> - bpf_trace_printk (for debugging)

while reviewing everything for Nth time realized that
bpf_trace_printk is useless for unprivileged users, since
trace_pipe is root only.
So going to drop it in V2.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
  2015-10-07 23:49               ` Alexei Starovoitov
@ 2015-10-08  6:21                 ` Ingo Molnar
  2015-10-08  6:30                   ` Alexei Starovoitov
  2015-10-08 17:42                 ` Kees Cook
  1 sibling, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2015-10-08  6:21 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Kees Cook, Daniel Borkmann, David S. Miller, Andy Lutomirski,
	Hannes Frederic Sowa, Eric Dumazet, Linux API,
	Network Development, LKML


* Alexei Starovoitov <ast@plumgrid.com> wrote:

> As far as sysctl we can look at two with similar purpose:
> sysctl_perf_event_paranoid and modules_disabled.
> First one is indeed multi level, but not because of the fear of bugs,
> but because of real security implications.

It serves both purposes flexibly, and note that most people and distros will use 
the default value.

> [...] Like raw events on hyperthreaded cpu or uncore events can extract data 
> from other user processes. So it controls these extra privileges.

It also controls the generally increased risk caused by a larger attack surface, 
which some users may not want to carry and which they can thus shrink.

With a static keys approach there would be no runtime overhead worth speaking of, 
so I see no reason why unprivileged eBPF couldn't have a sysctl too - with the 
default value set to permissive.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
  2015-10-08  6:21                 ` Ingo Molnar
@ 2015-10-08  6:30                   ` Alexei Starovoitov
  0 siblings, 0 replies; 30+ messages in thread
From: Alexei Starovoitov @ 2015-10-08  6:30 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Daniel Borkmann, David S. Miller, Andy Lutomirski,
	Hannes Frederic Sowa, Eric Dumazet, Linux API,
	Network Development, LKML

On 10/7/15 11:21 PM, Ingo Molnar wrote:
> so I see no reason why unprivileged eBPF couldn't have a sysctl too - with the
> default value set to permissive.

agreed. sent out v2 follows 'modules_disabled' style.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
  2015-10-07 23:49               ` Alexei Starovoitov
  2015-10-08  6:21                 ` Ingo Molnar
@ 2015-10-08 17:42                 ` Kees Cook
  1 sibling, 0 replies; 30+ messages in thread
From: Kees Cook @ 2015-10-08 17:42 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Borkmann, David S. Miller, Andy Lutomirski, Ingo Molnar,
	Hannes Frederic Sowa, Eric Dumazet, Linux API,
	Network Development, LKML

On Wed, Oct 7, 2015 at 4:49 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> On 10/7/15 3:22 PM, Kees Cook wrote:
>>>
>>> Yes, I agree with you that there would be a CVE regardless. I still
>>> >like the option of configurable access, not a big fan of the sysctl
>>> >either. Thinking out loudly, what about a Kconfig option? We started
>>> >out like this on bpf(2) itself (initially under expert settings, now
>>> >afaik not anymore), and depending on usage scenarios, a requirement
>>> >could be to have immutable cap_sys_admin-only, for other use-cases a
>>> >requirement on the kernel might instead be to have unprivileged users
>>> >as well.
>>
>> It'd be nice to have it just be a Kconfig, but this shoots
>> distro-users in the foot if a distro decides to include unpriv bpf and
>> the user doesn't want it. I think it's probably a good idea to keep
>> the sysctl.
>
>
> I don't like introducing Kconfig for no clear reason. It only adds
> to the testing matrix and makes it harder to hack around.
> Paranoid distros can disable bpf via single config already,
> there is no reason to go more fine grained here.
> Unpriv checks add minimal amount of code, so even for tinification
> purpose there is no need to chop of few bytes. tiny kernels would
> disable bpf all together.
>
> As far as sysctl we can look at two with similar purpose:
> sysctl_perf_event_paranoid and modules_disabled.
> First one is indeed multi level, but not because of the fear of bugs,
> but because of real security implications. Like raw events on
> hyperthreaded cpu or uncore events can extract data from other
> user processes. So it controls these extra privileges.
> For bpf there are no hw implications to deal with.
> If we make seccomp+bpf in the future it shouldn't need another knob
> or extra bit. There are no extra privileges to grant, so not needed.
>
> modules_disabled is off by default and can be toggled on once.
> I think for paranoid distro users that "don't want bpf" that is
> the better model.
> So I'm thinking to do sysctl_unprivileged_bpf_disabled that will be
> 0=off by default (meaning that users can load unpriv socket filter
> programs and seccomp in the future) and that can be switched
> to 1=on once and stay that way until reboot.
> I think that's the best balance that avoids adding checks to all
> apps that want to use bpf and admins can still act on it.
> From app point of view it's no different than bpf syscall
> was not compiled in. So single feature test for bpf syscall will
> be enough.

I think this would be great. :)

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2015-10-08 17:42 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-05 20:48 [PATCH net-next 0/2] bpf: unprivileged Alexei Starovoitov
2015-10-05 20:48 ` [PATCH net-next 1/2] bpf: enable non-root eBPF programs Alexei Starovoitov
2015-10-05 21:00   ` Kees Cook
2015-10-05 21:12     ` Alexei Starovoitov
2015-10-05 21:16       ` Andy Lutomirski
2015-10-05 21:32         ` Alexei Starovoitov
2015-10-05 22:02       ` Kees Cook
2015-10-06  0:28         ` Alexei Starovoitov
2015-10-05 22:14   ` Daniel Borkmann
2015-10-06  0:51     ` Alexei Starovoitov
2015-10-06  7:13       ` Ingo Molnar
2015-10-06  8:05         ` Daniel Borkmann
2015-10-06  8:20           ` Ingo Molnar
2015-10-06  8:39             ` Daniel Borkmann
2015-10-06 17:50               ` Alexei Starovoitov
2015-10-06 17:56                 ` Eric Dumazet
2015-10-06 18:05                   ` Andy Lutomirski
2015-10-07  6:05                     ` Ingo Molnar
2015-10-06 19:26                   ` Alexei Starovoitov
2015-10-06 18:03                 ` Daniel Borkmann
2015-10-06 12:45       ` Daniel Borkmann
2015-10-07 21:20         ` Alexei Starovoitov
2015-10-07 22:07           ` Daniel Borkmann
2015-10-07 22:22             ` Kees Cook
2015-10-07 23:49               ` Alexei Starovoitov
2015-10-08  6:21                 ` Ingo Molnar
2015-10-08  6:30                   ` Alexei Starovoitov
2015-10-08 17:42                 ` Kees Cook
2015-10-08  2:29   ` Alexei Starovoitov
2015-10-05 20:48 ` [PATCH net-next 2/2] bpf: charge user for creation of BPF maps and programs Alexei Starovoitov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).