All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC bpf-next 00/11] Add socket lookup support
@ 2018-05-09 21:06 Joe Stringer
  2018-05-09 21:06 ` [RFC bpf-next 01/11] bpf: Add iterator for spilled registers Joe Stringer
                   ` (11 more replies)
  0 siblings, 12 replies; 26+ messages in thread
From: Joe Stringer @ 2018-05-09 21:06 UTC (permalink / raw)
  To: daniel; +Cc: netdev, ast, john.fastabend, tgraf, kafai

This series proposes a new helper for the BPF API which allows BPF programs to
perform lookups for sockets in a network namespace. This would allow programs
to determine early on in processing whether the stack is expecting to receive
the packet, and perform some action (eg drop, forward somewhere) based on this
information.

The series is structured roughly into:
* Misc refactor
* Add the socket pointer type
* Add reference tracking to ensure that socket references are freed
* Extend the BPF API to add sk_lookup() / sk_release() functions
* Add tests/documentation

The helper proposed in this series includes a parameter for a tuple which must
be filled in by the caller to determine the socket to look up. The simplest
case would be filling with the contents of the packet, ie mapping the packet's
5-tuple into the parameter. In common cases, it may alternatively be useful to
reverse the direction of the tuple and perform a lookup, to find the socket
that initiates this connection; and if the BPF program ever performs a form of
IP address translation, it may further be useful to be able to look up
arbitrary tuples that are not based upon the packet, but instead based on state
held in BPF maps or hardcoded in the BPF program.

Currently, access into the socket's fields are limited to those which are
otherwise already accessible, and are restricted to read-only access.

A few open points:
* Currently, the lookup interface only returns either a valid socket or a NULL
  pointer. This means that if there is any kind of issue with the tuple, such
  as it provides an unsupported protocol number, or the socket can't be found,
  then we are unable to differentiate these cases from one another. One natural
  approach to improve this could be to return an ERR_PTR from the
  bpf_sk_lookup() helper. This would be more complicated but maybe it's
  worthwhile.
* No ordering is defined between sockets. If the tuple could find multiple
  sockets, then it will arbitrarily return one. It is up to the caller to
  handle this. If we wish to handle this more reliably in future, we could
  encode an ordering preference in the flags field.
* Currently this helper is only defined for TC hook point, but it should also
  be valid at XDP and perhaps some other hooks.

Joe Stringer (11):
  bpf: Add iterator for spilled registers
  bpf: Simplify ptr_min_max_vals adjustment
  bpf: Generalize ptr_or_null regs check
  bpf: Add PTR_TO_SOCKET verifier type
  bpf: Macrofy stack state copy
  bpf: Add reference tracking to verifier
  bpf: Add helper to retrieve socket in BPF
  selftests/bpf: Add tests for reference tracking
  libbpf: Support loading individual progs
  selftests/bpf: Add C tests for reference tracking
  Documentation: Describe bpf reference tracking

 Documentation/networking/filter.txt               |  64 +++
 include/linux/bpf.h                               |  19 +-
 include/linux/bpf_verifier.h                      |  31 +-
 include/uapi/linux/bpf.h                          |  39 +-
 kernel/bpf/verifier.c                             | 548 ++++++++++++++++++----
 net/core/filter.c                                 | 132 +++++-
 tools/include/uapi/linux/bpf.h                    |  40 +-
 tools/lib/bpf/libbpf.c                            |   4 +-
 tools/lib/bpf/libbpf.h                            |   3 +
 tools/testing/selftests/bpf/Makefile              |   2 +-
 tools/testing/selftests/bpf/bpf_helpers.h         |   7 +
 tools/testing/selftests/bpf/test_progs.c          |  38 ++
 tools/testing/selftests/bpf/test_sk_lookup_kern.c | 127 +++++
 tools/testing/selftests/bpf/test_verifier.c       | 373 ++++++++++++++-
 14 files changed, 1299 insertions(+), 128 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/test_sk_lookup_kern.c

-- 
2.14.1

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC bpf-next 01/11] bpf: Add iterator for spilled registers
  2018-05-09 21:06 [RFC bpf-next 00/11] Add socket lookup support Joe Stringer
@ 2018-05-09 21:06 ` Joe Stringer
  2018-05-09 21:07 ` [RFC bpf-next 02/11] bpf: Simplify ptr_min_max_vals adjustment Joe Stringer
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Joe Stringer @ 2018-05-09 21:06 UTC (permalink / raw)
  To: daniel; +Cc: netdev, ast, john.fastabend, kafai

Add this iterator for spilled registers, it concentrates the details of
how to get the current frame's spilled registers into a single macro
while clarifying the intention of the code which is calling the macro.

Signed-off-by: Joe Stringer <joe@wand.net.nz>
---
 include/linux/bpf_verifier.h | 11 +++++++++++
 kernel/bpf/verifier.c        | 16 +++++++---------
 2 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 8f70dc181e23..a613b52ce939 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -133,6 +133,17 @@ struct bpf_verifier_state {
 	u32 curframe;
 };
 
+#define __get_spilled_reg(slot, frame)					\
+	(((slot < frame->allocated_stack / BPF_REG_SIZE) &&		\
+	  (frame->stack[slot].slot_type[0] == STACK_SPILL))		\
+	 ? &frame->stack[slot].spilled_ptr : NULL)
+
+/* Iterate over 'frame', setting 'reg' to either NULL or a spilled register. */
+#define for_each_spilled_reg(iter, frame, reg)				\
+	for (iter = 0, reg = __get_spilled_reg(iter, frame);		\
+	     iter < frame->allocated_stack / BPF_REG_SIZE;		\
+	     iter++, reg = __get_spilled_reg(iter, frame))
+
 /* linked list of verifier states used to prune search */
 struct bpf_verifier_state_list {
 	struct bpf_verifier_state state;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d92d9c37affd..f40e089c3893 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -2216,10 +2216,9 @@ static void __clear_all_pkt_pointers(struct bpf_verifier_env *env,
 		if (reg_is_pkt_pointer_any(&regs[i]))
 			mark_reg_unknown(env, regs, i);
 
-	for (i = 0; i < state->allocated_stack / BPF_REG_SIZE; i++) {
-		if (state->stack[i].slot_type[0] != STACK_SPILL)
+	for_each_spilled_reg(i, state, reg) {
+		if (!reg)
 			continue;
-		reg = &state->stack[i].spilled_ptr;
 		if (reg_is_pkt_pointer_any(reg))
 			__mark_reg_unknown(reg);
 	}
@@ -3326,10 +3325,9 @@ static void find_good_pkt_pointers(struct bpf_verifier_state *vstate,
 
 	for (j = 0; j <= vstate->curframe; j++) {
 		state = vstate->frame[j];
-		for (i = 0; i < state->allocated_stack / BPF_REG_SIZE; i++) {
-			if (state->stack[i].slot_type[0] != STACK_SPILL)
+		for_each_spilled_reg(i, state, reg) {
+			if (!reg)
 				continue;
-			reg = &state->stack[i].spilled_ptr;
 			if (reg->type == type && reg->id == dst_reg->id)
 				reg->range = max(reg->range, new_range);
 		}
@@ -3574,7 +3572,7 @@ static void mark_map_regs(struct bpf_verifier_state *vstate, u32 regno,
 			  bool is_null)
 {
 	struct bpf_func_state *state = vstate->frame[vstate->curframe];
-	struct bpf_reg_state *regs = state->regs;
+	struct bpf_reg_state *reg, *regs = state->regs;
 	u32 id = regs[regno].id;
 	int i, j;
 
@@ -3583,8 +3581,8 @@ static void mark_map_regs(struct bpf_verifier_state *vstate, u32 regno,
 
 	for (j = 0; j <= vstate->curframe; j++) {
 		state = vstate->frame[j];
-		for (i = 0; i < state->allocated_stack / BPF_REG_SIZE; i++) {
-			if (state->stack[i].slot_type[0] != STACK_SPILL)
+		for_each_spilled_reg(i, state, reg) {
+			if (!reg)
 				continue;
 			mark_map_reg(&state->stack[i].spilled_ptr, 0, id, is_null);
 		}
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC bpf-next 02/11] bpf: Simplify ptr_min_max_vals adjustment
  2018-05-09 21:06 [RFC bpf-next 00/11] Add socket lookup support Joe Stringer
  2018-05-09 21:06 ` [RFC bpf-next 01/11] bpf: Add iterator for spilled registers Joe Stringer
@ 2018-05-09 21:07 ` Joe Stringer
  2018-05-09 21:07 ` [RFC bpf-next 03/11] bpf: Generalize ptr_or_null regs check Joe Stringer
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Joe Stringer @ 2018-05-09 21:07 UTC (permalink / raw)
  To: daniel; +Cc: netdev, ast, john.fastabend, kafai

An upcoming commit will add another two pointer types that need very
similar behaviour, so generalise this function now.

Signed-off-by: Joe Stringer <joe@wand.net.nz>
---
 kernel/bpf/verifier.c                       | 22 ++++++++++------------
 tools/testing/selftests/bpf/test_verifier.c | 14 +++++++-------
 2 files changed, 17 insertions(+), 19 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index f40e089c3893..a32b560072d7 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -2602,20 +2602,18 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
 		return -EACCES;
 	}
 
-	if (ptr_reg->type == PTR_TO_MAP_VALUE_OR_NULL) {
-		verbose(env, "R%d pointer arithmetic on PTR_TO_MAP_VALUE_OR_NULL prohibited, null-check it first\n",
-			dst);
-		return -EACCES;
-	}
-	if (ptr_reg->type == CONST_PTR_TO_MAP) {
-		verbose(env, "R%d pointer arithmetic on CONST_PTR_TO_MAP prohibited\n",
-			dst);
+	switch (ptr_reg->type) {
+	case PTR_TO_MAP_VALUE_OR_NULL:
+		verbose(env, "R%d pointer arithmetic on %s prohibited, null-check it first\n",
+			dst, reg_type_str[ptr_reg->type]);
 		return -EACCES;
-	}
-	if (ptr_reg->type == PTR_TO_PACKET_END) {
-		verbose(env, "R%d pointer arithmetic on PTR_TO_PACKET_END prohibited\n",
-			dst);
+	case CONST_PTR_TO_MAP:
+	case PTR_TO_PACKET_END:
+		verbose(env, "R%d pointer arithmetic on %s prohibited\n",
+			dst, reg_type_str[ptr_reg->type]);
 		return -EACCES;
+	default:
+		break;
 	}
 
 	/* In case of 'scalar += pointer', dst_reg inherits pointer type and id.
diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index 275b4570b5b8..53439f40e1de 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -3497,7 +3497,7 @@ static struct bpf_test tests[] = {
 			BPF_MOV64_IMM(BPF_REG_0, 0),
 			BPF_EXIT_INSN(),
 		},
-		.errstr = "R3 pointer arithmetic on PTR_TO_PACKET_END",
+		.errstr = "R3 pointer arithmetic on pkt_end",
 		.result = REJECT,
 		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	},
@@ -4525,7 +4525,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map1 = { 4 },
-		.errstr = "R4 pointer arithmetic on PTR_TO_MAP_VALUE_OR_NULL",
+		.errstr = "R4 pointer arithmetic on map_value_or_null",
 		.result = REJECT,
 		.prog_type = BPF_PROG_TYPE_SCHED_CLS
 	},
@@ -4546,7 +4546,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map1 = { 4 },
-		.errstr = "R4 pointer arithmetic on PTR_TO_MAP_VALUE_OR_NULL",
+		.errstr = "R4 pointer arithmetic on map_value_or_null",
 		.result = REJECT,
 		.prog_type = BPF_PROG_TYPE_SCHED_CLS
 	},
@@ -4567,7 +4567,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map1 = { 4 },
-		.errstr = "R4 pointer arithmetic on PTR_TO_MAP_VALUE_OR_NULL",
+		.errstr = "R4 pointer arithmetic on map_value_or_null",
 		.result = REJECT,
 		.prog_type = BPF_PROG_TYPE_SCHED_CLS
 	},
@@ -6864,7 +6864,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map_in_map = { 3 },
-		.errstr = "R1 pointer arithmetic on CONST_PTR_TO_MAP prohibited",
+		.errstr = "R1 pointer arithmetic on map_ptr prohibited",
 		.result = REJECT,
 	},
 	{
@@ -8538,7 +8538,7 @@ static struct bpf_test tests[] = {
 			BPF_MOV64_IMM(BPF_REG_0, 0),
 			BPF_EXIT_INSN(),
 		},
-		.errstr = "R3 pointer arithmetic on PTR_TO_PACKET_END",
+		.errstr = "R3 pointer arithmetic on pkt_end",
 		.result = REJECT,
 		.prog_type = BPF_PROG_TYPE_XDP,
 	},
@@ -8557,7 +8557,7 @@ static struct bpf_test tests[] = {
 			BPF_MOV64_IMM(BPF_REG_0, 0),
 			BPF_EXIT_INSN(),
 		},
-		.errstr = "R3 pointer arithmetic on PTR_TO_PACKET_END",
+		.errstr = "R3 pointer arithmetic on pkt_end",
 		.result = REJECT,
 		.prog_type = BPF_PROG_TYPE_XDP,
 	},
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC bpf-next 03/11] bpf: Generalize ptr_or_null regs check
  2018-05-09 21:06 [RFC bpf-next 00/11] Add socket lookup support Joe Stringer
  2018-05-09 21:06 ` [RFC bpf-next 01/11] bpf: Add iterator for spilled registers Joe Stringer
  2018-05-09 21:07 ` [RFC bpf-next 02/11] bpf: Simplify ptr_min_max_vals adjustment Joe Stringer
@ 2018-05-09 21:07 ` Joe Stringer
  2018-05-09 21:07 ` [RFC bpf-next 04/11] bpf: Add PTR_TO_SOCKET verifier type Joe Stringer
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Joe Stringer @ 2018-05-09 21:07 UTC (permalink / raw)
  To: daniel; +Cc: netdev, ast, john.fastabend, kafai

This check will be reused by an upcoming commit for conditional jump
checks for sockets. Refactor it a bit to simplify the later commit.

Signed-off-by: Joe Stringer <joe@wand.net.nz>
---
 kernel/bpf/verifier.c | 43 +++++++++++++++++++++++++------------------
 1 file changed, 25 insertions(+), 18 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index a32b560072d7..1b31b805dea4 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -227,6 +227,11 @@ static bool type_is_pkt_pointer(enum bpf_reg_type type)
 	       type == PTR_TO_PACKET_META;
 }
 
+static bool reg_type_may_be_null(enum bpf_reg_type type)
+{
+	return type == PTR_TO_MAP_VALUE_OR_NULL;
+}
+
 /* string representation of 'enum bpf_reg_type' */
 static const char * const reg_type_str[] = {
 	[NOT_INIT]		= "?",
@@ -3531,12 +3536,10 @@ static void reg_combine_min_max(struct bpf_reg_state *true_src,
 	}
 }
 
-static void mark_map_reg(struct bpf_reg_state *regs, u32 regno, u32 id,
-			 bool is_null)
+static void mark_ptr_or_null_reg(struct bpf_reg_state *reg, u32 id,
+				 bool is_null)
 {
-	struct bpf_reg_state *reg = &regs[regno];
-
-	if (reg->type == PTR_TO_MAP_VALUE_OR_NULL && reg->id == id) {
+	if (reg_type_may_be_null(reg->type) && reg->id == id) {
 		/* Old offset (both fixed and variable parts) should
 		 * have been known-zero, because we don't allow pointer
 		 * arithmetic on pointers that might be NULL.
@@ -3549,11 +3552,13 @@ static void mark_map_reg(struct bpf_reg_state *regs, u32 regno, u32 id,
 		}
 		if (is_null) {
 			reg->type = SCALAR_VALUE;
-		} else if (reg->map_ptr->inner_map_meta) {
-			reg->type = CONST_PTR_TO_MAP;
-			reg->map_ptr = reg->map_ptr->inner_map_meta;
-		} else {
-			reg->type = PTR_TO_MAP_VALUE;
+		} else if (reg->type == PTR_TO_MAP_VALUE_OR_NULL) {
+			if (reg->map_ptr->inner_map_meta) {
+				reg->type = CONST_PTR_TO_MAP;
+				reg->map_ptr = reg->map_ptr->inner_map_meta;
+			} else {
+				reg->type = PTR_TO_MAP_VALUE;
+			}
 		}
 		/* We don't need id from this point onwards anymore, thus we
 		 * should better reset it, so that state pruning has chances
@@ -3566,8 +3571,8 @@ static void mark_map_reg(struct bpf_reg_state *regs, u32 regno, u32 id,
 /* The logic is similar to find_good_pkt_pointers(), both could eventually
  * be folded together at some point.
  */
-static void mark_map_regs(struct bpf_verifier_state *vstate, u32 regno,
-			  bool is_null)
+static void mark_ptr_or_null_regs(struct bpf_verifier_state *vstate, u32 regno,
+				  bool is_null)
 {
 	struct bpf_func_state *state = vstate->frame[vstate->curframe];
 	struct bpf_reg_state *reg, *regs = state->regs;
@@ -3575,14 +3580,14 @@ static void mark_map_regs(struct bpf_verifier_state *vstate, u32 regno,
 	int i, j;
 
 	for (i = 0; i < MAX_BPF_REG; i++)
-		mark_map_reg(regs, i, id, is_null);
+		mark_ptr_or_null_reg(&regs[i], id, is_null);
 
 	for (j = 0; j <= vstate->curframe; j++) {
 		state = vstate->frame[j];
 		for_each_spilled_reg(i, state, reg) {
 			if (!reg)
 				continue;
-			mark_map_reg(&state->stack[i].spilled_ptr, 0, id, is_null);
+			mark_ptr_or_null_reg(reg, id, is_null);
 		}
 	}
 }
@@ -3784,12 +3789,14 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
 	/* detect if R == 0 where R is returned from bpf_map_lookup_elem() */
 	if (BPF_SRC(insn->code) == BPF_K &&
 	    insn->imm == 0 && (opcode == BPF_JEQ || opcode == BPF_JNE) &&
-	    dst_reg->type == PTR_TO_MAP_VALUE_OR_NULL) {
-		/* Mark all identical map registers in each branch as either
+	    reg_type_may_be_null(dst_reg->type)) {
+		/* Mark all identical registers in each branch as either
 		 * safe or unknown depending R == 0 or R != 0 conditional.
 		 */
-		mark_map_regs(this_branch, insn->dst_reg, opcode == BPF_JNE);
-		mark_map_regs(other_branch, insn->dst_reg, opcode == BPF_JEQ);
+		mark_ptr_or_null_regs(this_branch, insn->dst_reg,
+				      opcode == BPF_JNE);
+		mark_ptr_or_null_regs(other_branch, insn->dst_reg,
+				      opcode == BPF_JEQ);
 	} else if (!try_match_pkt_pointers(insn, dst_reg, &regs[insn->src_reg],
 					   this_branch, other_branch) &&
 		   is_pointer_value(env, insn->dst_reg)) {
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC bpf-next 04/11] bpf: Add PTR_TO_SOCKET verifier type
  2018-05-09 21:06 [RFC bpf-next 00/11] Add socket lookup support Joe Stringer
                   ` (2 preceding siblings ...)
  2018-05-09 21:07 ` [RFC bpf-next 03/11] bpf: Generalize ptr_or_null regs check Joe Stringer
@ 2018-05-09 21:07 ` Joe Stringer
  2018-05-15  2:37   ` Alexei Starovoitov
  2018-05-09 21:07 ` [RFC bpf-next 05/11] bpf: Macrofy stack state copy Joe Stringer
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 26+ messages in thread
From: Joe Stringer @ 2018-05-09 21:07 UTC (permalink / raw)
  To: daniel; +Cc: netdev, ast, john.fastabend, kafai

Teach the verifier a little bit about a new type of pointer, a
PTR_TO_SOCKET. This pointer type is accessed from BPF through the
'struct bpf_sock' structure.

Signed-off-by: Joe Stringer <joe@wand.net.nz>
---
 include/linux/bpf.h          | 19 +++++++++-
 include/linux/bpf_verifier.h |  2 ++
 kernel/bpf/verifier.c        | 86 ++++++++++++++++++++++++++++++++++++++------
 net/core/filter.c            | 30 +++++++++-------
 4 files changed, 114 insertions(+), 23 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index a38e474bf7ee..a03b4b0edcb6 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -136,7 +136,7 @@ enum bpf_arg_type {
 	/* the following constraints used to prototype bpf_memcmp() and other
 	 * functions that access data on eBPF program stack
 	 */
-	ARG_PTR_TO_MEM,		/* pointer to valid memory (stack, packet, map value) */
+	ARG_PTR_TO_MEM,		/* pointer to valid memory (stack, packet, map value, socket) */
 	ARG_PTR_TO_MEM_OR_NULL, /* pointer to valid memory or NULL */
 	ARG_PTR_TO_UNINIT_MEM,	/* pointer to memory does not need to be initialized,
 				 * helper function must fill all bytes or clear
@@ -148,6 +148,7 @@ enum bpf_arg_type {
 
 	ARG_PTR_TO_CTX,		/* pointer to context */
 	ARG_ANYTHING,		/* any (initialized) argument is ok */
+	ARG_PTR_TO_SOCKET,	/* pointer to bpf_sock */
 };
 
 /* type of values returned from helper functions */
@@ -155,6 +156,7 @@ enum bpf_return_type {
 	RET_INTEGER,			/* function returns integer */
 	RET_VOID,			/* function doesn't return anything */
 	RET_PTR_TO_MAP_VALUE_OR_NULL,	/* returns a pointer to map elem value or NULL */
+	RET_PTR_TO_SOCKET_OR_NULL,	/* returns a pointer to a socket or NULL */
 };
 
 /* eBPF function prototype used by verifier to allow BPF_CALLs from eBPF programs
@@ -205,6 +207,8 @@ enum bpf_reg_type {
 	PTR_TO_PACKET_META,	 /* skb->data - meta_len */
 	PTR_TO_PACKET,		 /* reg points to skb->data */
 	PTR_TO_PACKET_END,	 /* skb->data + headlen */
+	PTR_TO_SOCKET,		 /* reg points to struct bpf_sock */
+	PTR_TO_SOCKET_OR_NULL,	 /* reg points to struct bpf_sock or NULL */
 };
 
 /* The information passed from prog-specific *_is_valid_access
@@ -326,6 +330,11 @@ const struct bpf_func_proto *bpf_get_trace_printk_proto(void);
 
 typedef unsigned long (*bpf_ctx_copy_t)(void *dst, const void *src,
 					unsigned long off, unsigned long len);
+typedef u32 (*bpf_convert_ctx_access_t)(enum bpf_access_type type,
+					const struct bpf_insn *src,
+					struct bpf_insn *dst,
+					struct bpf_prog *prog,
+					u32 *target_size);
 
 u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size,
 		     void *ctx, u64 ctx_size, bpf_ctx_copy_t ctx_copy);
@@ -729,4 +738,12 @@ extern const struct bpf_func_proto bpf_sock_map_update_proto;
 void bpf_user_rnd_init_once(void);
 u64 bpf_user_rnd_u32(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
 
+bool bpf_sock_is_valid_access(int off, int size, enum bpf_access_type type,
+			      struct bpf_insn_access_aux *info);
+u32 bpf_sock_convert_ctx_access(enum bpf_access_type type,
+			        const struct bpf_insn *si,
+			        struct bpf_insn *insn_buf,
+			        struct bpf_prog *prog,
+			        u32 *target_size);
+
 #endif /* _LINUX_BPF_H */
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index a613b52ce939..9dcd87f1d322 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -57,6 +57,8 @@ struct bpf_reg_state {
 	 * offset, so they can share range knowledge.
 	 * For PTR_TO_MAP_VALUE_OR_NULL this is used to share which map value we
 	 * came from, when one is tested for != NULL.
+	 * For PTR_TO_SOCKET this is used to share which pointers retain the
+	 * same reference to the socket, to determine proper reference freeing.
 	 */
 	u32 id;
 	/* Ordering of fields matters.  See states_equal() */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 1b31b805dea4..d38c7c1e9da6 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -80,8 +80,8 @@ static const struct bpf_verifier_ops * const bpf_verifier_ops[] = {
  * (like pointer plus pointer becomes SCALAR_VALUE type)
  *
  * When verifier sees load or store instructions the type of base register
- * can be: PTR_TO_MAP_VALUE, PTR_TO_CTX, PTR_TO_STACK. These are three pointer
- * types recognized by check_mem_access() function.
+ * can be: PTR_TO_MAP_VALUE, PTR_TO_CTX, PTR_TO_STACK, PTR_TO_SOCKET. These are
+ * four pointer types recognized by check_mem_access() function.
  *
  * PTR_TO_MAP_VALUE means that this register is pointing to 'map element value'
  * and the range of [ptr, ptr + map's value_size) is accessible.
@@ -244,6 +244,8 @@ static const char * const reg_type_str[] = {
 	[PTR_TO_PACKET]		= "pkt",
 	[PTR_TO_PACKET_META]	= "pkt_meta",
 	[PTR_TO_PACKET_END]	= "pkt_end",
+	[PTR_TO_SOCKET]		= "sock",
+	[PTR_TO_SOCKET_OR_NULL] = "sock_or_null",
 };
 
 static void print_liveness(struct bpf_verifier_env *env,
@@ -977,6 +979,8 @@ static bool is_spillable_regtype(enum bpf_reg_type type)
 	case PTR_TO_PACKET_META:
 	case PTR_TO_PACKET_END:
 	case CONST_PTR_TO_MAP:
+	case PTR_TO_SOCKET:
+	case PTR_TO_SOCKET_OR_NULL:
 		return true;
 	default:
 		return false;
@@ -1360,6 +1364,28 @@ static int check_ctx_access(struct bpf_verifier_env *env, int insn_idx, int off,
 	return -EACCES;
 }
 
+static int check_sock_access(struct bpf_verifier_env *env, u32 regno, int off,
+			     int size, enum bpf_access_type t)
+{
+	struct bpf_reg_state *regs = cur_regs(env);
+	struct bpf_reg_state *reg = &regs[regno];
+	struct bpf_insn_access_aux info;
+
+	if (reg->smin_value < 0) {
+		verbose(env, "R%d min value is negative, either use unsigned index or do a if (index >=0) check.\n",
+			regno);
+		return -EACCES;
+	}
+
+	if (!bpf_sock_is_valid_access(off, size, t, &info)) {
+		verbose(env, "invalid bpf_sock_ops access off=%d size=%d\n",
+			off, size);
+		return -EACCES;
+	}
+
+	return 0;
+}
+
 static bool __is_pointer_value(bool allow_ptr_leaks,
 			       const struct bpf_reg_state *reg)
 {
@@ -1475,6 +1501,9 @@ static int check_ptr_alignment(struct bpf_verifier_env *env,
 		 */
 		strict = true;
 		break;
+	case PTR_TO_SOCKET:
+		pointer_desc = "sock ";
+		break;
 	default:
 		break;
 	}
@@ -1723,6 +1752,16 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 		err = check_packet_access(env, regno, off, size, false);
 		if (!err && t == BPF_READ && value_regno >= 0)
 			mark_reg_unknown(env, regs, value_regno);
+
+	} else if (reg->type == PTR_TO_SOCKET) {
+		if (t == BPF_WRITE) {
+			verbose(env, "cannot write into socket\n");
+			return -EACCES;
+		}
+		err = check_sock_access(env, regno, off, size, t);
+		if (!err && t == BPF_READ && value_regno >= 0)
+			mark_reg_unknown(env, regs, value_regno);
+
 	} else {
 		verbose(env, "R%d invalid mem access '%s'\n", regno,
 			reg_type_str[reg->type]);
@@ -1941,6 +1980,10 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
 		expected_type = PTR_TO_CTX;
 		if (type != expected_type)
 			goto err_type;
+	} else if (arg_type == ARG_PTR_TO_SOCKET) {
+		expected_type = PTR_TO_SOCKET;
+		if (type != expected_type)
+			goto err_type;
 	} else if (arg_type_is_mem_ptr(arg_type)) {
 		expected_type = PTR_TO_STACK;
 		/* One exception here. In case function allows for NULL to be
@@ -2477,6 +2520,10 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn
 			insn_aux->map_ptr = meta.map_ptr;
 		else if (insn_aux->map_ptr != meta.map_ptr)
 			insn_aux->map_ptr = BPF_MAP_PTR_POISON;
+	} else if (fn->ret_type == RET_PTR_TO_SOCKET_OR_NULL) {
+		mark_reg_known_zero(env, regs, BPF_REG_0);
+		regs[BPF_REG_0].type = PTR_TO_SOCKET_OR_NULL;
+		regs[BPF_REG_0].id = ++env->id_gen;
 	} else {
 		verbose(env, "unknown return type %d of func %s#%d\n",
 			fn->ret_type, func_id_name(func_id), func_id);
@@ -2614,6 +2661,8 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
 		return -EACCES;
 	case CONST_PTR_TO_MAP:
 	case PTR_TO_PACKET_END:
+	case PTR_TO_SOCKET:
+	case PTR_TO_SOCKET_OR_NULL:
 		verbose(env, "R%d pointer arithmetic on %s prohibited\n",
 			dst, reg_type_str[ptr_reg->type]);
 		return -EACCES;
@@ -3559,6 +3608,8 @@ static void mark_ptr_or_null_reg(struct bpf_reg_state *reg, u32 id,
 			} else {
 				reg->type = PTR_TO_MAP_VALUE;
 			}
+		} else if (reg->type == PTR_TO_SOCKET_OR_NULL) {
+			reg->type = PTR_TO_SOCKET;
 		}
 		/* We don't need id from this point onwards anymore, thus we
 		 * should better reset it, so that state pruning has chances
@@ -4333,6 +4384,8 @@ static bool regsafe(struct bpf_reg_state *rold, struct bpf_reg_state *rcur,
 	case PTR_TO_CTX:
 	case CONST_PTR_TO_MAP:
 	case PTR_TO_PACKET_END:
+	case PTR_TO_SOCKET:
+	case PTR_TO_SOCKET_OR_NULL:
 		/* Only valid matches are exact, which memcmp() above
 		 * would have accepted
 		 */
@@ -5188,10 +5241,14 @@ static void sanitize_dead_code(struct bpf_verifier_env *env)
 	}
 }
 
-/* convert load instructions that access fields of 'struct __sk_buff'
- * into sequence of instructions that access fields of 'struct sk_buff'
+/* convert load instructions that access fields of a context type into a
+ * sequence of instructions that access fields of the underlying structure:
+ *     struct __sk_buff    -> struct sk_buff
+ *     struct bpf_sock_ops -> struct sock
  */
-static int convert_ctx_accesses(struct bpf_verifier_env *env)
+static int convert_ctx_accesses(struct bpf_verifier_env *env,
+				bpf_convert_ctx_access_t convert_ctx_access,
+				enum bpf_reg_type ctx_type)
 {
 	const struct bpf_verifier_ops *ops = env->ops;
 	int i, cnt, size, ctx_field_size, delta = 0;
@@ -5218,12 +5275,14 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 		}
 	}
 
-	if (!ops->convert_ctx_access || bpf_prog_is_dev_bound(env->prog->aux))
+	if (!convert_ctx_access || bpf_prog_is_dev_bound(env->prog->aux))
 		return 0;
 
 	insn = env->prog->insnsi + delta;
 
 	for (i = 0; i < insn_cnt; i++, insn++) {
+		enum bpf_reg_type ptr_type;
+
 		if (insn->code == (BPF_LDX | BPF_MEM | BPF_B) ||
 		    insn->code == (BPF_LDX | BPF_MEM | BPF_H) ||
 		    insn->code == (BPF_LDX | BPF_MEM | BPF_W) ||
@@ -5237,7 +5296,8 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 		else
 			continue;
 
-		if (env->insn_aux_data[i + delta].ptr_type != PTR_TO_CTX)
+		ptr_type = env->insn_aux_data[i + delta].ptr_type;
+		if (ptr_type != ctx_type)
 			continue;
 
 		ctx_field_size = env->insn_aux_data[i + delta].ctx_field_size;
@@ -5269,8 +5329,8 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 		}
 
 		target_size = 0;
-		cnt = ops->convert_ctx_access(type, insn, insn_buf, env->prog,
-					      &target_size);
+		cnt = convert_ctx_access(type, insn, insn_buf, env->prog,
+					 &target_size);
 		if (cnt == 0 || cnt >= ARRAY_SIZE(insn_buf) ||
 		    (ctx_field_size && !target_size)) {
 			verbose(env, "bpf verifier is misconfigured\n");
@@ -5785,7 +5845,13 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr)
 
 	if (ret == 0)
 		/* program is valid, convert *(u32*)(ctx + off) accesses */
-		ret = convert_ctx_accesses(env);
+		ret = convert_ctx_accesses(env, env->ops->convert_ctx_access,
+					   PTR_TO_CTX);
+
+	if (ret == 0)
+		/* Convert *(u32*)(sock_ops + off) accesses */
+		ret = convert_ctx_accesses(env, bpf_sock_convert_ctx_access,
+					   PTR_TO_SOCKET);
 
 	if (ret == 0)
 		ret = fixup_bpf_calls(env);
diff --git a/net/core/filter.c b/net/core/filter.c
index 0baa715e4699..4c35152fb3a8 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4496,23 +4496,29 @@ static bool __sock_filter_check_size(int off, int size,
 	return size == size_default;
 }
 
-static bool sock_filter_is_valid_access(int off, int size,
-					enum bpf_access_type type,
-					const struct bpf_prog *prog,
-					struct bpf_insn_access_aux *info)
+bool bpf_sock_is_valid_access(int off, int size, enum bpf_access_type type,
+			      struct bpf_insn_access_aux *info)
 {
 	if (off < 0 || off >= sizeof(struct bpf_sock))
 		return false;
 	if (off % size != 0)
 		return false;
-	if (!__sock_filter_check_attach_type(off, type,
-					     prog->expected_attach_type))
-		return false;
 	if (!__sock_filter_check_size(off, size, info))
 		return false;
 	return true;
 }
 
+static bool sock_filter_is_valid_access(int off, int size,
+					enum bpf_access_type type,
+					const struct bpf_prog *prog,
+					struct bpf_insn_access_aux *info)
+{
+	if (!__sock_filter_check_attach_type(off, type,
+					     prog->expected_attach_type))
+		return false;
+	return bpf_sock_is_valid_access(off, size, type, info);
+}
+
 static int bpf_unclone_prologue(struct bpf_insn *insn_buf, bool direct_write,
 				const struct bpf_prog *prog, int drop_verdict)
 {
@@ -5153,10 +5159,10 @@ static u32 bpf_convert_ctx_access(enum bpf_access_type type,
 	return insn - insn_buf;
 }
 
-static u32 sock_filter_convert_ctx_access(enum bpf_access_type type,
-					  const struct bpf_insn *si,
-					  struct bpf_insn *insn_buf,
-					  struct bpf_prog *prog, u32 *target_size)
+u32 bpf_sock_convert_ctx_access(enum bpf_access_type type,
+				const struct bpf_insn *si,
+				struct bpf_insn *insn_buf,
+				struct bpf_prog *prog, u32 *target_size)
 {
 	struct bpf_insn *insn = insn_buf;
 	int off;
@@ -5926,7 +5932,7 @@ const struct bpf_prog_ops lwt_xmit_prog_ops = {
 const struct bpf_verifier_ops cg_sock_verifier_ops = {
 	.get_func_proto		= sock_filter_func_proto,
 	.is_valid_access	= sock_filter_is_valid_access,
-	.convert_ctx_access	= sock_filter_convert_ctx_access,
+	.convert_ctx_access	= bpf_sock_convert_ctx_access,
 };
 
 const struct bpf_prog_ops cg_sock_prog_ops = {
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC bpf-next 05/11] bpf: Macrofy stack state copy
  2018-05-09 21:06 [RFC bpf-next 00/11] Add socket lookup support Joe Stringer
                   ` (3 preceding siblings ...)
  2018-05-09 21:07 ` [RFC bpf-next 04/11] bpf: Add PTR_TO_SOCKET verifier type Joe Stringer
@ 2018-05-09 21:07 ` Joe Stringer
  2018-05-09 21:07 ` [RFC bpf-next 06/11] bpf: Add reference tracking to verifier Joe Stringer
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Joe Stringer @ 2018-05-09 21:07 UTC (permalink / raw)
  To: daniel; +Cc: netdev, ast, john.fastabend, kafai

An upcoming commit will need very similar copy/realloc boilerplate, so
refactor the existing stack copy/realloc functions into macros to
simplify it.

Signed-off-by: Joe Stringer <joe@wand.net.nz>
---
 kernel/bpf/verifier.c | 104 ++++++++++++++++++++++++++++----------------------
 1 file changed, 59 insertions(+), 45 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d38c7c1e9da6..f426ebf2b6bf 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -347,60 +347,74 @@ static void print_verifier_state(struct bpf_verifier_env *env,
 	verbose(env, "\n");
 }
 
-static int copy_stack_state(struct bpf_func_state *dst,
-			    const struct bpf_func_state *src)
-{
-	if (!src->stack)
-		return 0;
-	if (WARN_ON_ONCE(dst->allocated_stack < src->allocated_stack)) {
-		/* internal bug, make state invalid to reject the program */
-		memset(dst, 0, sizeof(*dst));
-		return -EFAULT;
-	}
-	memcpy(dst->stack, src->stack,
-	       sizeof(*src->stack) * (src->allocated_stack / BPF_REG_SIZE));
-	return 0;
-}
+#define COPY_STATE_FN(NAME, COUNT, FIELD, SIZE)				\
+static int copy_##NAME##_state(struct bpf_func_state *dst,		\
+			       const struct bpf_func_state *src)	\
+{									\
+	if (!src->FIELD)						\
+		return 0;						\
+	if (WARN_ON_ONCE(dst->COUNT < src->COUNT)) {			\
+		/* internal bug, make state invalid to reject the program */ \
+		memset(dst, 0, sizeof(*dst));				\
+		return -EFAULT;						\
+	}								\
+	memcpy(dst->FIELD, src->FIELD,					\
+	       sizeof(*src->FIELD) * (src->COUNT / SIZE));		\
+	return 0;							\
+}
+/* copy_stack_state() */
+COPY_STATE_FN(stack, allocated_stack, stack, BPF_REG_SIZE)
+#undef COPY_STATE_FN
+
+#define REALLOC_STATE_FN(NAME, COUNT, FIELD, SIZE)			\
+static int realloc_##NAME##_state(struct bpf_func_state *state, int size, \
+				  bool copy_old)			\
+{									\
+	u32 old_size = state->COUNT;					\
+	struct bpf_##NAME##_state *new_##FIELD;				\
+	int slot = size / SIZE;						\
+									\
+	if (size <= old_size || !size) {				\
+		if (copy_old)						\
+			return 0;					\
+		state->COUNT = slot * SIZE;				\
+		if (!size && old_size) {				\
+			kfree(state->FIELD);				\
+			state->FIELD = NULL;				\
+		}							\
+		return 0;						\
+	}								\
+	new_##FIELD = kmalloc_array(slot, sizeof(struct bpf_##NAME##_state), \
+				    GFP_KERNEL);			\
+	if (!new_##FIELD)						\
+		return -ENOMEM;						\
+	if (copy_old) {							\
+		if (state->FIELD)					\
+			memcpy(new_##FIELD, state->FIELD,		\
+			       sizeof(*new_##FIELD) * (old_size / SIZE)); \
+		memset(new_##FIELD + old_size / SIZE, 0,		\
+		       sizeof(*new_##FIELD) * (size - old_size) / SIZE); \
+	}								\
+	state->COUNT = slot * SIZE;					\
+	kfree(state->FIELD);						\
+	state->FIELD = new_##FIELD;					\
+	return 0;							\
+}
+/* realloc_stack_state() */
+REALLOC_STATE_FN(stack, allocated_stack, stack, BPF_REG_SIZE)
+#undef REALLOC_STATE_FN
 
 /* do_check() starts with zero-sized stack in struct bpf_verifier_state to
  * make it consume minimal amount of memory. check_stack_write() access from
  * the program calls into realloc_func_state() to grow the stack size.
  * Note there is a non-zero 'parent' pointer inside bpf_verifier_state
- * which this function copies over. It points to previous bpf_verifier_state
- * which is never reallocated
+ * which realloc_stack_state() copies over. It points to previous
+ * bpf_verifier_state which is never reallocated.
  */
 static int realloc_func_state(struct bpf_func_state *state, int size,
 			      bool copy_old)
 {
-	u32 old_size = state->allocated_stack;
-	struct bpf_stack_state *new_stack;
-	int slot = size / BPF_REG_SIZE;
-
-	if (size <= old_size || !size) {
-		if (copy_old)
-			return 0;
-		state->allocated_stack = slot * BPF_REG_SIZE;
-		if (!size && old_size) {
-			kfree(state->stack);
-			state->stack = NULL;
-		}
-		return 0;
-	}
-	new_stack = kmalloc_array(slot, sizeof(struct bpf_stack_state),
-				  GFP_KERNEL);
-	if (!new_stack)
-		return -ENOMEM;
-	if (copy_old) {
-		if (state->stack)
-			memcpy(new_stack, state->stack,
-			       sizeof(*new_stack) * (old_size / BPF_REG_SIZE));
-		memset(new_stack + old_size / BPF_REG_SIZE, 0,
-		       sizeof(*new_stack) * (size - old_size) / BPF_REG_SIZE);
-	}
-	state->allocated_stack = slot * BPF_REG_SIZE;
-	kfree(state->stack);
-	state->stack = new_stack;
-	return 0;
+	return realloc_stack_state(state, size, copy_old);
 }
 
 static void free_func_state(struct bpf_func_state *state)
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC bpf-next 06/11] bpf: Add reference tracking to verifier
  2018-05-09 21:06 [RFC bpf-next 00/11] Add socket lookup support Joe Stringer
                   ` (4 preceding siblings ...)
  2018-05-09 21:07 ` [RFC bpf-next 05/11] bpf: Macrofy stack state copy Joe Stringer
@ 2018-05-09 21:07 ` Joe Stringer
  2018-05-15  3:04   ` Alexei Starovoitov
  2018-05-09 21:07 ` [RFC bpf-next 07/11] bpf: Add helper to retrieve socket in BPF Joe Stringer
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 26+ messages in thread
From: Joe Stringer @ 2018-05-09 21:07 UTC (permalink / raw)
  To: daniel; +Cc: netdev, ast, john.fastabend, kafai

Allow helper functions to acquire a reference and return it into a
register. Specific pointer types such as the PTR_TO_SOCKET will
implicitly represent such a reference. The verifier must ensure that
these references are released exactly once in each path through the
program.

To achieve this, this commit assigns an id to the pointer and tracks it
in the 'bpf_func_state', then when the function or program exits,
verifies that all of the acquired references have been freed. When the
pointer is passed to a function that frees the reference, it is removed
from the 'bpf_func_state` and all existing copies of the pointer in
registers are marked invalid.

Signed-off-by: Joe Stringer <joe@wand.net.nz>
---
 include/linux/bpf_verifier.h |  18 ++-
 kernel/bpf/verifier.c        | 295 ++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 292 insertions(+), 21 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 9dcd87f1d322..8dbee360b3ec 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -104,6 +104,11 @@ struct bpf_stack_state {
 	u8 slot_type[BPF_REG_SIZE];
 };
 
+struct bpf_reference_state {
+	int id;
+	int insn_idx; /* allocation insn */
+};
+
 /* state of the program:
  * type of all registers and stack info
  */
@@ -122,7 +127,9 @@ struct bpf_func_state {
 	 */
 	u32 subprogno;
 
-	/* should be second to last. See copy_func_state() */
+	/* The following fields should be last. See copy_func_state() */
+	int acquired_refs;
+	struct bpf_reference_state *refs;
 	int allocated_stack;
 	struct bpf_stack_state *stack;
 };
@@ -218,11 +225,16 @@ void bpf_verifier_vlog(struct bpf_verifier_log *log, const char *fmt,
 __printf(2, 3) void bpf_verifier_log_write(struct bpf_verifier_env *env,
 					   const char *fmt, ...);
 
-static inline struct bpf_reg_state *cur_regs(struct bpf_verifier_env *env)
+static inline struct bpf_func_state *cur_func(struct bpf_verifier_env *env)
 {
 	struct bpf_verifier_state *cur = env->cur_state;
 
-	return cur->frame[cur->curframe]->regs;
+	return cur->frame[cur->curframe];
+}
+
+static inline struct bpf_reg_state *cur_regs(struct bpf_verifier_env *env)
+{
+	return cur_func(env)->regs;
 }
 
 int bpf_prog_offload_verifier_prep(struct bpf_verifier_env *env);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index f426ebf2b6bf..92b9a5dc465a 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1,5 +1,6 @@
 /* Copyright (c) 2011-2014 PLUMgrid, http://plumgrid.com
  * Copyright (c) 2016 Facebook
+ * Copyright (c) 2018 Covalent IO, Inc. http://covalent.io
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of version 2 of the GNU General Public
@@ -140,6 +141,18 @@ static const struct bpf_verifier_ops * const bpf_verifier_ops[] = {
  *
  * After the call R0 is set to return type of the function and registers R1-R5
  * are set to NOT_INIT to indicate that they are no longer readable.
+ *
+ * The following reference types represent a potential reference to a kernel
+ * resource which, after first being allocated, must be checked and freed by
+ * the BPF program:
+ * - PTR_TO_SOCKET_OR_NULL, PTR_TO_SOCKET
+ *
+ * When the verifier sees a helper call return a reference type, it allocates a
+ * pointer id for the reference and stores it in the current function state.
+ * Similar to the way that PTR_TO_MAP_VALUE_OR_NULL is converted into
+ * PTR_TO_MAP_VALUE, PTR_TO_SOCKET_OR_NULL becomes PTR_TO_SOCKET when the type
+ * passes through a NULL-check conditional. For the branch wherein the state is
+ * changed to CONST_IMM, the verifier releases the reference.
  */
 
 /* verifier_state + insn_idx are pushed to stack when branch is encountered */
@@ -229,7 +242,42 @@ static bool type_is_pkt_pointer(enum bpf_reg_type type)
 
 static bool reg_type_may_be_null(enum bpf_reg_type type)
 {
-	return type == PTR_TO_MAP_VALUE_OR_NULL;
+	return type == PTR_TO_MAP_VALUE_OR_NULL ||
+	       type == PTR_TO_SOCKET_OR_NULL;
+}
+
+static bool type_is_refcounted(enum bpf_reg_type type)
+{
+	return type == PTR_TO_SOCKET;
+}
+
+static bool type_is_refcounted_or_null(enum bpf_reg_type type)
+{
+	return type == PTR_TO_SOCKET || type == PTR_TO_SOCKET_OR_NULL;
+}
+
+static bool reg_is_refcounted(const struct bpf_reg_state *reg)
+{
+	return type_is_refcounted(reg->type);
+}
+
+static bool reg_is_refcounted_or_null(const struct bpf_reg_state *reg)
+{
+	return type_is_refcounted_or_null(reg->type);
+}
+
+static bool arg_type_is_refcounted(enum bpf_arg_type type)
+{
+	return type == ARG_PTR_TO_SOCKET;
+}
+
+/* Determine whether the function releases some resources allocated by another
+ * function call. The first reference type argument will be assumed to be
+ * released by release_reference().
+ */
+static bool is_release_function(enum bpf_func_id func_id)
+{
+	return false;
 }
 
 /* string representation of 'enum bpf_reg_type' */
@@ -344,6 +392,12 @@ static void print_verifier_state(struct bpf_verifier_env *env,
 		if (state->stack[i].slot_type[0] == STACK_ZERO)
 			verbose(env, " fp%d=0", (-i - 1) * BPF_REG_SIZE);
 	}
+	if (state->acquired_refs && state->refs[0].id) {
+		verbose(env, " refs=%d", state->refs[0].id);
+		for (i = 1; i < state->acquired_refs; i++)
+			if (state->refs[i].id)
+				verbose(env, ",%d", state->refs[i].id);
+	}
 	verbose(env, "\n");
 }
 
@@ -362,6 +416,8 @@ static int copy_##NAME##_state(struct bpf_func_state *dst,		\
 	       sizeof(*src->FIELD) * (src->COUNT / SIZE));		\
 	return 0;							\
 }
+/* copy_reference_state() */
+COPY_STATE_FN(reference, acquired_refs, refs, 1)
 /* copy_stack_state() */
 COPY_STATE_FN(stack, allocated_stack, stack, BPF_REG_SIZE)
 #undef COPY_STATE_FN
@@ -400,6 +456,8 @@ static int realloc_##NAME##_state(struct bpf_func_state *state, int size, \
 	state->FIELD = new_##FIELD;					\
 	return 0;							\
 }
+/* realloc_reference_state() */
+REALLOC_STATE_FN(reference, acquired_refs, refs, 1)
 /* realloc_stack_state() */
 REALLOC_STATE_FN(stack, allocated_stack, stack, BPF_REG_SIZE)
 #undef REALLOC_STATE_FN
@@ -411,16 +469,89 @@ REALLOC_STATE_FN(stack, allocated_stack, stack, BPF_REG_SIZE)
  * which realloc_stack_state() copies over. It points to previous
  * bpf_verifier_state which is never reallocated.
  */
-static int realloc_func_state(struct bpf_func_state *state, int size,
-			      bool copy_old)
+static int realloc_func_state(struct bpf_func_state *state, int stack_size,
+			      int refs_size, bool copy_old)
 {
-	return realloc_stack_state(state, size, copy_old);
+	int err = realloc_reference_state(state, refs_size, copy_old);
+	if (err)
+		return err;
+	return realloc_stack_state(state, stack_size, copy_old);
+}
+
+/* Acquire a pointer id from the env and update the state->refs to include
+ * this new pointer reference.
+ * On success, returns a valid pointer id to associate with the register
+ * On failure, returns a negative errno.
+ */
+static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx)
+{
+	struct bpf_func_state *state = cur_func(env);
+	int new_ofs = state->acquired_refs;
+	int id, err;
+
+	err = realloc_reference_state(state, state->acquired_refs + 1, true);
+	if (err)
+		return err;
+	id = ++env->id_gen;
+	state->refs[new_ofs].id = id;
+	state->refs[new_ofs].insn_idx = insn_idx;
+
+	return id;
+}
+
+/* release function corresponding to acquire_reference_state(). Idempotent. */
+static int __release_reference_state(struct bpf_func_state *state, int ptr_id)
+{
+	int i, last_idx;
+
+	if (!ptr_id)
+		return 0;
+
+	last_idx = state->acquired_refs - 1;
+	for (i = 0; i < state->acquired_refs; i++) {
+		if (state->refs[i].id == ptr_id) {
+			if (last_idx && i != last_idx)
+				memcpy(&state->refs[i], &state->refs[last_idx],
+				       sizeof(*state->refs));
+			memset(&state->refs[last_idx], 0, sizeof(*state->refs));
+			state->acquired_refs--;
+			return 0;
+		}
+	}
+	return -EFAULT;
+}
+
+/* variation on the above for cases where we expect that there must be an
+ * outstanding reference for the specified ptr_id.
+ */
+static int release_reference_state(struct bpf_verifier_env *env, int ptr_id)
+{
+	struct bpf_func_state *state = cur_func(env);
+	int err;
+
+	err = __release_reference_state(state, ptr_id);
+	if (WARN_ON_ONCE(err != 0))
+		verbose(env, "verifier internal error: can't release reference\n");
+	return err;
+}
+
+static int transfer_reference_state(struct bpf_func_state *dst,
+				    struct bpf_func_state *src)
+{
+	int err = realloc_reference_state(dst, src->acquired_refs, false);
+	if (err)
+		return err;
+	err = copy_reference_state(dst, src);
+	if (err)
+		return err;
+	return 0;
 }
 
 static void free_func_state(struct bpf_func_state *state)
 {
 	if (!state)
 		return;
+	kfree(state->refs);
 	kfree(state->stack);
 	kfree(state);
 }
@@ -446,10 +577,14 @@ static int copy_func_state(struct bpf_func_state *dst,
 {
 	int err;
 
-	err = realloc_func_state(dst, src->allocated_stack, false);
+	err = realloc_func_state(dst, src->allocated_stack, src->acquired_refs,
+				 false);
+	if (err)
+		return err;
+	memcpy(dst, src, offsetof(struct bpf_func_state, acquired_refs));
+	err = copy_reference_state(dst, src);
 	if (err)
 		return err;
-	memcpy(dst, src, offsetof(struct bpf_func_state, allocated_stack));
 	return copy_stack_state(dst, src);
 }
 
@@ -1019,7 +1154,7 @@ static int check_stack_write(struct bpf_verifier_env *env,
 	enum bpf_reg_type type;
 
 	err = realloc_func_state(state, round_up(slot + 1, BPF_REG_SIZE),
-				 true);
+				 state->acquired_refs, true);
 	if (err)
 		return err;
 	/* caller checked that off % size == 0 and -MAX_BPF_STACK <= off < 0,
@@ -2259,10 +2394,32 @@ static bool check_arg_pair_ok(const struct bpf_func_proto *fn)
 	return true;
 }
 
+static bool check_refcount_ok(const struct bpf_func_proto *fn)
+{
+	int count = 0;
+
+	if (arg_type_is_refcounted(fn->arg1_type))
+		count++;
+	if (arg_type_is_refcounted(fn->arg2_type))
+		count++;
+	if (arg_type_is_refcounted(fn->arg3_type))
+		count++;
+	if (arg_type_is_refcounted(fn->arg4_type))
+		count++;
+	if (arg_type_is_refcounted(fn->arg5_type))
+		count++;
+
+	/* We only support one arg being unreferenced at the moment,
+	 * which is sufficient for the helper functions we have right now.
+	 */
+	return count <= 1;
+}
+
 static int check_func_proto(const struct bpf_func_proto *fn)
 {
 	return check_raw_mode_ok(fn) &&
-	       check_arg_pair_ok(fn) ? 0 : -EINVAL;
+	       check_arg_pair_ok(fn) &&
+	       check_refcount_ok(fn) ? 0 : -EINVAL;
 }
 
 /* Packet data might have moved, any old PTR_TO_PACKET[_META,_END]
@@ -2295,12 +2452,57 @@ static void clear_all_pkt_pointers(struct bpf_verifier_env *env)
 		__clear_all_pkt_pointers(env, vstate->frame[i]);
 }
 
+static void release_reg_references(struct bpf_verifier_env *env,
+				   struct bpf_func_state *state, int id)
+{
+	struct bpf_reg_state *regs = state->regs, *reg;
+	int i;
+
+	for (i = 0; i < MAX_BPF_REG; i++)
+		if (regs[i].id == id)
+			mark_reg_unknown(env, regs, i);
+
+	for_each_spilled_reg(i, state, reg) {
+		if (!reg)
+			continue;
+		if (reg_is_refcounted(reg) && reg->id == id)
+			__mark_reg_unknown(reg);
+	}
+}
+
+/* The pointer with the specified id has released its reference to kernel
+ * resources. Identify all copies of the same pointer and clear the reference.
+ */
+static int release_reference(struct bpf_verifier_env *env)
+{
+	struct bpf_verifier_state *vstate = env->cur_state;
+	struct bpf_reg_state *regs = cur_regs(env);
+	int i, ptr_id = 0;
+
+	for (i = BPF_REG_1; i < BPF_REG_6; i++) {
+		if (reg_is_refcounted(&regs[i])) {
+			ptr_id = regs[i].id;
+			break;
+		}
+	}
+	if (WARN_ON_ONCE(!ptr_id)) {
+		/* references must be special pointer types that are checked
+		 * against argument requirements for the release function. */
+		verbose(env, "verifier internal error: can't locate refcounted arg\n");
+		return -EFAULT;
+	}
+	for (i = 0; i <= vstate->curframe; i++)
+		release_reg_references(env, vstate->frame[i], ptr_id);
+
+	return release_reference_state(env, ptr_id);
+}
+
 static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			   int *insn_idx)
 {
 	struct bpf_verifier_state *state = env->cur_state;
 	struct bpf_func_state *caller, *callee;
-	int i, subprog, target_insn;
+	int i, err, subprog, target_insn;
 
 	if (state->curframe + 1 >= MAX_CALL_FRAMES) {
 		verbose(env, "the call stack of %d frames is too deep\n",
@@ -2338,6 +2540,11 @@ static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			state->curframe + 1 /* frameno within this callchain */,
 			subprog /* subprog number within this prog */);
 
+	/* Transfer references to the callee */
+	err = transfer_reference_state(callee, caller);
+	if (err)
+		return err;
+
 	/* copy r1 - r5 args that callee can access */
 	for (i = BPF_REG_1; i <= BPF_REG_5; i++)
 		callee->regs[i] = caller->regs[i];
@@ -2368,6 +2575,7 @@ static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx)
 	struct bpf_verifier_state *state = env->cur_state;
 	struct bpf_func_state *caller, *callee;
 	struct bpf_reg_state *r0;
+	int err;
 
 	callee = state->frame[state->curframe];
 	r0 = &callee->regs[BPF_REG_0];
@@ -2387,6 +2595,11 @@ static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx)
 	/* return to the caller whatever r0 had in the callee */
 	caller->regs[BPF_REG_0] = *r0;
 
+	/* Transfer references to the caller */
+	err = transfer_reference_state(caller, callee);
+	if (err)
+		return err;
+
 	*insn_idx = callee->callsite + 1;
 	if (env->log.level) {
 		verbose(env, "returning from callee:\n");
@@ -2498,6 +2711,15 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn
 			return err;
 	}
 
+	/* If the function is a release() function, mark all copies of the same
+	 * pointer as "freed" in all registers and in the stack.
+	 */
+	if (is_release_function(func_id)) {
+		err = release_reference(env);
+		if (err)
+			return err;
+	}
+
 	regs = cur_regs(env);
 	/* reset caller saved regs */
 	for (i = 0; i < CALLER_SAVED_REGS; i++) {
@@ -2535,9 +2757,12 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn
 		else if (insn_aux->map_ptr != meta.map_ptr)
 			insn_aux->map_ptr = BPF_MAP_PTR_POISON;
 	} else if (fn->ret_type == RET_PTR_TO_SOCKET_OR_NULL) {
+		int id = acquire_reference_state(env, insn_idx);
+		if (id < 0)
+			return id;
 		mark_reg_known_zero(env, regs, BPF_REG_0);
 		regs[BPF_REG_0].type = PTR_TO_SOCKET_OR_NULL;
-		regs[BPF_REG_0].id = ++env->id_gen;
+		regs[BPF_REG_0].id = id;
 	} else {
 		verbose(env, "unknown return type %d of func %s#%d\n",
 			fn->ret_type, func_id_name(func_id), func_id);
@@ -3599,7 +3824,8 @@ static void reg_combine_min_max(struct bpf_reg_state *true_src,
 	}
 }
 
-static void mark_ptr_or_null_reg(struct bpf_reg_state *reg, u32 id,
+static void mark_ptr_or_null_reg(struct bpf_func_state *state,
+				 struct bpf_reg_state *reg, u32 id,
 				 bool is_null)
 {
 	if (reg_type_may_be_null(reg->type) && reg->id == id) {
@@ -3625,11 +3851,13 @@ static void mark_ptr_or_null_reg(struct bpf_reg_state *reg, u32 id,
 		} else if (reg->type == PTR_TO_SOCKET_OR_NULL) {
 			reg->type = PTR_TO_SOCKET;
 		}
-		/* We don't need id from this point onwards anymore, thus we
-		 * should better reset it, so that state pruning has chances
-		 * to take effect.
-		 */
-		reg->id = 0;
+		if (is_null || !reg_is_refcounted(reg)) {
+			/* We don't need id from this point onwards anymore,
+			 * thus we should better reset it, so that state
+			 * pruning has chances to take effect.
+			 */
+			reg->id = 0;
+		}
 	}
 }
 
@@ -3644,15 +3872,18 @@ static void mark_ptr_or_null_regs(struct bpf_verifier_state *vstate, u32 regno,
 	u32 id = regs[regno].id;
 	int i, j;
 
+	if (reg_is_refcounted_or_null(&regs[regno]) && is_null)
+		__release_reference_state(state, id);
+
 	for (i = 0; i < MAX_BPF_REG; i++)
-		mark_ptr_or_null_reg(&regs[i], id, is_null);
+		mark_ptr_or_null_reg(state, &regs[i], id, is_null);
 
 	for (j = 0; j <= vstate->curframe; j++) {
 		state = vstate->frame[j];
 		for_each_spilled_reg(i, state, reg) {
 			if (!reg)
 				continue;
-			mark_ptr_or_null_reg(reg, id, is_null);
+			mark_ptr_or_null_reg(state, reg, id, is_null);
 		}
 	}
 }
@@ -4475,6 +4706,14 @@ static bool stacksafe(struct bpf_func_state *old,
 	return true;
 }
 
+static bool refsafe(struct bpf_func_state *old, struct bpf_func_state *cur)
+{
+	if (old->acquired_refs != cur->acquired_refs)
+		return false;
+	return !memcmp(old->refs, cur->refs,
+		       sizeof(*old->refs) * old->acquired_refs);
+}
+
 /* compare two verifier states
  *
  * all states stored in state_list are known to be valid, since
@@ -4520,6 +4759,9 @@ static bool func_states_equal(struct bpf_func_state *old,
 
 	if (!stacksafe(old, cur, idmap))
 		goto out_free;
+
+	if (!refsafe(old, cur))
+		goto out_free;
 	ret = true;
 out_free:
 	kfree(idmap);
@@ -4669,6 +4911,18 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
 	return 0;
 }
 
+static int check_reference_leak(struct bpf_verifier_env *env)
+{
+	struct bpf_func_state *state = cur_func(env);
+	int i;
+
+	for (i = 0; i < state->acquired_refs; i++) {
+		verbose(env, "Unreleased reference id=%d alloc_insn=%d\n",
+			state->refs[i].id, state->refs[i].insn_idx);
+	}
+	return state->acquired_refs ? -EINVAL : 0;
+}
+
 static int do_check(struct bpf_verifier_env *env)
 {
 	struct bpf_verifier_state *state;
@@ -4763,6 +5017,7 @@ static int do_check(struct bpf_verifier_env *env)
 
 		regs = cur_regs(env);
 		env->insn_aux_data[insn_idx].seen = true;
+
 		if (class == BPF_ALU || class == BPF_ALU64) {
 			err = check_alu_op(env, insn);
 			if (err)
@@ -4931,6 +5186,10 @@ static int do_check(struct bpf_verifier_env *env)
 					continue;
 				}
 
+				err = check_reference_leak(env);
+				if (err)
+					return err;
+
 				/* eBPF calling convetion is such that R0 is used
 				 * to return the value from eBPF program.
 				 * Make sure that it's readable at this time
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC bpf-next 07/11] bpf: Add helper to retrieve socket in BPF
  2018-05-09 21:06 [RFC bpf-next 00/11] Add socket lookup support Joe Stringer
                   ` (5 preceding siblings ...)
  2018-05-09 21:07 ` [RFC bpf-next 06/11] bpf: Add reference tracking to verifier Joe Stringer
@ 2018-05-09 21:07 ` Joe Stringer
  2018-05-11  5:00   ` Martin KaFai Lau
  2018-05-09 21:07 ` [RFC bpf-next 08/11] selftests/bpf: Add tests for reference tracking Joe Stringer
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 26+ messages in thread
From: Joe Stringer @ 2018-05-09 21:07 UTC (permalink / raw)
  To: daniel; +Cc: netdev, ast, john.fastabend, kafai

This patch adds a new BPF helper function, sk_lookup() which allows BPF
programs to find out if there is a socket listening on this host, and
returns a socket pointer which the BPF program can then access to
determine, for instance, whether to forward or drop traffic. sk_lookup()
takes a reference on the socket, so when a BPF program makes use of this
function, it must subsequently pass the returned pointer into the newly
added sk_release() to return the reference.

By way of example, the following pseudocode would filter inbound
connections at XDP if there is no corresponding service listening for
the traffic:

  struct bpf_sock_tuple tuple;
  struct bpf_sock_ops *sk;

  populate_tuple(ctx, &tuple); // Extract the 5tuple from the packet
  sk = bpf_sk_lookup(ctx, &tuple, sizeof tuple, netns, 0);
  if (!sk) {
    // Couldn't find a socket listening for this traffic. Drop.
    return TC_ACT_SHOT;
  }
  bpf_sk_release(sk, 0);
  return TC_ACT_OK;

Signed-off-by: Joe Stringer <joe@wand.net.nz>
---
 include/uapi/linux/bpf.h                  |  39 +++++++++++-
 kernel/bpf/verifier.c                     |   8 ++-
 net/core/filter.c                         | 102 ++++++++++++++++++++++++++++++
 tools/include/uapi/linux/bpf.h            |  40 +++++++++++-
 tools/testing/selftests/bpf/bpf_helpers.h |   7 ++
 5 files changed, 193 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index d615c777b573..29f38838dbca 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1828,6 +1828,25 @@ union bpf_attr {
  * 	Return
  * 		0 on success, or a negative error in case of failure.
  *
+ * struct bpf_sock_ops *bpf_sk_lookup(ctx, tuple, tuple_size, netns, flags)
+ * 	Decription
+ * 		Look for socket matching 'tuple'. The return value must be checked,
+ * 		and if non-NULL, released via bpf_sk_release().
+ * 		@ctx: pointer to ctx
+ * 		@tuple: pointer to struct bpf_sock_tuple
+ * 		@tuple_size: size of the tuple
+ * 		@flags: flags value
+ * 	Return
+ * 		pointer to socket ops on success, or
+ * 		NULL in case of failure
+ *
+ *  int bpf_sk_release(sock, flags)
+ * 	Description
+ * 		Release the reference held by 'sock'.
+ * 		@sock: Pointer reference to release. Must be found via bpf_sk_lookup().
+ * 		@flags: flags value
+ * 	Return
+ * 		0 on success, or a negative error in case of failure.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -1898,7 +1917,9 @@ union bpf_attr {
 	FN(xdp_adjust_tail),		\
 	FN(skb_get_xfrm_state),		\
 	FN(get_stack),			\
-	FN(skb_load_bytes_relative),
+	FN(skb_load_bytes_relative),	\
+	FN(sk_lookup),			\
+	FN(sk_release),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
@@ -2060,6 +2081,22 @@ struct bpf_sock {
 				 */
 };
 
+struct bpf_sock_tuple {
+	union {
+		__be32 ipv6[4];
+		__be32 ipv4;
+	} saddr;
+	union {
+		__be32 ipv6[4];
+		__be32 ipv4;
+	} daddr;
+	__be16 sport;
+	__be16 dport;
+	__u32 dst_if;
+	__u8 family;
+	__u8 proto;
+};
+
 #define XDP_PACKET_HEADROOM 256
 
 /* User return codes for XDP prog type.
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 92b9a5dc465a..579012c483e4 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -153,6 +153,12 @@ static const struct bpf_verifier_ops * const bpf_verifier_ops[] = {
  * PTR_TO_MAP_VALUE, PTR_TO_SOCKET_OR_NULL becomes PTR_TO_SOCKET when the type
  * passes through a NULL-check conditional. For the branch wherein the state is
  * changed to CONST_IMM, the verifier releases the reference.
+ *
+ * For each helper function that allocates a reference, such as bpf_sk_lookup(),
+ * there is a corresponding release function, such as bpf_sk_release(). When
+ * a reference type passes into the release function, the verifier also releases
+ * the reference. If any unchecked or unreleased reference remains at the end of
+ * the program, the verifier rejects it.
  */
 
 /* verifier_state + insn_idx are pushed to stack when branch is encountered */
@@ -277,7 +283,7 @@ static bool arg_type_is_refcounted(enum bpf_arg_type type)
  */
 static bool is_release_function(enum bpf_func_id func_id)
 {
-	return false;
+	return func_id == BPF_FUNC_sk_release;
 }
 
 /* string representation of 'enum bpf_reg_type' */
diff --git a/net/core/filter.c b/net/core/filter.c
index 4c35152fb3a8..751c255d17d3 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -58,8 +58,12 @@
 #include <net/busy_poll.h>
 #include <net/tcp.h>
 #include <net/xfrm.h>
+#include <net/udp.h>
 #include <linux/bpf_trace.h>
 #include <net/xdp_sock.h>
+#include <net/inet_hashtables.h>
+#include <net/inet6_hashtables.h>
+#include <net/net_namespace.h>
 
 /**
  *	sk_filter_trim_cap - run a packet through a socket filter
@@ -4032,6 +4036,96 @@ static const struct bpf_func_proto bpf_skb_get_xfrm_state_proto = {
 };
 #endif
 
+struct sock *
+sk_lookup(struct net *net, struct bpf_sock_tuple *tuple) {
+	int dst_if = (int)tuple->dst_if;
+	struct in6_addr *src6;
+	struct in6_addr *dst6;
+
+	if (tuple->family == AF_INET6) {
+		src6 = (struct in6_addr *)&tuple->saddr.ipv6;
+		dst6 = (struct in6_addr *)&tuple->daddr.ipv6;
+	} else if (tuple->family != AF_INET) {
+		return ERR_PTR(-EOPNOTSUPP);
+	}
+
+	if (tuple->proto == IPPROTO_TCP) {
+		if (tuple->family == AF_INET)
+			return inet_lookup(net, &tcp_hashinfo, NULL, 0,
+					   tuple->saddr.ipv4, tuple->sport,
+					   tuple->daddr.ipv4, tuple->dport,
+					   dst_if);
+		else
+			return inet6_lookup(net, &tcp_hashinfo, NULL, 0,
+					    src6, tuple->sport,
+					    dst6, tuple->dport, dst_if);
+	} else if (tuple->proto == IPPROTO_UDP) {
+		if (tuple->family == AF_INET)
+			return udp4_lib_lookup(net, tuple->saddr.ipv4,
+					       tuple->sport, tuple->daddr.ipv4,
+					       tuple->dport, dst_if);
+		else
+			return udp6_lib_lookup(net, src6, tuple->sport,
+					       dst6, tuple->dport, dst_if);
+	} else {
+		return ERR_PTR(-EOPNOTSUPP);
+	}
+
+	return NULL;
+}
+
+BPF_CALL_5(bpf_sk_lookup, struct sk_buff *, skb,
+	   struct bpf_sock_tuple *, tuple, u32, len, u32, netns_id, u64, flags)
+{
+	struct net *caller_net = dev_net(skb->dev);
+	struct sock *sk = NULL;
+	struct net *net;
+
+	/* XXX: Perform verification-time checking of tuple size? */
+	if (unlikely(len != sizeof(struct bpf_sock_tuple) || flags))
+		goto out;
+
+	net = get_net_ns_by_id(caller_net, netns_id);
+	if (unlikely(!net))
+		goto out;
+
+	sk = sk_lookup(net, tuple);
+	put_net(net);
+	if (IS_ERR_OR_NULL(sk))
+		sk = NULL;
+	else
+		sk = sk_to_full_sk(sk);
+out:
+	return (unsigned long) sk;
+}
+
+static const struct bpf_func_proto bpf_sk_lookup_proto = {
+	.func		= bpf_sk_lookup,
+	.gpl_only	= false,
+	.ret_type	= RET_PTR_TO_SOCKET_OR_NULL,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type	= ARG_PTR_TO_MEM,
+	.arg3_type	= ARG_CONST_SIZE,
+	.arg4_type	= ARG_ANYTHING,
+	.arg5_type	= ARG_ANYTHING,
+};
+
+BPF_CALL_2(bpf_sk_release, struct sock *, sk, u64, flags)
+{
+	sock_gen_put(sk);
+	if (unlikely(flags))
+		return -EINVAL;
+	return 0;
+}
+
+static const struct bpf_func_proto bpf_sk_release_proto = {
+	.func		= bpf_sk_release,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_SOCKET,
+	.arg2_type	= ARG_ANYTHING,
+};
+
 static const struct bpf_func_proto *
 bpf_base_func_proto(enum bpf_func_id func_id)
 {
@@ -4181,6 +4275,10 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_skb_get_xfrm_state:
 		return &bpf_skb_get_xfrm_state_proto;
 #endif
+	case BPF_FUNC_sk_lookup:
+		return &bpf_sk_lookup_proto;
+	case BPF_FUNC_sk_release:
+		return &bpf_sk_release_proto;
 	default:
 		return bpf_base_func_proto(func_id);
 	}
@@ -4292,6 +4390,10 @@ sk_skb_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_get_socket_uid_proto;
 	case BPF_FUNC_sk_redirect_map:
 		return &bpf_sk_redirect_map_proto;
+	case BPF_FUNC_sk_lookup:
+		return &bpf_sk_lookup_proto;
+	case BPF_FUNC_sk_release:
+		return &bpf_sk_release_proto;
 	default:
 		return bpf_base_func_proto(func_id);
 	}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index fff51c187d1e..29f38838dbca 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -117,6 +117,7 @@ enum bpf_map_type {
 	BPF_MAP_TYPE_DEVMAP,
 	BPF_MAP_TYPE_SOCKMAP,
 	BPF_MAP_TYPE_CPUMAP,
+	BPF_MAP_TYPE_XSKMAP,
 };
 
 enum bpf_prog_type {
@@ -1827,6 +1828,25 @@ union bpf_attr {
  * 	Return
  * 		0 on success, or a negative error in case of failure.
  *
+ * struct bpf_sock_ops *bpf_sk_lookup(ctx, tuple, tuple_size, netns, flags)
+ * 	Decription
+ * 		Look for socket matching 'tuple'. The return value must be checked,
+ * 		and if non-NULL, released via bpf_sk_release().
+ * 		@ctx: pointer to ctx
+ * 		@tuple: pointer to struct bpf_sock_tuple
+ * 		@tuple_size: size of the tuple
+ * 		@flags: flags value
+ * 	Return
+ * 		pointer to socket ops on success, or
+ * 		NULL in case of failure
+ *
+ *  int bpf_sk_release(sock, flags)
+ * 	Description
+ * 		Release the reference held by 'sock'.
+ * 		@sock: Pointer reference to release. Must be found via bpf_sk_lookup().
+ * 		@flags: flags value
+ * 	Return
+ * 		0 on success, or a negative error in case of failure.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -1897,7 +1917,9 @@ union bpf_attr {
 	FN(xdp_adjust_tail),		\
 	FN(skb_get_xfrm_state),		\
 	FN(get_stack),			\
-	FN(skb_load_bytes_relative),
+	FN(skb_load_bytes_relative),	\
+	FN(sk_lookup),			\
+	FN(sk_release),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
@@ -2059,6 +2081,22 @@ struct bpf_sock {
 				 */
 };
 
+struct bpf_sock_tuple {
+	union {
+		__be32 ipv6[4];
+		__be32 ipv4;
+	} saddr;
+	union {
+		__be32 ipv6[4];
+		__be32 ipv4;
+	} daddr;
+	__be16 sport;
+	__be16 dport;
+	__u32 dst_if;
+	__u8 family;
+	__u8 proto;
+};
+
 #define XDP_PACKET_HEADROOM 256
 
 /* User return codes for XDP prog type.
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index 265f8e0e8ada..4dc311ea0c16 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -103,6 +103,13 @@ static int (*bpf_skb_get_xfrm_state)(void *ctx, int index, void *state,
 	(void *) BPF_FUNC_skb_get_xfrm_state;
 static int (*bpf_get_stack)(void *ctx, void *buf, int size, int flags) =
 	(void *) BPF_FUNC_get_stack;
+static struct bpf_sock *(*bpf_sk_lookup)(void *ctx,
+					 struct bpf_sock_tuple *tuple,
+					 int size, unsigned int netns_id,
+					 unsigned long long flags) =
+	(void *) BPF_FUNC_sk_lookup;
+static int (*bpf_sk_release)(struct bpf_sock *sk, unsigned long long flags) =
+	(void *) BPF_FUNC_sk_release;
 
 /* llvm builtin functions that eBPF C program may use to
  * emit BPF_LD_ABS and BPF_LD_IND instructions
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC bpf-next 08/11] selftests/bpf: Add tests for reference tracking
  2018-05-09 21:06 [RFC bpf-next 00/11] Add socket lookup support Joe Stringer
                   ` (6 preceding siblings ...)
  2018-05-09 21:07 ` [RFC bpf-next 07/11] bpf: Add helper to retrieve socket in BPF Joe Stringer
@ 2018-05-09 21:07 ` Joe Stringer
  2018-05-09 21:07 ` [RFC bpf-next 09/11] libbpf: Support loading individual progs Joe Stringer
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Joe Stringer @ 2018-05-09 21:07 UTC (permalink / raw)
  To: daniel; +Cc: netdev, ast, john.fastabend, kafai

reference tracking: leak potential reference
reference tracking: leak potential reference on stack
reference tracking: leak potential reference on stack 2
reference tracking: zero potential reference
reference tracking: copy and zero potential references
reference tracking: release reference without check
reference tracking: release reference
reference tracking: release reference twice
reference tracking: release reference twice inside branch
reference tracking: alloc, check, free in one subbranch
reference tracking: alloc, check, free in both subbranches
reference tracking in call: free reference in subprog
reference tracking in call: free reference in subprog and outside
reference tracking in call: alloc & leak reference in subprog
reference tracking in call: alloc in subprog, release outside
reference tracking in call: sk_ptr leak into caller stack
reference tracking in call: sk_ptr spill into caller stack

Signed-off-by: Joe Stringer <joe@wand.net.nz>
---
 tools/testing/selftests/bpf/test_verifier.c | 359 ++++++++++++++++++++++++++++
 1 file changed, 359 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index 53439f40e1de..150c7c19eb51 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -3,6 +3,7 @@
  *
  * Copyright (c) 2014 PLUMgrid, http://plumgrid.com
  * Copyright (c) 2017 Facebook
+ * Copyright (c) 2018 Covalent IO, Inc. http://covalent.io
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of version 2 of the GNU General Public
@@ -152,6 +153,23 @@ static void bpf_fill_jump_around_ld_abs(struct bpf_test *self)
 	insn[i] = BPF_EXIT_INSN();
 }
 
+#define BPF_SK_LOOKUP						\
+	/* struct bpf_sock_tuple tuple = {} */			\
+	BPF_MOV64_IMM(BPF_REG_2, 0),				\
+	BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8),		\
+	BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_2, -16),	\
+	BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_2, -24),	\
+	BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_2, -32),	\
+	BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_2, -40),	\
+	BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_2, -48),	\
+	/* sk = sk_lookup(ctx, &tuple, sizeof tuple, 0, 0) */	\
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),			\
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -48),			\
+	BPF_MOV64_IMM(BPF_REG_3, 44),				\
+	BPF_MOV64_IMM(BPF_REG_4, 0),				\
+	BPF_MOV64_IMM(BPF_REG_5, 0),				\
+	BPF_EMIT_CALL(BPF_FUNC_sk_lookup)
+
 static struct bpf_test tests[] = {
 	{
 		"add+sub+mul",
@@ -11974,6 +11992,347 @@ static struct bpf_test tests[] = {
 		.result = ACCEPT,
 		.retval = 10,
 	},
+	{
+		"reference tracking: leak potential reference",
+		.insns = {
+			BPF_SK_LOOKUP,
+			BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), /* leak reference */
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+		.errstr = "Unreleased reference",
+		.result = REJECT,
+	},
+	{
+		"reference tracking: leak potential reference on stack",
+		.insns = {
+			BPF_SK_LOOKUP,
+			BPF_MOV64_REG(BPF_REG_4, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_4, -8),
+			BPF_STX_MEM(BPF_DW, BPF_REG_4, BPF_REG_0, 0),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+		.errstr = "Unreleased reference",
+		.result = REJECT,
+	},
+	{
+		"reference tracking: leak potential reference on stack 2",
+		.insns = {
+			BPF_SK_LOOKUP,
+			BPF_MOV64_REG(BPF_REG_4, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_4, -8),
+			BPF_STX_MEM(BPF_DW, BPF_REG_4, BPF_REG_0, 0),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_ST_MEM(BPF_DW, BPF_REG_4, 0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+		.errstr = "Unreleased reference",
+		.result = REJECT,
+	},
+	{
+		"reference tracking: zero potential reference",
+		.insns = {
+			BPF_SK_LOOKUP,
+			BPF_MOV64_IMM(BPF_REG_0, 0), /* leak reference */
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+		.errstr = "Unreleased reference",
+		.result = REJECT,
+	},
+	{
+		"reference tracking: copy and zero potential references",
+		.insns = {
+			BPF_SK_LOOKUP,
+			BPF_MOV64_REG(BPF_REG_7, BPF_REG_0),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_MOV64_IMM(BPF_REG_7, 0), /* leak reference */
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+		.errstr = "Unreleased reference",
+		.result = REJECT,
+	},
+	{
+		"reference tracking: release reference without check",
+		.insns = {
+			BPF_SK_LOOKUP,
+			/* reference in r0 may be NULL */
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+			BPF_MOV64_IMM(BPF_REG_2, 0),
+			BPF_EMIT_CALL(BPF_FUNC_sk_release),
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+		.errstr = "type=sock_or_null expected=sock",
+		.result = REJECT,
+	},
+	{
+		"reference tracking: release reference",
+		.insns = {
+			BPF_SK_LOOKUP,
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
+			BPF_MOV64_IMM(BPF_REG_2, 0),
+			BPF_EMIT_CALL(BPF_FUNC_sk_release),
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+		.result = ACCEPT,
+	},
+	{
+		"reference tracking: release reference 2",
+		.insns = {
+			BPF_SK_LOOKUP,
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+			BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+			BPF_EXIT_INSN(),
+			BPF_MOV64_IMM(BPF_REG_2, 0),
+			BPF_EMIT_CALL(BPF_FUNC_sk_release),
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+		.result = ACCEPT,
+	},
+	{
+		"reference tracking: release reference twice",
+		.insns = {
+			BPF_SK_LOOKUP,
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+			BPF_MOV64_REG(BPF_REG_6, BPF_REG_0),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
+			BPF_MOV64_IMM(BPF_REG_2, 0),
+			BPF_EMIT_CALL(BPF_FUNC_sk_release),
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+			BPF_EMIT_CALL(BPF_FUNC_sk_release),
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+		.errstr = "type=inv expected=sock",
+		.result = REJECT,
+	},
+	{
+		"reference tracking: release reference twice inside branch",
+		.insns = {
+			BPF_SK_LOOKUP,
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+			BPF_MOV64_REG(BPF_REG_6, BPF_REG_0),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 4), /* goto end */
+			BPF_MOV64_IMM(BPF_REG_2, 0),
+			BPF_EMIT_CALL(BPF_FUNC_sk_release),
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+			BPF_EMIT_CALL(BPF_FUNC_sk_release),
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+		.errstr = "type=inv expected=sock",
+		.result = REJECT,
+	},
+	{
+		"reference tracking: alloc, check, free in one subbranch",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
+				    offsetof(struct __sk_buff, data)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1,
+				    offsetof(struct __sk_buff, data_end)),
+			BPF_MOV64_REG(BPF_REG_0, BPF_REG_2),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 16),
+			/* if (offsetof(skb, mark) > data_len) exit; */
+			BPF_JMP_REG(BPF_JLE, BPF_REG_0, BPF_REG_3, 1),
+			BPF_EXIT_INSN(),
+			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_2,
+				    offsetof(struct __sk_buff, mark)),
+			BPF_SK_LOOKUP,
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_6, 0, 1), /* mark == 0? */
+			/* Leak reference in R0 */
+			BPF_EXIT_INSN(),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3), /* sk NULL? */
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+			BPF_MOV64_IMM(BPF_REG_2, 0),
+			BPF_EMIT_CALL(BPF_FUNC_sk_release),
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+		.errstr = "Unreleased reference",
+		.result = REJECT,
+	},
+	{
+		"reference tracking: alloc, check, free in both subbranches",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
+				    offsetof(struct __sk_buff, data)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1,
+				    offsetof(struct __sk_buff, data_end)),
+			BPF_MOV64_REG(BPF_REG_0, BPF_REG_2),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 16),
+			/* if (offsetof(skb, mark) > data_len) exit; */
+			BPF_JMP_REG(BPF_JLE, BPF_REG_0, BPF_REG_3, 1),
+			BPF_EXIT_INSN(),
+			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_2,
+				    offsetof(struct __sk_buff, mark)),
+			BPF_SK_LOOKUP,
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_6, 0, 5), /* mark == 0? */
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3), /* sk NULL? */
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+			BPF_MOV64_IMM(BPF_REG_2, 0),
+			BPF_EMIT_CALL(BPF_FUNC_sk_release),
+			BPF_EXIT_INSN(),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3), /* sk NULL? */
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+			BPF_MOV64_IMM(BPF_REG_2, 0),
+			BPF_EMIT_CALL(BPF_FUNC_sk_release),
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+		.result = ACCEPT,
+	},
+	{
+		"reference tracking in call: free reference in subprog",
+		.insns = {
+			BPF_SK_LOOKUP,
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), /* unchecked reference */
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 2),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+
+			/* subprog 1 */
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_1),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_2, 0, 2),
+			BPF_MOV64_IMM(BPF_REG_2, 0),
+			BPF_EMIT_CALL(BPF_FUNC_sk_release),
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+		.result = ACCEPT,
+	},
+	{
+		"reference tracking in call: free reference in subprog and outside",
+		.insns = {
+			BPF_SK_LOOKUP,
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), /* unchecked reference */
+			BPF_MOV64_REG(BPF_REG_6, BPF_REG_0),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 3),
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+			BPF_EMIT_CALL(BPF_FUNC_sk_release),
+			BPF_EXIT_INSN(),
+
+			/* subprog 1 */
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_1),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_2, 0, 2),
+			BPF_MOV64_IMM(BPF_REG_2, 0),
+			BPF_EMIT_CALL(BPF_FUNC_sk_release),
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+		.errstr = "type=inv expected=sock",
+		.result = REJECT,
+	},
+	{
+		"reference tracking in call: alloc & leak reference in subprog",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_4, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_4, -8),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 3),
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+
+			/* subprog 1 */
+			BPF_MOV64_REG(BPF_REG_6, BPF_REG_4),
+			BPF_SK_LOOKUP,
+			/* spill unchecked sk_ptr into stack of caller */
+			BPF_STX_MEM(BPF_DW, BPF_REG_6, BPF_REG_0, 0),
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+		.errstr = "Unreleased reference",
+		.result = REJECT,
+	},
+	{
+		"reference tracking in call: alloc in subprog, release outside",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_4, BPF_REG_10),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 5),
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
+			BPF_MOV64_IMM(BPF_REG_2, 0),
+			BPF_EMIT_CALL(BPF_FUNC_sk_release),
+			BPF_EXIT_INSN(),
+
+			/* subprog 1 */
+			BPF_SK_LOOKUP,
+			BPF_EXIT_INSN(), /* return sk */
+		},
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+		.retval = POINTER_VALUE,
+		.result = ACCEPT,
+	},
+	{
+		"reference tracking in call: sk_ptr leak into caller stack",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_4, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_4, -8),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 2),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+
+			/* subprog 1 */
+			BPF_MOV64_REG(BPF_REG_5, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_5, -8),
+			BPF_STX_MEM(BPF_DW, BPF_REG_5, BPF_REG_4, 0),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 5),
+			/* spill unchecked sk_ptr into stack of caller */
+			BPF_MOV64_REG(BPF_REG_5, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_5, -8),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_4, BPF_REG_5, 0),
+			BPF_STX_MEM(BPF_DW, BPF_REG_4, BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+
+			/* subprog 2 */
+			BPF_SK_LOOKUP,
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+		.errstr = "Unreleased reference",
+		.result = REJECT,
+	},
+	{
+		"reference tracking in call: sk_ptr spill into caller stack",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_4, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_4, -8),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 2),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+
+			/* subprog 1 */
+			BPF_MOV64_REG(BPF_REG_5, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_5, -8),
+			BPF_STX_MEM(BPF_DW, BPF_REG_5, BPF_REG_4, 0),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 9),
+			/* spill unchecked sk_ptr into stack of caller */
+			BPF_MOV64_REG(BPF_REG_5, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_5, -8),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_4, BPF_REG_5, 0),
+			BPF_STX_MEM(BPF_DW, BPF_REG_4, BPF_REG_0, 0),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3),
+			/* now the sk_ptr is verified, free the reference */
+			BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_4, 0),
+			BPF_MOV64_IMM(BPF_REG_2, 0),
+			BPF_EMIT_CALL(BPF_FUNC_sk_release),
+			BPF_EXIT_INSN(),
+
+			/* subprog 2 */
+			BPF_SK_LOOKUP,
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+		.result = ACCEPT,
+	},
 };
 
 static int probe_filter_length(const struct bpf_insn *fp)
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC bpf-next 09/11] libbpf: Support loading individual progs
  2018-05-09 21:06 [RFC bpf-next 00/11] Add socket lookup support Joe Stringer
                   ` (7 preceding siblings ...)
  2018-05-09 21:07 ` [RFC bpf-next 08/11] selftests/bpf: Add tests for reference tracking Joe Stringer
@ 2018-05-09 21:07 ` Joe Stringer
  2018-05-09 21:07 ` [RFC bpf-next 10/11] selftests/bpf: Add C tests for reference tracking Joe Stringer
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Joe Stringer @ 2018-05-09 21:07 UTC (permalink / raw)
  To: daniel; +Cc: netdev, ast, john.fastabend, kafai

Allow the individual program load to be invoked. This will help with
testing, where a single ELF may contain several sections, some of which
denote subprograms that are expected to fail verification, along with
some which are expected to pass verification. By allowing programs to be
iterated and individually loaded, each program can be independently
checked against its expected verification result.

Signed-off-by: Joe Stringer <joe@wand.net.nz>
---
 tools/lib/bpf/libbpf.c | 4 ++--
 tools/lib/bpf/libbpf.h | 3 +++
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 7bcdca13083a..04e3754bcf30 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -268,7 +268,7 @@ struct bpf_object {
 };
 #define obj_elf_valid(o)	((o)->efile.elf)
 
-static void bpf_program__unload(struct bpf_program *prog)
+void bpf_program__unload(struct bpf_program *prog)
 {
 	int i;
 
@@ -1338,7 +1338,7 @@ load_program(enum bpf_prog_type type, enum bpf_attach_type expected_attach_type,
 	return ret;
 }
 
-static int
+int
 bpf_program__load(struct bpf_program *prog,
 		  char *license, u32 kern_version)
 {
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 197f9ce2248c..c07e9969e4ed 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -112,10 +112,13 @@ void *bpf_program__priv(struct bpf_program *prog);
 
 const char *bpf_program__title(struct bpf_program *prog, bool needs_copy);
 
+int bpf_program__load(struct bpf_program *prog, char *license,
+		      u32 kern_version);
 int bpf_program__fd(struct bpf_program *prog);
 int bpf_program__pin_instance(struct bpf_program *prog, const char *path,
 			      int instance);
 int bpf_program__pin(struct bpf_program *prog, const char *path);
+void bpf_program__unload(struct bpf_program *prog);
 
 struct bpf_insn;
 
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC bpf-next 10/11] selftests/bpf: Add C tests for reference tracking
  2018-05-09 21:06 [RFC bpf-next 00/11] Add socket lookup support Joe Stringer
                   ` (8 preceding siblings ...)
  2018-05-09 21:07 ` [RFC bpf-next 09/11] libbpf: Support loading individual progs Joe Stringer
@ 2018-05-09 21:07 ` Joe Stringer
  2018-05-09 21:07 ` [RFC bpf-next 11/11] Documentation: Describe bpf " Joe Stringer
  2018-05-16 19:05 ` [RFC bpf-next 00/11] Add socket lookup support Joe Stringer
  11 siblings, 0 replies; 26+ messages in thread
From: Joe Stringer @ 2018-05-09 21:07 UTC (permalink / raw)
  To: daniel; +Cc: netdev, ast, john.fastabend, kafai

Signed-off-by: Joe Stringer <joe@wand.net.nz>
---
 tools/testing/selftests/bpf/Makefile              |   2 +-
 tools/testing/selftests/bpf/test_progs.c          |  38 +++++++
 tools/testing/selftests/bpf/test_sk_lookup_kern.c | 127 ++++++++++++++++++++++
 3 files changed, 166 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/test_sk_lookup_kern.c

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 9d762184b805..cf71baa9d51d 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -33,7 +33,7 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test
 	sample_map_ret0.o test_tcpbpf_kern.o test_stacktrace_build_id.o \
 	sockmap_tcp_msg_prog.o connect4_prog.o connect6_prog.o test_adjust_tail.o \
 	test_btf_haskv.o test_btf_nokv.o test_sockmap_kern.o test_tunnel_kern.o \
-	test_get_stack_rawtp.o
+	test_get_stack_rawtp.o test_sk_lookup_kern.o
 
 # Order correspond to 'make run_tests' order
 TEST_PROGS := test_kmod.sh \
diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
index ed197eef1cfc..6d868a031b00 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -1409,6 +1409,43 @@ static void test_get_stack_raw_tp(void)
 	bpf_object__close(obj);
 }
 
+static void test_reference_tracking()
+{
+	const char *file = "./test_sk_lookup_kern.o";
+	struct bpf_object *obj;
+	struct bpf_program *prog;
+	__u32 duration;
+	int err = 0;
+
+	obj = bpf_object__open(file);
+	if (IS_ERR(obj)) {
+		error_cnt++;
+		return;
+	}
+
+	bpf_object__for_each_program(prog, obj) {
+		const char *title;
+
+		/* Ignore .text sections */
+		title = bpf_program__title(prog, false);
+		if (strstr(title, ".text") != NULL)
+			continue;
+
+		bpf_program__set_type(prog, BPF_PROG_TYPE_SCHED_CLS);
+
+		/* Expect verifier failure if test name has 'fail' */
+		if (strstr(title, "fail") != NULL) {
+			libbpf_set_print(NULL, NULL, NULL);
+			err = !bpf_program__load(prog, "GPL", 0);
+			libbpf_set_print(printf, printf, NULL);
+		} else {
+			err = bpf_program__load(prog, "GPL", 0);
+		}
+		CHECK(err, title, "\n");
+	}
+	bpf_object__close(obj);
+}
+
 int main(void)
 {
 	jit_enabled = is_jit_enabled();
@@ -1427,6 +1464,7 @@ int main(void)
 	test_stacktrace_build_id();
 	test_stacktrace_map_raw_tp();
 	test_get_stack_raw_tp();
+	test_reference_tracking();
 
 	printf("Summary: %d PASSED, %d FAILED\n", pass_cnt, error_cnt);
 	return error_cnt ? EXIT_FAILURE : EXIT_SUCCESS;
diff --git a/tools/testing/selftests/bpf/test_sk_lookup_kern.c b/tools/testing/selftests/bpf/test_sk_lookup_kern.c
new file mode 100644
index 000000000000..4f7383a31916
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_sk_lookup_kern.c
@@ -0,0 +1,127 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+// Copyright (c) 2018 Covalent IO, Inc. http://covalent.io
+
+#include <stddef.h>
+#include <string.h>
+#include <linux/bpf.h>
+#include <linux/if_ether.h>
+#include <linux/in.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/pkt_cls.h>
+#include <linux/tcp.h>
+#include <sys/socket.h>
+#include "bpf_helpers.h"
+#include "bpf_endian.h"
+
+int _version SEC("version") = 1;
+char _license[] SEC("license") = "GPL";
+
+/* Fill 'tuple' with L3 info, and attempt to find L4. On fail, return NULL. */
+static void *fill_ip(struct bpf_sock_tuple *tuple, void *data, __u64 nh_off,
+		     void *data_end, __u16 eth_proto)
+{
+	__u64 ihl_len;
+
+	if (eth_proto == bpf_htons(ETH_P_IP)) {
+		struct iphdr *iph = (struct iphdr *)(data + nh_off);
+
+		if (iph + 1 > data_end)
+			return NULL;
+		ihl_len = iph->ihl * 4;
+
+		tuple->family = AF_INET;
+		tuple->proto = iph->protocol;
+		tuple->saddr.ipv4 = iph->saddr;
+		tuple->daddr.ipv4 = iph->daddr;
+	} else if (eth_proto == bpf_htons(ETH_P_IPV6)) {
+		struct ipv6hdr *ip6h = (struct ipv6hdr *)(data + nh_off);
+
+		if (ip6h + 1 > data_end)
+			return NULL;
+		ihl_len = sizeof(*ip6h);
+
+		tuple->family = AF_INET6;
+		tuple->proto = ip6h->nexthdr;
+		*((struct in6_addr *)&tuple->saddr.ipv6) = ip6h->saddr;
+		*((struct in6_addr *)&tuple->daddr.ipv6) = ip6h->daddr;
+	}
+
+	if (tuple->proto != IPPROTO_TCP)
+		return NULL;
+
+	return data + nh_off + ihl_len;
+}
+
+SEC("sk_lookup_success")
+int bpf_sk_lookup_test0(struct __sk_buff *skb)
+{
+	void *data_end = (void *)(long)skb->data_end;
+	void *data = (void *)(long)skb->data;
+	struct ethhdr *eth = (struct ethhdr *)(data);
+	struct bpf_sock_tuple tuple = {};
+	struct tcphdr *tcp;
+	struct bpf_sock *sk;
+	void *l4;
+
+	if (eth + 1 > data_end)
+		return TC_ACT_SHOT;
+
+	l4 = fill_ip(&tuple, data, sizeof(*eth), data_end, eth->h_proto);
+	if (!l4 || l4 + sizeof *tcp > data_end)
+		return TC_ACT_SHOT;
+
+	tcp = l4;
+	tuple.sport = tcp->source;
+	tuple.dport = tcp->dest;
+
+	sk = bpf_sk_lookup(skb, &tuple, sizeof(tuple), 0, 0);
+	if (sk)
+		bpf_sk_release(sk, 0);
+	return sk ? TC_ACT_OK : TC_ACT_UNSPEC;
+}
+
+SEC("fail_no_release")
+int bpf_sk_lookup_test1(struct __sk_buff *skb)
+{
+	struct bpf_sock_tuple tuple = {};
+
+	bpf_sk_lookup(skb, &tuple, sizeof(tuple), 0, 0);
+	return 0;
+}
+
+SEC("fail_release_twice")
+int bpf_sk_lookup_test2(struct __sk_buff *skb)
+{
+	struct bpf_sock_tuple tuple = {};
+	struct bpf_sock *sk;
+
+	sk = bpf_sk_lookup(skb, &tuple, sizeof(tuple), 0, 0);
+	bpf_sk_release(sk, 0);
+	bpf_sk_release(sk, 0);
+	return 0;
+}
+
+SEC("fail_release_unchecked")
+int bpf_sk_lookup_test3(struct __sk_buff *skb)
+{
+	struct bpf_sock_tuple tuple = {};
+	struct bpf_sock *sk;
+
+	sk = bpf_sk_lookup(skb, &tuple, sizeof(tuple), 0, 0);
+	bpf_sk_release(sk, 0);
+	return 0;
+}
+
+void lookup_no_release(struct __sk_buff *skb)
+{
+	struct bpf_sock_tuple tuple = {};
+	bpf_sk_lookup(skb, &tuple, sizeof(tuple), 0, 0);
+}
+
+SEC("fail_no_release_subcall")
+int bpf_sk_lookup_test4(struct __sk_buff *skb)
+{
+	lookup_no_release(skb);
+	return 0;
+}
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC bpf-next 11/11] Documentation: Describe bpf reference tracking
  2018-05-09 21:06 [RFC bpf-next 00/11] Add socket lookup support Joe Stringer
                   ` (9 preceding siblings ...)
  2018-05-09 21:07 ` [RFC bpf-next 10/11] selftests/bpf: Add C tests for reference tracking Joe Stringer
@ 2018-05-09 21:07 ` Joe Stringer
  2018-05-15  3:19   ` Alexei Starovoitov
  2018-05-16 19:05 ` [RFC bpf-next 00/11] Add socket lookup support Joe Stringer
  11 siblings, 1 reply; 26+ messages in thread
From: Joe Stringer @ 2018-05-09 21:07 UTC (permalink / raw)
  To: daniel; +Cc: netdev, ast, john.fastabend, kafai

Signed-off-by: Joe Stringer <joe@wand.net.nz>
---
 Documentation/networking/filter.txt | 64 +++++++++++++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)

diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt
index 5032e1263bc9..77be17977bc5 100644
--- a/Documentation/networking/filter.txt
+++ b/Documentation/networking/filter.txt
@@ -1125,6 +1125,14 @@ pointer type.  The types of pointers describe their base, as follows:
     PTR_TO_STACK        Frame pointer.
     PTR_TO_PACKET       skb->data.
     PTR_TO_PACKET_END   skb->data + headlen; arithmetic forbidden.
+    PTR_TO_SOCKET       Pointer to struct bpf_sock_ops, implicitly refcounted.
+    PTR_TO_SOCKET_OR_NULL
+                        Either a pointer to a socket, or NULL; socket lookup
+                        returns this type, which becomes a PTR_TO_SOCKET when
+                        checked != NULL. PTR_TO_SOCKET is reference-counted,
+                        so programs must release the reference through the
+                        socket release function before the end of the program.
+                        Arithmetic on these pointers is forbidden.
 However, a pointer may be offset from this base (as a result of pointer
 arithmetic), and this is tracked in two parts: the 'fixed offset' and 'variable
 offset'.  The former is used when an exactly-known value (e.g. an immediate
@@ -1168,6 +1176,13 @@ over the Ethernet header, then reads IHL and addes (IHL * 4), the resulting
 pointer will have a variable offset known to be 4n+2 for some n, so adding the 2
 bytes (NET_IP_ALIGN) gives a 4-byte alignment and so word-sized accesses through
 that pointer are safe.
+The 'id' field is also used on PTR_TO_SOCKET and PTR_TO_SOCKET_OR_NULL, common
+to all copies of the pointer returned from a socket lookup. This has similar
+behaviour to the handling for PTR_TO_MAP_VALUE_OR_NULL->PTR_TO_MAP_VALUE, but
+it also handles reference tracking for the pointer. PTR_TO_SOCKET implicitly
+represents a reference to the corresponding 'struct sock'. To ensure that the
+reference is not leaked, it is imperative to NULL-check the reference and in
+the non-NULL case, and pass the valid reference to the socket release function.
 
 Direct packet access
 --------------------
@@ -1441,6 +1456,55 @@ Error:
   8: (7a) *(u64 *)(r0 +0) = 1
   R0 invalid mem access 'imm'
 
+Program that performs a socket lookup then sets the pointer to NULL without
+checking it:
+value:
+  BPF_MOV64_IMM(BPF_REG_2, 0),
+  BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8),
+  BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+  BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+  BPF_MOV64_IMM(BPF_REG_3, 4),
+  BPF_MOV64_IMM(BPF_REG_4, 0),
+  BPF_MOV64_IMM(BPF_REG_5, 0),
+  BPF_EMIT_CALL(BPF_FUNC_sk_lookup),
+  BPF_MOV64_IMM(BPF_REG_0, 0),
+  BPF_EXIT_INSN(),
+Error:
+  0: (b7) r2 = 0
+  1: (63) *(u32 *)(r10 -8) = r2
+  2: (bf) r2 = r10
+  3: (07) r2 += -8
+  4: (b7) r3 = 4
+  5: (b7) r4 = 0
+  6: (b7) r5 = 0
+  7: (85) call bpf_sk_lookup#65
+  8: (b7) r0 = 0
+  9: (95) exit
+  Unreleased reference id=1, alloc_insn=7
+
+Program that performs a socket lookup but does not NULL-check the returned
+value:
+  BPF_MOV64_IMM(BPF_REG_2, 0),
+  BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8),
+  BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+  BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+  BPF_MOV64_IMM(BPF_REG_3, 4),
+  BPF_MOV64_IMM(BPF_REG_4, 0),
+  BPF_MOV64_IMM(BPF_REG_5, 0),
+  BPF_EMIT_CALL(BPF_FUNC_sk_lookup),
+  BPF_EXIT_INSN(),
+Error:
+  0: (b7) r2 = 0
+  1: (63) *(u32 *)(r10 -8) = r2
+  2: (bf) r2 = r10
+  3: (07) r2 += -8
+  4: (b7) r3 = 4
+  5: (b7) r4 = 0
+  6: (b7) r5 = 0
+  7: (85) call bpf_sk_lookup#65
+  8: (95) exit
+  Unreleased reference id=1, alloc_insn=7
+
 Testing
 -------
 
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [RFC bpf-next 07/11] bpf: Add helper to retrieve socket in BPF
  2018-05-09 21:07 ` [RFC bpf-next 07/11] bpf: Add helper to retrieve socket in BPF Joe Stringer
@ 2018-05-11  5:00   ` Martin KaFai Lau
  2018-05-11 21:08     ` Joe Stringer
  0 siblings, 1 reply; 26+ messages in thread
From: Martin KaFai Lau @ 2018-05-11  5:00 UTC (permalink / raw)
  To: Joe Stringer; +Cc: daniel, netdev, ast, john.fastabend

On Wed, May 09, 2018 at 02:07:05PM -0700, Joe Stringer wrote:
> This patch adds a new BPF helper function, sk_lookup() which allows BPF
> programs to find out if there is a socket listening on this host, and
> returns a socket pointer which the BPF program can then access to
> determine, for instance, whether to forward or drop traffic. sk_lookup()
> takes a reference on the socket, so when a BPF program makes use of this
> function, it must subsequently pass the returned pointer into the newly
> added sk_release() to return the reference.
> 
> By way of example, the following pseudocode would filter inbound
> connections at XDP if there is no corresponding service listening for
> the traffic:
> 
>   struct bpf_sock_tuple tuple;
>   struct bpf_sock_ops *sk;
> 
>   populate_tuple(ctx, &tuple); // Extract the 5tuple from the packet
>   sk = bpf_sk_lookup(ctx, &tuple, sizeof tuple, netns, 0);
>   if (!sk) {
>     // Couldn't find a socket listening for this traffic. Drop.
>     return TC_ACT_SHOT;
>   }
>   bpf_sk_release(sk, 0);
>   return TC_ACT_OK;
> 
> Signed-off-by: Joe Stringer <joe@wand.net.nz>
> ---
>  include/uapi/linux/bpf.h                  |  39 +++++++++++-
>  kernel/bpf/verifier.c                     |   8 ++-
>  net/core/filter.c                         | 102 ++++++++++++++++++++++++++++++
>  tools/include/uapi/linux/bpf.h            |  40 +++++++++++-
>  tools/testing/selftests/bpf/bpf_helpers.h |   7 ++
>  5 files changed, 193 insertions(+), 3 deletions(-)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index d615c777b573..29f38838dbca 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -1828,6 +1828,25 @@ union bpf_attr {
>   * 	Return
>   * 		0 on success, or a negative error in case of failure.
>   *
> + * struct bpf_sock_ops *bpf_sk_lookup(ctx, tuple, tuple_size, netns, flags)
> + * 	Decription
> + * 		Look for socket matching 'tuple'. The return value must be checked,
> + * 		and if non-NULL, released via bpf_sk_release().
> + * 		@ctx: pointer to ctx
> + * 		@tuple: pointer to struct bpf_sock_tuple
> + * 		@tuple_size: size of the tuple
> + * 		@flags: flags value
> + * 	Return
> + * 		pointer to socket ops on success, or
> + * 		NULL in case of failure
> + *
> + *  int bpf_sk_release(sock, flags)
> + * 	Description
> + * 		Release the reference held by 'sock'.
> + * 		@sock: Pointer reference to release. Must be found via bpf_sk_lookup().
> + * 		@flags: flags value
> + * 	Return
> + * 		0 on success, or a negative error in case of failure.
>   */
>  #define __BPF_FUNC_MAPPER(FN)		\
>  	FN(unspec),			\
> @@ -1898,7 +1917,9 @@ union bpf_attr {
>  	FN(xdp_adjust_tail),		\
>  	FN(skb_get_xfrm_state),		\
>  	FN(get_stack),			\
> -	FN(skb_load_bytes_relative),
> +	FN(skb_load_bytes_relative),	\
> +	FN(sk_lookup),			\
> +	FN(sk_release),
>  
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>   * function eBPF program intends to call
> @@ -2060,6 +2081,22 @@ struct bpf_sock {
>  				 */
>  };
>  
> +struct bpf_sock_tuple {
> +	union {
> +		__be32 ipv6[4];
> +		__be32 ipv4;
> +	} saddr;
> +	union {
> +		__be32 ipv6[4];
> +		__be32 ipv4;
> +	} daddr;
> +	__be16 sport;
> +	__be16 dport;
> +	__u32 dst_if;
> +	__u8 family;
> +	__u8 proto;
> +};
> +
>  #define XDP_PACKET_HEADROOM 256
>  
>  /* User return codes for XDP prog type.
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 92b9a5dc465a..579012c483e4 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -153,6 +153,12 @@ static const struct bpf_verifier_ops * const bpf_verifier_ops[] = {
>   * PTR_TO_MAP_VALUE, PTR_TO_SOCKET_OR_NULL becomes PTR_TO_SOCKET when the type
>   * passes through a NULL-check conditional. For the branch wherein the state is
>   * changed to CONST_IMM, the verifier releases the reference.
> + *
> + * For each helper function that allocates a reference, such as bpf_sk_lookup(),
> + * there is a corresponding release function, such as bpf_sk_release(). When
> + * a reference type passes into the release function, the verifier also releases
> + * the reference. If any unchecked or unreleased reference remains at the end of
> + * the program, the verifier rejects it.
>   */
>  
>  /* verifier_state + insn_idx are pushed to stack when branch is encountered */
> @@ -277,7 +283,7 @@ static bool arg_type_is_refcounted(enum bpf_arg_type type)
>   */
>  static bool is_release_function(enum bpf_func_id func_id)
>  {
> -	return false;
> +	return func_id == BPF_FUNC_sk_release;
>  }
>  
>  /* string representation of 'enum bpf_reg_type' */
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 4c35152fb3a8..751c255d17d3 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -58,8 +58,12 @@
>  #include <net/busy_poll.h>
>  #include <net/tcp.h>
>  #include <net/xfrm.h>
> +#include <net/udp.h>
>  #include <linux/bpf_trace.h>
>  #include <net/xdp_sock.h>
> +#include <net/inet_hashtables.h>
> +#include <net/inet6_hashtables.h>
> +#include <net/net_namespace.h>
>  
>  /**
>   *	sk_filter_trim_cap - run a packet through a socket filter
> @@ -4032,6 +4036,96 @@ static const struct bpf_func_proto bpf_skb_get_xfrm_state_proto = {
>  };
>  #endif
>  
> +struct sock *
> +sk_lookup(struct net *net, struct bpf_sock_tuple *tuple) {
Would it be possible to have another version that
returns a sk without taking its refcnt?
It may have performance benefit.

> +	int dst_if = (int)tuple->dst_if;
> +	struct in6_addr *src6;
> +	struct in6_addr *dst6;
> +
> +	if (tuple->family == AF_INET6) {
> +		src6 = (struct in6_addr *)&tuple->saddr.ipv6;
> +		dst6 = (struct in6_addr *)&tuple->daddr.ipv6;
> +	} else if (tuple->family != AF_INET) {
> +		return ERR_PTR(-EOPNOTSUPP);
> +	}
> +
> +	if (tuple->proto == IPPROTO_TCP) {
> +		if (tuple->family == AF_INET)
> +			return inet_lookup(net, &tcp_hashinfo, NULL, 0,
> +					   tuple->saddr.ipv4, tuple->sport,
> +					   tuple->daddr.ipv4, tuple->dport,
> +					   dst_if);
> +		else
> +			return inet6_lookup(net, &tcp_hashinfo, NULL, 0,
> +					    src6, tuple->sport,
> +					    dst6, tuple->dport, dst_if);
> +	} else if (tuple->proto == IPPROTO_UDP) {
> +		if (tuple->family == AF_INET)
> +			return udp4_lib_lookup(net, tuple->saddr.ipv4,
> +					       tuple->sport, tuple->daddr.ipv4,
> +					       tuple->dport, dst_if);
> +		else
> +			return udp6_lib_lookup(net, src6, tuple->sport,
> +					       dst6, tuple->dport, dst_if);
> +	} else {
> +		return ERR_PTR(-EOPNOTSUPP);
> +	}
> +
> +	return NULL;
> +}
> +
> +BPF_CALL_5(bpf_sk_lookup, struct sk_buff *, skb,
> +	   struct bpf_sock_tuple *, tuple, u32, len, u32, netns_id, u64, flags)
> +{
> +	struct net *caller_net = dev_net(skb->dev);
> +	struct sock *sk = NULL;
> +	struct net *net;
> +
> +	/* XXX: Perform verification-time checking of tuple size? */
> +	if (unlikely(len != sizeof(struct bpf_sock_tuple) || flags))
> +		goto out;
> +
> +	net = get_net_ns_by_id(caller_net, netns_id);
> +	if (unlikely(!net))
> +		goto out;
> +
> +	sk = sk_lookup(net, tuple);
> +	put_net(net);
> +	if (IS_ERR_OR_NULL(sk))
> +		sk = NULL;
> +	else
> +		sk = sk_to_full_sk(sk);
> +out:
> +	return (unsigned long) sk;
> +}
> +
> +static const struct bpf_func_proto bpf_sk_lookup_proto = {
> +	.func		= bpf_sk_lookup,
> +	.gpl_only	= false,
> +	.ret_type	= RET_PTR_TO_SOCKET_OR_NULL,
> +	.arg1_type	= ARG_PTR_TO_CTX,
> +	.arg2_type	= ARG_PTR_TO_MEM,
> +	.arg3_type	= ARG_CONST_SIZE,
> +	.arg4_type	= ARG_ANYTHING,
> +	.arg5_type	= ARG_ANYTHING,
> +};
> +
> +BPF_CALL_2(bpf_sk_release, struct sock *, sk, u64, flags)
> +{
> +	sock_gen_put(sk);
> +	if (unlikely(flags))
> +		return -EINVAL;
> +	return 0;
> +}
> +
> +static const struct bpf_func_proto bpf_sk_release_proto = {
> +	.func		= bpf_sk_release,
> +	.gpl_only	= false,
> +	.ret_type	= RET_INTEGER,
> +	.arg1_type	= ARG_PTR_TO_SOCKET,
> +	.arg2_type	= ARG_ANYTHING,
> +};
> +
>  static const struct bpf_func_proto *
>  bpf_base_func_proto(enum bpf_func_id func_id)
>  {
> @@ -4181,6 +4275,10 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>  	case BPF_FUNC_skb_get_xfrm_state:
>  		return &bpf_skb_get_xfrm_state_proto;
>  #endif
> +	case BPF_FUNC_sk_lookup:
> +		return &bpf_sk_lookup_proto;
> +	case BPF_FUNC_sk_release:
> +		return &bpf_sk_release_proto;
>  	default:
>  		return bpf_base_func_proto(func_id);
>  	}
> @@ -4292,6 +4390,10 @@ sk_skb_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>  		return &bpf_get_socket_uid_proto;
>  	case BPF_FUNC_sk_redirect_map:
>  		return &bpf_sk_redirect_map_proto;
> +	case BPF_FUNC_sk_lookup:
> +		return &bpf_sk_lookup_proto;
> +	case BPF_FUNC_sk_release:
> +		return &bpf_sk_release_proto;
>  	default:
>  		return bpf_base_func_proto(func_id);
>  	}
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index fff51c187d1e..29f38838dbca 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -117,6 +117,7 @@ enum bpf_map_type {
>  	BPF_MAP_TYPE_DEVMAP,
>  	BPF_MAP_TYPE_SOCKMAP,
>  	BPF_MAP_TYPE_CPUMAP,
> +	BPF_MAP_TYPE_XSKMAP,
>  };
>  
>  enum bpf_prog_type {
> @@ -1827,6 +1828,25 @@ union bpf_attr {
>   * 	Return
>   * 		0 on success, or a negative error in case of failure.
>   *
> + * struct bpf_sock_ops *bpf_sk_lookup(ctx, tuple, tuple_size, netns, flags)
> + * 	Decription
> + * 		Look for socket matching 'tuple'. The return value must be checked,
> + * 		and if non-NULL, released via bpf_sk_release().
> + * 		@ctx: pointer to ctx
> + * 		@tuple: pointer to struct bpf_sock_tuple
> + * 		@tuple_size: size of the tuple
> + * 		@flags: flags value
> + * 	Return
> + * 		pointer to socket ops on success, or
> + * 		NULL in case of failure
> + *
> + *  int bpf_sk_release(sock, flags)
> + * 	Description
> + * 		Release the reference held by 'sock'.
> + * 		@sock: Pointer reference to release. Must be found via bpf_sk_lookup().
> + * 		@flags: flags value
> + * 	Return
> + * 		0 on success, or a negative error in case of failure.
>   */
>  #define __BPF_FUNC_MAPPER(FN)		\
>  	FN(unspec),			\
> @@ -1897,7 +1917,9 @@ union bpf_attr {
>  	FN(xdp_adjust_tail),		\
>  	FN(skb_get_xfrm_state),		\
>  	FN(get_stack),			\
> -	FN(skb_load_bytes_relative),
> +	FN(skb_load_bytes_relative),	\
> +	FN(sk_lookup),			\
> +	FN(sk_release),
>  
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>   * function eBPF program intends to call
> @@ -2059,6 +2081,22 @@ struct bpf_sock {
>  				 */
>  };
>  
> +struct bpf_sock_tuple {
> +	union {
> +		__be32 ipv6[4];
> +		__be32 ipv4;
> +	} saddr;
> +	union {
> +		__be32 ipv6[4];
> +		__be32 ipv4;
> +	} daddr;
> +	__be16 sport;
> +	__be16 dport;
> +	__u32 dst_if;
> +	__u8 family;
> +	__u8 proto;
> +};
> +
>  #define XDP_PACKET_HEADROOM 256
>  
>  /* User return codes for XDP prog type.
> diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
> index 265f8e0e8ada..4dc311ea0c16 100644
> --- a/tools/testing/selftests/bpf/bpf_helpers.h
> +++ b/tools/testing/selftests/bpf/bpf_helpers.h
> @@ -103,6 +103,13 @@ static int (*bpf_skb_get_xfrm_state)(void *ctx, int index, void *state,
>  	(void *) BPF_FUNC_skb_get_xfrm_state;
>  static int (*bpf_get_stack)(void *ctx, void *buf, int size, int flags) =
>  	(void *) BPF_FUNC_get_stack;
> +static struct bpf_sock *(*bpf_sk_lookup)(void *ctx,
> +					 struct bpf_sock_tuple *tuple,
> +					 int size, unsigned int netns_id,
> +					 unsigned long long flags) =
> +	(void *) BPF_FUNC_sk_lookup;
> +static int (*bpf_sk_release)(struct bpf_sock *sk, unsigned long long flags) =
> +	(void *) BPF_FUNC_sk_release;
>  
>  /* llvm builtin functions that eBPF C program may use to
>   * emit BPF_LD_ABS and BPF_LD_IND instructions
> -- 
> 2.14.1
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC bpf-next 07/11] bpf: Add helper to retrieve socket in BPF
  2018-05-11  5:00   ` Martin KaFai Lau
@ 2018-05-11 21:08     ` Joe Stringer
  2018-05-11 21:41       ` Martin KaFai Lau
  0 siblings, 1 reply; 26+ messages in thread
From: Joe Stringer @ 2018-05-11 21:08 UTC (permalink / raw)
  To: Martin KaFai Lau; +Cc: Joe Stringer, daniel, netdev, ast, john fastabend

On 10 May 2018 at 22:00, Martin KaFai Lau <kafai@fb.com> wrote:
> On Wed, May 09, 2018 at 02:07:05PM -0700, Joe Stringer wrote:
>> This patch adds a new BPF helper function, sk_lookup() which allows BPF
>> programs to find out if there is a socket listening on this host, and
>> returns a socket pointer which the BPF program can then access to
>> determine, for instance, whether to forward or drop traffic. sk_lookup()
>> takes a reference on the socket, so when a BPF program makes use of this
>> function, it must subsequently pass the returned pointer into the newly
>> added sk_release() to return the reference.
>>
>> By way of example, the following pseudocode would filter inbound
>> connections at XDP if there is no corresponding service listening for
>> the traffic:
>>
>>   struct bpf_sock_tuple tuple;
>>   struct bpf_sock_ops *sk;
>>
>>   populate_tuple(ctx, &tuple); // Extract the 5tuple from the packet
>>   sk = bpf_sk_lookup(ctx, &tuple, sizeof tuple, netns, 0);
>>   if (!sk) {
>>     // Couldn't find a socket listening for this traffic. Drop.
>>     return TC_ACT_SHOT;
>>   }
>>   bpf_sk_release(sk, 0);
>>   return TC_ACT_OK;
>>
>> Signed-off-by: Joe Stringer <joe@wand.net.nz>
>> ---

...

>> @@ -4032,6 +4036,96 @@ static const struct bpf_func_proto bpf_skb_get_xfrm_state_proto = {
>>  };
>>  #endif
>>
>> +struct sock *
>> +sk_lookup(struct net *net, struct bpf_sock_tuple *tuple) {
> Would it be possible to have another version that
> returns a sk without taking its refcnt?
> It may have performance benefit.

Not really. The sockets are not RCU-protected, and established sockets
may be torn down without notice. If we don't take a reference, there's
no guarantee that the socket will continue to exist for the duration
of running the BPF program.

>From what I follow, the comment below has a hidden implication which
is that sockets without SOCK_RCU_FREE, eg established sockets, may be
directly freed regardless of RCU.

/* Sockets having SOCK_RCU_FREE will call this function after one RCU
 * grace period. This is the case for UDP sockets and TCP listeners.
 */
static void __sk_destruct(struct rcu_head *head)
...

Therefore without the refcount, it won't be safe.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC bpf-next 07/11] bpf: Add helper to retrieve socket in BPF
  2018-05-11 21:08     ` Joe Stringer
@ 2018-05-11 21:41       ` Martin KaFai Lau
  2018-05-12  0:54         ` Joe Stringer
  0 siblings, 1 reply; 26+ messages in thread
From: Martin KaFai Lau @ 2018-05-11 21:41 UTC (permalink / raw)
  To: Joe Stringer; +Cc: daniel, netdev, ast, john fastabend

On Fri, May 11, 2018 at 02:08:01PM -0700, Joe Stringer wrote:
> On 10 May 2018 at 22:00, Martin KaFai Lau <kafai@fb.com> wrote:
> > On Wed, May 09, 2018 at 02:07:05PM -0700, Joe Stringer wrote:
> >> This patch adds a new BPF helper function, sk_lookup() which allows BPF
> >> programs to find out if there is a socket listening on this host, and
> >> returns a socket pointer which the BPF program can then access to
> >> determine, for instance, whether to forward or drop traffic. sk_lookup()
> >> takes a reference on the socket, so when a BPF program makes use of this
> >> function, it must subsequently pass the returned pointer into the newly
> >> added sk_release() to return the reference.
> >>
> >> By way of example, the following pseudocode would filter inbound
> >> connections at XDP if there is no corresponding service listening for
> >> the traffic:
> >>
> >>   struct bpf_sock_tuple tuple;
> >>   struct bpf_sock_ops *sk;
> >>
> >>   populate_tuple(ctx, &tuple); // Extract the 5tuple from the packet
> >>   sk = bpf_sk_lookup(ctx, &tuple, sizeof tuple, netns, 0);
> >>   if (!sk) {
> >>     // Couldn't find a socket listening for this traffic. Drop.
> >>     return TC_ACT_SHOT;
> >>   }
> >>   bpf_sk_release(sk, 0);
> >>   return TC_ACT_OK;
> >>
> >> Signed-off-by: Joe Stringer <joe@wand.net.nz>
> >> ---
> 
> ...
> 
> >> @@ -4032,6 +4036,96 @@ static const struct bpf_func_proto bpf_skb_get_xfrm_state_proto = {
> >>  };
> >>  #endif
> >>
> >> +struct sock *
> >> +sk_lookup(struct net *net, struct bpf_sock_tuple *tuple) {
> > Would it be possible to have another version that
> > returns a sk without taking its refcnt?
> > It may have performance benefit.
> 
> Not really. The sockets are not RCU-protected, and established sockets
> may be torn down without notice. If we don't take a reference, there's
> no guarantee that the socket will continue to exist for the duration
> of running the BPF program.
> 
> From what I follow, the comment below has a hidden implication which
> is that sockets without SOCK_RCU_FREE, eg established sockets, may be
> directly freed regardless of RCU.
Right, SOCK_RCU_FREE sk is the one I am concern about.
For example, TCP_LISTEN socket does not require taking a refcnt
now.  Doing a bpf_sk_lookup() may have a rather big
impact on handling TCP syn flood.  or the usual intention
is to redirect instead of passing it up to the stack?


> 
> /* Sockets having SOCK_RCU_FREE will call this function after one RCU
>  * grace period. This is the case for UDP sockets and TCP listeners.
>  */
> static void __sk_destruct(struct rcu_head *head)
> ...
> 
> Therefore without the refcount, it won't be safe.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC bpf-next 07/11] bpf: Add helper to retrieve socket in BPF
  2018-05-11 21:41       ` Martin KaFai Lau
@ 2018-05-12  0:54         ` Joe Stringer
  2018-05-15  3:16           ` Alexei Starovoitov
  0 siblings, 1 reply; 26+ messages in thread
From: Joe Stringer @ 2018-05-12  0:54 UTC (permalink / raw)
  To: Martin KaFai Lau; +Cc: Joe Stringer, daniel, netdev, ast, john fastabend

On 11 May 2018 at 14:41, Martin KaFai Lau <kafai@fb.com> wrote:
> On Fri, May 11, 2018 at 02:08:01PM -0700, Joe Stringer wrote:
>> On 10 May 2018 at 22:00, Martin KaFai Lau <kafai@fb.com> wrote:
>> > On Wed, May 09, 2018 at 02:07:05PM -0700, Joe Stringer wrote:
>> >> This patch adds a new BPF helper function, sk_lookup() which allows BPF
>> >> programs to find out if there is a socket listening on this host, and
>> >> returns a socket pointer which the BPF program can then access to
>> >> determine, for instance, whether to forward or drop traffic. sk_lookup()
>> >> takes a reference on the socket, so when a BPF program makes use of this
>> >> function, it must subsequently pass the returned pointer into the newly
>> >> added sk_release() to return the reference.
>> >>
>> >> By way of example, the following pseudocode would filter inbound
>> >> connections at XDP if there is no corresponding service listening for
>> >> the traffic:
>> >>
>> >>   struct bpf_sock_tuple tuple;
>> >>   struct bpf_sock_ops *sk;
>> >>
>> >>   populate_tuple(ctx, &tuple); // Extract the 5tuple from the packet
>> >>   sk = bpf_sk_lookup(ctx, &tuple, sizeof tuple, netns, 0);
>> >>   if (!sk) {
>> >>     // Couldn't find a socket listening for this traffic. Drop.
>> >>     return TC_ACT_SHOT;
>> >>   }
>> >>   bpf_sk_release(sk, 0);
>> >>   return TC_ACT_OK;
>> >>
>> >> Signed-off-by: Joe Stringer <joe@wand.net.nz>
>> >> ---
>>
>> ...
>>
>> >> @@ -4032,6 +4036,96 @@ static const struct bpf_func_proto bpf_skb_get_xfrm_state_proto = {
>> >>  };
>> >>  #endif
>> >>
>> >> +struct sock *
>> >> +sk_lookup(struct net *net, struct bpf_sock_tuple *tuple) {
>> > Would it be possible to have another version that
>> > returns a sk without taking its refcnt?
>> > It may have performance benefit.
>>
>> Not really. The sockets are not RCU-protected, and established sockets
>> may be torn down without notice. If we don't take a reference, there's
>> no guarantee that the socket will continue to exist for the duration
>> of running the BPF program.
>>
>> From what I follow, the comment below has a hidden implication which
>> is that sockets without SOCK_RCU_FREE, eg established sockets, may be
>> directly freed regardless of RCU.
> Right, SOCK_RCU_FREE sk is the one I am concern about.
> For example, TCP_LISTEN socket does not require taking a refcnt
> now.  Doing a bpf_sk_lookup() may have a rather big
> impact on handling TCP syn flood.  or the usual intention
> is to redirect instead of passing it up to the stack?

I see, if you're only interested in listen sockets then probably this
series could be extended with a new flag, eg something like
BPF_F_SK_FIND_LISTENERS which restricts the set of possible sockets
found to only listen sockets, then the implementation would call into
__inet_lookup_listener() instead of inet_lookup(). The presence of
that flag in the relevant register during CALL instruction would show
that the verifier should not reference-track the result, then there'd
need to be a check on the release to ensure that this unreferenced
socket is never released. Just a thought, completely untested and I
could still be missing some detail..

That said, I don't completely follow how you would expect to handle
the traffic for sockets that are already established - the helper
would no longer find those sockets, so you wouldn't know whether to
pass the traffic up the stack for established traffic or not.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC bpf-next 04/11] bpf: Add PTR_TO_SOCKET verifier type
  2018-05-09 21:07 ` [RFC bpf-next 04/11] bpf: Add PTR_TO_SOCKET verifier type Joe Stringer
@ 2018-05-15  2:37   ` Alexei Starovoitov
  2018-05-16 23:56     ` Joe Stringer
  0 siblings, 1 reply; 26+ messages in thread
From: Alexei Starovoitov @ 2018-05-15  2:37 UTC (permalink / raw)
  To: Joe Stringer; +Cc: daniel, netdev, ast, john.fastabend, kafai

On Wed, May 09, 2018 at 02:07:02PM -0700, Joe Stringer wrote:
> Teach the verifier a little bit about a new type of pointer, a
> PTR_TO_SOCKET. This pointer type is accessed from BPF through the
> 'struct bpf_sock' structure.
> 
> Signed-off-by: Joe Stringer <joe@wand.net.nz>
> ---
>  include/linux/bpf.h          | 19 +++++++++-
>  include/linux/bpf_verifier.h |  2 ++
>  kernel/bpf/verifier.c        | 86 ++++++++++++++++++++++++++++++++++++++------
>  net/core/filter.c            | 30 +++++++++-------
>  4 files changed, 114 insertions(+), 23 deletions(-)

Ack for patches 1-3. In this one few nits:

> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index a38e474bf7ee..a03b4b0edcb6 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -136,7 +136,7 @@ enum bpf_arg_type {
>  	/* the following constraints used to prototype bpf_memcmp() and other
>  	 * functions that access data on eBPF program stack
>  	 */
> -	ARG_PTR_TO_MEM,		/* pointer to valid memory (stack, packet, map value) */
> +	ARG_PTR_TO_MEM,		/* pointer to valid memory (stack, packet, map value, socket) */

I don't see where in this patch this change happens...

>  	ARG_PTR_TO_MEM_OR_NULL, /* pointer to valid memory or NULL */
>  	ARG_PTR_TO_UNINIT_MEM,	/* pointer to memory does not need to be initialized,
>  				 * helper function must fill all bytes or clear
> @@ -148,6 +148,7 @@ enum bpf_arg_type {
>  
>  	ARG_PTR_TO_CTX,		/* pointer to context */
>  	ARG_ANYTHING,		/* any (initialized) argument is ok */
> +	ARG_PTR_TO_SOCKET,	/* pointer to bpf_sock */
>  };
>  
>  /* type of values returned from helper functions */
> @@ -155,6 +156,7 @@ enum bpf_return_type {
>  	RET_INTEGER,			/* function returns integer */
>  	RET_VOID,			/* function doesn't return anything */
>  	RET_PTR_TO_MAP_VALUE_OR_NULL,	/* returns a pointer to map elem value or NULL */
> +	RET_PTR_TO_SOCKET_OR_NULL,	/* returns a pointer to a socket or NULL */
>  };
>  
>  /* eBPF function prototype used by verifier to allow BPF_CALLs from eBPF programs
> @@ -205,6 +207,8 @@ enum bpf_reg_type {
>  	PTR_TO_PACKET_META,	 /* skb->data - meta_len */
>  	PTR_TO_PACKET,		 /* reg points to skb->data */
>  	PTR_TO_PACKET_END,	 /* skb->data + headlen */
> +	PTR_TO_SOCKET,		 /* reg points to struct bpf_sock */
> +	PTR_TO_SOCKET_OR_NULL,	 /* reg points to struct bpf_sock or NULL */
>  };
>  
>  /* The information passed from prog-specific *_is_valid_access
> @@ -326,6 +330,11 @@ const struct bpf_func_proto *bpf_get_trace_printk_proto(void);
>  
>  typedef unsigned long (*bpf_ctx_copy_t)(void *dst, const void *src,
>  					unsigned long off, unsigned long len);
> +typedef u32 (*bpf_convert_ctx_access_t)(enum bpf_access_type type,
> +					const struct bpf_insn *src,
> +					struct bpf_insn *dst,
> +					struct bpf_prog *prog,
> +					u32 *target_size);
>  
>  u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size,
>  		     void *ctx, u64 ctx_size, bpf_ctx_copy_t ctx_copy);
> @@ -729,4 +738,12 @@ extern const struct bpf_func_proto bpf_sock_map_update_proto;
>  void bpf_user_rnd_init_once(void);
>  u64 bpf_user_rnd_u32(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
>  
> +bool bpf_sock_is_valid_access(int off, int size, enum bpf_access_type type,
> +			      struct bpf_insn_access_aux *info);
> +u32 bpf_sock_convert_ctx_access(enum bpf_access_type type,
> +			        const struct bpf_insn *si,
> +			        struct bpf_insn *insn_buf,
> +			        struct bpf_prog *prog,
> +			        u32 *target_size);
> +
>  #endif /* _LINUX_BPF_H */
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index a613b52ce939..9dcd87f1d322 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -57,6 +57,8 @@ struct bpf_reg_state {
>  	 * offset, so they can share range knowledge.
>  	 * For PTR_TO_MAP_VALUE_OR_NULL this is used to share which map value we
>  	 * came from, when one is tested for != NULL.
> +	 * For PTR_TO_SOCKET this is used to share which pointers retain the
> +	 * same reference to the socket, to determine proper reference freeing.
>  	 */
>  	u32 id;
>  	/* Ordering of fields matters.  See states_equal() */
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 1b31b805dea4..d38c7c1e9da6 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -80,8 +80,8 @@ static const struct bpf_verifier_ops * const bpf_verifier_ops[] = {
>   * (like pointer plus pointer becomes SCALAR_VALUE type)
>   *
>   * When verifier sees load or store instructions the type of base register
> - * can be: PTR_TO_MAP_VALUE, PTR_TO_CTX, PTR_TO_STACK. These are three pointer
> - * types recognized by check_mem_access() function.
> + * can be: PTR_TO_MAP_VALUE, PTR_TO_CTX, PTR_TO_STACK, PTR_TO_SOCKET. These are
> + * four pointer types recognized by check_mem_access() function.
>   *
>   * PTR_TO_MAP_VALUE means that this register is pointing to 'map element value'
>   * and the range of [ptr, ptr + map's value_size) is accessible.
> @@ -244,6 +244,8 @@ static const char * const reg_type_str[] = {
>  	[PTR_TO_PACKET]		= "pkt",
>  	[PTR_TO_PACKET_META]	= "pkt_meta",
>  	[PTR_TO_PACKET_END]	= "pkt_end",
> +	[PTR_TO_SOCKET]		= "sock",
> +	[PTR_TO_SOCKET_OR_NULL] = "sock_or_null",
>  };
>  
>  static void print_liveness(struct bpf_verifier_env *env,
> @@ -977,6 +979,8 @@ static bool is_spillable_regtype(enum bpf_reg_type type)
>  	case PTR_TO_PACKET_META:
>  	case PTR_TO_PACKET_END:
>  	case CONST_PTR_TO_MAP:
> +	case PTR_TO_SOCKET:
> +	case PTR_TO_SOCKET_OR_NULL:
>  		return true;
>  	default:
>  		return false;
> @@ -1360,6 +1364,28 @@ static int check_ctx_access(struct bpf_verifier_env *env, int insn_idx, int off,
>  	return -EACCES;
>  }
>  
> +static int check_sock_access(struct bpf_verifier_env *env, u32 regno, int off,
> +			     int size, enum bpf_access_type t)
> +{
> +	struct bpf_reg_state *regs = cur_regs(env);
> +	struct bpf_reg_state *reg = &regs[regno];
> +	struct bpf_insn_access_aux info;
> +
> +	if (reg->smin_value < 0) {
> +		verbose(env, "R%d min value is negative, either use unsigned index or do a if (index >=0) check.\n",
> +			regno);
> +		return -EACCES;
> +	}
> +
> +	if (!bpf_sock_is_valid_access(off, size, t, &info)) {
> +		verbose(env, "invalid bpf_sock_ops access off=%d size=%d\n",
> +			off, size);
> +		return -EACCES;
> +	}
> +
> +	return 0;
> +}
> +
>  static bool __is_pointer_value(bool allow_ptr_leaks,
>  			       const struct bpf_reg_state *reg)
>  {
> @@ -1475,6 +1501,9 @@ static int check_ptr_alignment(struct bpf_verifier_env *env,
>  		 */
>  		strict = true;
>  		break;
> +	case PTR_TO_SOCKET:
> +		pointer_desc = "sock ";
> +		break;
>  	default:
>  		break;
>  	}
> @@ -1723,6 +1752,16 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
>  		err = check_packet_access(env, regno, off, size, false);
>  		if (!err && t == BPF_READ && value_regno >= 0)
>  			mark_reg_unknown(env, regs, value_regno);
> +
> +	} else if (reg->type == PTR_TO_SOCKET) {
> +		if (t == BPF_WRITE) {
> +			verbose(env, "cannot write into socket\n");
> +			return -EACCES;
> +		}
> +		err = check_sock_access(env, regno, off, size, t);
> +		if (!err && t == BPF_READ && value_regno >= 0)

t == BPF_READ check is unnecessary.

> +			mark_reg_unknown(env, regs, value_regno);
> +
>  	} else {
>  		verbose(env, "R%d invalid mem access '%s'\n", regno,
>  			reg_type_str[reg->type]);
> @@ -1941,6 +1980,10 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
>  		expected_type = PTR_TO_CTX;
>  		if (type != expected_type)
>  			goto err_type;
> +	} else if (arg_type == ARG_PTR_TO_SOCKET) {
> +		expected_type = PTR_TO_SOCKET;
> +		if (type != expected_type)
> +			goto err_type;
>  	} else if (arg_type_is_mem_ptr(arg_type)) {
>  		expected_type = PTR_TO_STACK;
>  		/* One exception here. In case function allows for NULL to be
> @@ -2477,6 +2520,10 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn
>  			insn_aux->map_ptr = meta.map_ptr;
>  		else if (insn_aux->map_ptr != meta.map_ptr)
>  			insn_aux->map_ptr = BPF_MAP_PTR_POISON;
> +	} else if (fn->ret_type == RET_PTR_TO_SOCKET_OR_NULL) {
> +		mark_reg_known_zero(env, regs, BPF_REG_0);
> +		regs[BPF_REG_0].type = PTR_TO_SOCKET_OR_NULL;
> +		regs[BPF_REG_0].id = ++env->id_gen;
>  	} else {
>  		verbose(env, "unknown return type %d of func %s#%d\n",
>  			fn->ret_type, func_id_name(func_id), func_id);
> @@ -2614,6 +2661,8 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
>  		return -EACCES;
>  	case CONST_PTR_TO_MAP:
>  	case PTR_TO_PACKET_END:
> +	case PTR_TO_SOCKET:
> +	case PTR_TO_SOCKET_OR_NULL:
>  		verbose(env, "R%d pointer arithmetic on %s prohibited\n",
>  			dst, reg_type_str[ptr_reg->type]);
>  		return -EACCES;
> @@ -3559,6 +3608,8 @@ static void mark_ptr_or_null_reg(struct bpf_reg_state *reg, u32 id,
>  			} else {
>  				reg->type = PTR_TO_MAP_VALUE;
>  			}
> +		} else if (reg->type == PTR_TO_SOCKET_OR_NULL) {
> +			reg->type = PTR_TO_SOCKET;
>  		}
>  		/* We don't need id from this point onwards anymore, thus we
>  		 * should better reset it, so that state pruning has chances
> @@ -4333,6 +4384,8 @@ static bool regsafe(struct bpf_reg_state *rold, struct bpf_reg_state *rcur,
>  	case PTR_TO_CTX:
>  	case CONST_PTR_TO_MAP:
>  	case PTR_TO_PACKET_END:
> +	case PTR_TO_SOCKET:
> +	case PTR_TO_SOCKET_OR_NULL:
>  		/* Only valid matches are exact, which memcmp() above
>  		 * would have accepted
>  		 */
> @@ -5188,10 +5241,14 @@ static void sanitize_dead_code(struct bpf_verifier_env *env)
>  	}
>  }
>  
> -/* convert load instructions that access fields of 'struct __sk_buff'
> - * into sequence of instructions that access fields of 'struct sk_buff'
> +/* convert load instructions that access fields of a context type into a
> + * sequence of instructions that access fields of the underlying structure:
> + *     struct __sk_buff    -> struct sk_buff
> + *     struct bpf_sock_ops -> struct sock
>   */
> -static int convert_ctx_accesses(struct bpf_verifier_env *env)
> +static int convert_ctx_accesses(struct bpf_verifier_env *env,
> +				bpf_convert_ctx_access_t convert_ctx_access,
> +				enum bpf_reg_type ctx_type)
>  {
>  	const struct bpf_verifier_ops *ops = env->ops;
>  	int i, cnt, size, ctx_field_size, delta = 0;
> @@ -5218,12 +5275,14 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
>  		}
>  	}
>  
> -	if (!ops->convert_ctx_access || bpf_prog_is_dev_bound(env->prog->aux))
> +	if (!convert_ctx_access || bpf_prog_is_dev_bound(env->prog->aux))
>  		return 0;
>  
>  	insn = env->prog->insnsi + delta;
>  
>  	for (i = 0; i < insn_cnt; i++, insn++) {
> +		enum bpf_reg_type ptr_type;
> +
>  		if (insn->code == (BPF_LDX | BPF_MEM | BPF_B) ||
>  		    insn->code == (BPF_LDX | BPF_MEM | BPF_H) ||
>  		    insn->code == (BPF_LDX | BPF_MEM | BPF_W) ||
> @@ -5237,7 +5296,8 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
>  		else
>  			continue;
>  
> -		if (env->insn_aux_data[i + delta].ptr_type != PTR_TO_CTX)
> +		ptr_type = env->insn_aux_data[i + delta].ptr_type;
> +		if (ptr_type != ctx_type)
>  			continue;
>  
>  		ctx_field_size = env->insn_aux_data[i + delta].ctx_field_size;
> @@ -5269,8 +5329,8 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
>  		}
>  
>  		target_size = 0;
> -		cnt = ops->convert_ctx_access(type, insn, insn_buf, env->prog,
> -					      &target_size);
> +		cnt = convert_ctx_access(type, insn, insn_buf, env->prog,
> +					 &target_size);
>  		if (cnt == 0 || cnt >= ARRAY_SIZE(insn_buf) ||
>  		    (ctx_field_size && !target_size)) {
>  			verbose(env, "bpf verifier is misconfigured\n");
> @@ -5785,7 +5845,13 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr)
>  
>  	if (ret == 0)
>  		/* program is valid, convert *(u32*)(ctx + off) accesses */
> -		ret = convert_ctx_accesses(env);
> +		ret = convert_ctx_accesses(env, env->ops->convert_ctx_access,
> +					   PTR_TO_CTX);
> +
> +	if (ret == 0)
> +		/* Convert *(u32*)(sock_ops + off) accesses */
> +		ret = convert_ctx_accesses(env, bpf_sock_convert_ctx_access,
> +					   PTR_TO_SOCKET);

Overall looks great.
Only this part is missing for PTR_TO_SOCKET:
     } else if (dst_reg_type != *prev_dst_type &&
                (dst_reg_type == PTR_TO_CTX ||
                 *prev_dst_type == PTR_TO_CTX)) {
             verbose(env, "same insn cannot be used with different pointers\n");
             return -EINVAL;
similar logic has to be added.
Otherwise the following will be accepted:

R1 = sock_ptr
goto X;
...
R1 = some_other_valid_ptr;
goto X;
...

R2 = *(u32 *)(R1 + 0);
this will be rewritten for first branch,
but it's wrong for second.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC bpf-next 06/11] bpf: Add reference tracking to verifier
  2018-05-09 21:07 ` [RFC bpf-next 06/11] bpf: Add reference tracking to verifier Joe Stringer
@ 2018-05-15  3:04   ` Alexei Starovoitov
  2018-05-17  1:05     ` Joe Stringer
  0 siblings, 1 reply; 26+ messages in thread
From: Alexei Starovoitov @ 2018-05-15  3:04 UTC (permalink / raw)
  To: Joe Stringer; +Cc: daniel, netdev, ast, john.fastabend, kafai

On Wed, May 09, 2018 at 02:07:04PM -0700, Joe Stringer wrote:
> Allow helper functions to acquire a reference and return it into a
> register. Specific pointer types such as the PTR_TO_SOCKET will
> implicitly represent such a reference. The verifier must ensure that
> these references are released exactly once in each path through the
> program.
> 
> To achieve this, this commit assigns an id to the pointer and tracks it
> in the 'bpf_func_state', then when the function or program exits,
> verifies that all of the acquired references have been freed. When the
> pointer is passed to a function that frees the reference, it is removed
> from the 'bpf_func_state` and all existing copies of the pointer in
> registers are marked invalid.
> 
> Signed-off-by: Joe Stringer <joe@wand.net.nz>
> ---
>  include/linux/bpf_verifier.h |  18 ++-
>  kernel/bpf/verifier.c        | 295 ++++++++++++++++++++++++++++++++++++++++---
>  2 files changed, 292 insertions(+), 21 deletions(-)
> 
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index 9dcd87f1d322..8dbee360b3ec 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -104,6 +104,11 @@ struct bpf_stack_state {
>  	u8 slot_type[BPF_REG_SIZE];
>  };
>  
> +struct bpf_reference_state {
> +	int id;
> +	int insn_idx; /* allocation insn */

the insn_idx is for more verbose messages, right?
It doesn't seem to affect the safety of algorithm.
Please add a comment to clarify that.

> +};
> +
>  /* state of the program:
>   * type of all registers and stack info
>   */
> @@ -122,7 +127,9 @@ struct bpf_func_state {
>  	 */
>  	u32 subprogno;
>  
> -	/* should be second to last. See copy_func_state() */
> +	/* The following fields should be last. See copy_func_state() */
> +	int acquired_refs;
> +	struct bpf_reference_state *refs;
>  	int allocated_stack;
>  	struct bpf_stack_state *stack;
>  };
> @@ -218,11 +225,16 @@ void bpf_verifier_vlog(struct bpf_verifier_log *log, const char *fmt,
>  __printf(2, 3) void bpf_verifier_log_write(struct bpf_verifier_env *env,
>  					   const char *fmt, ...);
>  
> -static inline struct bpf_reg_state *cur_regs(struct bpf_verifier_env *env)
> +static inline struct bpf_func_state *cur_func(struct bpf_verifier_env *env)
>  {
>  	struct bpf_verifier_state *cur = env->cur_state;
>  
> -	return cur->frame[cur->curframe]->regs;
> +	return cur->frame[cur->curframe];
> +}
> +
> +static inline struct bpf_reg_state *cur_regs(struct bpf_verifier_env *env)
> +{
> +	return cur_func(env)->regs;
>  }
>  
>  int bpf_prog_offload_verifier_prep(struct bpf_verifier_env *env);
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index f426ebf2b6bf..92b9a5dc465a 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -1,5 +1,6 @@
>  /* Copyright (c) 2011-2014 PLUMgrid, http://plumgrid.com
>   * Copyright (c) 2016 Facebook
> + * Copyright (c) 2018 Covalent IO, Inc. http://covalent.io
>   *
>   * This program is free software; you can redistribute it and/or
>   * modify it under the terms of version 2 of the GNU General Public
> @@ -140,6 +141,18 @@ static const struct bpf_verifier_ops * const bpf_verifier_ops[] = {
>   *
>   * After the call R0 is set to return type of the function and registers R1-R5
>   * are set to NOT_INIT to indicate that they are no longer readable.
> + *
> + * The following reference types represent a potential reference to a kernel
> + * resource which, after first being allocated, must be checked and freed by
> + * the BPF program:
> + * - PTR_TO_SOCKET_OR_NULL, PTR_TO_SOCKET
> + *
> + * When the verifier sees a helper call return a reference type, it allocates a
> + * pointer id for the reference and stores it in the current function state.
> + * Similar to the way that PTR_TO_MAP_VALUE_OR_NULL is converted into
> + * PTR_TO_MAP_VALUE, PTR_TO_SOCKET_OR_NULL becomes PTR_TO_SOCKET when the type
> + * passes through a NULL-check conditional. For the branch wherein the state is
> + * changed to CONST_IMM, the verifier releases the reference.
>   */
>  
>  /* verifier_state + insn_idx are pushed to stack when branch is encountered */
> @@ -229,7 +242,42 @@ static bool type_is_pkt_pointer(enum bpf_reg_type type)
>  
>  static bool reg_type_may_be_null(enum bpf_reg_type type)
>  {
> -	return type == PTR_TO_MAP_VALUE_OR_NULL;
> +	return type == PTR_TO_MAP_VALUE_OR_NULL ||
> +	       type == PTR_TO_SOCKET_OR_NULL;
> +}
> +
> +static bool type_is_refcounted(enum bpf_reg_type type)
> +{
> +	return type == PTR_TO_SOCKET;
> +}
> +
> +static bool type_is_refcounted_or_null(enum bpf_reg_type type)
> +{
> +	return type == PTR_TO_SOCKET || type == PTR_TO_SOCKET_OR_NULL;
> +}
> +
> +static bool reg_is_refcounted(const struct bpf_reg_state *reg)
> +{
> +	return type_is_refcounted(reg->type);
> +}
> +
> +static bool reg_is_refcounted_or_null(const struct bpf_reg_state *reg)
> +{
> +	return type_is_refcounted_or_null(reg->type);
> +}
> +
> +static bool arg_type_is_refcounted(enum bpf_arg_type type)
> +{
> +	return type == ARG_PTR_TO_SOCKET;
> +}
> +
> +/* Determine whether the function releases some resources allocated by another
> + * function call. The first reference type argument will be assumed to be
> + * released by release_reference().
> + */
> +static bool is_release_function(enum bpf_func_id func_id)
> +{
> +	return false;
>  }
>  
>  /* string representation of 'enum bpf_reg_type' */
> @@ -344,6 +392,12 @@ static void print_verifier_state(struct bpf_verifier_env *env,
>  		if (state->stack[i].slot_type[0] == STACK_ZERO)
>  			verbose(env, " fp%d=0", (-i - 1) * BPF_REG_SIZE);
>  	}
> +	if (state->acquired_refs && state->refs[0].id) {
> +		verbose(env, " refs=%d", state->refs[0].id);
> +		for (i = 1; i < state->acquired_refs; i++)
> +			if (state->refs[i].id)
> +				verbose(env, ",%d", state->refs[i].id);
> +	}
>  	verbose(env, "\n");
>  }
>  
> @@ -362,6 +416,8 @@ static int copy_##NAME##_state(struct bpf_func_state *dst,		\
>  	       sizeof(*src->FIELD) * (src->COUNT / SIZE));		\
>  	return 0;							\
>  }
> +/* copy_reference_state() */
> +COPY_STATE_FN(reference, acquired_refs, refs, 1)
>  /* copy_stack_state() */
>  COPY_STATE_FN(stack, allocated_stack, stack, BPF_REG_SIZE)
>  #undef COPY_STATE_FN
> @@ -400,6 +456,8 @@ static int realloc_##NAME##_state(struct bpf_func_state *state, int size, \
>  	state->FIELD = new_##FIELD;					\
>  	return 0;							\
>  }
> +/* realloc_reference_state() */
> +REALLOC_STATE_FN(reference, acquired_refs, refs, 1)
>  /* realloc_stack_state() */
>  REALLOC_STATE_FN(stack, allocated_stack, stack, BPF_REG_SIZE)
>  #undef REALLOC_STATE_FN
> @@ -411,16 +469,89 @@ REALLOC_STATE_FN(stack, allocated_stack, stack, BPF_REG_SIZE)
>   * which realloc_stack_state() copies over. It points to previous
>   * bpf_verifier_state which is never reallocated.
>   */
> -static int realloc_func_state(struct bpf_func_state *state, int size,
> -			      bool copy_old)
> +static int realloc_func_state(struct bpf_func_state *state, int stack_size,
> +			      int refs_size, bool copy_old)
>  {
> -	return realloc_stack_state(state, size, copy_old);
> +	int err = realloc_reference_state(state, refs_size, copy_old);
> +	if (err)
> +		return err;
> +	return realloc_stack_state(state, stack_size, copy_old);
> +}
> +
> +/* Acquire a pointer id from the env and update the state->refs to include
> + * this new pointer reference.
> + * On success, returns a valid pointer id to associate with the register
> + * On failure, returns a negative errno.
> + */
> +static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx)
> +{
> +	struct bpf_func_state *state = cur_func(env);
> +	int new_ofs = state->acquired_refs;
> +	int id, err;
> +
> +	err = realloc_reference_state(state, state->acquired_refs + 1, true);
> +	if (err)
> +		return err;
> +	id = ++env->id_gen;
> +	state->refs[new_ofs].id = id;
> +	state->refs[new_ofs].insn_idx = insn_idx;

I thought that we may avoid this extra 'ref_state' array if we store
'id' into 'aux' array which is one to one to array of instructions
and avoid this expensive reallocs, but then I realized we can go
through the same instruction that returns a pointer to socket
multiple times and every time it needs to be different 'id' and
tracked indepdently, so yeah. All that infra is necessary.
Would be good to document the algorithm a bit more.

> +
> +	return id;
> +}
> +
> +/* release function corresponding to acquire_reference_state(). Idempotent. */
> +static int __release_reference_state(struct bpf_func_state *state, int ptr_id)
> +{
> +	int i, last_idx;
> +
> +	if (!ptr_id)
> +		return 0;
> +
> +	last_idx = state->acquired_refs - 1;
> +	for (i = 0; i < state->acquired_refs; i++) {
> +		if (state->refs[i].id == ptr_id) {
> +			if (last_idx && i != last_idx)
> +				memcpy(&state->refs[i], &state->refs[last_idx],
> +				       sizeof(*state->refs));
> +			memset(&state->refs[last_idx], 0, sizeof(*state->refs));
> +			state->acquired_refs--;
> +			return 0;
> +		}
> +	}
> +	return -EFAULT;
> +}
> +
> +/* variation on the above for cases where we expect that there must be an
> + * outstanding reference for the specified ptr_id.
> + */
> +static int release_reference_state(struct bpf_verifier_env *env, int ptr_id)
> +{
> +	struct bpf_func_state *state = cur_func(env);
> +	int err;
> +
> +	err = __release_reference_state(state, ptr_id);
> +	if (WARN_ON_ONCE(err != 0))
> +		verbose(env, "verifier internal error: can't release reference\n");
> +	return err;
> +}
> +
> +static int transfer_reference_state(struct bpf_func_state *dst,
> +				    struct bpf_func_state *src)
> +{
> +	int err = realloc_reference_state(dst, src->acquired_refs, false);
> +	if (err)
> +		return err;
> +	err = copy_reference_state(dst, src);
> +	if (err)
> +		return err;
> +	return 0;
>  }
>  
>  static void free_func_state(struct bpf_func_state *state)
>  {
>  	if (!state)
>  		return;
> +	kfree(state->refs);
>  	kfree(state->stack);
>  	kfree(state);
>  }
> @@ -446,10 +577,14 @@ static int copy_func_state(struct bpf_func_state *dst,
>  {
>  	int err;
>  
> -	err = realloc_func_state(dst, src->allocated_stack, false);
> +	err = realloc_func_state(dst, src->allocated_stack, src->acquired_refs,
> +				 false);
> +	if (err)
> +		return err;
> +	memcpy(dst, src, offsetof(struct bpf_func_state, acquired_refs));
> +	err = copy_reference_state(dst, src);
>  	if (err)
>  		return err;
> -	memcpy(dst, src, offsetof(struct bpf_func_state, allocated_stack));
>  	return copy_stack_state(dst, src);
>  }
>  
> @@ -1019,7 +1154,7 @@ static int check_stack_write(struct bpf_verifier_env *env,
>  	enum bpf_reg_type type;
>  
>  	err = realloc_func_state(state, round_up(slot + 1, BPF_REG_SIZE),
> -				 true);
> +				 state->acquired_refs, true);
>  	if (err)
>  		return err;
>  	/* caller checked that off % size == 0 and -MAX_BPF_STACK <= off < 0,
> @@ -2259,10 +2394,32 @@ static bool check_arg_pair_ok(const struct bpf_func_proto *fn)
>  	return true;
>  }
>  
> +static bool check_refcount_ok(const struct bpf_func_proto *fn)
> +{
> +	int count = 0;
> +
> +	if (arg_type_is_refcounted(fn->arg1_type))
> +		count++;
> +	if (arg_type_is_refcounted(fn->arg2_type))
> +		count++;
> +	if (arg_type_is_refcounted(fn->arg3_type))
> +		count++;
> +	if (arg_type_is_refcounted(fn->arg4_type))
> +		count++;
> +	if (arg_type_is_refcounted(fn->arg5_type))
> +		count++;
> +
> +	/* We only support one arg being unreferenced at the moment,
> +	 * which is sufficient for the helper functions we have right now.
> +	 */
> +	return count <= 1;
> +}
> +
>  static int check_func_proto(const struct bpf_func_proto *fn)
>  {
>  	return check_raw_mode_ok(fn) &&
> -	       check_arg_pair_ok(fn) ? 0 : -EINVAL;
> +	       check_arg_pair_ok(fn) &&
> +	       check_refcount_ok(fn) ? 0 : -EINVAL;
>  }
>  
>  /* Packet data might have moved, any old PTR_TO_PACKET[_META,_END]
> @@ -2295,12 +2452,57 @@ static void clear_all_pkt_pointers(struct bpf_verifier_env *env)
>  		__clear_all_pkt_pointers(env, vstate->frame[i]);
>  }
>  
> +static void release_reg_references(struct bpf_verifier_env *env,
> +				   struct bpf_func_state *state, int id)
> +{
> +	struct bpf_reg_state *regs = state->regs, *reg;
> +	int i;
> +
> +	for (i = 0; i < MAX_BPF_REG; i++)
> +		if (regs[i].id == id)
> +			mark_reg_unknown(env, regs, i);
> +
> +	for_each_spilled_reg(i, state, reg) {
> +		if (!reg)
> +			continue;
> +		if (reg_is_refcounted(reg) && reg->id == id)
> +			__mark_reg_unknown(reg);
> +	}
> +}
> +
> +/* The pointer with the specified id has released its reference to kernel
> + * resources. Identify all copies of the same pointer and clear the reference.
> + */
> +static int release_reference(struct bpf_verifier_env *env)
> +{
> +	struct bpf_verifier_state *vstate = env->cur_state;
> +	struct bpf_reg_state *regs = cur_regs(env);
> +	int i, ptr_id = 0;
> +
> +	for (i = BPF_REG_1; i < BPF_REG_6; i++) {
> +		if (reg_is_refcounted(&regs[i])) {
> +			ptr_id = regs[i].id;
> +			break;
> +		}
> +	}
> +	if (WARN_ON_ONCE(!ptr_id)) {
> +		/* references must be special pointer types that are checked
> +		 * against argument requirements for the release function. */
> +		verbose(env, "verifier internal error: can't locate refcounted arg\n");
> +		return -EFAULT;
> +	}
> +	for (i = 0; i <= vstate->curframe; i++)
> +		release_reg_references(env, vstate->frame[i], ptr_id);
> +
> +	return release_reference_state(env, ptr_id);
> +}
> +
>  static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>  			   int *insn_idx)
>  {
>  	struct bpf_verifier_state *state = env->cur_state;
>  	struct bpf_func_state *caller, *callee;
> -	int i, subprog, target_insn;
> +	int i, err, subprog, target_insn;
>  
>  	if (state->curframe + 1 >= MAX_CALL_FRAMES) {
>  		verbose(env, "the call stack of %d frames is too deep\n",
> @@ -2338,6 +2540,11 @@ static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>  			state->curframe + 1 /* frameno within this callchain */,
>  			subprog /* subprog number within this prog */);
>  
> +	/* Transfer references to the callee */
> +	err = transfer_reference_state(callee, caller);
> +	if (err)
> +		return err;
> +
>  	/* copy r1 - r5 args that callee can access */
>  	for (i = BPF_REG_1; i <= BPF_REG_5; i++)
>  		callee->regs[i] = caller->regs[i];
> @@ -2368,6 +2575,7 @@ static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx)
>  	struct bpf_verifier_state *state = env->cur_state;
>  	struct bpf_func_state *caller, *callee;
>  	struct bpf_reg_state *r0;
> +	int err;
>  
>  	callee = state->frame[state->curframe];
>  	r0 = &callee->regs[BPF_REG_0];
> @@ -2387,6 +2595,11 @@ static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx)
>  	/* return to the caller whatever r0 had in the callee */
>  	caller->regs[BPF_REG_0] = *r0;
>  
> +	/* Transfer references to the caller */
> +	err = transfer_reference_state(caller, callee);
> +	if (err)
> +		return err;
> +
>  	*insn_idx = callee->callsite + 1;
>  	if (env->log.level) {
>  		verbose(env, "returning from callee:\n");
> @@ -2498,6 +2711,15 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn
>  			return err;
>  	}
>  
> +	/* If the function is a release() function, mark all copies of the same
> +	 * pointer as "freed" in all registers and in the stack.
> +	 */
> +	if (is_release_function(func_id)) {
> +		err = release_reference(env);

I think this can be improved if check_func_arg() stores ptr_id into meta.
Then this loop
 for (i = BPF_REG_1; i < BPF_REG_6; i++) {
       if (reg_is_refcounted(&regs[i])) {
in release_reference() won't be needed.

Also the macros from the previous patch look ugly, but considering this patch
I guess it's justified. At least I don't see a better way of doing it.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC bpf-next 07/11] bpf: Add helper to retrieve socket in BPF
  2018-05-12  0:54         ` Joe Stringer
@ 2018-05-15  3:16           ` Alexei Starovoitov
  2018-05-15 16:48             ` Martin KaFai Lau
  0 siblings, 1 reply; 26+ messages in thread
From: Alexei Starovoitov @ 2018-05-15  3:16 UTC (permalink / raw)
  To: Joe Stringer; +Cc: Martin KaFai Lau, daniel, netdev, ast, john fastabend

On Fri, May 11, 2018 at 05:54:33PM -0700, Joe Stringer wrote:
> On 11 May 2018 at 14:41, Martin KaFai Lau <kafai@fb.com> wrote:
> > On Fri, May 11, 2018 at 02:08:01PM -0700, Joe Stringer wrote:
> >> On 10 May 2018 at 22:00, Martin KaFai Lau <kafai@fb.com> wrote:
> >> > On Wed, May 09, 2018 at 02:07:05PM -0700, Joe Stringer wrote:
> >> >> This patch adds a new BPF helper function, sk_lookup() which allows BPF
> >> >> programs to find out if there is a socket listening on this host, and
> >> >> returns a socket pointer which the BPF program can then access to
> >> >> determine, for instance, whether to forward or drop traffic. sk_lookup()
> >> >> takes a reference on the socket, so when a BPF program makes use of this
> >> >> function, it must subsequently pass the returned pointer into the newly
> >> >> added sk_release() to return the reference.
> >> >>
> >> >> By way of example, the following pseudocode would filter inbound
> >> >> connections at XDP if there is no corresponding service listening for
> >> >> the traffic:
> >> >>
> >> >>   struct bpf_sock_tuple tuple;
> >> >>   struct bpf_sock_ops *sk;
> >> >>
> >> >>   populate_tuple(ctx, &tuple); // Extract the 5tuple from the packet
> >> >>   sk = bpf_sk_lookup(ctx, &tuple, sizeof tuple, netns, 0);
> >> >>   if (!sk) {
> >> >>     // Couldn't find a socket listening for this traffic. Drop.
> >> >>     return TC_ACT_SHOT;
> >> >>   }
> >> >>   bpf_sk_release(sk, 0);
> >> >>   return TC_ACT_OK;
> >> >>
> >> >> Signed-off-by: Joe Stringer <joe@wand.net.nz>
> >> >> ---
> >>
> >> ...
> >>
> >> >> @@ -4032,6 +4036,96 @@ static const struct bpf_func_proto bpf_skb_get_xfrm_state_proto = {
> >> >>  };
> >> >>  #endif
> >> >>
> >> >> +struct sock *
> >> >> +sk_lookup(struct net *net, struct bpf_sock_tuple *tuple) {
> >> > Would it be possible to have another version that
> >> > returns a sk without taking its refcnt?
> >> > It may have performance benefit.
> >>
> >> Not really. The sockets are not RCU-protected, and established sockets
> >> may be torn down without notice. If we don't take a reference, there's
> >> no guarantee that the socket will continue to exist for the duration
> >> of running the BPF program.
> >>
> >> From what I follow, the comment below has a hidden implication which
> >> is that sockets without SOCK_RCU_FREE, eg established sockets, may be
> >> directly freed regardless of RCU.
> > Right, SOCK_RCU_FREE sk is the one I am concern about.
> > For example, TCP_LISTEN socket does not require taking a refcnt
> > now.  Doing a bpf_sk_lookup() may have a rather big
> > impact on handling TCP syn flood.  or the usual intention
> > is to redirect instead of passing it up to the stack?
> 
> I see, if you're only interested in listen sockets then probably this
> series could be extended with a new flag, eg something like
> BPF_F_SK_FIND_LISTENERS which restricts the set of possible sockets
> found to only listen sockets, then the implementation would call into
> __inet_lookup_listener() instead of inet_lookup(). The presence of
> that flag in the relevant register during CALL instruction would show
> that the verifier should not reference-track the result, then there'd
> need to be a check on the release to ensure that this unreferenced
> socket is never released. Just a thought, completely untested and I
> could still be missing some detail..
> 
> That said, I don't completely follow how you would expect to handle
> the traffic for sockets that are already established - the helper
> would no longer find those sockets, so you wouldn't know whether to
> pass the traffic up the stack for established traffic or not.

I think Martin has a valid concern here that if somebody starts using
this helper on the rx traffic the bpf program (via these two new
helpers) will be doing refcnt++ and refcnt-- even for listener
sockets which will cause synflood to suffer.
One can argue that this is bpf author mistake, but without fixes
(and api changes) to the helper the programmer doesn't really have a way
of avoiding this situation.
Also udp sockets don't need refcnt at all.
How about we split this single helper into three:
- bpf_sk_lookup_tcp_established() that will return refcnt-ed socket
and has to be bpf_sk_release()d by the program.
- bpf_sk_lookup_tcp_listener() that doesn't refcnt, since progs
run in rcu.
- bpf_sk_lookup_udp() that also doesn't refcnt.
The logic you want to put into this helper can be easily
replicated with these three helpers and the whole thing will
be much faster.
Thoughts?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC bpf-next 11/11] Documentation: Describe bpf reference tracking
  2018-05-09 21:07 ` [RFC bpf-next 11/11] Documentation: Describe bpf " Joe Stringer
@ 2018-05-15  3:19   ` Alexei Starovoitov
  0 siblings, 0 replies; 26+ messages in thread
From: Alexei Starovoitov @ 2018-05-15  3:19 UTC (permalink / raw)
  To: Joe Stringer; +Cc: daniel, netdev, ast, john.fastabend, kafai

On Wed, May 09, 2018 at 02:07:09PM -0700, Joe Stringer wrote:
> Signed-off-by: Joe Stringer <joe@wand.net.nz>
> ---
>  Documentation/networking/filter.txt | 64 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 64 insertions(+)
> 
> diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt
> index 5032e1263bc9..77be17977bc5 100644
> --- a/Documentation/networking/filter.txt
> +++ b/Documentation/networking/filter.txt
> @@ -1125,6 +1125,14 @@ pointer type.  The types of pointers describe their base, as follows:
>      PTR_TO_STACK        Frame pointer.
>      PTR_TO_PACKET       skb->data.
>      PTR_TO_PACKET_END   skb->data + headlen; arithmetic forbidden.
> +    PTR_TO_SOCKET       Pointer to struct bpf_sock_ops, implicitly refcounted.
> +    PTR_TO_SOCKET_OR_NULL
> +                        Either a pointer to a socket, or NULL; socket lookup
> +                        returns this type, which becomes a PTR_TO_SOCKET when
> +                        checked != NULL. PTR_TO_SOCKET is reference-counted,
> +                        so programs must release the reference through the
> +                        socket release function before the end of the program.
> +                        Arithmetic on these pointers is forbidden.
>  However, a pointer may be offset from this base (as a result of pointer
>  arithmetic), and this is tracked in two parts: the 'fixed offset' and 'variable
>  offset'.  The former is used when an exactly-known value (e.g. an immediate
> @@ -1168,6 +1176,13 @@ over the Ethernet header, then reads IHL and addes (IHL * 4), the resulting
>  pointer will have a variable offset known to be 4n+2 for some n, so adding the 2
>  bytes (NET_IP_ALIGN) gives a 4-byte alignment and so word-sized accesses through
>  that pointer are safe.
> +The 'id' field is also used on PTR_TO_SOCKET and PTR_TO_SOCKET_OR_NULL, common
> +to all copies of the pointer returned from a socket lookup. This has similar
> +behaviour to the handling for PTR_TO_MAP_VALUE_OR_NULL->PTR_TO_MAP_VALUE, but
> +it also handles reference tracking for the pointer. PTR_TO_SOCKET implicitly
> +represents a reference to the corresponding 'struct sock'. To ensure that the
> +reference is not leaked, it is imperative to NULL-check the reference and in
> +the non-NULL case, and pass the valid reference to the socket release function.
>  
>  Direct packet access
>  --------------------
> @@ -1441,6 +1456,55 @@ Error:
>    8: (7a) *(u64 *)(r0 +0) = 1
>    R0 invalid mem access 'imm'
>  
> +Program that performs a socket lookup then sets the pointer to NULL without
> +checking it:
> +value:
> +  BPF_MOV64_IMM(BPF_REG_2, 0),
> +  BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8),
> +  BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
> +  BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
> +  BPF_MOV64_IMM(BPF_REG_3, 4),
> +  BPF_MOV64_IMM(BPF_REG_4, 0),
> +  BPF_MOV64_IMM(BPF_REG_5, 0),
> +  BPF_EMIT_CALL(BPF_FUNC_sk_lookup),
> +  BPF_MOV64_IMM(BPF_REG_0, 0),
> +  BPF_EXIT_INSN(),
> +Error:
> +  0: (b7) r2 = 0
> +  1: (63) *(u32 *)(r10 -8) = r2
> +  2: (bf) r2 = r10
> +  3: (07) r2 += -8
> +  4: (b7) r3 = 4
> +  5: (b7) r4 = 0
> +  6: (b7) r5 = 0
> +  7: (85) call bpf_sk_lookup#65
> +  8: (b7) r0 = 0
> +  9: (95) exit
> +  Unreleased reference id=1, alloc_insn=7
> +
> +Program that performs a socket lookup but does not NULL-check the returned
> +value:
> +  BPF_MOV64_IMM(BPF_REG_2, 0),
> +  BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8),
> +  BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
> +  BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
> +  BPF_MOV64_IMM(BPF_REG_3, 4),
> +  BPF_MOV64_IMM(BPF_REG_4, 0),
> +  BPF_MOV64_IMM(BPF_REG_5, 0),
> +  BPF_EMIT_CALL(BPF_FUNC_sk_lookup),
> +  BPF_EXIT_INSN(),
> +Error:
> +  0: (b7) r2 = 0
> +  1: (63) *(u32 *)(r10 -8) = r2
> +  2: (bf) r2 = r10
> +  3: (07) r2 += -8
> +  4: (b7) r3 = 4
> +  5: (b7) r4 = 0
> +  6: (b7) r5 = 0
> +  7: (85) call bpf_sk_lookup#65
> +  8: (95) exit
> +  Unreleased reference id=1, alloc_insn=7

Nice. Thank you for updating this doc. We haven't touched it in long time.
It probably long overdue for complete overhaul.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC bpf-next 07/11] bpf: Add helper to retrieve socket in BPF
  2018-05-15  3:16           ` Alexei Starovoitov
@ 2018-05-15 16:48             ` Martin KaFai Lau
  2018-05-16 18:55               ` Joe Stringer
  0 siblings, 1 reply; 26+ messages in thread
From: Martin KaFai Lau @ 2018-05-15 16:48 UTC (permalink / raw)
  To: Alexei Starovoitov, Joe Stringer; +Cc: daniel, netdev, ast, john fastabend

On Mon, May 14, 2018 at 08:16:59PM -0700, Alexei Starovoitov wrote:
> On Fri, May 11, 2018 at 05:54:33PM -0700, Joe Stringer wrote:
> > On 11 May 2018 at 14:41, Martin KaFai Lau <kafai@fb.com> wrote:
> > > On Fri, May 11, 2018 at 02:08:01PM -0700, Joe Stringer wrote:
> > >> On 10 May 2018 at 22:00, Martin KaFai Lau <kafai@fb.com> wrote:
> > >> > On Wed, May 09, 2018 at 02:07:05PM -0700, Joe Stringer wrote:
> > >> >> This patch adds a new BPF helper function, sk_lookup() which allows BPF
> > >> >> programs to find out if there is a socket listening on this host, and
> > >> >> returns a socket pointer which the BPF program can then access to
> > >> >> determine, for instance, whether to forward or drop traffic. sk_lookup()
> > >> >> takes a reference on the socket, so when a BPF program makes use of this
> > >> >> function, it must subsequently pass the returned pointer into the newly
> > >> >> added sk_release() to return the reference.
> > >> >>
> > >> >> By way of example, the following pseudocode would filter inbound
> > >> >> connections at XDP if there is no corresponding service listening for
> > >> >> the traffic:
> > >> >>
> > >> >>   struct bpf_sock_tuple tuple;
> > >> >>   struct bpf_sock_ops *sk;
> > >> >>
> > >> >>   populate_tuple(ctx, &tuple); // Extract the 5tuple from the packet
> > >> >>   sk = bpf_sk_lookup(ctx, &tuple, sizeof tuple, netns, 0);
> > >> >>   if (!sk) {
> > >> >>     // Couldn't find a socket listening for this traffic. Drop.
> > >> >>     return TC_ACT_SHOT;
> > >> >>   }
> > >> >>   bpf_sk_release(sk, 0);
> > >> >>   return TC_ACT_OK;
> > >> >>
> > >> >> Signed-off-by: Joe Stringer <joe@wand.net.nz>
> > >> >> ---
> > >>
> > >> ...
> > >>
> > >> >> @@ -4032,6 +4036,96 @@ static const struct bpf_func_proto bpf_skb_get_xfrm_state_proto = {
> > >> >>  };
> > >> >>  #endif
> > >> >>
> > >> >> +struct sock *
> > >> >> +sk_lookup(struct net *net, struct bpf_sock_tuple *tuple) {
> > >> > Would it be possible to have another version that
> > >> > returns a sk without taking its refcnt?
> > >> > It may have performance benefit.
> > >>
> > >> Not really. The sockets are not RCU-protected, and established sockets
> > >> may be torn down without notice. If we don't take a reference, there's
> > >> no guarantee that the socket will continue to exist for the duration
> > >> of running the BPF program.
> > >>
> > >> From what I follow, the comment below has a hidden implication which
> > >> is that sockets without SOCK_RCU_FREE, eg established sockets, may be
> > >> directly freed regardless of RCU.
> > > Right, SOCK_RCU_FREE sk is the one I am concern about.
> > > For example, TCP_LISTEN socket does not require taking a refcnt
> > > now.  Doing a bpf_sk_lookup() may have a rather big
> > > impact on handling TCP syn flood.  or the usual intention
> > > is to redirect instead of passing it up to the stack?
> > 
> > I see, if you're only interested in listen sockets then probably this
> > series could be extended with a new flag, eg something like
> > BPF_F_SK_FIND_LISTENERS which restricts the set of possible sockets
> > found to only listen sockets, then the implementation would call into
> > __inet_lookup_listener() instead of inet_lookup(). The presence of
> > that flag in the relevant register during CALL instruction would show
> > that the verifier should not reference-track the result, then there'd
> > need to be a check on the release to ensure that this unreferenced
> > socket is never released. Just a thought, completely untested and I
> > could still be missing some detail..
> > 
> > That said, I don't completely follow how you would expect to handle
> > the traffic for sockets that are already established - the helper
> > would no longer find those sockets, so you wouldn't know whether to
> > pass the traffic up the stack for established traffic or not.
> 
> I think Martin has a valid concern here that if somebody starts using
> this helper on the rx traffic the bpf program (via these two new
> helpers) will be doing refcnt++ and refcnt-- even for listener
> sockets which will cause synflood to suffer.
> One can argue that this is bpf author mistake, but without fixes
> (and api changes) to the helper the programmer doesn't really have a way
> of avoiding this situation.
> Also udp sockets don't need refcnt at all.
> How about we split this single helper into three:
> - bpf_sk_lookup_tcp_established() that will return refcnt-ed socket
> and has to be bpf_sk_release()d by the program.
> - bpf_sk_lookup_tcp_listener() that doesn't refcnt, since progs
> run in rcu.
> - bpf_sk_lookup_udp() that also doesn't refcnt.
> The logic you want to put into this helper can be easily
> replicated with these three helpers and the whole thing will
> be much faster.
> Thoughts?
Just came to my mind.

or can we explore something like:

On the bpf_sk_lookup() side, use __inet[6]_lookup()
and __udp[46]_lib_lookup() instead.  That should
only take refcnt if it has to.

On the bpf_sk_release() side, it skips refcnt--
if sk is SOCK_RCU_FREE.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC bpf-next 07/11] bpf: Add helper to retrieve socket in BPF
  2018-05-15 16:48             ` Martin KaFai Lau
@ 2018-05-16 18:55               ` Joe Stringer
  0 siblings, 0 replies; 26+ messages in thread
From: Joe Stringer @ 2018-05-16 18:55 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Alexei Starovoitov, Joe Stringer, daniel, netdev, ast, john fastabend

On 15 May 2018 at 09:48, Martin KaFai Lau <kafai@fb.com> wrote:
> On Mon, May 14, 2018 at 08:16:59PM -0700, Alexei Starovoitov wrote:
>> On Fri, May 11, 2018 at 05:54:33PM -0700, Joe Stringer wrote:
>> > On 11 May 2018 at 14:41, Martin KaFai Lau <kafai@fb.com> wrote:
>> > > On Fri, May 11, 2018 at 02:08:01PM -0700, Joe Stringer wrote:
>> > >> On 10 May 2018 at 22:00, Martin KaFai Lau <kafai@fb.com> wrote:
>> > >> > On Wed, May 09, 2018 at 02:07:05PM -0700, Joe Stringer wrote:
>> > >> >> This patch adds a new BPF helper function, sk_lookup() which allows BPF
>> > >> >> programs to find out if there is a socket listening on this host, and
>> > >> >> returns a socket pointer which the BPF program can then access to
>> > >> >> determine, for instance, whether to forward or drop traffic. sk_lookup()
>> > >> >> takes a reference on the socket, so when a BPF program makes use of this
>> > >> >> function, it must subsequently pass the returned pointer into the newly
>> > >> >> added sk_release() to return the reference.
>> > >> >>
>> > >> >> By way of example, the following pseudocode would filter inbound
>> > >> >> connections at XDP if there is no corresponding service listening for
>> > >> >> the traffic:
>> > >> >>
>> > >> >>   struct bpf_sock_tuple tuple;
>> > >> >>   struct bpf_sock_ops *sk;
>> > >> >>
>> > >> >>   populate_tuple(ctx, &tuple); // Extract the 5tuple from the packet
>> > >> >>   sk = bpf_sk_lookup(ctx, &tuple, sizeof tuple, netns, 0);
>> > >> >>   if (!sk) {
>> > >> >>     // Couldn't find a socket listening for this traffic. Drop.
>> > >> >>     return TC_ACT_SHOT;
>> > >> >>   }
>> > >> >>   bpf_sk_release(sk, 0);
>> > >> >>   return TC_ACT_OK;
>> > >> >>
>> > >> >> Signed-off-by: Joe Stringer <joe@wand.net.nz>
>> > >> >> ---
>> > >>
>> > >> ...
>> > >>
>> > >> >> @@ -4032,6 +4036,96 @@ static const struct bpf_func_proto bpf_skb_get_xfrm_state_proto = {
>> > >> >>  };
>> > >> >>  #endif
>> > >> >>
>> > >> >> +struct sock *
>> > >> >> +sk_lookup(struct net *net, struct bpf_sock_tuple *tuple) {
>> > >> > Would it be possible to have another version that
>> > >> > returns a sk without taking its refcnt?
>> > >> > It may have performance benefit.
>> > >>
>> > >> Not really. The sockets are not RCU-protected, and established sockets
>> > >> may be torn down without notice. If we don't take a reference, there's
>> > >> no guarantee that the socket will continue to exist for the duration
>> > >> of running the BPF program.
>> > >>
>> > >> From what I follow, the comment below has a hidden implication which
>> > >> is that sockets without SOCK_RCU_FREE, eg established sockets, may be
>> > >> directly freed regardless of RCU.
>> > > Right, SOCK_RCU_FREE sk is the one I am concern about.
>> > > For example, TCP_LISTEN socket does not require taking a refcnt
>> > > now.  Doing a bpf_sk_lookup() may have a rather big
>> > > impact on handling TCP syn flood.  or the usual intention
>> > > is to redirect instead of passing it up to the stack?
>> >
>> > I see, if you're only interested in listen sockets then probably this
>> > series could be extended with a new flag, eg something like
>> > BPF_F_SK_FIND_LISTENERS which restricts the set of possible sockets
>> > found to only listen sockets, then the implementation would call into
>> > __inet_lookup_listener() instead of inet_lookup(). The presence of
>> > that flag in the relevant register during CALL instruction would show
>> > that the verifier should not reference-track the result, then there'd
>> > need to be a check on the release to ensure that this unreferenced
>> > socket is never released. Just a thought, completely untested and I
>> > could still be missing some detail..
>> >
>> > That said, I don't completely follow how you would expect to handle
>> > the traffic for sockets that are already established - the helper
>> > would no longer find those sockets, so you wouldn't know whether to
>> > pass the traffic up the stack for established traffic or not.
>>
>> I think Martin has a valid concern here that if somebody starts using
>> this helper on the rx traffic the bpf program (via these two new
>> helpers) will be doing refcnt++ and refcnt-- even for listener
>> sockets which will cause synflood to suffer.
>> One can argue that this is bpf author mistake, but without fixes
>> (and api changes) to the helper the programmer doesn't really have a way
>> of avoiding this situation.
>> Also udp sockets don't need refcnt at all.
>> How about we split this single helper into three:
>> - bpf_sk_lookup_tcp_established() that will return refcnt-ed socket
>> and has to be bpf_sk_release()d by the program.
>> - bpf_sk_lookup_tcp_listener() that doesn't refcnt, since progs
>> run in rcu.
>> - bpf_sk_lookup_udp() that also doesn't refcnt.
>> The logic you want to put into this helper can be easily
>> replicated with these three helpers and the whole thing will
>> be much faster.
>> Thoughts?
> Just came to my mind.
>
> or can we explore something like:
>
> On the bpf_sk_lookup() side, use __inet[6]_lookup()
> and __udp[46]_lib_lookup() instead.  That should
> only take refcnt if it has to.
>
> On the bpf_sk_release() side, it skips refcnt--
> if sk is SOCK_RCU_FREE.

Reflecting the discussion from IOVisor call:

I voiced a concern with the above proposal by Alexei that it leaks
kernel implementation detail (established sockets are refcnted) into
the BPF API.

Martin's proposal here addresses this concern. We can force all
sk_lookup()s to match with a bpf_sk_release(), then inside the
bpf_sk_release() we can deal with the details of whether any freeing
is actually required.

It's still useful to split the helpers out into bpf_sk_lookup_tcp()
and bpf_sk_lookup_udp() because then we don't need to deal with the
forward-compatibility concern of adding support for different socket
types (eg SCTP). That said, the TCP established/listener split does
not have an immediate user, so we don't need to split these at this
time. If there is a use case for only finding the listener sockets, we
can always add a flag to the bpf_sk_lookup_tcp() helper to only find
the listener sockets.

I'll respin, thanks for the feedback all.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC bpf-next 00/11] Add socket lookup support
  2018-05-09 21:06 [RFC bpf-next 00/11] Add socket lookup support Joe Stringer
                   ` (10 preceding siblings ...)
  2018-05-09 21:07 ` [RFC bpf-next 11/11] Documentation: Describe bpf " Joe Stringer
@ 2018-05-16 19:05 ` Joe Stringer
  2018-05-16 20:04   ` Alexei Starovoitov
  11 siblings, 1 reply; 26+ messages in thread
From: Joe Stringer @ 2018-05-16 19:05 UTC (permalink / raw)
  To: Joe Stringer; +Cc: daniel, netdev, ast, john fastabend, tgraf, Martin KaFai Lau

On 9 May 2018 at 14:06, Joe Stringer <joe@wand.net.nz> wrote:
> This series proposes a new helper for the BPF API which allows BPF programs to
> perform lookups for sockets in a network namespace. This would allow programs
> to determine early on in processing whether the stack is expecting to receive
> the packet, and perform some action (eg drop, forward somewhere) based on this
> information.
>
> The series is structured roughly into:
> * Misc refactor
> * Add the socket pointer type
> * Add reference tracking to ensure that socket references are freed
> * Extend the BPF API to add sk_lookup() / sk_release() functions
> * Add tests/documentation
>
> The helper proposed in this series includes a parameter for a tuple which must
> be filled in by the caller to determine the socket to look up. The simplest
> case would be filling with the contents of the packet, ie mapping the packet's
> 5-tuple into the parameter. In common cases, it may alternatively be useful to
> reverse the direction of the tuple and perform a lookup, to find the socket
> that initiates this connection; and if the BPF program ever performs a form of
> IP address translation, it may further be useful to be able to look up
> arbitrary tuples that are not based upon the packet, but instead based on state
> held in BPF maps or hardcoded in the BPF program.
>
> Currently, access into the socket's fields are limited to those which are
> otherwise already accessible, and are restricted to read-only access.
>
> A few open points:
> * Currently, the lookup interface only returns either a valid socket or a NULL
>   pointer. This means that if there is any kind of issue with the tuple, such
>   as it provides an unsupported protocol number, or the socket can't be found,
>   then we are unable to differentiate these cases from one another. One natural
>   approach to improve this could be to return an ERR_PTR from the
>   bpf_sk_lookup() helper. This would be more complicated but maybe it's
>   worthwhile.

This suggestion would add a lot of complexity, and there's not many
legitimately different error cases. There's:
* Unsupported socket type
* Cannot find netns
* Tuple argument is the wrong size
* Can't find socket

If we split the helpers into protocol-specific types, the first one
would be addressed. The last one is addressed by returning NULL. It
seems like a reasonable compromise to me to return NULL also in the
middle two cases as well, and rely on the BPF writer to provide valid
arguments.

> * No ordering is defined between sockets. If the tuple could find multiple
>   sockets, then it will arbitrarily return one. It is up to the caller to
>   handle this. If we wish to handle this more reliably in future, we could
>   encode an ordering preference in the flags field.

Doesn't need to be addressed with this series, there is scope for
addressing these cases when the use case arises.

> * Currently this helper is only defined for TC hook point, but it should also
>   be valid at XDP and perhaps some other hooks.

Easy to add support for XDP on demand, initial implementation doesn't need it.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC bpf-next 00/11] Add socket lookup support
  2018-05-16 19:05 ` [RFC bpf-next 00/11] Add socket lookup support Joe Stringer
@ 2018-05-16 20:04   ` Alexei Starovoitov
  0 siblings, 0 replies; 26+ messages in thread
From: Alexei Starovoitov @ 2018-05-16 20:04 UTC (permalink / raw)
  To: Joe Stringer; +Cc: daniel, netdev, ast, john fastabend, tgraf, Martin KaFai Lau

On Wed, May 16, 2018 at 12:05:06PM -0700, Joe Stringer wrote:
> >
> > A few open points:
> > * Currently, the lookup interface only returns either a valid socket or a NULL
> >   pointer. This means that if there is any kind of issue with the tuple, such
> >   as it provides an unsupported protocol number, or the socket can't be found,
> >   then we are unable to differentiate these cases from one another. One natural
> >   approach to improve this could be to return an ERR_PTR from the
> >   bpf_sk_lookup() helper. This would be more complicated but maybe it's
> >   worthwhile.
> 
> This suggestion would add a lot of complexity, and there's not many
> legitimately different error cases. There's:
> * Unsupported socket type
> * Cannot find netns
> * Tuple argument is the wrong size
> * Can't find socket
> 
> If we split the helpers into protocol-specific types, the first one
> would be addressed. The last one is addressed by returning NULL. It
> seems like a reasonable compromise to me to return NULL also in the
> middle two cases as well, and rely on the BPF writer to provide valid
> arguments.
> 
> > * No ordering is defined between sockets. If the tuple could find multiple
> >   sockets, then it will arbitrarily return one. It is up to the caller to
> >   handle this. If we wish to handle this more reliably in future, we could
> >   encode an ordering preference in the flags field.
> 
> Doesn't need to be addressed with this series, there is scope for
> addressing these cases when the use case arises.

Thanks for summarizing the conf call discussion.
Looking forward to non-rfc patches :)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC bpf-next 04/11] bpf: Add PTR_TO_SOCKET verifier type
  2018-05-15  2:37   ` Alexei Starovoitov
@ 2018-05-16 23:56     ` Joe Stringer
  0 siblings, 0 replies; 26+ messages in thread
From: Joe Stringer @ 2018-05-16 23:56 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Joe Stringer, daniel, netdev, ast, john fastabend, Martin KaFai Lau

On 14 May 2018 at 19:37, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Wed, May 09, 2018 at 02:07:02PM -0700, Joe Stringer wrote:
>> Teach the verifier a little bit about a new type of pointer, a
>> PTR_TO_SOCKET. This pointer type is accessed from BPF through the
>> 'struct bpf_sock' structure.
>>
>> Signed-off-by: Joe Stringer <joe@wand.net.nz>
>> ---
>>  include/linux/bpf.h          | 19 +++++++++-
>>  include/linux/bpf_verifier.h |  2 ++
>>  kernel/bpf/verifier.c        | 86 ++++++++++++++++++++++++++++++++++++++------
>>  net/core/filter.c            | 30 +++++++++-------
>>  4 files changed, 114 insertions(+), 23 deletions(-)
>
> Ack for patches 1-3. In this one few nits:
>
>> @@ -1723,6 +1752,16 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
>>               err = check_packet_access(env, regno, off, size, false);
>>               if (!err && t == BPF_READ && value_regno >= 0)
>>                       mark_reg_unknown(env, regs, value_regno);
>> +
>> +     } else if (reg->type == PTR_TO_SOCKET) {
>> +             if (t == BPF_WRITE) {
>> +                     verbose(env, "cannot write into socket\n");
>> +                     return -EACCES;
>> +             }
>> +             err = check_sock_access(env, regno, off, size, t);
>> +             if (!err && t == BPF_READ && value_regno >= 0)
>
> t == BPF_READ check is unnecessary.
>
>> @@ -5785,7 +5845,13 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr)
>>
>>       if (ret == 0)
>>               /* program is valid, convert *(u32*)(ctx + off) accesses */
>> -             ret = convert_ctx_accesses(env);
>> +             ret = convert_ctx_accesses(env, env->ops->convert_ctx_access,
>> +                                        PTR_TO_CTX);
>> +
>> +     if (ret == 0)
>> +             /* Convert *(u32*)(sock_ops + off) accesses */
>> +             ret = convert_ctx_accesses(env, bpf_sock_convert_ctx_access,
>> +                                        PTR_TO_SOCKET);
>
> Overall looks great.
> Only this part is missing for PTR_TO_SOCKET:
>      } else if (dst_reg_type != *prev_dst_type &&
>                 (dst_reg_type == PTR_TO_CTX ||
>                  *prev_dst_type == PTR_TO_CTX)) {
>              verbose(env, "same insn cannot be used with different pointers\n");
>              return -EINVAL;
> similar logic has to be added.
> Otherwise the following will be accepted:
>
> R1 = sock_ptr
> goto X;
> ...
> R1 = some_other_valid_ptr;
> goto X;
> ...
>
> R2 = *(u32 *)(R1 + 0);
> this will be rewritten for first branch,
> but it's wrong for second.
>

Thanks for the review, will address these comments.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC bpf-next 06/11] bpf: Add reference tracking to verifier
  2018-05-15  3:04   ` Alexei Starovoitov
@ 2018-05-17  1:05     ` Joe Stringer
  0 siblings, 0 replies; 26+ messages in thread
From: Joe Stringer @ 2018-05-17  1:05 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Joe Stringer, daniel, netdev, ast, john fastabend, Martin KaFai Lau

On 14 May 2018 at 20:04, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Wed, May 09, 2018 at 02:07:04PM -0700, Joe Stringer wrote:
>> Allow helper functions to acquire a reference and return it into a
>> register. Specific pointer types such as the PTR_TO_SOCKET will
>> implicitly represent such a reference. The verifier must ensure that
>> these references are released exactly once in each path through the
>> program.
>>
>> To achieve this, this commit assigns an id to the pointer and tracks it
>> in the 'bpf_func_state', then when the function or program exits,
>> verifies that all of the acquired references have been freed. When the
>> pointer is passed to a function that frees the reference, it is removed
>> from the 'bpf_func_state` and all existing copies of the pointer in
>> registers are marked invalid.
>>
>> Signed-off-by: Joe Stringer <joe@wand.net.nz>
>> ---
>>  include/linux/bpf_verifier.h |  18 ++-
>>  kernel/bpf/verifier.c        | 295 ++++++++++++++++++++++++++++++++++++++++---
>>  2 files changed, 292 insertions(+), 21 deletions(-)
>>
>> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
>> index 9dcd87f1d322..8dbee360b3ec 100644
>> --- a/include/linux/bpf_verifier.h
>> +++ b/include/linux/bpf_verifier.h
>> @@ -104,6 +104,11 @@ struct bpf_stack_state {
>>       u8 slot_type[BPF_REG_SIZE];
>>  };
>>
>> +struct bpf_reference_state {
>> +     int id;
>> +     int insn_idx; /* allocation insn */
>
> the insn_idx is for more verbose messages, right?
> It doesn't seem to affect the safety of algorithm.
> Please add a comment to clarify that.

Yup, will do.

>> +/* Acquire a pointer id from the env and update the state->refs to include
>> + * this new pointer reference.
>> + * On success, returns a valid pointer id to associate with the register
>> + * On failure, returns a negative errno.
>> + */
>> +static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx)
>> +{
>> +     struct bpf_func_state *state = cur_func(env);
>> +     int new_ofs = state->acquired_refs;
>> +     int id, err;
>> +
>> +     err = realloc_reference_state(state, state->acquired_refs + 1, true);
>> +     if (err)
>> +             return err;
>> +     id = ++env->id_gen;
>> +     state->refs[new_ofs].id = id;
>> +     state->refs[new_ofs].insn_idx = insn_idx;
>
> I thought that we may avoid this extra 'ref_state' array if we store
> 'id' into 'aux' array which is one to one to array of instructions
> and avoid this expensive reallocs, but then I realized we can go
> through the same instruction that returns a pointer to socket
> multiple times and every time it needs to be different 'id' and
> tracked indepdently, so yeah. All that infra is necessary.
> Would be good to document the algorithm a bit more.

Good point, I'll add these details to the bpf_reference_state definition.
Will consider other areas that could receive some docs attention.

>> @@ -2498,6 +2711,15 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn
>>                       return err;
>>       }
>>
>> +     /* If the function is a release() function, mark all copies of the same
>> +      * pointer as "freed" in all registers and in the stack.
>> +      */
>> +     if (is_release_function(func_id)) {
>> +             err = release_reference(env);
>
> I think this can be improved if check_func_arg() stores ptr_id into meta.
> Then this loop
>  for (i = BPF_REG_1; i < BPF_REG_6; i++) {
>        if (reg_is_refcounted(&regs[i])) {
> in release_reference() won't be needed.

That's a nice cleanup.

> Also the macros from the previous patch look ugly, but considering this patch
> I guess it's justified. At least I don't see a better way of doing it.

Completely agree, ugly, but I also didn't see a great alternative.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2018-05-17  1:05 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-09 21:06 [RFC bpf-next 00/11] Add socket lookup support Joe Stringer
2018-05-09 21:06 ` [RFC bpf-next 01/11] bpf: Add iterator for spilled registers Joe Stringer
2018-05-09 21:07 ` [RFC bpf-next 02/11] bpf: Simplify ptr_min_max_vals adjustment Joe Stringer
2018-05-09 21:07 ` [RFC bpf-next 03/11] bpf: Generalize ptr_or_null regs check Joe Stringer
2018-05-09 21:07 ` [RFC bpf-next 04/11] bpf: Add PTR_TO_SOCKET verifier type Joe Stringer
2018-05-15  2:37   ` Alexei Starovoitov
2018-05-16 23:56     ` Joe Stringer
2018-05-09 21:07 ` [RFC bpf-next 05/11] bpf: Macrofy stack state copy Joe Stringer
2018-05-09 21:07 ` [RFC bpf-next 06/11] bpf: Add reference tracking to verifier Joe Stringer
2018-05-15  3:04   ` Alexei Starovoitov
2018-05-17  1:05     ` Joe Stringer
2018-05-09 21:07 ` [RFC bpf-next 07/11] bpf: Add helper to retrieve socket in BPF Joe Stringer
2018-05-11  5:00   ` Martin KaFai Lau
2018-05-11 21:08     ` Joe Stringer
2018-05-11 21:41       ` Martin KaFai Lau
2018-05-12  0:54         ` Joe Stringer
2018-05-15  3:16           ` Alexei Starovoitov
2018-05-15 16:48             ` Martin KaFai Lau
2018-05-16 18:55               ` Joe Stringer
2018-05-09 21:07 ` [RFC bpf-next 08/11] selftests/bpf: Add tests for reference tracking Joe Stringer
2018-05-09 21:07 ` [RFC bpf-next 09/11] libbpf: Support loading individual progs Joe Stringer
2018-05-09 21:07 ` [RFC bpf-next 10/11] selftests/bpf: Add C tests for reference tracking Joe Stringer
2018-05-09 21:07 ` [RFC bpf-next 11/11] Documentation: Describe bpf " Joe Stringer
2018-05-15  3:19   ` Alexei Starovoitov
2018-05-16 19:05 ` [RFC bpf-next 00/11] Add socket lookup support Joe Stringer
2018-05-16 20:04   ` Alexei Starovoitov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.