[PATCH bpf-next 0/3] bpf: Make the verifier recognize llvm register allocation patterns.

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH bpf-next 0/3] bpf: Make the verifier recognize llvm register allocation patterns.
@ 2020-10-06 20:09 Alexei Starovoitov
  2020-10-06 20:09 ` [PATCH bpf-next 1/3] bpf: Propagate scalar ranges through register assignments Alexei Starovoitov
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Alexei Starovoitov @ 2020-10-06 20:09 UTC (permalink / raw)
  To: davem; +Cc: daniel, john.fastabend, netdev, bpf, kernel-team

From: Alexei Starovoitov <ast@kernel.org>

Make two verifier improvements:
- The llvm register allocator may use two different registers representing the
same virtual register. Teach the verifier to recognize that.
- Track bounded scalar spill/fill.

The 'profiler' test in patch 3 will fail to load without patches 1 and 2.

Alexei Starovoitov (2):
  bpf: Propagate scalar ranges through register assignments.
  selftests/bpf: Add profiler test

Yonghong Song (1):
  bpf: Track spill/fill of bounded scalars.

 kernel/bpf/verifier.c                         |  54 +-
 .../testing/selftests/bpf/prog_tests/align.c  |  16 +-
 .../selftests/bpf/prog_tests/test_profiler.c  |  72 ++
 tools/testing/selftests/bpf/progs/profiler.h  | 177 ++++
 .../selftests/bpf/progs/profiler.inc.h        | 969 ++++++++++++++++++
 tools/testing/selftests/bpf/progs/profiler1.c |   6 +
 tools/testing/selftests/bpf/progs/profiler2.c |   6 +
 tools/testing/selftests/bpf/progs/profiler3.c |   6 +
 .../bpf/verifier/direct_packet_access.c       |   2 +-
 9 files changed, 1298 insertions(+), 10 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/test_profiler.c
 create mode 100644 tools/testing/selftests/bpf/progs/profiler.h
 create mode 100644 tools/testing/selftests/bpf/progs/profiler.inc.h
 create mode 100644 tools/testing/selftests/bpf/progs/profiler1.c
 create mode 100644 tools/testing/selftests/bpf/progs/profiler2.c
 create mode 100644 tools/testing/selftests/bpf/progs/profiler3.c

-- 
2.23.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH bpf-next 1/3] bpf: Propagate scalar ranges through register assignments.
  2020-10-06 20:09 [PATCH bpf-next 0/3] bpf: Make the verifier recognize llvm register allocation patterns Alexei Starovoitov
@ 2020-10-06 20:09 ` Alexei Starovoitov
  2020-10-07  1:56   ` Andrii Nakryiko
  2020-10-07 23:44   ` John Fastabend
  2020-10-06 20:09 ` [PATCH bpf-next 2/3] bpf: Track spill/fill of bounded scalars Alexei Starovoitov
  2020-10-06 20:09 ` [PATCH bpf-next 3/3] selftests/bpf: Add profiler test Alexei Starovoitov
  2 siblings, 2 replies; 18+ messages in thread
From: Alexei Starovoitov @ 2020-10-06 20:09 UTC (permalink / raw)
  To: davem; +Cc: daniel, john.fastabend, netdev, bpf, kernel-team

From: Alexei Starovoitov <ast@kernel.org>

The llvm register allocator may use two different registers representing the
same virtual register. In such case the following pattern can be observed:
1047: (bf) r9 = r6
1048: (a5) if r6 < 0x1000 goto pc+1
1050: ...
1051: (a5) if r9 < 0x2 goto pc+66
1052: ...
1053: (bf) r2 = r9 /* r2 needs to have upper and lower bounds */

In order to track this information without backtracking allocate ID
for scalars in a similar way as it's done for find_good_pkt_pointers().

When the verifier encounters r9 = r6 assignment it will assign the same ID
to both registers. Later if either register range is narrowed via conditional
jump propagate the register state into the other register.

Clear register ID in adjust_reg_min_max_vals() for any alu instruction.

Newly allocated register ID is ignored for scalars in regsafe() and doesn't
affect state pruning. mark_reg_unknown() also clears the ID.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 kernel/bpf/verifier.c                         | 38 +++++++++++++++++++
 .../testing/selftests/bpf/prog_tests/align.c  | 16 ++++----
 .../bpf/verifier/direct_packet_access.c       |  2 +-
 3 files changed, 47 insertions(+), 9 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 01120acab09a..09e17b483b0b 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -6432,6 +6432,8 @@ static int adjust_reg_min_max_vals(struct bpf_verifier_env *env,
 	src_reg = NULL;
 	if (dst_reg->type != SCALAR_VALUE)
 		ptr_reg = dst_reg;
+	else
+		dst_reg->id = 0;
 	if (BPF_SRC(insn->code) == BPF_X) {
 		src_reg = &regs[insn->src_reg];
 		if (src_reg->type != SCALAR_VALUE) {
@@ -6565,6 +6567,8 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
 				/* case: R1 = R2
 				 * copy register state to dest reg
 				 */
+				if (src_reg->type == SCALAR_VALUE)
+					src_reg->id = ++env->id_gen;
 				*dst_reg = *src_reg;
 				dst_reg->live |= REG_LIVE_WRITTEN;
 				dst_reg->subreg_def = DEF_NOT_SUBREG;
@@ -7365,6 +7369,30 @@ static bool try_match_pkt_pointers(const struct bpf_insn *insn,
 	return true;
 }
 
+static void find_equal_scalars(struct bpf_verifier_state *vstate,
+			       struct bpf_reg_state *known_reg)
+{
+	struct bpf_func_state *state;
+	struct bpf_reg_state *reg;
+	int i, j;
+
+	for (i = 0; i <= vstate->curframe; i++) {
+		state = vstate->frame[i];
+		for (j = 0; j < MAX_BPF_REG; j++) {
+			reg = &state->regs[j];
+			if (reg->type == SCALAR_VALUE && reg->id == known_reg->id)
+				*reg = *known_reg;
+		}
+
+		bpf_for_each_spilled_reg(j, state, reg) {
+			if (!reg)
+				continue;
+			if (reg->type == SCALAR_VALUE && reg->id == known_reg->id)
+				*reg = *known_reg;
+		}
+	}
+}
+
 static int check_cond_jmp_op(struct bpf_verifier_env *env,
 			     struct bpf_insn *insn, int *insn_idx)
 {
@@ -7493,6 +7521,11 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
 				reg_combine_min_max(&other_branch_regs[insn->src_reg],
 						    &other_branch_regs[insn->dst_reg],
 						    src_reg, dst_reg, opcode);
+			if (src_reg->id) {
+				find_equal_scalars(this_branch, src_reg);
+				find_equal_scalars(other_branch, &other_branch_regs[insn->src_reg]);
+			}
+
 		}
 	} else if (dst_reg->type == SCALAR_VALUE) {
 		reg_set_min_max(&other_branch_regs[insn->dst_reg],
@@ -7500,6 +7533,11 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
 					opcode, is_jmp32);
 	}
 
+	if (dst_reg->type == SCALAR_VALUE && dst_reg->id) {
+		find_equal_scalars(this_branch, dst_reg);
+		find_equal_scalars(other_branch, &other_branch_regs[insn->dst_reg]);
+	}
+
 	/* detect if R == 0 where R is returned from bpf_map_lookup_elem().
 	 * NOTE: these optimizations below are related with pointer comparison
 	 *       which will never be JMP32.
diff --git a/tools/testing/selftests/bpf/prog_tests/align.c b/tools/testing/selftests/bpf/prog_tests/align.c
index c548aded6585..56a414ce5504 100644
--- a/tools/testing/selftests/bpf/prog_tests/align.c
+++ b/tools/testing/selftests/bpf/prog_tests/align.c
@@ -195,13 +195,13 @@ static struct bpf_align_test tests[] = {
 		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 		.matches = {
 			{7, "R3_w=inv(id=0,umax_value=255,var_off=(0x0; 0xff))"},
-			{8, "R4_w=inv(id=0,umax_value=255,var_off=(0x0; 0xff))"},
+			{8, "R4_w=inv(id=1,umax_value=255,var_off=(0x0; 0xff))"},
 			{9, "R4_w=inv(id=0,umax_value=255,var_off=(0x0; 0xff))"},
-			{10, "R4_w=inv(id=0,umax_value=255,var_off=(0x0; 0xff))"},
+			{10, "R4_w=inv(id=2,umax_value=255,var_off=(0x0; 0xff))"},
 			{11, "R4_w=inv(id=0,umax_value=510,var_off=(0x0; 0x1fe))"},
-			{12, "R4_w=inv(id=0,umax_value=255,var_off=(0x0; 0xff))"},
+			{12, "R4_w=inv(id=3,umax_value=255,var_off=(0x0; 0xff))"},
 			{13, "R4_w=inv(id=0,umax_value=1020,var_off=(0x0; 0x3fc))"},
-			{14, "R4_w=inv(id=0,umax_value=255,var_off=(0x0; 0xff))"},
+			{14, "R4_w=inv(id=4,umax_value=255,var_off=(0x0; 0xff))"},
 			{15, "R4_w=inv(id=0,umax_value=2040,var_off=(0x0; 0x7f8))"},
 			{16, "R4_w=inv(id=0,umax_value=4080,var_off=(0x0; 0xff0))"},
 		},
@@ -518,7 +518,7 @@ static struct bpf_align_test tests[] = {
 			 * the total offset is 4-byte aligned and meets the
 			 * load's requirements.
 			 */
-			{20, "R5=pkt(id=1,off=0,r=4,umin_value=2,umax_value=1034,var_off=(0x2; 0x7fc)"},
+			{20, "R5=pkt(id=2,off=0,r=4,umin_value=2,umax_value=1034,var_off=(0x2; 0x7fc)"},
 
 		},
 	},
@@ -561,18 +561,18 @@ static struct bpf_align_test tests[] = {
 			/* Adding 14 makes R6 be (4n+2) */
 			{11, "R6_w=inv(id=0,umin_value=14,umax_value=74,var_off=(0x2; 0x7c))"},
 			/* Subtracting from packet pointer overflows ubounds */
-			{13, "R5_w=pkt(id=1,off=0,r=8,umin_value=18446744073709551542,umax_value=18446744073709551602,var_off=(0xffffffffffffff82; 0x7c)"},
+			{13, "R5_w=pkt(id=2,off=0,r=8,umin_value=18446744073709551542,umax_value=18446744073709551602,var_off=(0xffffffffffffff82; 0x7c)"},
 			/* New unknown value in R7 is (4n), >= 76 */
 			{15, "R7_w=inv(id=0,umin_value=76,umax_value=1096,var_off=(0x0; 0x7fc))"},
 			/* Adding it to packet pointer gives nice bounds again */
-			{16, "R5_w=pkt(id=2,off=0,r=0,umin_value=2,umax_value=1082,var_off=(0x2; 0xfffffffc)"},
+			{16, "R5_w=pkt(id=3,off=0,r=0,umin_value=2,umax_value=1082,var_off=(0x2; 0xfffffffc)"},
 			/* At the time the word size load is performed from R5,
 			 * its total fixed offset is NET_IP_ALIGN + reg->off (0)
 			 * which is 2.  Then the variable offset is (4n+2), so
 			 * the total offset is 4-byte aligned and meets the
 			 * load's requirements.
 			 */
-			{20, "R5=pkt(id=2,off=0,r=4,umin_value=2,umax_value=1082,var_off=(0x2; 0xfffffffc)"},
+			{20, "R5=pkt(id=3,off=0,r=4,umin_value=2,umax_value=1082,var_off=(0x2; 0xfffffffc)"},
 		},
 	},
 };
diff --git a/tools/testing/selftests/bpf/verifier/direct_packet_access.c b/tools/testing/selftests/bpf/verifier/direct_packet_access.c
index 2c5fbe7bcd27..ae72536603fe 100644
--- a/tools/testing/selftests/bpf/verifier/direct_packet_access.c
+++ b/tools/testing/selftests/bpf/verifier/direct_packet_access.c
@@ -529,7 +529,7 @@
 	},
 	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	.result = REJECT,
-	.errstr = "invalid access to packet, off=0 size=8, R5(id=1,off=0,r=0)",
+	.errstr = "invalid access to packet, off=0 size=8, R5(id=2,off=0,r=0)",
 	.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
 },
 {
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH bpf-next 2/3] bpf: Track spill/fill of bounded scalars.
  2020-10-06 20:09 [PATCH bpf-next 0/3] bpf: Make the verifier recognize llvm register allocation patterns Alexei Starovoitov
  2020-10-06 20:09 ` [PATCH bpf-next 1/3] bpf: Propagate scalar ranges through register assignments Alexei Starovoitov
@ 2020-10-06 20:09 ` Alexei Starovoitov
  2020-10-07  3:35   ` Andrii Nakryiko
  2020-10-06 20:09 ` [PATCH bpf-next 3/3] selftests/bpf: Add profiler test Alexei Starovoitov
  2 siblings, 1 reply; 18+ messages in thread
From: Alexei Starovoitov @ 2020-10-06 20:09 UTC (permalink / raw)
  To: davem; +Cc: daniel, john.fastabend, netdev, bpf, kernel-team

From: Yonghong Song <yhs@fb.com>

Under register pressure the llvm may spill registers with bounds into the stack.
The verifier has to track them through spill/fill otherwise many kinds of bound
errors will be seen. The spill/fill of induction variables was already
happening. This patch extends this logic from tracking spill/fill of a constant
into any bounded register. There is no need to track spill/fill of unbounded,
since no new information will be retrieved from the stack during register fill.

Though extra stack difference could cause state pruning to be less effective, no
adverse affects were seen from this patch on selftests and on cilium programs.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 kernel/bpf/verifier.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 09e17b483b0b..7553ef14c2b1 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -2227,6 +2227,20 @@ static bool register_is_const(struct bpf_reg_state *reg)
 	return reg->type == SCALAR_VALUE && tnum_is_const(reg->var_off);
 }
 
+static bool __is_scalar_unbounded(struct bpf_reg_state *reg)
+{
+	return tnum_is_unknown(reg->var_off) &&
+	       reg->smin_value == S64_MIN && reg->smax_value == S64_MAX &&
+	       reg->umin_value == 0 && reg->umax_value == U64_MAX &&
+	       reg->s32_min_value == S32_MIN && reg->s32_max_value == S32_MAX &&
+	       reg->u32_min_value == 0 && reg->u32_max_value == U32_MAX;
+}
+
+static bool register_is_bounded(struct bpf_reg_state *reg)
+{
+	return reg->type == SCALAR_VALUE && !__is_scalar_unbounded(reg);
+}
+
 static bool __is_pointer_value(bool allow_ptr_leaks,
 			       const struct bpf_reg_state *reg)
 {
@@ -2278,7 +2292,7 @@ static int check_stack_write(struct bpf_verifier_env *env,
 	if (value_regno >= 0)
 		reg = &cur->regs[value_regno];
 
-	if (reg && size == BPF_REG_SIZE && register_is_const(reg) &&
+	if (reg && size == BPF_REG_SIZE && register_is_bounded(reg) &&
 	    !register_is_null(reg) && env->bpf_capable) {
 		if (dst_reg != BPF_REG_FP) {
 			/* The backtracking logic can only recognize explicit
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH bpf-next 3/3] selftests/bpf: Add profiler test
  2020-10-06 20:09 [PATCH bpf-next 0/3] bpf: Make the verifier recognize llvm register allocation patterns Alexei Starovoitov
  2020-10-06 20:09 ` [PATCH bpf-next 1/3] bpf: Propagate scalar ranges through register assignments Alexei Starovoitov
  2020-10-06 20:09 ` [PATCH bpf-next 2/3] bpf: Track spill/fill of bounded scalars Alexei Starovoitov
@ 2020-10-06 20:09 ` Alexei Starovoitov
  2020-10-07  1:22   ` Jakub Kicinski
  2 siblings, 1 reply; 18+ messages in thread
From: Alexei Starovoitov @ 2020-10-06 20:09 UTC (permalink / raw)
  To: davem; +Cc: daniel, john.fastabend, netdev, bpf, kernel-team

From: Alexei Starovoitov <ast@kernel.org>

The main purpose of the profiler test to check different llvm generation
patterns to make sure the verifier can load these large programs.

Note that profiler.inc.h test doesn't follow strict kernel coding style.
The code was formatted in kernel style, but variable declarations are
kept as-is to preserve original llvm IR pattern.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 .../selftests/bpf/prog_tests/test_profiler.c  |  72 ++
 tools/testing/selftests/bpf/progs/profiler.h  | 177 ++++
 .../selftests/bpf/progs/profiler.inc.h        | 969 ++++++++++++++++++
 tools/testing/selftests/bpf/progs/profiler1.c |   6 +
 tools/testing/selftests/bpf/progs/profiler2.c |   6 +
 tools/testing/selftests/bpf/progs/profiler3.c |   6 +
 6 files changed, 1236 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/test_profiler.c
 create mode 100644 tools/testing/selftests/bpf/progs/profiler.h
 create mode 100644 tools/testing/selftests/bpf/progs/profiler.inc.h
 create mode 100644 tools/testing/selftests/bpf/progs/profiler1.c
 create mode 100644 tools/testing/selftests/bpf/progs/profiler2.c
 create mode 100644 tools/testing/selftests/bpf/progs/profiler3.c

diff --git a/tools/testing/selftests/bpf/prog_tests/test_profiler.c b/tools/testing/selftests/bpf/prog_tests/test_profiler.c
new file mode 100644
index 000000000000..4ca275101ee0
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/test_profiler.c
@@ -0,0 +1,72 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2020 Facebook */
+#include <test_progs.h>
+#include "progs/profiler.h"
+#include "profiler1.skel.h"
+#include "profiler2.skel.h"
+#include "profiler3.skel.h"
+
+static int sanity_run(struct bpf_program *prog)
+{
+	struct bpf_prog_test_run_attr test_attr = {};
+	__u64 args[] = {1, 2, 3};
+	__u32 duration = 0;
+	int err, prog_fd;
+
+	prog_fd = bpf_program__fd(prog);
+	test_attr.prog_fd = prog_fd;
+	test_attr.ctx_in = args;
+	test_attr.ctx_size_in = sizeof(args);
+	err = bpf_prog_test_run_xattr(&test_attr);
+	if (CHECK(err || test_attr.retval, "test_run",
+		  "err %d errno %d retval %d duration %d\n",
+		  err, errno, test_attr.retval, duration))
+		return -1;
+	return 0;
+}
+
+void test_test_profiler(void)
+{
+	struct profiler1 *profiler1_skel = NULL;
+	struct profiler2 *profiler2_skel = NULL;
+	struct profiler3 *profiler3_skel = NULL;
+	__u32 duration = 0;
+	int err;
+
+	profiler1_skel = profiler1__open_and_load();
+	if (CHECK(!profiler1_skel, "profiler1_skel_load", "profiler1 skeleton failed\n"))
+		goto cleanup;
+
+	err = profiler1__attach(profiler1_skel);
+	if (CHECK(err, "profiler1_attach", "profiler1 attach failed: %d\n", err))
+		goto cleanup;
+
+	if (sanity_run(profiler1_skel->progs.raw_tracepoint__sched_process_exec))
+		goto cleanup;
+
+	profiler2_skel = profiler2__open_and_load();
+	if (CHECK(!profiler2_skel, "profiler2_skel_load", "profiler2 skeleton failed\n"))
+		goto cleanup;
+
+	err = profiler2__attach(profiler2_skel);
+	if (CHECK(err, "profiler2_attach", "profiler2 attach failed: %d\n", err))
+		goto cleanup;
+
+	if (sanity_run(profiler2_skel->progs.raw_tracepoint__sched_process_exec))
+		goto cleanup;
+
+	profiler3_skel = profiler3__open_and_load();
+	if (CHECK(!profiler3_skel, "profiler3_skel_load", "profiler3 skeleton failed\n"))
+		goto cleanup;
+
+	err = profiler3__attach(profiler3_skel);
+	if (CHECK(err, "profiler3_attach", "profiler3 attach failed: %d\n", err))
+		goto cleanup;
+
+	if (sanity_run(profiler3_skel->progs.raw_tracepoint__sched_process_exec))
+		goto cleanup;
+cleanup:
+	profiler1__destroy(profiler1_skel);
+	profiler2__destroy(profiler2_skel);
+	profiler3__destroy(profiler3_skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/profiler.h b/tools/testing/selftests/bpf/progs/profiler.h
new file mode 100644
index 000000000000..3bac4fdd4bdf
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/profiler.h
@@ -0,0 +1,177 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2020 Facebook */
+#pragma once
+
+#define TASK_COMM_LEN 16
+#define MAX_ANCESTORS 4
+#define MAX_PATH 256
+#define KILL_TARGET_LEN 64
+#define CTL_MAXNAME 10
+#define MAX_ARGS_LEN 4096
+#define MAX_FILENAME_LEN 512
+#define MAX_ENVIRON_LEN 8192
+#define MAX_PATH_DEPTH 32
+#define MAX_FILEPATH_LENGTH (MAX_PATH_DEPTH * MAX_PATH)
+#define MAX_CGROUPS_PATH_DEPTH 8
+
+#define MAX_METADATA_PAYLOAD_LEN TASK_COMM_LEN
+
+#define MAX_CGROUP_PAYLOAD_LEN \
+	(MAX_PATH * 2 + (MAX_PATH * MAX_CGROUPS_PATH_DEPTH))
+
+#define MAX_CAP_PAYLOAD_LEN (MAX_METADATA_PAYLOAD_LEN + MAX_CGROUP_PAYLOAD_LEN)
+
+#define MAX_SYSCTL_PAYLOAD_LEN \
+	(MAX_METADATA_PAYLOAD_LEN + MAX_CGROUP_PAYLOAD_LEN + CTL_MAXNAME + MAX_PATH)
+
+#define MAX_KILL_PAYLOAD_LEN \
+	(MAX_METADATA_PAYLOAD_LEN + MAX_CGROUP_PAYLOAD_LEN + TASK_COMM_LEN + \
+	 KILL_TARGET_LEN)
+
+#define MAX_EXEC_PAYLOAD_LEN \
+	(MAX_METADATA_PAYLOAD_LEN + MAX_CGROUP_PAYLOAD_LEN + MAX_FILENAME_LEN + \
+	 MAX_ARGS_LEN + MAX_ENVIRON_LEN)
+
+#define MAX_FILEMOD_PAYLOAD_LEN \
+	(MAX_METADATA_PAYLOAD_LEN + MAX_CGROUP_PAYLOAD_LEN + MAX_FILEPATH_LENGTH + \
+	 MAX_FILEPATH_LENGTH)
+
+enum data_type {
+	INVALID_EVENT,
+	EXEC_EVENT,
+	FORK_EVENT,
+	KILL_EVENT,
+	SYSCTL_EVENT,
+	FILEMOD_EVENT,
+	MAX_DATA_TYPE_EVENT
+};
+
+enum filemod_type {
+	FMOD_OPEN,
+	FMOD_LINK,
+	FMOD_SYMLINK,
+};
+
+struct ancestors_data_t {
+	pid_t ancestor_pids[MAX_ANCESTORS];
+	uint32_t ancestor_exec_ids[MAX_ANCESTORS];
+	uint64_t ancestor_start_times[MAX_ANCESTORS];
+	uint32_t num_ancestors;
+};
+
+struct var_metadata_t {
+	enum data_type type;
+	pid_t pid;
+	uint32_t exec_id;
+	uid_t uid;
+	gid_t gid;
+	uint64_t start_time;
+	uint32_t cpu_id;
+	uint64_t bpf_stats_num_perf_events;
+	uint64_t bpf_stats_start_ktime_ns;
+	uint8_t comm_length;
+};
+
+struct cgroup_data_t {
+	ino_t cgroup_root_inode;
+	ino_t cgroup_proc_inode;
+	uint64_t cgroup_root_mtime;
+	uint64_t cgroup_proc_mtime;
+	uint16_t cgroup_root_length;
+	uint16_t cgroup_proc_length;
+	uint16_t cgroup_full_length;
+	int cgroup_full_path_root_pos;
+};
+
+struct var_sysctl_data_t {
+	struct var_metadata_t meta;
+	struct cgroup_data_t cgroup_data;
+	struct ancestors_data_t ancestors_info;
+	uint8_t sysctl_val_length;
+	uint16_t sysctl_path_length;
+	char payload[MAX_SYSCTL_PAYLOAD_LEN];
+};
+
+struct var_kill_data_t {
+	struct var_metadata_t meta;
+	struct cgroup_data_t cgroup_data;
+	struct ancestors_data_t ancestors_info;
+	pid_t kill_target_pid;
+	int kill_sig;
+	uint32_t kill_count;
+	uint64_t last_kill_time;
+	uint8_t kill_target_name_length;
+	uint8_t kill_target_cgroup_proc_length;
+	char payload[MAX_KILL_PAYLOAD_LEN];
+	size_t payload_length;
+};
+
+struct var_exec_data_t {
+	struct var_metadata_t meta;
+	struct cgroup_data_t cgroup_data;
+	pid_t parent_pid;
+	uint32_t parent_exec_id;
+	uid_t parent_uid;
+	uint64_t parent_start_time;
+	uint16_t bin_path_length;
+	uint16_t cmdline_length;
+	uint16_t environment_length;
+	char payload[MAX_EXEC_PAYLOAD_LEN];
+};
+
+struct var_fork_data_t {
+	struct var_metadata_t meta;
+	pid_t parent_pid;
+	uint32_t parent_exec_id;
+	uint64_t parent_start_time;
+	char payload[MAX_METADATA_PAYLOAD_LEN];
+};
+
+struct var_filemod_data_t {
+	struct var_metadata_t meta;
+	struct cgroup_data_t cgroup_data;
+	enum filemod_type fmod_type;
+	unsigned int dst_flags;
+	uint32_t src_device_id;
+	uint32_t dst_device_id;
+	ino_t src_inode;
+	ino_t dst_inode;
+	uint16_t src_filepath_length;
+	uint16_t dst_filepath_length;
+	char payload[MAX_FILEMOD_PAYLOAD_LEN];
+};
+
+struct profiler_config_struct {
+	bool fetch_cgroups_from_bpf;
+	ino_t cgroup_fs_inode;
+	ino_t cgroup_login_session_inode;
+	uint64_t kill_signals_mask;
+	ino_t inode_filter;
+	uint32_t stale_info_secs;
+	bool use_variable_buffers;
+	bool read_environ_from_exec;
+	bool enable_cgroup_v1_resolver;
+};
+
+struct bpf_func_stats_data {
+	uint64_t time_elapsed_ns;
+	uint64_t num_executions;
+	uint64_t num_perf_events;
+};
+
+struct bpf_func_stats_ctx {
+	uint64_t start_time_ns;
+	struct bpf_func_stats_data* bpf_func_stats_data_val;
+};
+
+enum bpf_function_id {
+	profiler_bpf_proc_sys_write,
+	profiler_bpf_sched_process_exec,
+	profiler_bpf_sched_process_exit,
+	profiler_bpf_sys_enter_kill,
+	profiler_bpf_do_filp_open_ret,
+	profiler_bpf_sched_process_fork,
+	profiler_bpf_vfs_link,
+	profiler_bpf_vfs_symlink,
+	profiler_bpf_max_function_id
+};
diff --git a/tools/testing/selftests/bpf/progs/profiler.inc.h b/tools/testing/selftests/bpf/progs/profiler.inc.h
new file mode 100644
index 000000000000..00578311a423
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/profiler.inc.h
@@ -0,0 +1,969 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2020 Facebook */
+#include <vmlinux.h>
+#include <bpf/bpf_core_read.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+#include "profiler.h"
+
+#ifndef NULL
+#define NULL 0
+#endif
+
+#define O_WRONLY 00000001
+#define O_RDWR 00000002
+#define O_DIRECTORY 00200000
+#define __O_TMPFILE 020000000
+#define O_TMPFILE (__O_TMPFILE | O_DIRECTORY)
+#define MAX_ERRNO 4095
+#define S_IFMT 00170000
+#define S_IFSOCK 0140000
+#define S_IFLNK 0120000
+#define S_IFREG 0100000
+#define S_IFBLK 0060000
+#define S_IFDIR 0040000
+#define S_IFCHR 0020000
+#define S_IFIFO 0010000
+#define S_ISUID 0004000
+#define S_ISGID 0002000
+#define S_ISVTX 0001000
+#define S_ISLNK(m) (((m)&S_IFMT) == S_IFLNK)
+#define S_ISDIR(m) (((m)&S_IFMT) == S_IFDIR)
+#define S_ISCHR(m) (((m)&S_IFMT) == S_IFCHR)
+#define S_ISBLK(m) (((m)&S_IFMT) == S_IFBLK)
+#define S_ISFIFO(m) (((m)&S_IFMT) == S_IFIFO)
+#define S_ISSOCK(m) (((m)&S_IFMT) == S_IFSOCK)
+#define IS_ERR_VALUE(x) (unsigned long)(void*)(x) >= (unsigned long)-MAX_ERRNO
+
+#define KILL_DATA_ARRAY_SIZE 8
+
+struct var_kill_data_arr_t {
+	struct var_kill_data_t array[KILL_DATA_ARRAY_SIZE];
+};
+
+union any_profiler_data_t {
+	struct var_exec_data_t var_exec;
+	struct var_kill_data_t var_kill;
+	struct var_sysctl_data_t var_sysctl;
+	struct var_filemod_data_t var_filemod;
+	struct var_fork_data_t var_fork;
+	struct var_kill_data_arr_t var_kill_data_arr;
+};
+
+volatile struct profiler_config_struct bpf_config = {};
+
+#define FETCH_CGROUPS_FROM_BPF (bpf_config.fetch_cgroups_from_bpf)
+#define CGROUP_FS_INODE (bpf_config.cgroup_fs_inode)
+#define CGROUP_LOGIN_SESSION_INODE \
+	(bpf_config.cgroup_login_session_inode)
+#define KILL_SIGNALS (bpf_config.kill_signals_mask)
+#define STALE_INFO (bpf_config.stale_info_secs)
+#define INODE_FILTER (bpf_config.inode_filter)
+#define READ_ENVIRON_FROM_EXEC (bpf_config.read_environ_from_exec)
+#define ENABLE_CGROUP_V1_RESOLVER (bpf_config.enable_cgroup_v1_resolver)
+
+struct kernfs_iattrs___52 {
+	struct iattr ia_iattr;
+};
+
+struct kernfs_node___52 {
+	union /* kernfs_node_id */ {
+		struct {
+			u32 ino;
+			u32 generation;
+		};
+		u64 id;
+	} id;
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
+	__uint(max_entries, 1);
+	__type(key, u32);
+	__type(value, union any_profiler_data_t);
+} data_heap SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
+	__uint(key_size, sizeof(int));
+	__uint(value_size, sizeof(int));
+} events SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(max_entries, KILL_DATA_ARRAY_SIZE);
+	__type(key, u32);
+	__type(value, struct var_kill_data_arr_t);
+} var_tpid_to_data SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
+	__uint(max_entries, profiler_bpf_max_function_id);
+	__type(key, u32);
+	__type(value, struct bpf_func_stats_data);
+} bpf_func_stats SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__type(key, u32);
+	__type(value, bool);
+	__uint(max_entries, 16);
+} allowed_devices SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__type(key, u64);
+	__type(value, bool);
+	__uint(max_entries, 1024);
+} allowed_file_inodes SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__type(key, u64);
+	__type(value, bool);
+	__uint(max_entries, 1024);
+} allowed_directory_inodes SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__type(key, u32);
+	__type(value, bool);
+	__uint(max_entries, 16);
+} disallowed_exec_inodes SEC(".maps");
+
+#ifndef ARRAY_SIZE
+#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof(arr[0]))
+#endif
+
+static INLINE bool IS_ERR(const void* ptr)
+{
+	return IS_ERR_VALUE((unsigned long)ptr);
+}
+
+static INLINE u32 get_userspace_pid()
+{
+	return bpf_get_current_pid_tgid() >> 32;
+}
+
+static INLINE bool is_init_process(u32 tgid)
+{
+	return tgid == 1 || tgid == 0;
+}
+
+static INLINE unsigned long
+probe_read_lim(void* dst, void* src, unsigned long len, unsigned long max)
+{
+	len = len < max ? len : max;
+	if (len > 1) {
+		if (bpf_probe_read(dst, len, src))
+			return 0;
+	} else if (len == 1) {
+		if (bpf_probe_read(dst, 1, src))
+			return 0;
+	}
+	return len;
+}
+
+static INLINE int get_var_spid_index(struct var_kill_data_arr_t* arr_struct,
+				     int spid)
+{
+#ifdef UNROLL
+#pragma unroll
+#endif
+	for (int i = 0; i < ARRAY_SIZE(arr_struct->array); i++)
+		if (arr_struct->array[i].meta.pid == spid)
+			return i;
+	return -1;
+}
+
+static INLINE void populate_ancestors(struct task_struct* task,
+				      struct ancestors_data_t* ancestors_data)
+{
+	struct task_struct* parent = task;
+	u32 num_ancestors, ppid;
+
+	ancestors_data->num_ancestors = 0;
+#ifdef UNROLL
+#pragma unroll
+#endif
+	for (num_ancestors = 0; num_ancestors < MAX_ANCESTORS; num_ancestors++) {
+		parent = BPF_CORE_READ(parent, real_parent);
+		if (parent == NULL)
+			break;
+		ppid = BPF_CORE_READ(parent, tgid);
+		if (is_init_process(ppid))
+			break;
+		ancestors_data->ancestor_pids[num_ancestors] = ppid;
+		ancestors_data->ancestor_exec_ids[num_ancestors] =
+			BPF_CORE_READ(parent, self_exec_id);
+		ancestors_data->ancestor_start_times[num_ancestors] =
+			BPF_CORE_READ(parent, start_time);
+		ancestors_data->num_ancestors = num_ancestors;
+	}
+}
+
+static INLINE void* read_full_cgroup_path(struct kernfs_node* cgroup_node,
+					  struct kernfs_node* cgroup_root_node,
+					  void* payload,
+					  int* root_pos)
+{
+	void* payload_start = payload;
+	size_t filepart_length;
+
+#ifdef UNROLL
+#pragma unroll
+#endif
+	for (int i = 0; i < MAX_CGROUPS_PATH_DEPTH; i++) {
+		filepart_length =
+			bpf_probe_read_str(payload, MAX_PATH, BPF_CORE_READ(cgroup_node, name));
+		if (!cgroup_node)
+			return payload;
+		if (cgroup_node == cgroup_root_node)
+			*root_pos = payload - payload_start;
+		if (filepart_length <= MAX_PATH) {
+			barrier_var(filepart_length);
+			payload += filepart_length;
+		}
+		cgroup_node = BPF_CORE_READ(cgroup_node, parent);
+	}
+	return payload;
+}
+
+static ino_t get_inode_from_kernfs(struct kernfs_node* node)
+{
+	struct kernfs_node___52* node52 = (void*)node;
+
+	if (bpf_core_field_exists(node52->id.ino)) {
+		barrier_var(node52);
+		return BPF_CORE_READ(node52, id.ino);
+	} else {
+		barrier_var(node);
+		return (u64)BPF_CORE_READ(node, id);
+	}
+}
+
+int pids_cgrp_id = 1;
+
+static INLINE void* populate_cgroup_info(struct cgroup_data_t* cgroup_data,
+					 struct task_struct* task,
+					 void* payload)
+{
+	struct kernfs_node* root_kernfs =
+		BPF_CORE_READ(task, nsproxy, cgroup_ns, root_cset, dfl_cgrp, kn);
+	struct kernfs_node* proc_kernfs = BPF_CORE_READ(task, cgroups, dfl_cgrp, kn);
+
+	if (ENABLE_CGROUP_V1_RESOLVER) {
+#ifdef UNROLL
+#pragma unroll
+#endif
+		for (int i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
+			struct cgroup_subsys_state* subsys =
+				BPF_CORE_READ(task, cgroups, subsys[i]);
+			if (subsys != NULL) {
+				int subsys_id = BPF_CORE_READ(subsys, ss, id);
+				if (subsys_id == pids_cgrp_id) {
+					proc_kernfs = BPF_CORE_READ(subsys, cgroup, kn);
+					root_kernfs = BPF_CORE_READ(subsys, ss, root, kf_root, kn);
+					break;
+				}
+			}
+		}
+	}
+
+	cgroup_data->cgroup_root_inode = get_inode_from_kernfs(root_kernfs);
+	cgroup_data->cgroup_proc_inode = get_inode_from_kernfs(proc_kernfs);
+
+	if (bpf_core_field_exists(root_kernfs->iattr->ia_mtime)) {
+		cgroup_data->cgroup_root_mtime =
+			BPF_CORE_READ(root_kernfs, iattr, ia_mtime.tv_nsec);
+		cgroup_data->cgroup_proc_mtime =
+			BPF_CORE_READ(proc_kernfs, iattr, ia_mtime.tv_nsec);
+	} else {
+		struct kernfs_iattrs___52* root_iattr =
+			(struct kernfs_iattrs___52*)BPF_CORE_READ(root_kernfs, iattr);
+		cgroup_data->cgroup_root_mtime =
+			BPF_CORE_READ(root_iattr, ia_iattr.ia_mtime.tv_nsec);
+
+		struct kernfs_iattrs___52* proc_iattr =
+			(struct kernfs_iattrs___52*)BPF_CORE_READ(proc_kernfs, iattr);
+		cgroup_data->cgroup_proc_mtime =
+			BPF_CORE_READ(proc_iattr, ia_iattr.ia_mtime.tv_nsec);
+	}
+
+	cgroup_data->cgroup_root_length = 0;
+	cgroup_data->cgroup_proc_length = 0;
+	cgroup_data->cgroup_full_length = 0;
+
+	size_t cgroup_root_length =
+		bpf_probe_read_str(payload, MAX_PATH, BPF_CORE_READ(root_kernfs, name));
+	barrier_var(cgroup_root_length);
+	if (cgroup_root_length <= MAX_PATH) {
+		barrier_var(cgroup_root_length);
+		cgroup_data->cgroup_root_length = cgroup_root_length;
+		payload += cgroup_root_length;
+	}
+
+	size_t cgroup_proc_length =
+		bpf_probe_read_str(payload, MAX_PATH, BPF_CORE_READ(proc_kernfs, name));
+	barrier_var(cgroup_proc_length);
+	if (cgroup_proc_length <= MAX_PATH) {
+		barrier_var(cgroup_proc_length);
+		cgroup_data->cgroup_proc_length = cgroup_proc_length;
+		payload += cgroup_proc_length;
+	}
+
+	if (FETCH_CGROUPS_FROM_BPF) {
+		cgroup_data->cgroup_full_path_root_pos = -1;
+		void* payload_end_pos = read_full_cgroup_path(proc_kernfs, root_kernfs, payload,
+							      &cgroup_data->cgroup_full_path_root_pos);
+		cgroup_data->cgroup_full_length = payload_end_pos - payload;
+		payload = payload_end_pos;
+	}
+
+	return (void*)payload;
+}
+
+static INLINE void* populate_var_metadata(struct var_metadata_t* metadata,
+					  struct task_struct* task,
+					  u32 pid, void* payload)
+{
+	u64 uid_gid = bpf_get_current_uid_gid();
+
+	metadata->uid = (u32)uid_gid;
+	metadata->gid = uid_gid >> 32;
+	metadata->pid = pid;
+	metadata->exec_id = BPF_CORE_READ(task, self_exec_id);
+	metadata->start_time = BPF_CORE_READ(task, start_time);
+	metadata->comm_length = 0;
+
+	size_t comm_length = bpf_core_read_str(payload, TASK_COMM_LEN, &task->comm);
+	barrier_var(comm_length);
+	if (comm_length <= TASK_COMM_LEN) {
+		barrier_var(comm_length);
+		metadata->comm_length = comm_length;
+		payload += comm_length;
+	}
+
+	return (void*)payload;
+}
+
+static INLINE struct var_kill_data_t*
+get_var_kill_data(struct pt_regs* ctx, int spid, int tpid, int sig)
+{
+	int zero = 0;
+	struct var_kill_data_t* kill_data = bpf_map_lookup_elem(&data_heap, &zero);
+
+	if (kill_data == NULL)
+		return NULL;
+	struct task_struct* task = (struct task_struct*)bpf_get_current_task();
+
+	void* payload = populate_var_metadata(&kill_data->meta, task, spid, kill_data->payload);
+	payload = populate_cgroup_info(&kill_data->cgroup_data, task, payload);
+	size_t payload_length = payload - (void*)kill_data->payload;
+	kill_data->payload_length = payload_length;
+	populate_ancestors(task, &kill_data->ancestors_info);
+	kill_data->meta.type = KILL_EVENT;
+	kill_data->kill_target_pid = tpid;
+	kill_data->kill_sig = sig;
+	kill_data->kill_count = 1;
+	kill_data->last_kill_time = bpf_ktime_get_ns();
+	return kill_data;
+}
+
+static INLINE int trace_var_sys_kill(void* ctx, int tpid, int sig)
+{
+	if ((KILL_SIGNALS & (1ULL << sig)) == 0)
+		return 0;
+
+	u32 spid = get_userspace_pid();
+	struct var_kill_data_arr_t* arr_struct = bpf_map_lookup_elem(&var_tpid_to_data, &tpid);
+
+	if (arr_struct == NULL) {
+		struct var_kill_data_t* kill_data = get_var_kill_data(ctx, spid, tpid, sig);
+		int zero = 0;
+
+		if (kill_data == NULL)
+			return 0;
+		arr_struct = bpf_map_lookup_elem(&data_heap, &zero);
+		if (arr_struct == NULL)
+			return 0;
+		bpf_probe_read(&arr_struct->array[0], sizeof(arr_struct->array[0]), kill_data);
+	} else {
+		int index = get_var_spid_index(arr_struct, spid);
+
+		if (index == -1) {
+			struct var_kill_data_t* kill_data =
+				get_var_kill_data(ctx, spid, tpid, sig);
+			if (kill_data == NULL)
+				return 0;
+#ifdef UNROLL
+#pragma unroll
+#endif
+			for (int i = 0; i < ARRAY_SIZE(arr_struct->array); i++)
+				if (arr_struct->array[i].meta.pid == 0) {
+					bpf_probe_read(&arr_struct->array[i],
+						       sizeof(arr_struct->array[i]), kill_data);
+					bpf_map_update_elem(&var_tpid_to_data, &tpid,
+							    arr_struct, 0);
+
+					return 0;
+				}
+			return 0;
+		}
+
+		struct var_kill_data_t* kill_data = &arr_struct->array[index];
+
+		u64 delta_sec =
+			(bpf_ktime_get_ns() - kill_data->last_kill_time) / 1000000000;
+
+		if (delta_sec < STALE_INFO) {
+			kill_data->kill_count++;
+			kill_data->last_kill_time = bpf_ktime_get_ns();
+			bpf_probe_read(&arr_struct->array[index],
+				       sizeof(arr_struct->array[index]),
+				       kill_data);
+		} else {
+			struct var_kill_data_t* kill_data =
+				get_var_kill_data(ctx, spid, tpid, sig);
+			if (kill_data == NULL)
+				return 0;
+			bpf_probe_read(&arr_struct->array[index],
+				       sizeof(arr_struct->array[index]),
+				       kill_data);
+		}
+	}
+	bpf_map_update_elem(&var_tpid_to_data, &tpid, arr_struct, 0);
+	return 0;
+}
+
+static INLINE void bpf_stats_enter(struct bpf_func_stats_ctx* bpf_stat_ctx,
+				   enum bpf_function_id func_id)
+{
+	int func_id_key = func_id;
+
+	bpf_stat_ctx->start_time_ns = bpf_ktime_get_ns();
+	bpf_stat_ctx->bpf_func_stats_data_val =
+		bpf_map_lookup_elem(&bpf_func_stats, &func_id_key);
+	if (bpf_stat_ctx->bpf_func_stats_data_val)
+		bpf_stat_ctx->bpf_func_stats_data_val->num_executions++;
+}
+
+static INLINE void bpf_stats_exit(struct bpf_func_stats_ctx* bpf_stat_ctx)
+{
+	if (bpf_stat_ctx->bpf_func_stats_data_val)
+		bpf_stat_ctx->bpf_func_stats_data_val->time_elapsed_ns +=
+			bpf_ktime_get_ns() - bpf_stat_ctx->start_time_ns;
+}
+
+static INLINE void
+bpf_stats_pre_submit_var_perf_event(struct bpf_func_stats_ctx* bpf_stat_ctx,
+				    struct var_metadata_t* meta)
+{
+	if (bpf_stat_ctx->bpf_func_stats_data_val) {
+		bpf_stat_ctx->bpf_func_stats_data_val->num_perf_events++;
+		meta->bpf_stats_num_perf_events =
+			bpf_stat_ctx->bpf_func_stats_data_val->num_perf_events;
+	}
+	meta->bpf_stats_start_ktime_ns = bpf_stat_ctx->start_time_ns;
+	meta->cpu_id = bpf_get_smp_processor_id();
+}
+
+static INLINE size_t
+read_absolute_file_path_from_dentry(struct dentry* filp_dentry, void* payload)
+{
+	size_t length = 0;
+	size_t filepart_length;
+	struct dentry* parent_dentry;
+
+#ifdef UNROLL
+#pragma unroll
+#endif
+	for (int i = 0; i < MAX_PATH_DEPTH; i++) {
+		filepart_length = bpf_probe_read_str(payload, MAX_PATH,
+						     BPF_CORE_READ(filp_dentry, d_name.name));
+		barrier_var(filepart_length);
+		if (filepart_length > MAX_PATH)
+			break;
+		barrier_var(filepart_length);
+		payload += filepart_length;
+		length += filepart_length;
+
+		parent_dentry = BPF_CORE_READ(filp_dentry, d_parent);
+		if (filp_dentry == parent_dentry)
+			break;
+		filp_dentry = parent_dentry;
+	}
+
+	return length;
+}
+
+static INLINE bool
+is_ancestor_in_allowed_inodes(struct dentry* filp_dentry)
+{
+	struct dentry* parent_dentry;
+#ifdef UNROLL
+#pragma unroll
+#endif
+	for (int i = 0; i < MAX_PATH_DEPTH; i++) {
+		u64 dir_ino = BPF_CORE_READ(filp_dentry, d_inode, i_ino);
+		bool* allowed_dir = bpf_map_lookup_elem(&allowed_directory_inodes, &dir_ino);
+
+		if (allowed_dir != NULL)
+			return true;
+		parent_dentry = BPF_CORE_READ(filp_dentry, d_parent);
+		if (filp_dentry == parent_dentry)
+			break;
+		filp_dentry = parent_dentry;
+	}
+	return false;
+}
+
+static INLINE bool is_dentry_allowed_for_filemod(struct dentry* file_dentry,
+						 u32* device_id,
+						 u64* file_ino)
+{
+	u32 dev_id = BPF_CORE_READ(file_dentry, d_sb, s_dev);
+	*device_id = dev_id;
+	bool* allowed_device = bpf_map_lookup_elem(&allowed_devices, &dev_id);
+
+	if (allowed_device == NULL)
+		return false;
+
+	u64 ino = BPF_CORE_READ(file_dentry, d_inode, i_ino);
+	*file_ino = ino;
+	bool* allowed_file = bpf_map_lookup_elem(&allowed_file_inodes, &ino);
+
+	if (allowed_file == NULL)
+		if (!is_ancestor_in_allowed_inodes(BPF_CORE_READ(file_dentry, d_parent)))
+			return false;
+	return true;
+}
+
+SEC("kprobe/proc_sys_write")
+ssize_t BPF_KPROBE(kprobe__proc_sys_write,
+		   struct file* filp, const char* buf,
+		   size_t count, loff_t* ppos)
+{
+	struct bpf_func_stats_ctx stats_ctx;
+	bpf_stats_enter(&stats_ctx, profiler_bpf_proc_sys_write);
+
+	u32 pid = get_userspace_pid();
+	int zero = 0;
+	struct var_sysctl_data_t* sysctl_data =
+		bpf_map_lookup_elem(&data_heap, &zero);
+	if (!sysctl_data)
+		goto out;
+
+	struct task_struct* task = (struct task_struct*)bpf_get_current_task();
+	sysctl_data->meta.type = SYSCTL_EVENT;
+	void* payload = populate_var_metadata(&sysctl_data->meta, task, pid, sysctl_data->payload);
+	payload = populate_cgroup_info(&sysctl_data->cgroup_data, task, payload);
+
+	populate_ancestors(task, &sysctl_data->ancestors_info);
+
+	sysctl_data->sysctl_val_length = 0;
+	sysctl_data->sysctl_path_length = 0;
+
+	size_t sysctl_val_length = bpf_probe_read_str(payload, CTL_MAXNAME, buf);
+	barrier_var(sysctl_val_length);
+	if (sysctl_val_length <= CTL_MAXNAME) {
+		barrier_var(sysctl_val_length);
+		sysctl_data->sysctl_val_length = sysctl_val_length;
+		payload += sysctl_val_length;
+	}
+
+	size_t sysctl_path_length = bpf_probe_read_str(payload, MAX_PATH,
+						       BPF_CORE_READ(filp, f_path.dentry, d_name.name));
+	barrier_var(sysctl_path_length);
+	if (sysctl_path_length <= MAX_PATH) {
+		barrier_var(sysctl_path_length);
+		sysctl_data->sysctl_path_length = sysctl_path_length;
+		payload += sysctl_path_length;
+	}
+
+	bpf_stats_pre_submit_var_perf_event(&stats_ctx, &sysctl_data->meta);
+	unsigned long data_len = payload - (void*)sysctl_data;
+	data_len = data_len > sizeof(struct var_sysctl_data_t)
+		? sizeof(struct var_sysctl_data_t)
+		: data_len;
+	bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, sysctl_data, data_len);
+out:
+	bpf_stats_exit(&stats_ctx);
+	return 0;
+}
+
+SEC("tracepoint/syscalls/sys_enter_kill")
+int tracepoint__syscalls__sys_enter_kill(struct trace_event_raw_sys_enter* ctx)
+{
+	struct bpf_func_stats_ctx stats_ctx;
+
+	bpf_stats_enter(&stats_ctx, profiler_bpf_sys_enter_kill);
+	int pid = ctx->args[0];
+	int sig = ctx->args[1];
+	int ret = trace_var_sys_kill(ctx, pid, sig);
+	bpf_stats_exit(&stats_ctx);
+	return ret;
+};
+
+SEC("raw_tracepoint/sched_process_exit")
+int raw_tracepoint__sched_process_exit(void* ctx)
+{
+	int zero = 0;
+	struct bpf_func_stats_ctx stats_ctx;
+	bpf_stats_enter(&stats_ctx, profiler_bpf_sched_process_exit);
+
+	u32 tpid = get_userspace_pid();
+
+	struct var_kill_data_arr_t* arr_struct = bpf_map_lookup_elem(&var_tpid_to_data, &tpid);
+	struct var_kill_data_t* kill_data = bpf_map_lookup_elem(&data_heap, &zero);
+
+	if (arr_struct == NULL || kill_data == NULL)
+		goto out;
+
+	struct task_struct* task = (struct task_struct*)bpf_get_current_task();
+	struct kernfs_node* proc_kernfs = BPF_CORE_READ(task, cgroups, dfl_cgrp, kn);
+
+#ifdef UNROLL
+#pragma unroll
+#endif
+	for (int i = 0; i < ARRAY_SIZE(arr_struct->array); i++) {
+		struct var_kill_data_t* past_kill_data = &arr_struct->array[i];
+
+		if (past_kill_data != NULL && past_kill_data->kill_target_pid == tpid) {
+			bpf_probe_read(kill_data, sizeof(*past_kill_data), past_kill_data);
+			void* payload = kill_data->payload;
+			size_t offset = kill_data->payload_length;
+			if (offset >= MAX_METADATA_PAYLOAD_LEN + MAX_CGROUP_PAYLOAD_LEN)
+				return 0;
+			payload += offset;
+
+			kill_data->kill_target_name_length = 0;
+			kill_data->kill_target_cgroup_proc_length = 0;
+
+			size_t comm_length = bpf_core_read_str(payload, TASK_COMM_LEN, &task->comm);
+			barrier_var(comm_length);
+			if (comm_length <= TASK_COMM_LEN) {
+				barrier_var(comm_length);
+				kill_data->kill_target_name_length = comm_length;
+				payload += comm_length;
+			}
+
+			size_t cgroup_proc_length = bpf_probe_read_str(payload, KILL_TARGET_LEN,
+								       BPF_CORE_READ(proc_kernfs, name));
+			barrier_var(cgroup_proc_length);
+			if (cgroup_proc_length <= KILL_TARGET_LEN) {
+				barrier_var(cgroup_proc_length);
+				kill_data->kill_target_cgroup_proc_length = cgroup_proc_length;
+				payload += cgroup_proc_length;
+			}
+
+			bpf_stats_pre_submit_var_perf_event(&stats_ctx, &kill_data->meta);
+			unsigned long data_len = (void*)payload - (void*)kill_data;
+			data_len = data_len > sizeof(struct var_kill_data_t)
+				? sizeof(struct var_kill_data_t)
+				: data_len;
+			bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, kill_data, data_len);
+		}
+	}
+	bpf_map_delete_elem(&var_tpid_to_data, &tpid);
+out:
+	bpf_stats_exit(&stats_ctx);
+	return 0;
+}
+
+SEC("raw_tracepoint/sched_process_exec")
+int raw_tracepoint__sched_process_exec(struct bpf_raw_tracepoint_args* ctx)
+{
+	struct bpf_func_stats_ctx stats_ctx;
+	bpf_stats_enter(&stats_ctx, profiler_bpf_sched_process_exec);
+
+	struct linux_binprm* bprm = (struct linux_binprm*)ctx->args[2];
+	u64 inode = BPF_CORE_READ(bprm, file, f_inode, i_ino);
+
+	bool* should_filter_binprm = bpf_map_lookup_elem(&disallowed_exec_inodes, &inode);
+	if (should_filter_binprm != NULL)
+		goto out;
+
+	int zero = 0;
+	struct var_exec_data_t* proc_exec_data = bpf_map_lookup_elem(&data_heap, &zero);
+	if (!proc_exec_data)
+		goto out;
+
+	if (INODE_FILTER && inode != INODE_FILTER)
+		return 0;
+
+	u32 pid = get_userspace_pid();
+	struct task_struct* task = (struct task_struct*)bpf_get_current_task();
+
+	proc_exec_data->meta.type = EXEC_EVENT;
+	proc_exec_data->bin_path_length = 0;
+	proc_exec_data->cmdline_length = 0;
+	proc_exec_data->environment_length = 0;
+	void* payload = populate_var_metadata(&proc_exec_data->meta, task, pid,
+					      proc_exec_data->payload);
+	payload = populate_cgroup_info(&proc_exec_data->cgroup_data, task, payload);
+
+	struct task_struct* parent_task = BPF_CORE_READ(task, real_parent);
+	proc_exec_data->parent_pid = BPF_CORE_READ(parent_task, tgid);
+	proc_exec_data->parent_uid = BPF_CORE_READ(parent_task, real_cred, uid.val);
+	proc_exec_data->parent_exec_id = BPF_CORE_READ(parent_task, self_exec_id);
+	proc_exec_data->parent_start_time = BPF_CORE_READ(parent_task, start_time);
+
+	const char* filename = BPF_CORE_READ(bprm, filename);
+	size_t bin_path_length = bpf_probe_read_str(payload, MAX_FILENAME_LEN, filename);
+	barrier_var(bin_path_length);
+	if (bin_path_length <= MAX_FILENAME_LEN) {
+		barrier_var(bin_path_length);
+		proc_exec_data->bin_path_length = bin_path_length;
+		payload += bin_path_length;
+	}
+
+	void* arg_start = (void*)BPF_CORE_READ(task, mm, arg_start);
+	void* arg_end = (void*)BPF_CORE_READ(task, mm, arg_end);
+	unsigned int cmdline_length = probe_read_lim(payload, arg_start,
+						     arg_end - arg_start, MAX_ARGS_LEN);
+
+	if (cmdline_length <= MAX_ARGS_LEN) {
+		barrier_var(cmdline_length);
+		proc_exec_data->cmdline_length = cmdline_length;
+		payload += cmdline_length;
+	}
+
+	if (READ_ENVIRON_FROM_EXEC) {
+		void* env_start = (void*)BPF_CORE_READ(task, mm, env_start);
+		void* env_end = (void*)BPF_CORE_READ(task, mm, env_end);
+		unsigned long env_len = probe_read_lim(payload, env_start,
+						       env_end - env_start, MAX_ENVIRON_LEN);
+		if (cmdline_length <= MAX_ENVIRON_LEN) {
+			proc_exec_data->environment_length = env_len;
+			payload += env_len;
+		}
+	}
+
+	bpf_stats_pre_submit_var_perf_event(&stats_ctx, &proc_exec_data->meta);
+	unsigned long data_len = payload - (void*)proc_exec_data;
+	data_len = data_len > sizeof(struct var_exec_data_t)
+		? sizeof(struct var_exec_data_t)
+		: data_len;
+	bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, proc_exec_data, data_len);
+out:
+	bpf_stats_exit(&stats_ctx);
+	return 0;
+}
+
+SEC("kretprobe/do_filp_open")
+int kprobe_ret__do_filp_open(struct pt_regs* ctx)
+{
+	struct bpf_func_stats_ctx stats_ctx;
+	bpf_stats_enter(&stats_ctx, profiler_bpf_do_filp_open_ret);
+
+	struct file* filp = (struct file*)PT_REGS_RC_CORE(ctx);
+
+	if (filp == NULL || IS_ERR(filp))
+		goto out;
+	unsigned int flags = BPF_CORE_READ(filp, f_flags);
+	if ((flags & (O_RDWR | O_WRONLY)) == 0)
+		goto out;
+	if ((flags & O_TMPFILE) > 0)
+		goto out;
+	struct inode* file_inode = BPF_CORE_READ(filp, f_inode);
+	umode_t mode = BPF_CORE_READ(file_inode, i_mode);
+	if (S_ISDIR(mode) || S_ISCHR(mode) || S_ISBLK(mode) || S_ISFIFO(mode) ||
+	    S_ISSOCK(mode))
+		goto out;
+
+	struct dentry* filp_dentry = BPF_CORE_READ(filp, f_path.dentry);
+	u32 device_id = 0;
+	u64 file_ino = 0;
+	if (!is_dentry_allowed_for_filemod(filp_dentry, &device_id, &file_ino))
+		goto out;
+
+	int zero = 0;
+	struct var_filemod_data_t* filemod_data = bpf_map_lookup_elem(&data_heap, &zero);
+	if (!filemod_data)
+		goto out;
+
+	u32 pid = get_userspace_pid();
+	struct task_struct* task = (struct task_struct*)bpf_get_current_task();
+
+	filemod_data->meta.type = FILEMOD_EVENT;
+	filemod_data->fmod_type = FMOD_OPEN;
+	filemod_data->dst_flags = flags;
+	filemod_data->src_inode = 0;
+	filemod_data->dst_inode = file_ino;
+	filemod_data->src_device_id = 0;
+	filemod_data->dst_device_id = device_id;
+	filemod_data->src_filepath_length = 0;
+	filemod_data->dst_filepath_length = 0;
+
+	void* payload = populate_var_metadata(&filemod_data->meta, task, pid,
+					      filemod_data->payload);
+	payload = populate_cgroup_info(&filemod_data->cgroup_data, task, payload);
+
+	size_t len = read_absolute_file_path_from_dentry(filp_dentry, payload);
+	barrier_var(len);
+	if (len <= MAX_FILEPATH_LENGTH) {
+		barrier_var(len);
+		payload += len;
+		filemod_data->dst_filepath_length = len;
+	}
+	bpf_stats_pre_submit_var_perf_event(&stats_ctx, &filemod_data->meta);
+	unsigned long data_len = payload - (void*)filemod_data;
+	data_len = data_len > sizeof(*filemod_data) ? sizeof(*filemod_data) : data_len;
+	bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, filemod_data, data_len);
+out:
+	bpf_stats_exit(&stats_ctx);
+	return 0;
+}
+
+SEC("kprobe/vfs_link")
+int BPF_KPROBE(kprobe__vfs_link,
+	       struct dentry* old_dentry, struct inode* dir,
+	       struct dentry* new_dentry, struct inode** delegated_inode)
+{
+	struct bpf_func_stats_ctx stats_ctx;
+	bpf_stats_enter(&stats_ctx, profiler_bpf_vfs_link);
+
+	u32 src_device_id = 0;
+	u64 src_file_ino = 0;
+	u32 dst_device_id = 0;
+	u64 dst_file_ino = 0;
+	if (!is_dentry_allowed_for_filemod(old_dentry, &src_device_id, &src_file_ino) &&
+	    !is_dentry_allowed_for_filemod(new_dentry, &dst_device_id, &dst_file_ino))
+		goto out;
+
+	int zero = 0;
+	struct var_filemod_data_t* filemod_data = bpf_map_lookup_elem(&data_heap, &zero);
+	if (!filemod_data)
+		goto out;
+
+	u32 pid = get_userspace_pid();
+	struct task_struct* task = (struct task_struct*)bpf_get_current_task();
+
+	filemod_data->meta.type = FILEMOD_EVENT;
+	filemod_data->fmod_type = FMOD_LINK;
+	filemod_data->dst_flags = 0;
+	filemod_data->src_inode = src_file_ino;
+	filemod_data->dst_inode = dst_file_ino;
+	filemod_data->src_device_id = src_device_id;
+	filemod_data->dst_device_id = dst_device_id;
+	filemod_data->src_filepath_length = 0;
+	filemod_data->dst_filepath_length = 0;
+
+	void* payload = populate_var_metadata(&filemod_data->meta, task, pid,
+					      filemod_data->payload);
+	payload = populate_cgroup_info(&filemod_data->cgroup_data, task, payload);
+
+	size_t len = read_absolute_file_path_from_dentry(old_dentry, payload);
+	barrier_var(len);
+	if (len <= MAX_FILEPATH_LENGTH) {
+		barrier_var(len);
+		payload += len;
+		filemod_data->src_filepath_length = len;
+	}
+
+	len = read_absolute_file_path_from_dentry(new_dentry, payload);
+	barrier_var(len);
+	if (len <= MAX_FILEPATH_LENGTH) {
+		barrier_var(len);
+		payload += len;
+		filemod_data->dst_filepath_length = len;
+	}
+
+	bpf_stats_pre_submit_var_perf_event(&stats_ctx, &filemod_data->meta);
+	unsigned long data_len = payload - (void*)filemod_data;
+	data_len = data_len > sizeof(*filemod_data) ? sizeof(*filemod_data) : data_len;
+	bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, filemod_data, data_len);
+out:
+	bpf_stats_exit(&stats_ctx);
+	return 0;
+}
+
+SEC("kprobe/vfs_symlink")
+int BPF_KPROBE(kprobe__vfs_symlink, struct inode* dir, struct dentry* dentry,
+	       const char* oldname)
+{
+	struct bpf_func_stats_ctx stats_ctx;
+	bpf_stats_enter(&stats_ctx, profiler_bpf_vfs_symlink);
+
+	u32 dst_device_id = 0;
+	u64 dst_file_ino = 0;
+	if (!is_dentry_allowed_for_filemod(dentry, &dst_device_id, &dst_file_ino))
+		goto out;
+
+	int zero = 0;
+	struct var_filemod_data_t* filemod_data = bpf_map_lookup_elem(&data_heap, &zero);
+	if (!filemod_data)
+		goto out;
+
+	u32 pid = get_userspace_pid();
+	struct task_struct* task = (struct task_struct*)bpf_get_current_task();
+
+	filemod_data->meta.type = FILEMOD_EVENT;
+	filemod_data->fmod_type = FMOD_SYMLINK;
+	filemod_data->dst_flags = 0;
+	filemod_data->src_inode = 0;
+	filemod_data->dst_inode = dst_file_ino;
+	filemod_data->src_device_id = 0;
+	filemod_data->dst_device_id = dst_device_id;
+	filemod_data->src_filepath_length = 0;
+	filemod_data->dst_filepath_length = 0;
+
+	void* payload = populate_var_metadata(&filemod_data->meta, task, pid,
+					      filemod_data->payload);
+	payload = populate_cgroup_info(&filemod_data->cgroup_data, task, payload);
+
+	size_t len = bpf_probe_read_str(payload, MAX_FILEPATH_LENGTH, oldname);
+	barrier_var(len);
+	if (len <= MAX_FILEPATH_LENGTH) {
+		barrier_var(len);
+		payload += len;
+		filemod_data->src_filepath_length = len;
+	}
+	len = read_absolute_file_path_from_dentry(dentry, payload);
+	barrier_var(len);
+	if (len <= MAX_FILEPATH_LENGTH) {
+		barrier_var(len);
+		payload += len;
+		filemod_data->dst_filepath_length = len;
+	}
+	bpf_stats_pre_submit_var_perf_event(&stats_ctx, &filemod_data->meta);
+	unsigned long data_len = payload - (void*)filemod_data;
+	data_len = data_len > sizeof(*filemod_data) ? sizeof(*filemod_data) : data_len;
+	bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, filemod_data, data_len);
+out:
+	bpf_stats_exit(&stats_ctx);
+	return 0;
+}
+
+SEC("raw_tracepoint/sched_process_fork")
+int raw_tracepoint__sched_process_fork(struct bpf_raw_tracepoint_args* ctx)
+{
+	struct bpf_func_stats_ctx stats_ctx;
+	bpf_stats_enter(&stats_ctx, profiler_bpf_sched_process_fork);
+
+	int zero = 0;
+	struct var_fork_data_t* fork_data = bpf_map_lookup_elem(&data_heap, &zero);
+	if (!fork_data)
+		goto out;
+
+	struct task_struct* parent = (struct task_struct*)ctx->args[0];
+	struct task_struct* child = (struct task_struct*)ctx->args[1];
+	fork_data->meta.type = FORK_EVENT;
+
+	void* payload = populate_var_metadata(&fork_data->meta, child,
+					      BPF_CORE_READ(child, pid), fork_data->payload);
+	fork_data->parent_pid = BPF_CORE_READ(parent, pid);
+	fork_data->parent_exec_id = BPF_CORE_READ(parent, self_exec_id);
+	fork_data->parent_start_time = BPF_CORE_READ(parent, start_time);
+	bpf_stats_pre_submit_var_perf_event(&stats_ctx, &fork_data->meta);
+
+	unsigned long data_len = payload - (void*)fork_data;
+	data_len = data_len > sizeof(*fork_data) ? sizeof(*fork_data) : data_len;
+	bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, fork_data, data_len);
+out:
+	bpf_stats_exit(&stats_ctx);
+	return 0;
+}
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/profiler1.c b/tools/testing/selftests/bpf/progs/profiler1.c
new file mode 100644
index 000000000000..0f32a3cbf556
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/profiler1.c
@@ -0,0 +1,6 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2020 Facebook */
+#define barrier_var(var) /**/
+/* undef #define UNROLL */
+#define INLINE /**/
+#include "profiler.inc.h"
diff --git a/tools/testing/selftests/bpf/progs/profiler2.c b/tools/testing/selftests/bpf/progs/profiler2.c
new file mode 100644
index 000000000000..4df9088bfc00
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/profiler2.c
@@ -0,0 +1,6 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2020 Facebook */
+#define barrier_var(var) asm volatile("" : "=r"(var) : "0"(var))
+#define UNROLL
+#define INLINE __always_inline
+#include "profiler.inc.h"
diff --git a/tools/testing/selftests/bpf/progs/profiler3.c b/tools/testing/selftests/bpf/progs/profiler3.c
new file mode 100644
index 000000000000..6249fc31ccb0
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/profiler3.c
@@ -0,0 +1,6 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2020 Facebook */
+#define barrier_var(var) /**/
+#define UNROLL
+#define INLINE __noinline
+#include "profiler.inc.h"
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next 3/3] selftests/bpf: Add profiler test
  2020-10-06 20:09 ` [PATCH bpf-next 3/3] selftests/bpf: Add profiler test Alexei Starovoitov
@ 2020-10-07  1:22   ` Jakub Kicinski
  2020-10-07  1:35     ` Alexei Starovoitov
  0 siblings, 1 reply; 18+ messages in thread
From: Jakub Kicinski @ 2020-10-07  1:22 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: davem, daniel, john.fastabend, netdev, bpf, kernel-team

On Tue,  6 Oct 2020 13:09:55 -0700 Alexei Starovoitov wrote:
> +static ino_t get_inode_from_kernfs(struct kernfs_node* node)

nit: my bot suggests this may be missing an "INLINE" since it's a
     static function in a header

> +{
> +	struct kernfs_node___52* node52 = (void*)node;
> +
> +	if (bpf_core_field_exists(node52->id.ino)) {
> +		barrier_var(node52);
> +		return BPF_CORE_READ(node52, id.ino);
> +	} else {
> +		barrier_var(node);
> +		return (u64)BPF_CORE_READ(node, id);
> +	}
> +}

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next 3/3] selftests/bpf: Add profiler test
  2020-10-07  1:22   ` Jakub Kicinski
@ 2020-10-07  1:35     ` Alexei Starovoitov
  0 siblings, 0 replies; 18+ messages in thread
From: Alexei Starovoitov @ 2020-10-07  1:35 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S. Miller, Daniel Borkmann, John Fastabend,
	Network Development, bpf, Kernel Team

On Tue, Oct 6, 2020 at 6:22 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue,  6 Oct 2020 13:09:55 -0700 Alexei Starovoitov wrote:
> > +static ino_t get_inode_from_kernfs(struct kernfs_node* node)
>
> nit: my bot suggests this may be missing an "INLINE" since it's a
>      static function in a header

false positive. it's not a header.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next 1/3] bpf: Propagate scalar ranges through register assignments.
  2020-10-06 20:09 ` [PATCH bpf-next 1/3] bpf: Propagate scalar ranges through register assignments Alexei Starovoitov
@ 2020-10-07  1:56   ` Andrii Nakryiko
  2020-10-07  2:18     ` Alexei Starovoitov
  2020-10-07 23:44   ` John Fastabend
  1 sibling, 1 reply; 18+ messages in thread
From: Andrii Nakryiko @ 2020-10-07  1:56 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, john fastabend, Networking,
	bpf, Kernel Team

On Tue, Oct 6, 2020 at 1:14 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> From: Alexei Starovoitov <ast@kernel.org>
>
> The llvm register allocator may use two different registers representing the
> same virtual register. In such case the following pattern can be observed:
> 1047: (bf) r9 = r6
> 1048: (a5) if r6 < 0x1000 goto pc+1
> 1050: ...
> 1051: (a5) if r9 < 0x2 goto pc+66
> 1052: ...
> 1053: (bf) r2 = r9 /* r2 needs to have upper and lower bounds */
>
> In order to track this information without backtracking allocate ID
> for scalars in a similar way as it's done for find_good_pkt_pointers().
>
> When the verifier encounters r9 = r6 assignment it will assign the same ID
> to both registers. Later if either register range is narrowed via conditional
> jump propagate the register state into the other register.
>
> Clear register ID in adjust_reg_min_max_vals() for any alu instruction.
>
> Newly allocated register ID is ignored for scalars in regsafe() and doesn't
> affect state pruning. mark_reg_unknown() also clears the ID.
>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---

I couldn't find the problem with the logic, though it's quite
non-obvious at times that reg->id will be cleared on BPF_END/BPF_NEG
and few other operations. But I think naming of this function can be
improved, see below.

Also, profiler.c is great, but it would still be nice to add selftest
to test_verifier that will explicitly test the logic in this patch

>  kernel/bpf/verifier.c                         | 38 +++++++++++++++++++
>  .../testing/selftests/bpf/prog_tests/align.c  | 16 ++++----
>  .../bpf/verifier/direct_packet_access.c       |  2 +-
>  3 files changed, 47 insertions(+), 9 deletions(-)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 01120acab09a..09e17b483b0b 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -6432,6 +6432,8 @@ static int adjust_reg_min_max_vals(struct bpf_verifier_env *env,
>         src_reg = NULL;
>         if (dst_reg->type != SCALAR_VALUE)
>                 ptr_reg = dst_reg;
> +       else
> +               dst_reg->id = 0;
>         if (BPF_SRC(insn->code) == BPF_X) {
>                 src_reg = &regs[insn->src_reg];
>                 if (src_reg->type != SCALAR_VALUE) {
> @@ -6565,6 +6567,8 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
>                                 /* case: R1 = R2
>                                  * copy register state to dest reg
>                                  */
> +                               if (src_reg->type == SCALAR_VALUE)
> +                                       src_reg->id = ++env->id_gen;
>                                 *dst_reg = *src_reg;
>                                 dst_reg->live |= REG_LIVE_WRITTEN;
>                                 dst_reg->subreg_def = DEF_NOT_SUBREG;
> @@ -7365,6 +7369,30 @@ static bool try_match_pkt_pointers(const struct bpf_insn *insn,
>         return true;
>  }
>
> +static void find_equal_scalars(struct bpf_verifier_state *vstate,
> +                              struct bpf_reg_state *known_reg)

this is double-misleading name:

1) it's not just "find", but also "update" (or rather the purpose of
this function is specifically to update registers, not find them, as
we don't really return found register)
2) "equal" is not exactly true either. You can have two scalar
register with exactly the same state, but they might not share ->id.
So it's less about being equal, rather being "linked" by assignment.

> +{
> +       struct bpf_func_state *state;
> +       struct bpf_reg_state *reg;
> +       int i, j;
> +
> +       for (i = 0; i <= vstate->curframe; i++) {
> +               state = vstate->frame[i];
> +               for (j = 0; j < MAX_BPF_REG; j++) {
> +                       reg = &state->regs[j];
> +                       if (reg->type == SCALAR_VALUE && reg->id == known_reg->id)
> +                               *reg = *known_reg;
> +               }
> +
> +               bpf_for_each_spilled_reg(j, state, reg) {
> +                       if (!reg)
> +                               continue;
> +                       if (reg->type == SCALAR_VALUE && reg->id == known_reg->id)
> +                               *reg = *known_reg;
> +               }
> +       }
> +}
> +
>  static int check_cond_jmp_op(struct bpf_verifier_env *env,
>                              struct bpf_insn *insn, int *insn_idx)
>  {
> @@ -7493,6 +7521,11 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
>                                 reg_combine_min_max(&other_branch_regs[insn->src_reg],
>                                                     &other_branch_regs[insn->dst_reg],
>                                                     src_reg, dst_reg, opcode);
> +                       if (src_reg->id) {
> +                               find_equal_scalars(this_branch, src_reg);
> +                               find_equal_scalars(other_branch, &other_branch_regs[insn->src_reg]);
> +                       }
> +
>                 }
>         } else if (dst_reg->type == SCALAR_VALUE) {
>                 reg_set_min_max(&other_branch_regs[insn->dst_reg],
> @@ -7500,6 +7533,11 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
>                                         opcode, is_jmp32);
>         }
>
> +       if (dst_reg->type == SCALAR_VALUE && dst_reg->id) {
> +               find_equal_scalars(this_branch, dst_reg);
> +               find_equal_scalars(other_branch, &other_branch_regs[insn->dst_reg]);

will this cover the case above where reg_combine_min_max() can update
dst_reg's as well? Even if yes, it probably would be more
straightforward to call appropriate updates in the respective if
branches (it's just a single line for each register, so not like it's
duplicating tons of code). It will make reasoning about this logic
easier, IMO. Also, moving reg->id check into find_equal_scalars()
would make the above suggestion even cleaner.

> +       }
> +
>         /* detect if R == 0 where R is returned from bpf_map_lookup_elem().
>          * NOTE: these optimizations below are related with pointer comparison
>          *       which will never be JMP32.

[...]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next 1/3] bpf: Propagate scalar ranges through register assignments.
  2020-10-07  1:56   ` Andrii Nakryiko
@ 2020-10-07  2:18     ` Alexei Starovoitov
  2020-10-07  3:31       ` Andrii Nakryiko
  0 siblings, 1 reply; 18+ messages in thread
From: Alexei Starovoitov @ 2020-10-07  2:18 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: David S. Miller, Daniel Borkmann, john fastabend, Networking,
	bpf, Kernel Team

On Tue, Oct 06, 2020 at 06:56:14PM -0700, Andrii Nakryiko wrote:
> On Tue, Oct 6, 2020 at 1:14 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > From: Alexei Starovoitov <ast@kernel.org>
> >
> > The llvm register allocator may use two different registers representing the
> > same virtual register. In such case the following pattern can be observed:
> > 1047: (bf) r9 = r6
> > 1048: (a5) if r6 < 0x1000 goto pc+1
> > 1050: ...
> > 1051: (a5) if r9 < 0x2 goto pc+66
> > 1052: ...
> > 1053: (bf) r2 = r9 /* r2 needs to have upper and lower bounds */
> >
> > In order to track this information without backtracking allocate ID
> > for scalars in a similar way as it's done for find_good_pkt_pointers().
> >
> > When the verifier encounters r9 = r6 assignment it will assign the same ID
> > to both registers. Later if either register range is narrowed via conditional
> > jump propagate the register state into the other register.
> >
> > Clear register ID in adjust_reg_min_max_vals() for any alu instruction.
> >
> > Newly allocated register ID is ignored for scalars in regsafe() and doesn't
> > affect state pruning. mark_reg_unknown() also clears the ID.
> >
> > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > ---
> 
> I couldn't find the problem with the logic, though it's quite
> non-obvious at times that reg->id will be cleared on BPF_END/BPF_NEG
> and few other operations. But I think naming of this function can be
> improved, see below.
> 
> Also, profiler.c is great, but it would still be nice to add selftest
> to test_verifier that will explicitly test the logic in this patch

the test align.c actualy does the id checking better than I expected.
I'm planning to add more asm tests in the follow up.

> >  kernel/bpf/verifier.c                         | 38 +++++++++++++++++++
> >  .../testing/selftests/bpf/prog_tests/align.c  | 16 ++++----
> >  .../bpf/verifier/direct_packet_access.c       |  2 +-
> >  3 files changed, 47 insertions(+), 9 deletions(-)
> >
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 01120acab09a..09e17b483b0b 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -6432,6 +6432,8 @@ static int adjust_reg_min_max_vals(struct bpf_verifier_env *env,
> >         src_reg = NULL;
> >         if (dst_reg->type != SCALAR_VALUE)
> >                 ptr_reg = dst_reg;
> > +       else
> > +               dst_reg->id = 0;
> >         if (BPF_SRC(insn->code) == BPF_X) {
> >                 src_reg = &regs[insn->src_reg];
> >                 if (src_reg->type != SCALAR_VALUE) {
> > @@ -6565,6 +6567,8 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
> >                                 /* case: R1 = R2
> >                                  * copy register state to dest reg
> >                                  */
> > +                               if (src_reg->type == SCALAR_VALUE)
> > +                                       src_reg->id = ++env->id_gen;
> >                                 *dst_reg = *src_reg;
> >                                 dst_reg->live |= REG_LIVE_WRITTEN;
> >                                 dst_reg->subreg_def = DEF_NOT_SUBREG;
> > @@ -7365,6 +7369,30 @@ static bool try_match_pkt_pointers(const struct bpf_insn *insn,
> >         return true;
> >  }
> >
> > +static void find_equal_scalars(struct bpf_verifier_state *vstate,
> > +                              struct bpf_reg_state *known_reg)
> 
> this is double-misleading name:
> 
> 1) it's not just "find", but also "update" (or rather the purpose of
> this function is specifically to update registers, not find them, as
> we don't really return found register)
> 2) "equal" is not exactly true either. You can have two scalar
> register with exactly the same state, but they might not share ->id.
> So it's less about being equal, rather being "linked" by assignment.

I don't think I can agree.
We already have find_good_pkt_pointers() that also updates,
so 'find' fits better than 'update'.
'linked' is also wrong. The regs are exactly equal.
In case of pkt and other pointers two regs will have the same id
as well, but they will not be equal. Here these two scalars are equal
otherwise doing *reg = *known_reg would be wrong.

> > +{
> > +       struct bpf_func_state *state;
> > +       struct bpf_reg_state *reg;
> > +       int i, j;
> > +
> > +       for (i = 0; i <= vstate->curframe; i++) {
> > +               state = vstate->frame[i];
> > +               for (j = 0; j < MAX_BPF_REG; j++) {
> > +                       reg = &state->regs[j];
> > +                       if (reg->type == SCALAR_VALUE && reg->id == known_reg->id)
> > +                               *reg = *known_reg;
> > +               }
> > +
> > +               bpf_for_each_spilled_reg(j, state, reg) {
> > +                       if (!reg)
> > +                               continue;
> > +                       if (reg->type == SCALAR_VALUE && reg->id == known_reg->id)
> > +                               *reg = *known_reg;
> > +               }
> > +       }
> > +}
> > +
> >  static int check_cond_jmp_op(struct bpf_verifier_env *env,
> >                              struct bpf_insn *insn, int *insn_idx)
> >  {
> > @@ -7493,6 +7521,11 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
> >                                 reg_combine_min_max(&other_branch_regs[insn->src_reg],
> >                                                     &other_branch_regs[insn->dst_reg],
> >                                                     src_reg, dst_reg, opcode);
> > +                       if (src_reg->id) {
> > +                               find_equal_scalars(this_branch, src_reg);
> > +                               find_equal_scalars(other_branch, &other_branch_regs[insn->src_reg]);
> > +                       }
> > +
> >                 }
> >         } else if (dst_reg->type == SCALAR_VALUE) {
> >                 reg_set_min_max(&other_branch_regs[insn->dst_reg],
> > @@ -7500,6 +7533,11 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
> >                                         opcode, is_jmp32);
> >         }
> >
> > +       if (dst_reg->type == SCALAR_VALUE && dst_reg->id) {
> > +               find_equal_scalars(this_branch, dst_reg);
> > +               find_equal_scalars(other_branch, &other_branch_regs[insn->dst_reg]);
> 
> will this cover the case above where reg_combine_min_max() can update
> dst_reg's as well? 

yes.

> Even if yes, it probably would be more
> straightforward to call appropriate updates in the respective if
> branches (it's just a single line for each register, so not like it's
> duplicating tons of code). 

You mean inside reg_set_min_max() and inside reg_combine_min_max() ?
That won't work because find_equal_scalars() needs access to the whole
bpf_verifier_state and not just bpf_reg_state.

> It will make reasoning about this logic
> easier, IMO. Also, moving reg->id check into find_equal_scalars()
> would make the above suggestion even cleaner.

I don't think so. I think checking for type == SCALAR && dst_reg->id != 0
should be done outside of that function. It makes the logic cleaner.
For the same reason we check type outside of find_good_pkt_pointers().

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next 1/3] bpf: Propagate scalar ranges through register assignments.
  2020-10-07  2:18     ` Alexei Starovoitov
@ 2020-10-07  3:31       ` Andrii Nakryiko
  2020-10-07  4:15         ` Alexei Starovoitov
  0 siblings, 1 reply; 18+ messages in thread
From: Andrii Nakryiko @ 2020-10-07  3:31 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, john fastabend, Networking,
	bpf, Kernel Team

On Tue, Oct 6, 2020 at 7:18 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Oct 06, 2020 at 06:56:14PM -0700, Andrii Nakryiko wrote:
> > On Tue, Oct 6, 2020 at 1:14 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > From: Alexei Starovoitov <ast@kernel.org>
> > >
> > > The llvm register allocator may use two different registers representing the
> > > same virtual register. In such case the following pattern can be observed:
> > > 1047: (bf) r9 = r6
> > > 1048: (a5) if r6 < 0x1000 goto pc+1
> > > 1050: ...
> > > 1051: (a5) if r9 < 0x2 goto pc+66
> > > 1052: ...
> > > 1053: (bf) r2 = r9 /* r2 needs to have upper and lower bounds */
> > >
> > > In order to track this information without backtracking allocate ID
> > > for scalars in a similar way as it's done for find_good_pkt_pointers().
> > >
> > > When the verifier encounters r9 = r6 assignment it will assign the same ID
> > > to both registers. Later if either register range is narrowed via conditional
> > > jump propagate the register state into the other register.
> > >
> > > Clear register ID in adjust_reg_min_max_vals() for any alu instruction.
> > >
> > > Newly allocated register ID is ignored for scalars in regsafe() and doesn't
> > > affect state pruning. mark_reg_unknown() also clears the ID.
> > >
> > > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > > ---
> >
> > I couldn't find the problem with the logic, though it's quite
> > non-obvious at times that reg->id will be cleared on BPF_END/BPF_NEG
> > and few other operations. But I think naming of this function can be
> > improved, see below.
> >
> > Also, profiler.c is great, but it would still be nice to add selftest
> > to test_verifier that will explicitly test the logic in this patch
>
> the test align.c actualy does the id checking better than I expected.
> I'm planning to add more asm tests in the follow up.
>

ok

Acked-by: Andrii Nakryiko <andrii@kernel.org>

> > >  kernel/bpf/verifier.c                         | 38 +++++++++++++++++++
> > >  .../testing/selftests/bpf/prog_tests/align.c  | 16 ++++----
> > >  .../bpf/verifier/direct_packet_access.c       |  2 +-
> > >  3 files changed, 47 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > index 01120acab09a..09e17b483b0b 100644
> > > --- a/kernel/bpf/verifier.c
> > > +++ b/kernel/bpf/verifier.c
> > > @@ -6432,6 +6432,8 @@ static int adjust_reg_min_max_vals(struct bpf_verifier_env *env,
> > >         src_reg = NULL;
> > >         if (dst_reg->type != SCALAR_VALUE)
> > >                 ptr_reg = dst_reg;
> > > +       else
> > > +               dst_reg->id = 0;
> > >         if (BPF_SRC(insn->code) == BPF_X) {
> > >                 src_reg = &regs[insn->src_reg];
> > >                 if (src_reg->type != SCALAR_VALUE) {
> > > @@ -6565,6 +6567,8 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
> > >                                 /* case: R1 = R2
> > >                                  * copy register state to dest reg
> > >                                  */
> > > +                               if (src_reg->type == SCALAR_VALUE)
> > > +                                       src_reg->id = ++env->id_gen;
> > >                                 *dst_reg = *src_reg;
> > >                                 dst_reg->live |= REG_LIVE_WRITTEN;
> > >                                 dst_reg->subreg_def = DEF_NOT_SUBREG;
> > > @@ -7365,6 +7369,30 @@ static bool try_match_pkt_pointers(const struct bpf_insn *insn,
> > >         return true;
> > >  }
> > >
> > > +static void find_equal_scalars(struct bpf_verifier_state *vstate,
> > > +                              struct bpf_reg_state *known_reg)
> >
> > this is double-misleading name:
> >
> > 1) it's not just "find", but also "update" (or rather the purpose of
> > this function is specifically to update registers, not find them, as
> > we don't really return found register)
> > 2) "equal" is not exactly true either. You can have two scalar
> > register with exactly the same state, but they might not share ->id.
> > So it's less about being equal, rather being "linked" by assignment.
>
> I don't think I can agree.
> We already have find_good_pkt_pointers() that also updates,
> so 'find' fits better than 'update'.

find_good_pkt_pointers() has similarly confusing name, but sure,
consistency rules

> 'linked' is also wrong. The regs are exactly equal.
> In case of pkt and other pointers two regs will have the same id
> as well, but they will not be equal. Here these two scalars are equal
> otherwise doing *reg = *known_reg would be wrong.

Ok, I guess it also means that "reg->type == SCALAR_VALUE" checks
below are unnecessary as well, because if known_reg->id matches, that
means register states are exactly the same.

>
> > > +{
> > > +       struct bpf_func_state *state;
> > > +       struct bpf_reg_state *reg;
> > > +       int i, j;
> > > +
> > > +       for (i = 0; i <= vstate->curframe; i++) {
> > > +               state = vstate->frame[i];
> > > +               for (j = 0; j < MAX_BPF_REG; j++) {
> > > +                       reg = &state->regs[j];
> > > +                       if (reg->type == SCALAR_VALUE && reg->id == known_reg->id)
> > > +                               *reg = *known_reg;
> > > +               }
> > > +
> > > +               bpf_for_each_spilled_reg(j, state, reg) {
> > > +                       if (!reg)
> > > +                               continue;
> > > +                       if (reg->type == SCALAR_VALUE && reg->id == known_reg->id)
> > > +                               *reg = *known_reg;
> > > +               }
> > > +       }
> > > +}
> > > +
> > >  static int check_cond_jmp_op(struct bpf_verifier_env *env,
> > >                              struct bpf_insn *insn, int *insn_idx)
> > >  {
> > > @@ -7493,6 +7521,11 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
> > >                                 reg_combine_min_max(&other_branch_regs[insn->src_reg],
> > >                                                     &other_branch_regs[insn->dst_reg],
> > >                                                     src_reg, dst_reg, opcode);
> > > +                       if (src_reg->id) {
> > > +                               find_equal_scalars(this_branch, src_reg);
> > > +                               find_equal_scalars(other_branch, &other_branch_regs[insn->src_reg]);
> > > +                       }
> > > +
> > >                 }
> > >         } else if (dst_reg->type == SCALAR_VALUE) {
> > >                 reg_set_min_max(&other_branch_regs[insn->dst_reg],
> > > @@ -7500,6 +7533,11 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
> > >                                         opcode, is_jmp32);
> > >         }
> > >
> > > +       if (dst_reg->type == SCALAR_VALUE && dst_reg->id) {
> > > +               find_equal_scalars(this_branch, dst_reg);
> > > +               find_equal_scalars(other_branch, &other_branch_regs[insn->dst_reg]);
> >
> > will this cover the case above where reg_combine_min_max() can update
> > dst_reg's as well?
>
> yes.
>
> > Even if yes, it probably would be more
> > straightforward to call appropriate updates in the respective if
> > branches (it's just a single line for each register, so not like it's
> > duplicating tons of code).
>
> You mean inside reg_set_min_max() and inside reg_combine_min_max() ?
> That won't work because find_equal_scalars() needs access to the whole
> bpf_verifier_state and not just bpf_reg_state.

No, I meant something like this, few lines above:

if (BPF_SRC(insn->code) == BPF_X) {

    if (dst_reg->type == SCALAR_VALUE && src_reg->type == SCALAR_VALUE) {
        if (...)
        else if (...)
        else

        /* both src/dst regs in both this/other branches could have
been updated */
        find_equal_scalars(this_branch, src_reg);
        find_equal_scalars(this_branch, dst_reg);
        find_equal_scalars(other_branch, &other_branch_regs[insn->src_reg])
        find_equal_scalars(other_branch, &other_branch_regs[insn->dst_reg])
    }
} else if (dst_reg->type == SCALAR_VALUE) {
    reg_set_min_max(...);

    /* only dst_reg in both branches could have been updated */
    find_equal_scalars(this_branch, dst_reg);
    find_equal_scalars(other_branch, &other_branch_regs[insn->dst_reg]);
}


This keeps find_equal_scalars() for relevant registers very close to
places where those registers are updated, instead of jumping back and
forth between the complicated if  after it, and double-checking under
which circumstances dst_reg can be updated, for example.

>
> > It will make reasoning about this logic
> > easier, IMO. Also, moving reg->id check into find_equal_scalars()
> > would make the above suggestion even cleaner.
>
> I don't think so. I think checking for type == SCALAR && dst_reg->id != 0
> should be done outside of that function. It makes the logic cleaner.
> For the same reason we check type outside of find_good_pkt_pointers().

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next 2/3] bpf: Track spill/fill of bounded scalars.
  2020-10-06 20:09 ` [PATCH bpf-next 2/3] bpf: Track spill/fill of bounded scalars Alexei Starovoitov
@ 2020-10-07  3:35   ` Andrii Nakryiko
  0 siblings, 0 replies; 18+ messages in thread
From: Andrii Nakryiko @ 2020-10-07  3:35 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, john fastabend, Networking,
	bpf, Kernel Team

On Tue, Oct 6, 2020 at 1:10 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> From: Yonghong Song <yhs@fb.com>
>
> Under register pressure the llvm may spill registers with bounds into the stack.
> The verifier has to track them through spill/fill otherwise many kinds of bound
> errors will be seen. The spill/fill of induction variables was already
> happening. This patch extends this logic from tracking spill/fill of a constant
> into any bounded register. There is no need to track spill/fill of unbounded,
> since no new information will be retrieved from the stack during register fill.
>
> Though extra stack difference could cause state pruning to be less effective, no
> adverse affects were seen from this patch on selftests and on cilium programs.
>
> Signed-off-by: Yonghong Song <yhs@fb.com>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---

Acked-by: Andrii Nakryiko <andrii@kernel.org>

>  kernel/bpf/verifier.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
>

[...]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next 1/3] bpf: Propagate scalar ranges through register assignments.
  2020-10-07  3:31       ` Andrii Nakryiko
@ 2020-10-07  4:15         ` Alexei Starovoitov
  2020-10-07  4:42           ` Andrii Nakryiko
  0 siblings, 1 reply; 18+ messages in thread
From: Alexei Starovoitov @ 2020-10-07  4:15 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: David S. Miller, Daniel Borkmann, john fastabend, Networking,
	bpf, Kernel Team

On Tue, Oct 06, 2020 at 08:31:23PM -0700, Andrii Nakryiko wrote:
> 
> > 'linked' is also wrong. The regs are exactly equal.
> > In case of pkt and other pointers two regs will have the same id
> > as well, but they will not be equal. Here these two scalars are equal
> > otherwise doing *reg = *known_reg would be wrong.
> 
> Ok, I guess it also means that "reg->type == SCALAR_VALUE" checks
> below are unnecessary as well, because if known_reg->id matches, that
> means register states are exactly the same.
> > > > +               for (j = 0; j < MAX_BPF_REG; j++) {
> > > > +                       reg = &state->regs[j];
> > > > +                       if (reg->type == SCALAR_VALUE && reg->id == known_reg->id)

Right. The type check is technically unnecessary. It's a safety net in case id
assignment goes wrong plus it makes it easier to understand the logic.

> > > Even if yes, it probably would be more
> > > straightforward to call appropriate updates in the respective if
> > > branches (it's just a single line for each register, so not like it's
> > > duplicating tons of code).
> >
> > You mean inside reg_set_min_max() and inside reg_combine_min_max() ?
> > That won't work because find_equal_scalars() needs access to the whole
> > bpf_verifier_state and not just bpf_reg_state.
> 
> No, I meant something like this, few lines above:
> 
> if (BPF_SRC(insn->code) == BPF_X) {
> 
>     if (dst_reg->type == SCALAR_VALUE && src_reg->type == SCALAR_VALUE) {
>         if (...)
>         else if (...)
>         else
> 
>         /* both src/dst regs in both this/other branches could have
> been updated */
>         find_equal_scalars(this_branch, src_reg);
>         find_equal_scalars(this_branch, dst_reg);
>         find_equal_scalars(other_branch, &other_branch_regs[insn->src_reg])
>         find_equal_scalars(other_branch, &other_branch_regs[insn->dst_reg])
>     }
> } else if (dst_reg->type == SCALAR_VALUE) {
>     reg_set_min_max(...);
> 
>     /* only dst_reg in both branches could have been updated */
>     find_equal_scalars(this_branch, dst_reg);
>     find_equal_scalars(other_branch, &other_branch_regs[insn->dst_reg]);
> }
> 
> 
> This keeps find_equal_scalars() for relevant registers very close to
> places where those registers are updated, instead of jumping back and
> forth between the complicated if  after it, and double-checking under
> which circumstances dst_reg can be updated, for example.

I see it differently.
I don't like moving if (reg->id) into find_equal_scalars(). Otherwise it would
have to be named something like try_find_equal_scalars(). And even with such
"try_" prefix it's still not clean. It's my general dislike of defensive
programming. I prefer all functions to be imperative: "do" vs "try_do".
There are exception from the rule, of course. Like kfree() that accepts NULL.
That's fine.
In this case I think if (type == SCALAR && id != 0) should be done by the caller.
Note that's different from __update_reg_bounds().
There the bounds may or may not change, but the action is performed.
What you're proposing it to make find_equal_scalars() accept any kind
of register and do the action only if argument is actual scalar
and its "id != 0". That's exactly the defensive programming
that I feel make programmers sloppier.
Note that's not the same as mark_reg_unknown() doing
if (WARN_ON(regno >= MAX_BPF_REG)) check. I hope the difference is clear.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next 1/3] bpf: Propagate scalar ranges through register assignments.
  2020-10-07  4:15         ` Alexei Starovoitov
@ 2020-10-07  4:42           ` Andrii Nakryiko
  2020-10-07  5:13             ` Alexei Starovoitov
  0 siblings, 1 reply; 18+ messages in thread
From: Andrii Nakryiko @ 2020-10-07  4:42 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, john fastabend, Networking,
	bpf, Kernel Team

On Tue, Oct 6, 2020 at 9:15 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Oct 06, 2020 at 08:31:23PM -0700, Andrii Nakryiko wrote:
> >
> > > 'linked' is also wrong. The regs are exactly equal.
> > > In case of pkt and other pointers two regs will have the same id
> > > as well, but they will not be equal. Here these two scalars are equal
> > > otherwise doing *reg = *known_reg would be wrong.
> >
> > Ok, I guess it also means that "reg->type == SCALAR_VALUE" checks
> > below are unnecessary as well, because if known_reg->id matches, that
> > means register states are exactly the same.
> > > > > +               for (j = 0; j < MAX_BPF_REG; j++) {
> > > > > +                       reg = &state->regs[j];
> > > > > +                       if (reg->type == SCALAR_VALUE && reg->id == known_reg->id)
>
> Right. The type check is technically unnecessary. It's a safety net in case id
> assignment goes wrong plus it makes it easier to understand the logic.
>
> > > > Even if yes, it probably would be more
> > > > straightforward to call appropriate updates in the respective if
> > > > branches (it's just a single line for each register, so not like it's
> > > > duplicating tons of code).
> > >
> > > You mean inside reg_set_min_max() and inside reg_combine_min_max() ?
> > > That won't work because find_equal_scalars() needs access to the whole
> > > bpf_verifier_state and not just bpf_reg_state.
> >
> > No, I meant something like this, few lines above:
> >
> > if (BPF_SRC(insn->code) == BPF_X) {
> >
> >     if (dst_reg->type == SCALAR_VALUE && src_reg->type == SCALAR_VALUE) {
> >         if (...)
> >         else if (...)
> >         else
> >
> >         /* both src/dst regs in both this/other branches could have
> > been updated */
> >         find_equal_scalars(this_branch, src_reg);
> >         find_equal_scalars(this_branch, dst_reg);
> >         find_equal_scalars(other_branch, &other_branch_regs[insn->src_reg])
> >         find_equal_scalars(other_branch, &other_branch_regs[insn->dst_reg])
> >     }
> > } else if (dst_reg->type == SCALAR_VALUE) {
> >     reg_set_min_max(...);
> >
> >     /* only dst_reg in both branches could have been updated */
> >     find_equal_scalars(this_branch, dst_reg);
> >     find_equal_scalars(other_branch, &other_branch_regs[insn->dst_reg]);
> > }
> >
> >
> > This keeps find_equal_scalars() for relevant registers very close to
> > places where those registers are updated, instead of jumping back and
> > forth between the complicated if  after it, and double-checking under
> > which circumstances dst_reg can be updated, for example.
>
> I see it differently.
> I don't like moving if (reg->id) into find_equal_scalars(). Otherwise it would
> have to be named something like try_find_equal_scalars(). And even with such
> "try_" prefix it's still not clean. It's my general dislike of defensive
> programming. I prefer all functions to be imperative: "do" vs "try_do".
> There are exception from the rule, of course. Like kfree() that accepts NULL.
> That's fine.
> In this case I think if (type == SCALAR && id != 0) should be done by the caller.

There is no need to do (type == SCALAR) check, see pseudo-code above.
In all cases where find_equal_scalars() is called we know already that
register is SCALAR.

As for `if (reg->id)` being moved inside find_equal_scalars(). I
didn't mean it as a defensive measure. It just allows to keep
higher-level logic in check_cond_jmp_op() a bit more linear.

Also, regarding "try_find_equal_scalars". It's not try/attempt to do
this, it's do it, similarly to __update_reg_bounds() you explained
below. It's just known_reg->id == 0 is a guarantee that there are no
other equal registers, so we can skip the work. But of course one can
look at this differently. I just prefer less nested ifs, if it's
possible to avoid them.

But all this is not that important. I suggested, you declined, let's move on.

> Note that's different from __update_reg_bounds().
> There the bounds may or may not change, but the action is performed.
> What you're proposing it to make find_equal_scalars() accept any kind
> of register and do the action only if argument is actual scalar
> and its "id != 0". That's exactly the defensive programming
> that I feel make programmers sloppier.

:) I see a little bit of an irony between this anti-defensive
programming manifesto and "safety net in case id assignment goes
wrong" above.

> Note that's not the same as mark_reg_unknown() doing
> if (WARN_ON(regno >= MAX_BPF_REG)) check. I hope the difference is clear.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next 1/3] bpf: Propagate scalar ranges through register assignments.
  2020-10-07  4:42           ` Andrii Nakryiko
@ 2020-10-07  5:13             ` Alexei Starovoitov
  0 siblings, 0 replies; 18+ messages in thread
From: Alexei Starovoitov @ 2020-10-07  5:13 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: David S. Miller, Daniel Borkmann, john fastabend, Networking,
	bpf, Kernel Team

On Tue, Oct 06, 2020 at 09:42:18PM -0700, Andrii Nakryiko wrote:
> > I see it differently.
> > I don't like moving if (reg->id) into find_equal_scalars(). Otherwise it would
> > have to be named something like try_find_equal_scalars(). And even with such
> > "try_" prefix it's still not clean. It's my general dislike of defensive
> > programming. I prefer all functions to be imperative: "do" vs "try_do".
> > There are exception from the rule, of course. Like kfree() that accepts NULL.
> > That's fine.
> > In this case I think if (type == SCALAR && id != 0) should be done by the caller.
> 
> There is no need to do (type == SCALAR) check, see pseudo-code above.
> In all cases where find_equal_scalars() is called we know already that
> register is SCALAR.
> 
> As for `if (reg->id)` being moved inside find_equal_scalars(). I
> didn't mean it as a defensive measure. It just allows to keep
> higher-level logic in check_cond_jmp_op() a bit more linear.
> 
> Also, regarding "try_find_equal_scalars". It's not try/attempt to do
> this, it's do it, similarly to __update_reg_bounds() you explained
> below. It's just known_reg->id == 0 is a guarantee that there are no
> other equal registers, so we can skip the work. But of course one can
> look at this differently. I just prefer less nested ifs, if it's
> possible to avoid them.
> 
> But all this is not that important. I suggested, you declined, let's move on.
> 
> > Note that's different from __update_reg_bounds().
> > There the bounds may or may not change, but the action is performed.
> > What you're proposing it to make find_equal_scalars() accept any kind
> > of register and do the action only if argument is actual scalar
> > and its "id != 0". That's exactly the defensive programming
> > that I feel make programmers sloppier.
> 
> :) I see a little bit of an irony between this anti-defensive
> programming manifesto and "safety net in case id assignment goes
> wrong" above.
> 
> > Note that's not the same as mark_reg_unknown() doing
> > if (WARN_ON(regno >= MAX_BPF_REG)) check. I hope the difference is clear.

Looks like the difference between defensive programming and safety net checks
was not clear. The safety net in mark_reg_unknown() will be triggered when
things really go wrong. I don't think I've ever seen in production code. I only
saw it during the development when my code was badly broken. That check is to
prevent security issues in case a bug sneaks in. The defensive programming lets
a function accept incorrect arguments. That's normal behavior of such function.
Because of such design choice the programers will routinely pass invalid args.
That's kfree() checking for NULL and the only exception I can remember in the
kernel code base. Arguably NULL is not an invalid value in this case. When
people talk about defensive programming the NULL check is brought up as an
example, but I think it's important to understand it at deeper level.
Letting function accept any register only to
> prefer less nested ifs, if it's possible to avoid them
is the same thing. It's making code sloppier for esthetics of less nested if-s.
There are plenty of projects and people that don't mind such coding style and
find it easier to program. That's a disagreement in coding philosophy. It's ok
to disagree, but it's important to understand those coding differences.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH bpf-next 1/3] bpf: Propagate scalar ranges through register assignments.
  2020-10-06 20:09 ` [PATCH bpf-next 1/3] bpf: Propagate scalar ranges through register assignments Alexei Starovoitov
  2020-10-07  1:56   ` Andrii Nakryiko
@ 2020-10-07 23:44   ` John Fastabend
  2020-10-07 23:55     ` John Fastabend
  1 sibling, 1 reply; 18+ messages in thread
From: John Fastabend @ 2020-10-07 23:44 UTC (permalink / raw)
  To: Alexei Starovoitov, davem
  Cc: daniel, john.fastabend, netdev, bpf, kernel-team

Alexei Starovoitov wrote:
> From: Alexei Starovoitov <ast@kernel.org>
> 
> The llvm register allocator may use two different registers representing the
> same virtual register. In such case the following pattern can be observed:
> 1047: (bf) r9 = r6
> 1048: (a5) if r6 < 0x1000 goto pc+1
> 1050: ...
> 1051: (a5) if r9 < 0x2 goto pc+66
> 1052: ...
> 1053: (bf) r2 = r9 /* r2 needs to have upper and lower bounds */
> 
> In order to track this information without backtracking allocate ID
> for scalars in a similar way as it's done for find_good_pkt_pointers().
> 
> When the verifier encounters r9 = r6 assignment it will assign the same ID
> to both registers. Later if either register range is narrowed via conditional
> jump propagate the register state into the other register.
> 
> Clear register ID in adjust_reg_min_max_vals() for any alu instruction.

Do we also need to clear the register ID on reg0 for CALL ops into a
helper?

Looks like check_helper_call might mark reg0 as a scalar, but I don't
see where it would clear the reg->id? Did I miss it. Either way maybe
a comment here would help make it obvious how CALLs are handled?

Thanks,
John

> 
> Newly allocated register ID is ignored for scalars in regsafe() and doesn't
> affect state pruning. mark_reg_unknown() also clears the ID.
> 
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---
>  kernel/bpf/verifier.c                         | 38 +++++++++++++++++++
>  .../testing/selftests/bpf/prog_tests/align.c  | 16 ++++----
>  .../bpf/verifier/direct_packet_access.c       |  2 +-
>  3 files changed, 47 insertions(+), 9 deletions(-)
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 01120acab09a..09e17b483b0b 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -6432,6 +6432,8 @@ static int adjust_reg_min_max_vals(struct bpf_verifier_env *env,
>  	src_reg = NULL;
>  	if (dst_reg->type != SCALAR_VALUE)
>  		ptr_reg = dst_reg;
> +	else
> +		dst_reg->id = 0;
>  	if (BPF_SRC(insn->code) == BPF_X) {
>  		src_reg = &regs[insn->src_reg];
>  		if (src_reg->type != SCALAR_VALUE) {
> @@ -6565,6 +6567,8 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
>  				/* case: R1 = R2
>  				 * copy register state to dest reg
>  				 */
> +				if (src_reg->type == SCALAR_VALUE)
> +					src_reg->id = ++env->id_gen;
>  				*dst_reg = *src_reg;
>  				dst_reg->live |= REG_LIVE_WRITTEN;
>  				dst_reg->subreg_def = DEF_NOT_SUBREG;
> @@ -7365,6 +7369,30 @@ static bool try_match_pkt_pointers(const struct bpf_insn *insn,
>  	return true;
>  }

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH bpf-next 1/3] bpf: Propagate scalar ranges through register assignments.
  2020-10-07 23:44   ` John Fastabend
@ 2020-10-07 23:55     ` John Fastabend
  2020-10-08  1:45       ` Alexei Starovoitov
  0 siblings, 1 reply; 18+ messages in thread
From: John Fastabend @ 2020-10-07 23:55 UTC (permalink / raw)
  To: John Fastabend, Alexei Starovoitov, davem
  Cc: daniel, john.fastabend, netdev, bpf, kernel-team

John Fastabend wrote:
> Alexei Starovoitov wrote:
> > From: Alexei Starovoitov <ast@kernel.org>
> > 
> > The llvm register allocator may use two different registers representing the
> > same virtual register. In such case the following pattern can be observed:
> > 1047: (bf) r9 = r6
> > 1048: (a5) if r6 < 0x1000 goto pc+1
> > 1050: ...
> > 1051: (a5) if r9 < 0x2 goto pc+66
> > 1052: ...
> > 1053: (bf) r2 = r9 /* r2 needs to have upper and lower bounds */
> > 
> > In order to track this information without backtracking allocate ID
> > for scalars in a similar way as it's done for find_good_pkt_pointers().
> > 
> > When the verifier encounters r9 = r6 assignment it will assign the same ID
> > to both registers. Later if either register range is narrowed via conditional
> > jump propagate the register state into the other register.
> > 
> > Clear register ID in adjust_reg_min_max_vals() for any alu instruction.
> 
> Do we also need to clear the register ID on reg0 for CALL ops into a
> helper?
> 
> Looks like check_helper_call might mark reg0 as a scalar, but I don't
> see where it would clear the reg->id? Did I miss it. Either way maybe
> a comment here would help make it obvious how CALLs are handled?
> 
> Thanks,
> John

OK sorry for the noise found it right after hitting send. Any call to
mark_reg_unknown will zero the id.


/* Mark a register as having a completely unknown (scalar) value. */
static void __mark_reg_unknown(const struct bpf_verifier_env *env,
			       struct bpf_reg_state *reg)
{
	/*
	 * Clear type, id, off, and union(map_ptr, range) and
	 * padding between 'type' and union
	 */
	memset(reg, 0, offsetof(struct bpf_reg_state, var_off));


And check_helper_call() does,

	/* update return register (already marked as written above) */
	if (fn->ret_type == RET_INTEGER) {
		/* sets type to SCALAR_VALUE */
		mark_reg_unknown(env, regs, BPF_REG_0);

so looks good to me. In the check_func_call() case the if is_global
branch will mark_reg_unknown(). The other case only seems to do a
clear_caller_saved_regs though. Is that enough?

.John


> 
> > 
> > Newly allocated register ID is ignored for scalars in regsafe() and doesn't
> > affect state pruning. mark_reg_unknown() also clears the ID.
> > 
> > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > ---
> >  kernel/bpf/verifier.c                         | 38 +++++++++++++++++++
> >  .../testing/selftests/bpf/prog_tests/align.c  | 16 ++++----
> >  .../bpf/verifier/direct_packet_access.c       |  2 +-
> >  3 files changed, 47 insertions(+), 9 deletions(-)
> > 
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 01120acab09a..09e17b483b0b 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -6432,6 +6432,8 @@ static int adjust_reg_min_max_vals(struct bpf_verifier_env *env,
> >  	src_reg = NULL;
> >  	if (dst_reg->type != SCALAR_VALUE)
> >  		ptr_reg = dst_reg;
> > +	else
> > +		dst_reg->id = 0;
> >  	if (BPF_SRC(insn->code) == BPF_X) {
> >  		src_reg = &regs[insn->src_reg];
> >  		if (src_reg->type != SCALAR_VALUE) {
> > @@ -6565,6 +6567,8 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
> >  				/* case: R1 = R2
> >  				 * copy register state to dest reg
> >  				 */
> > +				if (src_reg->type == SCALAR_VALUE)
> > +					src_reg->id = ++env->id_gen;
> >  				*dst_reg = *src_reg;
> >  				dst_reg->live |= REG_LIVE_WRITTEN;
> >  				dst_reg->subreg_def = DEF_NOT_SUBREG;
> > @@ -7365,6 +7369,30 @@ static bool try_match_pkt_pointers(const struct bpf_insn *insn,
> >  	return true;
> >  }



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next 1/3] bpf: Propagate scalar ranges through register assignments.
  2020-10-07 23:55     ` John Fastabend
@ 2020-10-08  1:45       ` Alexei Starovoitov
  2020-10-08 15:18         ` John Fastabend
  0 siblings, 1 reply; 18+ messages in thread
From: Alexei Starovoitov @ 2020-10-08  1:45 UTC (permalink / raw)
  To: John Fastabend; +Cc: davem, daniel, netdev, bpf, kernel-team

On Wed, Oct 07, 2020 at 04:55:24PM -0700, John Fastabend wrote:
> John Fastabend wrote:
> > Alexei Starovoitov wrote:
> > > From: Alexei Starovoitov <ast@kernel.org>
> > > 
> > > The llvm register allocator may use two different registers representing the
> > > same virtual register. In such case the following pattern can be observed:
> > > 1047: (bf) r9 = r6
> > > 1048: (a5) if r6 < 0x1000 goto pc+1
> > > 1050: ...
> > > 1051: (a5) if r9 < 0x2 goto pc+66
> > > 1052: ...
> > > 1053: (bf) r2 = r9 /* r2 needs to have upper and lower bounds */
> > > 
> > > In order to track this information without backtracking allocate ID
> > > for scalars in a similar way as it's done for find_good_pkt_pointers().
> > > 
> > > When the verifier encounters r9 = r6 assignment it will assign the same ID
> > > to both registers. Later if either register range is narrowed via conditional
> > > jump propagate the register state into the other register.
> > > 
> > > Clear register ID in adjust_reg_min_max_vals() for any alu instruction.
> > 
> > Do we also need to clear the register ID on reg0 for CALL ops into a
> > helper?

Thank you for asking all those questions. Much appreciate it!

> > 
> > Looks like check_helper_call might mark reg0 as a scalar, but I don't
> > see where it would clear the reg->id? Did I miss it. Either way maybe
> > a comment here would help make it obvious how CALLs are handled?
> > 
> > Thanks,
> > John
> 
> OK sorry for the noise found it right after hitting send. Any call to
> mark_reg_unknown will zero the id.


Right. The verifier uses mark_reg_unknown() in lots of places,
so I figured it doesn't make sense to list them all.

> 
> /* Mark a register as having a completely unknown (scalar) value. */
> static void __mark_reg_unknown(const struct bpf_verifier_env *env,
> 			       struct bpf_reg_state *reg)
> {
> 	/*
> 	 * Clear type, id, off, and union(map_ptr, range) and
> 	 * padding between 'type' and union
> 	 */
> 	memset(reg, 0, offsetof(struct bpf_reg_state, var_off));

Excatly and the comment mentions 'id' too.

> 
> And check_helper_call() does,
> 
> 	/* update return register (already marked as written above) */
> 	if (fn->ret_type == RET_INTEGER) {
> 		/* sets type to SCALAR_VALUE */
> 		mark_reg_unknown(env, regs, BPF_REG_0);
> 
> so looks good to me. In the check_func_call() case the if is_global
> branch will mark_reg_unknown(). The other case only seems to do a
> clear_caller_saved_regs though. Is that enough?

clear_caller_saved_regs() -> mark_reg_not_init() -> __mark_reg_unknown().

I couldn't think of any other case where scalar's ID has to be cleared.
Any kind of assignment and r0 return do it as well.

We can clear id in r6 - r10 when we call a helper, but that's a bit
paranoid, since the registers are still valid and still equal.
Like:
r6 = r7
call foo
// after the call
if r6 > 5 goto
if r7 < 2 goto
// here both r6 and r7 will have bounds

I think it's good for the verifier to support that.

The other case with calls:

r1 = r2
call foo
  // and now inside the callee
  if r1 > 5 goto
  if r2 < 2 goto
  // here both r1 and r2 will have bounds

This case will also work.

Both cases are artificial and the verifier doesn't have to be that
smart, but it doesn't hurt and I don't think it's worth to restrict.

I'll add two synthetic tests for these cases.

Any other case you can think of ?
I think some time in the past you've mentioned that you hit
exactly this greedy register alloc issue in your cilium programs.
Is it the case or am I misremembering?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next 1/3] bpf: Propagate scalar ranges through register assignments.
  2020-10-08  1:45       ` Alexei Starovoitov
@ 2020-10-08 15:18         ` John Fastabend
  2020-10-08 15:53           ` Alexei Starovoitov
  0 siblings, 1 reply; 18+ messages in thread
From: John Fastabend @ 2020-10-08 15:18 UTC (permalink / raw)
  To: Alexei Starovoitov, John Fastabend
  Cc: davem, daniel, netdev, bpf, kernel-team

Alexei Starovoitov wrote:
> On Wed, Oct 07, 2020 at 04:55:24PM -0700, John Fastabend wrote:
> > John Fastabend wrote:
> > > Alexei Starovoitov wrote:
> > > > From: Alexei Starovoitov <ast@kernel.org>
> > > > 
> > > > The llvm register allocator may use two different registers representing the
> > > > same virtual register. In such case the following pattern can be observed:
> > > > 1047: (bf) r9 = r6
> > > > 1048: (a5) if r6 < 0x1000 goto pc+1
> > > > 1050: ...
> > > > 1051: (a5) if r9 < 0x2 goto pc+66
> > > > 1052: ...
> > > > 1053: (bf) r2 = r9 /* r2 needs to have upper and lower bounds */
> > > > 
> > > > In order to track this information without backtracking allocate ID
> > > > for scalars in a similar way as it's done for find_good_pkt_pointers().
> > > > 
> > > > When the verifier encounters r9 = r6 assignment it will assign the same ID
> > > > to both registers. Later if either register range is narrowed via conditional
> > > > jump propagate the register state into the other register.
> > > > 
> > > > Clear register ID in adjust_reg_min_max_vals() for any alu instruction.
> > > 
> > > Do we also need to clear the register ID on reg0 for CALL ops into a
> > > helper?
> 
> Thank you for asking all those questions. Much appreciate it!
> 
> > > 
> > > Looks like check_helper_call might mark reg0 as a scalar, but I don't
> > > see where it would clear the reg->id? Did I miss it. Either way maybe
> > > a comment here would help make it obvious how CALLs are handled?
> > > 
> > > Thanks,
> > > John
> > 
> > OK sorry for the noise found it right after hitting send. Any call to
> > mark_reg_unknown will zero the id.
> 
> 
> Right. The verifier uses mark_reg_unknown() in lots of places,
> so I figured it doesn't make sense to list them all.

Right.

> 
> > 
> > /* Mark a register as having a completely unknown (scalar) value. */
> > static void __mark_reg_unknown(const struct bpf_verifier_env *env,
> > 			       struct bpf_reg_state *reg)
> > {
> > 	/*
> > 	 * Clear type, id, off, and union(map_ptr, range) and
> > 	 * padding between 'type' and union
> > 	 */
> > 	memset(reg, 0, offsetof(struct bpf_reg_state, var_off));
> 
> Excatly and the comment mentions 'id' too.

Yep.

> 
> > 
> > And check_helper_call() does,
> > 
> > 	/* update return register (already marked as written above) */
> > 	if (fn->ret_type == RET_INTEGER) {
> > 		/* sets type to SCALAR_VALUE */
> > 		mark_reg_unknown(env, regs, BPF_REG_0);
> > 
> > so looks good to me. In the check_func_call() case the if is_global
> > branch will mark_reg_unknown(). The other case only seems to do a
> > clear_caller_saved_regs though. Is that enough?
> 
> clear_caller_saved_regs() -> mark_reg_not_init() -> __mark_reg_unknown().

+1

> 
> I couldn't think of any other case where scalar's ID has to be cleared.
> Any kind of assignment and r0 return do it as well.

How about a zero extending move?

 r1 = r2 <- r1.id = r2.id
 w1 = w1

that will narrow the bounds on r1 but r2 should not be narrowed? So
we need to zero the r1.id I believe. But, I don't see where we
would set r1.id = 0 in this case.

> 
> We can clear id in r6 - r10 when we call a helper, but that's a bit
> paranoid, since the registers are still valid and still equal.
> Like:
> r6 = r7
> call foo
> // after the call
> if r6 > 5 goto
> if r7 < 2 goto
> // here both r6 and r7 will have bounds
> 
> I think it's good for the verifier to support that.
> 
> The other case with calls:
> 
> r1 = r2
> call foo
>   // and now inside the callee
>   if r1 > 5 goto
>   if r2 < 2 goto
>   // here both r1 and r2 will have bounds
> 
> This case will also work.
> 
> Both cases are artificial and the verifier doesn't have to be that
> smart, but it doesn't hurt and I don't think it's worth to restrict.

Agree I don't see any advantage to restrict above. I think adding
the restriction would just make it harder to follow.

> 
> I'll add two synthetic tests for these cases.

Thanks.

> 
> Any other case you can think of ?

Still churning on the above zero extending move. Also I thought
it was a bit odd that this wouldn't work,

 r1 = r2
 r0 = r1
 if r0 < 2 goto ...

then r0.id != r2.id because a new id is generated on the second
mov there. I don't actually care that much because I can't recall
seeing this pattern.

> I think some time in the past you've mentioned that you hit
> exactly this greedy register alloc issue in your cilium programs.
> Is it the case or am I misremembering?

Yes, I hit this a lot actually for whatever reason. Something
about the code I write maybe. It also tends to be inside a loop
so messing with volatiles doesn't help. End result is I get
a handful of small asm blocks to force compiler into generating
code the verifier doesn't trip up on. I was going to add I think
the cover letter understates how much this should help.

I still need to try some of Yonghong's latest patches maybe I'll
push this patch on my stack as well and see how much asm I can
delete.

Thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next 1/3] bpf: Propagate scalar ranges through register assignments.
  2020-10-08 15:18         ` John Fastabend
@ 2020-10-08 15:53           ` Alexei Starovoitov
  0 siblings, 0 replies; 18+ messages in thread
From: Alexei Starovoitov @ 2020-10-08 15:53 UTC (permalink / raw)
  To: John Fastabend; +Cc: davem, daniel, netdev, bpf, kernel-team

On Thu, Oct 08, 2020 at 08:18:46AM -0700, John Fastabend wrote:
> > 
> > I couldn't think of any other case where scalar's ID has to be cleared.
> > Any kind of assignment and r0 return do it as well.
> 
> How about a zero extending move?
> 
>  r1 = r2 <- r1.id = r2.id
>  w1 = w1
> 
> that will narrow the bounds on r1 but r2 should not be narrowed? So
> we need to zero the r1.id I believe. But, I don't see where we
> would set r1.id = 0 in this case.

Excellent catch! Indeed. id should be cleared for 32-bit move.
Will fix.

> > 
> > Any other case you can think of ?
> 
> Still churning on the above zero extending move. Also I thought
> it was a bit odd that this wouldn't work,
> 
>  r1 = r2
>  r0 = r1
>  if r0 < 2 goto ...
> 
> then r0.id != r2.id because a new id is generated on the second
> mov there. I don't actually care that much because I can't recall
> seeing this pattern.

Right. Since it's easy to support this case I'll add it as well.
Though I also never seen llvm generate the code like this and I don't
think it will based on my understanding of regalloc.

> > I think some time in the past you've mentioned that you hit
> > exactly this greedy register alloc issue in your cilium programs.
> > Is it the case or am I misremembering?
> 
> Yes, I hit this a lot actually for whatever reason. Something
> about the code I write maybe. It also tends to be inside a loop
> so messing with volatiles doesn't help. End result is I get
> a handful of small asm blocks to force compiler into generating
> code the verifier doesn't trip up on. I was going to add I think
> the cover letter understates how much this should help.

Yeah. We also see such patterns only inside the loops with large
loop bodies, and especially in unrolled loops.
My understanding is that this is normal behavior of the greedy register
allocator that introduces register copy for the split ranges.
Yonghong sent me that link that explains algorithm in details:
http://llvm.org/devmtg/2018-04/slides/Yatsina-LLVM%20Greedy%20Register%20Allocator.pdf
The slide 137 and following slides explain exactly this scenario.

In other words there is no way to tell llvm 'not to do this',
so we have to improve the verifier smartness in such case.

I'll add these details to commit log.

> I still need to try some of Yonghong's latest patches maybe I'll
> push this patch on my stack as well and see how much asm I can
> delete.

The 2 out of 3 patches already landed. Please pull the latest llvm master.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-10-08 15:54 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-06 20:09 [PATCH bpf-next 0/3] bpf: Make the verifier recognize llvm register allocation patterns Alexei Starovoitov
2020-10-06 20:09 ` [PATCH bpf-next 1/3] bpf: Propagate scalar ranges through register assignments Alexei Starovoitov
2020-10-07  1:56   ` Andrii Nakryiko
2020-10-07  2:18     ` Alexei Starovoitov
2020-10-07  3:31       ` Andrii Nakryiko
2020-10-07  4:15         ` Alexei Starovoitov
2020-10-07  4:42           ` Andrii Nakryiko
2020-10-07  5:13             ` Alexei Starovoitov
2020-10-07 23:44   ` John Fastabend
2020-10-07 23:55     ` John Fastabend
2020-10-08  1:45       ` Alexei Starovoitov
2020-10-08 15:18         ` John Fastabend
2020-10-08 15:53           ` Alexei Starovoitov
2020-10-06 20:09 ` [PATCH bpf-next 2/3] bpf: Track spill/fill of bounded scalars Alexei Starovoitov
2020-10-07  3:35   ` Andrii Nakryiko
2020-10-06 20:09 ` [PATCH bpf-next 3/3] selftests/bpf: Add profiler test Alexei Starovoitov
2020-10-07  1:22   ` Jakub Kicinski
2020-10-07  1:35     ` Alexei Starovoitov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.