All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 bpf-next 0/4] bpf: Introduce may_goto and cond_break
@ 2024-03-05  4:52 Alexei Starovoitov
  2024-03-05  4:52 ` [PATCH v5 bpf-next 1/4] bpf: Introduce may_goto instruction Alexei Starovoitov
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Alexei Starovoitov @ 2024-03-05  4:52 UTC (permalink / raw)
  To: bpf
  Cc: daniel, andrii, martin.lau, memxor, eddyz87, john.fastabend, kernel-team

From: Alexei Starovoitov <ast@kernel.org>

v4 -> v5:
- rewrote patch 1 to avoid fake may_goto_reg and use 'u32 may_goto_cnt' instead.
  This way may_goto handling is similar to bpf_loop() processing.
- fixed bug in patch 2 that RANGE_WITHIN should not use
  rold->type == NOT_INIT as a safe signal.
- patch 3 fixed negative offset computation in cond_break macro
- using bpf_arena and cond_break recompiled lib/glob.c as bpf prog
  and it works! It will be added as a selftest to arena series.

v3 -> v4:
- fix drained issue reported by John.
  may_goto insn could be implemented with sticky state (once
  reaches 0 it stays 0), but the verifier shouldn't assume that.
  It has to explore both branches.
  Arguably drained iterator state shouldn't be there at all.
  bpf_iter_css_next() is not sticky. Can be fixed, but auditing all
  iterators for stickiness. That's an orthogonal discussion.
- explained JMA name reasons in patch 1
- fixed test_progs-no_alu32 issue and added another test

v2 -> v3: Major change
- drop bpf_can_loop() kfunc and introduce may_goto instruction instead
  kfunc is a function call while may_goto doesn't consume any registers
  and LLVM can produce much better code due to less register pressure.
- instead of counting from zero to BPF_MAX_LOOPS start from it instead
  and break out of the loop when count reaches zero
- use may_goto instruction in cond_break macro
- recognize that 'exact' state comparison doesn't need to be truly exact.
  regsafe() should ignore precision and liveness marks, but range_within
  logic is safe to use while evaluating open coded iterators.

Alexei Starovoitov (4):
  bpf: Introduce may_goto instruction
  bpf: Recognize that two registers are safe when their ranges match
  bpf: Add cond_break macro
  selftests/bpf: Test may_goto

 include/linux/bpf_verifier.h                  |   2 +
 include/uapi/linux/bpf.h                      |   1 +
 kernel/bpf/core.c                             |   1 +
 kernel/bpf/disasm.c                           |   3 +
 kernel/bpf/verifier.c                         | 193 +++++++++++++-----
 tools/include/uapi/linux/bpf.h                |   1 +
 tools/testing/selftests/bpf/DENYLIST.s390x    |   1 +
 .../testing/selftests/bpf/bpf_experimental.h  |  12 ++
 .../bpf/progs/verifier_iterating_callbacks.c  | 103 +++++++++-
 9 files changed, 268 insertions(+), 49 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v5 bpf-next 1/4] bpf: Introduce may_goto instruction
  2024-03-05  4:52 [PATCH v5 bpf-next 0/4] bpf: Introduce may_goto and cond_break Alexei Starovoitov
@ 2024-03-05  4:52 ` Alexei Starovoitov
  2024-03-05 18:37   ` Andrii Nakryiko
  2024-03-05  4:52 ` [PATCH v5 bpf-next 2/4] bpf: Recognize that two registers are safe when their ranges match Alexei Starovoitov
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Alexei Starovoitov @ 2024-03-05  4:52 UTC (permalink / raw)
  To: bpf
  Cc: daniel, andrii, martin.lau, memxor, eddyz87, john.fastabend, kernel-team

From: Alexei Starovoitov <ast@kernel.org>

Introduce may_goto instruction that acts on a hidden bpf_iter_num, so that
bpf_iter_num_new(), bpf_iter_num_destroy() don't need to be called explicitly.
It can be used in any normal "for" or "while" loop, like

  for (i = zero; i < cnt; cond_break, i++) {

The verifier recognizes that may_goto is used in the program,
reserves additional 8 bytes of stack, initializes them in subprog
prologue, and replaces may_goto instruction with:
aux_reg = *(u64 *)(fp - 40)
if aux_reg == 0 goto pc+off
aux_reg += 1
*(u64 *)(fp - 40) = aux_reg

may_goto instruction can be used by LLVM to implement __builtin_memcpy,
__builtin_strcmp.

may_goto is not a full substitute for bpf_for() macro.
bpf_for() doesn't have induction variable that verifiers sees,
so 'i' in bpf_for(i, 0, 100) is seen as imprecise and bounded.

But when the code is written as:
for (i = 0; i < 100; cond_break, i++)
the verifier see 'i' as precise constant zero,
hence cond_break (aka may_goto) doesn't help to converge the loop.
A static or global variable can be used as a workaround:
static int zero = 0;
for (i = zero; i < 100; cond_break, i++) // works!

may_goto works well with arena pointers that don't need to be bounds-checked
on every iteration. Load/store from arena returns imprecise unbounded scalars.

Reserve new opcode BPF_JMP | BPF_JMA for may_goto insn.
JMA stands for "jump maybe", and "jump multipurpose", and "jump multi always".
Since goto_or_nop insn was proposed, it may use the same opcode.
may_goto vs goto_or_nop can be distinguished by src_reg:
code = BPF_JMP | BPF_JMA:
src_reg = 0 - may_goto
src_reg = 1 - goto_or_nop
We could have reused BPF_JMP | BPF_JA like:
src_reg = 0 - normal goto
src_reg = 1 - may_goto
src_reg = 2 - goto_or_nop
but JA is a real insn and it's unconditional, while may_goto and goto_or_nop
are pseudo instructions, and both are conditional. Hence it's better to
have a different opcode for them. Hence BPF_JMA.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 include/linux/bpf_verifier.h   |   2 +
 include/uapi/linux/bpf.h       |   1 +
 kernel/bpf/core.c              |   1 +
 kernel/bpf/disasm.c            |   3 +
 kernel/bpf/verifier.c          | 156 ++++++++++++++++++++++++++-------
 tools/include/uapi/linux/bpf.h |   1 +
 6 files changed, 134 insertions(+), 30 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 84365e6dd85d..917ca603059b 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -449,6 +449,7 @@ struct bpf_verifier_state {
 	u32 jmp_history_cnt;
 	u32 dfs_depth;
 	u32 callback_unroll_depth;
+	u32 may_goto_cnt;
 };
 
 #define bpf_get_spilled_reg(slot, frame, mask)				\
@@ -619,6 +620,7 @@ struct bpf_subprog_info {
 	u32 start; /* insn idx of function entry point */
 	u32 linfo_idx; /* The idx to the main_prog->aux->linfo */
 	u16 stack_depth; /* max. stack depth used by this function */
+	u16 stack_extra;
 	bool has_tail_call: 1;
 	bool tail_call_reachable: 1;
 	bool has_ld_abs: 1;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index a241f407c234..932ffef0dc88 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -42,6 +42,7 @@
 #define BPF_JSGE	0x70	/* SGE is signed '>=', GE in x86 */
 #define BPF_JSLT	0xc0	/* SLT is signed, '<' */
 #define BPF_JSLE	0xd0	/* SLE is signed, '<=' */
+#define BPF_JMA		0xe0	/* may_goto */
 #define BPF_CALL	0x80	/* function call */
 #define BPF_EXIT	0x90	/* function return */
 
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 71c459a51d9e..ba6101447b49 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1675,6 +1675,7 @@ bool bpf_opcode_in_insntable(u8 code)
 		[BPF_LD | BPF_IND | BPF_B] = true,
 		[BPF_LD | BPF_IND | BPF_H] = true,
 		[BPF_LD | BPF_IND | BPF_W] = true,
+		[BPF_JMP | BPF_JMA] = true,
 	};
 #undef BPF_INSN_3_TBL
 #undef BPF_INSN_2_TBL
diff --git a/kernel/bpf/disasm.c b/kernel/bpf/disasm.c
index 49940c26a227..598cd38af84c 100644
--- a/kernel/bpf/disasm.c
+++ b/kernel/bpf/disasm.c
@@ -322,6 +322,9 @@ void print_bpf_insn(const struct bpf_insn_cbs *cbs,
 		} else if (insn->code == (BPF_JMP | BPF_JA)) {
 			verbose(cbs->private_data, "(%02x) goto pc%+d\n",
 				insn->code, insn->off);
+		} else if (insn->code == (BPF_JMP | BPF_JMA)) {
+			verbose(cbs->private_data, "(%02x) may_goto pc%+d\n",
+				insn->code, insn->off);
 		} else if (insn->code == (BPF_JMP32 | BPF_JA)) {
 			verbose(cbs->private_data, "(%02x) gotol pc%+d\n",
 				insn->code, insn->imm);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 4dd84e13bbfe..226bb65f9c2c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1429,6 +1429,7 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state,
 	dst_state->dfs_depth = src->dfs_depth;
 	dst_state->callback_unroll_depth = src->callback_unroll_depth;
 	dst_state->used_as_loop_entry = src->used_as_loop_entry;
+	dst_state->may_goto_cnt = src->may_goto_cnt;
 	for (i = 0; i <= src->curframe; i++) {
 		dst = dst_state->frame[i];
 		if (!dst) {
@@ -7880,6 +7881,11 @@ static int widen_imprecise_scalars(struct bpf_verifier_env *env,
 	return 0;
 }
 
+static bool is_may_goto_insn(struct bpf_verifier_env *env, int insn_idx)
+{
+	return env->prog->insnsi[insn_idx].code == (BPF_JMP | BPF_JMA);
+}
+
 /* process_iter_next_call() is called when verifier gets to iterator's next
  * "method" (e.g., bpf_iter_num_next() for numbers iterator) call. We'll refer
  * to it as just "iter_next()" in comments below.
@@ -14871,11 +14877,35 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
 	int err;
 
 	/* Only conditional jumps are expected to reach here. */
-	if (opcode == BPF_JA || opcode > BPF_JSLE) {
+	if (opcode == BPF_JA || opcode > BPF_JMA) {
 		verbose(env, "invalid BPF_JMP/JMP32 opcode %x\n", opcode);
 		return -EINVAL;
 	}
 
+	if (opcode == BPF_JMA) {
+		struct bpf_verifier_state *cur_st = env->cur_state, *queued_st, *prev_st;
+		int idx = *insn_idx;
+
+		if (insn->code != (BPF_JMP | BPF_JMA) ||
+		    insn->src_reg || insn->dst_reg || insn->imm || insn->off == 0) {
+			verbose(env, "invalid may_goto off %d imm %d\n",
+				insn->off, insn->imm);
+			return -EINVAL;
+		}
+		prev_st = find_prev_entry(env, cur_st->parent, idx);
+
+		/* branch out 'fallthrough' insn as a new state to explore */
+		queued_st = push_stack(env, idx + 1, idx, false);
+		if (!queued_st)
+			return -ENOMEM;
+
+		queued_st->may_goto_cnt++;
+		if (prev_st)
+			widen_imprecise_scalars(env, prev_st, queued_st);
+		*insn_idx += insn->off;
+		return 0;
+	}
+
 	/* check src2 operand */
 	err = check_reg_arg(env, insn->dst_reg, SRC_OP);
 	if (err)
@@ -15659,6 +15689,8 @@ static int visit_insn(int t, struct bpf_verifier_env *env)
 	default:
 		/* conditional jump with two edges */
 		mark_prune_point(env, t);
+		if (insn->code == (BPF_JMP | BPF_JMA))
+			mark_force_checkpoint(env, t);
 
 		ret = push_insn(t, t + 1, FALLTHROUGH, env);
 		if (ret)
@@ -17135,6 +17167,13 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
 				}
 				goto skip_inf_loop_check;
 			}
+			if (is_may_goto_insn(env, insn_idx)) {
+				if (states_equal(env, &sl->state, cur, true)) {
+					update_loop_entry(cur, &sl->state);
+					goto hit;
+				}
+				goto skip_inf_loop_check;
+			}
 			if (calls_callback(env, insn_idx)) {
 				if (states_equal(env, &sl->state, cur, true))
 					goto hit;
@@ -17144,6 +17183,7 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
 			if (states_maybe_looping(&sl->state, cur) &&
 			    states_equal(env, &sl->state, cur, true) &&
 			    !iter_active_depths_differ(&sl->state, cur) &&
+			    sl->state.may_goto_cnt == cur->may_goto_cnt &&
 			    sl->state.callback_unroll_depth == cur->callback_unroll_depth) {
 				verbose_linfo(env, insn_idx, "; ");
 				verbose(env, "infinite loop detected at insn %d\n", insn_idx);
@@ -19408,7 +19448,10 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 	struct bpf_insn insn_buf[16];
 	struct bpf_prog *new_prog;
 	struct bpf_map *map_ptr;
-	int i, ret, cnt, delta = 0;
+	int i, ret, cnt, delta = 0, cur_subprog = 0;
+	struct bpf_subprog_info *subprogs = env->subprog_info;
+	u16 stack_depth = subprogs[cur_subprog].stack_depth;
+	u16 stack_depth_extra = 0;
 
 	if (env->seen_exception && !env->exception_callback_subprog) {
 		struct bpf_insn patch[] = {
@@ -19428,7 +19471,7 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 		mark_subprog_exc_cb(env, env->exception_callback_subprog);
 	}
 
-	for (i = 0; i < insn_cnt; i++, insn++) {
+	for (i = 0; i < insn_cnt;) {
 		/* Make divide-by-zero exceptions impossible. */
 		if (insn->code == (BPF_ALU64 | BPF_MOD | BPF_X) ||
 		    insn->code == (BPF_ALU64 | BPF_DIV | BPF_X) ||
@@ -19467,7 +19510,7 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 			delta    += cnt - 1;
 			env->prog = prog = new_prog;
 			insn      = new_prog->insnsi + i + delta;
-			continue;
+			goto next_insn;
 		}
 
 		/* Implement LD_ABS and LD_IND with a rewrite, if supported by the program type. */
@@ -19487,7 +19530,7 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 			delta    += cnt - 1;
 			env->prog = prog = new_prog;
 			insn      = new_prog->insnsi + i + delta;
-			continue;
+			goto next_insn;
 		}
 
 		/* Rewrite pointer arithmetic to mitigate speculation attacks. */
@@ -19502,7 +19545,7 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 			aux = &env->insn_aux_data[i + delta];
 			if (!aux->alu_state ||
 			    aux->alu_state == BPF_ALU_NON_POINTER)
-				continue;
+				goto next_insn;
 
 			isneg = aux->alu_state & BPF_ALU_NEG_VALUE;
 			issrc = (aux->alu_state & BPF_ALU_SANITIZE) ==
@@ -19540,19 +19583,39 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 			delta    += cnt - 1;
 			env->prog = prog = new_prog;
 			insn      = new_prog->insnsi + i + delta;
-			continue;
+			goto next_insn;
+		}
+
+		if (insn->code == (BPF_JMP | BPF_JMA)) {
+			int stack_off = -stack_depth - 8;
+
+			stack_depth_extra = 8;
+			insn_buf[0] = BPF_LDX_MEM(BPF_DW, BPF_REG_AX, BPF_REG_10, stack_off);
+			insn_buf[1] = BPF_JMP_IMM(BPF_JEQ, BPF_REG_AX, 0, insn->off + 2);
+			insn_buf[2] = BPF_ALU64_IMM(BPF_SUB, BPF_REG_AX, 1);
+			insn_buf[3] = BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_AX, stack_off);
+			cnt = 4;
+
+			new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
+			if (!new_prog)
+				return -ENOMEM;
+
+			delta += cnt - 1;
+			env->prog = prog = new_prog;
+			insn = new_prog->insnsi + i + delta;
+			goto next_insn;
 		}
 
 		if (insn->code != (BPF_JMP | BPF_CALL))
-			continue;
+			goto next_insn;
 		if (insn->src_reg == BPF_PSEUDO_CALL)
-			continue;
+			goto next_insn;
 		if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL) {
 			ret = fixup_kfunc_call(env, insn, insn_buf, i + delta, &cnt);
 			if (ret)
 				return ret;
 			if (cnt == 0)
-				continue;
+				goto next_insn;
 
 			new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
 			if (!new_prog)
@@ -19561,7 +19624,7 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 			delta	 += cnt - 1;
 			env->prog = prog = new_prog;
 			insn	  = new_prog->insnsi + i + delta;
-			continue;
+			goto next_insn;
 		}
 
 		if (insn->imm == BPF_FUNC_get_route_realm)
@@ -19609,11 +19672,11 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 				}
 
 				insn->imm = ret + 1;
-				continue;
+				goto next_insn;
 			}
 
 			if (!bpf_map_ptr_unpriv(aux))
-				continue;
+				goto next_insn;
 
 			/* instead of changing every JIT dealing with tail_call
 			 * emit two extra insns:
@@ -19642,7 +19705,7 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 			delta    += cnt - 1;
 			env->prog = prog = new_prog;
 			insn      = new_prog->insnsi + i + delta;
-			continue;
+			goto next_insn;
 		}
 
 		if (insn->imm == BPF_FUNC_timer_set_callback) {
@@ -19754,7 +19817,7 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 				delta    += cnt - 1;
 				env->prog = prog = new_prog;
 				insn      = new_prog->insnsi + i + delta;
-				continue;
+				goto next_insn;
 			}
 
 			BUILD_BUG_ON(!__same_type(ops->map_lookup_elem,
@@ -19785,31 +19848,31 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 			switch (insn->imm) {
 			case BPF_FUNC_map_lookup_elem:
 				insn->imm = BPF_CALL_IMM(ops->map_lookup_elem);
-				continue;
+				goto next_insn;
 			case BPF_FUNC_map_update_elem:
 				insn->imm = BPF_CALL_IMM(ops->map_update_elem);
-				continue;
+				goto next_insn;
 			case BPF_FUNC_map_delete_elem:
 				insn->imm = BPF_CALL_IMM(ops->map_delete_elem);
-				continue;
+				goto next_insn;
 			case BPF_FUNC_map_push_elem:
 				insn->imm = BPF_CALL_IMM(ops->map_push_elem);
-				continue;
+				goto next_insn;
 			case BPF_FUNC_map_pop_elem:
 				insn->imm = BPF_CALL_IMM(ops->map_pop_elem);
-				continue;
+				goto next_insn;
 			case BPF_FUNC_map_peek_elem:
 				insn->imm = BPF_CALL_IMM(ops->map_peek_elem);
-				continue;
+				goto next_insn;
 			case BPF_FUNC_redirect_map:
 				insn->imm = BPF_CALL_IMM(ops->map_redirect);
-				continue;
+				goto next_insn;
 			case BPF_FUNC_for_each_map_elem:
 				insn->imm = BPF_CALL_IMM(ops->map_for_each_callback);
-				continue;
+				goto next_insn;
 			case BPF_FUNC_map_lookup_percpu_elem:
 				insn->imm = BPF_CALL_IMM(ops->map_lookup_percpu_elem);
-				continue;
+				goto next_insn;
 			}
 
 			goto patch_call_imm;
@@ -19837,7 +19900,7 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 			delta    += cnt - 1;
 			env->prog = prog = new_prog;
 			insn      = new_prog->insnsi + i + delta;
-			continue;
+			goto next_insn;
 		}
 
 		/* Implement bpf_get_func_arg inline. */
@@ -19862,7 +19925,7 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 			delta    += cnt - 1;
 			env->prog = prog = new_prog;
 			insn      = new_prog->insnsi + i + delta;
-			continue;
+			goto next_insn;
 		}
 
 		/* Implement bpf_get_func_ret inline. */
@@ -19890,7 +19953,7 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 			delta    += cnt - 1;
 			env->prog = prog = new_prog;
 			insn      = new_prog->insnsi + i + delta;
-			continue;
+			goto next_insn;
 		}
 
 		/* Implement get_func_arg_cnt inline. */
@@ -19905,7 +19968,7 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 
 			env->prog = prog = new_prog;
 			insn      = new_prog->insnsi + i + delta;
-			continue;
+			goto next_insn;
 		}
 
 		/* Implement bpf_get_func_ip inline. */
@@ -19920,7 +19983,7 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 
 			env->prog = prog = new_prog;
 			insn      = new_prog->insnsi + i + delta;
-			continue;
+			goto next_insn;
 		}
 
 		/* Implement bpf_kptr_xchg inline */
@@ -19938,7 +20001,7 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 			delta    += cnt - 1;
 			env->prog = prog = new_prog;
 			insn      = new_prog->insnsi + i + delta;
-			continue;
+			goto next_insn;
 		}
 patch_call_imm:
 		fn = env->ops->get_func_proto(insn->imm, env->prog);
@@ -19952,6 +20015,39 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 			return -EFAULT;
 		}
 		insn->imm = fn->func - __bpf_call_base;
+next_insn:
+		if (subprogs[cur_subprog + 1].start == i + delta + 1) {
+			subprogs[cur_subprog].stack_depth += stack_depth_extra;
+			subprogs[cur_subprog].stack_extra = stack_depth_extra;
+			cur_subprog++;
+			stack_depth = subprogs[cur_subprog].stack_depth;
+			stack_depth_extra = 0;
+		}
+		i++; insn++;
+	}
+
+	env->prog->aux->stack_depth = subprogs[0].stack_depth;
+	for (i = 0; i < env->subprog_cnt; i++) {
+		int subprog_start = subprogs[i].start, j;
+		int stack_slots = subprogs[i].stack_extra / 8;
+
+		if (stack_slots >= ARRAY_SIZE(insn_buf)) {
+			verbose(env, "verifier bug: stack_extra is too large\n");
+			return -EFAULT;
+		}
+
+		/* Add insns to subprog prologue to init extra stack */
+		for (j = 0; j < stack_slots; j++)
+			insn_buf[j] = BPF_ST_MEM(BPF_DW, BPF_REG_FP,
+						 -subprogs[i].stack_depth + j * 8, BPF_MAX_LOOPS);
+		if (j) {
+			insn_buf[j] = env->prog->insnsi[subprog_start];
+
+			new_prog = bpf_patch_insn_data(env, subprog_start, insn_buf, j + 1);
+			if (!new_prog)
+				return -ENOMEM;
+			env->prog = prog = new_prog;
+		}
 	}
 
 	/* Since poke tab is now finalized, publish aux to tracker. */
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index a241f407c234..932ffef0dc88 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -42,6 +42,7 @@
 #define BPF_JSGE	0x70	/* SGE is signed '>=', GE in x86 */
 #define BPF_JSLT	0xc0	/* SLT is signed, '<' */
 #define BPF_JSLE	0xd0	/* SLE is signed, '<=' */
+#define BPF_JMA		0xe0	/* may_goto */
 #define BPF_CALL	0x80	/* function call */
 #define BPF_EXIT	0x90	/* function return */
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v5 bpf-next 2/4] bpf: Recognize that two registers are safe when their ranges match
  2024-03-05  4:52 [PATCH v5 bpf-next 0/4] bpf: Introduce may_goto and cond_break Alexei Starovoitov
  2024-03-05  4:52 ` [PATCH v5 bpf-next 1/4] bpf: Introduce may_goto instruction Alexei Starovoitov
@ 2024-03-05  4:52 ` Alexei Starovoitov
  2024-03-05 19:03   ` Andrii Nakryiko
  2024-03-05  4:52 ` [PATCH v5 bpf-next 3/4] bpf: Add cond_break macro Alexei Starovoitov
  2024-03-05  4:52 ` [PATCH v5 bpf-next 4/4] selftests/bpf: Test may_goto Alexei Starovoitov
  3 siblings, 1 reply; 10+ messages in thread
From: Alexei Starovoitov @ 2024-03-05  4:52 UTC (permalink / raw)
  To: bpf
  Cc: daniel, andrii, martin.lau, memxor, eddyz87, john.fastabend, kernel-team

From: Alexei Starovoitov <ast@kernel.org>

When open code iterators, bpf_loop or may_goto is used the following two states
are equivalent and safe to prune the search:

cur state: fp-8_w=scalar(id=3,smin=umin=smin32=umin32=2,smax=umax=smax32=umax32=11,var_off=(0x0; 0xf))
old state: fp-8_rw=scalar(id=2,smin=umin=smin32=umin32=1,smax=umax=smax32=umax32=11,var_off=(0x0; 0xf))

In other words "exact" state match should ignore liveness and precision marks,
since open coded iterator logic didn't complete their propagation,
but range_within logic that applies to scalars, ptr_to_mem, map_value, pkt_ptr
is safe to rely on.

Avoid doing such comparison when regular infinite loop detection logic is used,
otherwise bounded loop logic will declare such "infinite loop" as false
positive. Such example is in progs/verifier_loops1.c not_an_inifinite_loop().

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 kernel/bpf/verifier.c | 39 ++++++++++++++++++++++-----------------
 1 file changed, 22 insertions(+), 17 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 226bb65f9c2c..74b55d5571c7 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -16254,8 +16254,8 @@ static int check_btf_info(struct bpf_verifier_env *env,
 }
 
 /* check %cur's range satisfies %old's */
-static bool range_within(struct bpf_reg_state *old,
-			 struct bpf_reg_state *cur)
+static bool range_within(const struct bpf_reg_state *old,
+			 const struct bpf_reg_state *cur)
 {
 	return old->umin_value <= cur->umin_value &&
 	       old->umax_value >= cur->umax_value &&
@@ -16419,21 +16419,26 @@ static bool regs_exact(const struct bpf_reg_state *rold,
 	       check_ids(rold->ref_obj_id, rcur->ref_obj_id, idmap);
 }
 
+enum exact_level {
+	NOT_EXACT,
+	EXACT,
+	RANGE_WITHIN
+};
+
 /* Returns true if (rold safe implies rcur safe) */
 static bool regsafe(struct bpf_verifier_env *env, struct bpf_reg_state *rold,
-		    struct bpf_reg_state *rcur, struct bpf_idmap *idmap, bool exact)
+		    struct bpf_reg_state *rcur, struct bpf_idmap *idmap,
+		    enum exact_level exact)
 {
-	if (exact)
+	if (exact == EXACT)
 		return regs_exact(rold, rcur, idmap);
 
-	if (!(rold->live & REG_LIVE_READ))
+	if (!(rold->live & REG_LIVE_READ) && exact != RANGE_WITHIN)
 		/* explored state didn't use this */
 		return true;
-	if (rold->type == NOT_INIT)
+	if (rold->type == NOT_INIT && exact != RANGE_WITHIN)
 		/* explored state can't have used this */
 		return true;
-	if (rcur->type == NOT_INIT)
-		return false;
 
 	/* Enforce that register types have to match exactly, including their
 	 * modifiers (like PTR_MAYBE_NULL, MEM_RDONLY, etc), as a general
@@ -16468,7 +16473,7 @@ static bool regsafe(struct bpf_verifier_env *env, struct bpf_reg_state *rold,
 			return memcmp(rold, rcur, offsetof(struct bpf_reg_state, id)) == 0 &&
 			       check_scalar_ids(rold->id, rcur->id, idmap);
 		}
-		if (!rold->precise)
+		if (!rold->precise && exact != RANGE_WITHIN)
 			return true;
 		/* Why check_ids() for scalar registers?
 		 *
@@ -16579,7 +16584,7 @@ static struct bpf_reg_state *scalar_reg_for_stack(struct bpf_verifier_env *env,
 }
 
 static bool stacksafe(struct bpf_verifier_env *env, struct bpf_func_state *old,
-		      struct bpf_func_state *cur, struct bpf_idmap *idmap, bool exact)
+		      struct bpf_func_state *cur, struct bpf_idmap *idmap, enum exact_level exact)
 {
 	int i, spi;
 
@@ -16743,7 +16748,7 @@ static bool refsafe(struct bpf_func_state *old, struct bpf_func_state *cur,
  * the current state will reach 'bpf_exit' instruction safely
  */
 static bool func_states_equal(struct bpf_verifier_env *env, struct bpf_func_state *old,
-			      struct bpf_func_state *cur, bool exact)
+			      struct bpf_func_state *cur, enum exact_level exact)
 {
 	int i;
 
@@ -16770,7 +16775,7 @@ static void reset_idmap_scratch(struct bpf_verifier_env *env)
 static bool states_equal(struct bpf_verifier_env *env,
 			 struct bpf_verifier_state *old,
 			 struct bpf_verifier_state *cur,
-			 bool exact)
+			 enum exact_level exact)
 {
 	int i;
 
@@ -17144,7 +17149,7 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
 			 * => unsafe memory access at 11 would not be caught.
 			 */
 			if (is_iter_next_insn(env, insn_idx)) {
-				if (states_equal(env, &sl->state, cur, true)) {
+				if (states_equal(env, &sl->state, cur, RANGE_WITHIN)) {
 					struct bpf_func_state *cur_frame;
 					struct bpf_reg_state *iter_state, *iter_reg;
 					int spi;
@@ -17168,20 +17173,20 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
 				goto skip_inf_loop_check;
 			}
 			if (is_may_goto_insn(env, insn_idx)) {
-				if (states_equal(env, &sl->state, cur, true)) {
+				if (states_equal(env, &sl->state, cur, RANGE_WITHIN)) {
 					update_loop_entry(cur, &sl->state);
 					goto hit;
 				}
 				goto skip_inf_loop_check;
 			}
 			if (calls_callback(env, insn_idx)) {
-				if (states_equal(env, &sl->state, cur, true))
+				if (states_equal(env, &sl->state, cur, RANGE_WITHIN))
 					goto hit;
 				goto skip_inf_loop_check;
 			}
 			/* attempt to detect infinite loop to avoid unnecessary doomed work */
 			if (states_maybe_looping(&sl->state, cur) &&
-			    states_equal(env, &sl->state, cur, true) &&
+			    states_equal(env, &sl->state, cur, EXACT) &&
 			    !iter_active_depths_differ(&sl->state, cur) &&
 			    sl->state.may_goto_cnt == cur->may_goto_cnt &&
 			    sl->state.callback_unroll_depth == cur->callback_unroll_depth) {
@@ -17239,7 +17244,7 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
 		 */
 		loop_entry = get_loop_entry(&sl->state);
 		force_exact = loop_entry && loop_entry->branches > 0;
-		if (states_equal(env, &sl->state, cur, force_exact)) {
+		if (states_equal(env, &sl->state, cur, force_exact ? RANGE_WITHIN : NOT_EXACT)) {
 			if (force_exact)
 				update_loop_entry(cur, loop_entry);
 hit:
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v5 bpf-next 3/4] bpf: Add cond_break macro
  2024-03-05  4:52 [PATCH v5 bpf-next 0/4] bpf: Introduce may_goto and cond_break Alexei Starovoitov
  2024-03-05  4:52 ` [PATCH v5 bpf-next 1/4] bpf: Introduce may_goto instruction Alexei Starovoitov
  2024-03-05  4:52 ` [PATCH v5 bpf-next 2/4] bpf: Recognize that two registers are safe when their ranges match Alexei Starovoitov
@ 2024-03-05  4:52 ` Alexei Starovoitov
  2024-03-05  4:52 ` [PATCH v5 bpf-next 4/4] selftests/bpf: Test may_goto Alexei Starovoitov
  3 siblings, 0 replies; 10+ messages in thread
From: Alexei Starovoitov @ 2024-03-05  4:52 UTC (permalink / raw)
  To: bpf
  Cc: daniel, andrii, martin.lau, memxor, eddyz87, john.fastabend, kernel-team

From: Alexei Starovoitov <ast@kernel.org>

Use may_goto instruction to implement cond_break macro.
Ideally the macro should be written as:
  asm volatile goto(".byte 0xe5;
                     .byte 0;
                     .short %l[l_break] ...
                     .long 0;
but LLVM doesn't recognize fixup of 2 byte PC relative yet.
Hence use
  asm volatile goto(".byte 0xe5;
                     .byte 0;
                     .long %l[l_break] ...
                     .short 0;
that produces correct asm on little endian.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 tools/testing/selftests/bpf/bpf_experimental.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index 0d749006d107..bc9a0832ae72 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -326,6 +326,18 @@ l_true:												\
        })
 #endif
 
+#define cond_break					\
+	({ __label__ l_break, l_continue;		\
+	 asm volatile goto("1:.byte 0xe5;			\
+		      .byte 0;				\
+		      .long ((%l[l_break] - 1b - 8) / 8) & 0xffff;	\
+		      .short 0"				\
+		      :::: l_break);			\
+	goto l_continue;				\
+	l_break: break;					\
+	l_continue:;					\
+	})
+
 #ifndef bpf_nop_mov
 #define bpf_nop_mov(var) \
 	asm volatile("%[reg]=%[reg]"::[reg]"r"((short)var))
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v5 bpf-next 4/4] selftests/bpf: Test may_goto
  2024-03-05  4:52 [PATCH v5 bpf-next 0/4] bpf: Introduce may_goto and cond_break Alexei Starovoitov
                   ` (2 preceding siblings ...)
  2024-03-05  4:52 ` [PATCH v5 bpf-next 3/4] bpf: Add cond_break macro Alexei Starovoitov
@ 2024-03-05  4:52 ` Alexei Starovoitov
  3 siblings, 0 replies; 10+ messages in thread
From: Alexei Starovoitov @ 2024-03-05  4:52 UTC (permalink / raw)
  To: bpf
  Cc: daniel, andrii, martin.lau, memxor, eddyz87, john.fastabend, kernel-team

From: Alexei Starovoitov <ast@kernel.org>

Add tests for may_goto instruction via cond_break macro.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 tools/testing/selftests/bpf/DENYLIST.s390x    |   1 +
 .../bpf/progs/verifier_iterating_callbacks.c  | 103 +++++++++++++++++-
 2 files changed, 101 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/bpf/DENYLIST.s390x b/tools/testing/selftests/bpf/DENYLIST.s390x
index 1a63996c0304..cb810a98e78f 100644
--- a/tools/testing/selftests/bpf/DENYLIST.s390x
+++ b/tools/testing/selftests/bpf/DENYLIST.s390x
@@ -3,3 +3,4 @@
 exceptions				 # JIT does not support calling kfunc bpf_throw				       (exceptions)
 get_stack_raw_tp                         # user_stack corrupted user stack                                             (no backchain userspace)
 stacktrace_build_id                      # compare_map_keys stackid_hmap vs. stackmap err -2 errno 2                   (?)
+verifier_iterating_callbacks
diff --git a/tools/testing/selftests/bpf/progs/verifier_iterating_callbacks.c b/tools/testing/selftests/bpf/progs/verifier_iterating_callbacks.c
index 5905e036e0ea..04cdbce4652f 100644
--- a/tools/testing/selftests/bpf/progs/verifier_iterating_callbacks.c
+++ b/tools/testing/selftests/bpf/progs/verifier_iterating_callbacks.c
@@ -1,8 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
-
-#include <linux/bpf.h>
-#include <bpf/bpf_helpers.h>
 #include "bpf_misc.h"
+#include "bpf_experimental.h"
 
 struct {
 	__uint(type, BPF_MAP_TYPE_ARRAY);
@@ -239,4 +237,103 @@ int bpf_loop_iter_limit_nested(void *unused)
 	return 1000 * a + b + c;
 }
 
+#define ARR_SZ 1000000
+int zero;
+char arr[ARR_SZ];
+
+SEC("socket")
+__success __retval(0xd495cdc0)
+int cond_break1(const void *ctx)
+{
+	unsigned long i;
+	unsigned int sum = 0;
+
+	for (i = zero; i < ARR_SZ; cond_break, i++)
+		sum += i;
+	for (i = zero; i < ARR_SZ; i++) {
+		barrier_var(i);
+		sum += i + arr[i];
+		cond_break;
+	}
+
+	return sum;
+}
+
+SEC("socket")
+__success __retval(999000000)
+int cond_break2(const void *ctx)
+{
+	int i, j;
+	int sum = 0;
+
+	for (i = zero; i < 1000; cond_break, i++)
+		for (j = zero; j < 1000; j++) {
+			sum += i + j;
+			cond_break;
+		}
+
+	return sum;
+}
+
+static __noinline int loop(void)
+{
+	int i, sum = 0;
+
+	for (i = zero; i <= 1000000; i++, cond_break)
+		sum += i;
+
+	return sum;
+}
+
+SEC("socket")
+__success __retval(0x6a5a2920)
+int cond_break3(const void *ctx)
+{
+	return loop();
+}
+
+SEC("socket")
+__success __retval(1)
+int cond_break4(const void *ctx)
+{
+	int cnt = zero;
+
+	for (;;) {
+		/* should eventually break out of the loop */
+		cond_break;
+		cnt++;
+	}
+	/* if we looped a bit, it's a success */
+	return cnt > 1 ? 1 : 0;
+}
+
+static __noinline int static_subprog(void)
+{
+	int cnt = zero;
+
+	for (;;) {
+		cond_break;
+		cnt++;
+	}
+
+	return cnt;
+}
+
+SEC("socket")
+__success __retval(1)
+int cond_break5(const void *ctx)
+{
+	int cnt1 = zero, cnt2;
+
+	for (;;) {
+		cond_break;
+		cnt1++;
+	}
+
+	cnt2 = static_subprog();
+
+	/* main and subprog have to loop a bit */
+	return cnt1 > 1 && cnt2 > 1 ? 1 : 0;
+}
+
 char _license[] SEC("license") = "GPL";
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v5 bpf-next 1/4] bpf: Introduce may_goto instruction
  2024-03-05  4:52 ` [PATCH v5 bpf-next 1/4] bpf: Introduce may_goto instruction Alexei Starovoitov
@ 2024-03-05 18:37   ` Andrii Nakryiko
  2024-03-05 20:52     ` Alexei Starovoitov
  0 siblings, 1 reply; 10+ messages in thread
From: Andrii Nakryiko @ 2024-03-05 18:37 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, daniel, andrii, martin.lau, memxor, eddyz87, john.fastabend,
	kernel-team

On Mon, Mar 4, 2024 at 8:52 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> From: Alexei Starovoitov <ast@kernel.org>
>
> Introduce may_goto instruction that acts on a hidden bpf_iter_num, so that
> bpf_iter_num_new(), bpf_iter_num_destroy() don't need to be called explicitly.

bpf_iter_num was probably an inspiration, but I think by now the
analogy is pretty weak. bpf_iter_num_next() returns NULL or pointer to
int (i.e., it returns some usable value), while may_goto jumps or not.
So it's not just implicit new/destroy. The above doesn't confuse me,
but I wonder if someone less familiar with iterators would be confused
by the above?

> It can be used in any normal "for" or "while" loop, like
>
>   for (i = zero; i < cnt; cond_break, i++) {
>
> The verifier recognizes that may_goto is used in the program,
> reserves additional 8 bytes of stack, initializes them in subprog
> prologue, and replaces may_goto instruction with:
> aux_reg = *(u64 *)(fp - 40)
> if aux_reg == 0 goto pc+off
> aux_reg += 1

`aux_reg -= 1`?

> *(u64 *)(fp - 40) = aux_reg
>
> may_goto instruction can be used by LLVM to implement __builtin_memcpy,
> __builtin_strcmp.
>
> may_goto is not a full substitute for bpf_for() macro.
> bpf_for() doesn't have induction variable that verifiers sees,
> so 'i' in bpf_for(i, 0, 100) is seen as imprecise and bounded.
>
> But when the code is written as:
> for (i = 0; i < 100; cond_break, i++)
> the verifier see 'i' as precise constant zero,
> hence cond_break (aka may_goto) doesn't help to converge the loop.
> A static or global variable can be used as a workaround:
> static int zero = 0;
> for (i = zero; i < 100; cond_break, i++) // works!
>
> may_goto works well with arena pointers that don't need to be bounds-checked
> on every iteration. Load/store from arena returns imprecise unbounded scalars.
>
> Reserve new opcode BPF_JMP | BPF_JMA for may_goto insn.
> JMA stands for "jump maybe", and "jump multipurpose", and "jump multi always".
> Since goto_or_nop insn was proposed, it may use the same opcode.
> may_goto vs goto_or_nop can be distinguished by src_reg:
> code = BPF_JMP | BPF_JMA:
> src_reg = 0 - may_goto
> src_reg = 1 - goto_or_nop
> We could have reused BPF_JMP | BPF_JA like:
> src_reg = 0 - normal goto
> src_reg = 1 - may_goto
> src_reg = 2 - goto_or_nop
> but JA is a real insn and it's unconditional, while may_goto and goto_or_nop
> are pseudo instructions, and both are conditional. Hence it's better to
> have a different opcode for them. Hence BPF_JMA.
>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---
>  include/linux/bpf_verifier.h   |   2 +
>  include/uapi/linux/bpf.h       |   1 +
>  kernel/bpf/core.c              |   1 +
>  kernel/bpf/disasm.c            |   3 +
>  kernel/bpf/verifier.c          | 156 ++++++++++++++++++++++++++-------
>  tools/include/uapi/linux/bpf.h |   1 +
>  6 files changed, 134 insertions(+), 30 deletions(-)
>

Not a huge fan of BPF_JMA, but there is no clear naming winner.
BPF_JAUX, BPF_JPSEUDO, BPF_JMAYBE, would be a bit more
greppable/recognizable, but it's not a big deal.

Left few nits below, but overall LGTM

Acked-by: Andrii Nakryiko <andrii@kernel.org>

> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index 84365e6dd85d..917ca603059b 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -449,6 +449,7 @@ struct bpf_verifier_state {
>         u32 jmp_history_cnt;
>         u32 dfs_depth;
>         u32 callback_unroll_depth;
> +       u32 may_goto_cnt;

naming nit: seems like we consistently use "depth" terminology for
bpf_loop and open-coded iters, any reason to deviate with "cnt"
terminology here?

>  };
>
>  #define bpf_get_spilled_reg(slot, frame, mask)                         \

[...]

>
> +static bool is_may_goto_insn(struct bpf_verifier_env *env, int insn_idx)
> +{
> +       return env->prog->insnsi[insn_idx].code == (BPF_JMP | BPF_JMA);
> +}
> +
>  /* process_iter_next_call() is called when verifier gets to iterator's next
>   * "method" (e.g., bpf_iter_num_next() for numbers iterator) call. We'll refer
>   * to it as just "iter_next()" in comments below.
> @@ -14871,11 +14877,35 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
>         int err;
>
>         /* Only conditional jumps are expected to reach here. */
> -       if (opcode == BPF_JA || opcode > BPF_JSLE) {
> +       if (opcode == BPF_JA || opcode > BPF_JMA) {
>                 verbose(env, "invalid BPF_JMP/JMP32 opcode %x\n", opcode);
>                 return -EINVAL;
>         }
>
> +       if (opcode == BPF_JMA) {
> +               struct bpf_verifier_state *cur_st = env->cur_state, *queued_st, *prev_st;
> +               int idx = *insn_idx;
> +
> +               if (insn->code != (BPF_JMP | BPF_JMA) ||
> +                   insn->src_reg || insn->dst_reg || insn->imm || insn->off == 0) {
> +                       verbose(env, "invalid may_goto off %d imm %d\n",
> +                               insn->off, insn->imm);
> +                       return -EINVAL;
> +               }
> +               prev_st = find_prev_entry(env, cur_st->parent, idx);
> +
> +               /* branch out 'fallthrough' insn as a new state to explore */
> +               queued_st = push_stack(env, idx + 1, idx, false);
> +               if (!queued_st)
> +                       return -ENOMEM;
> +
> +               queued_st->may_goto_cnt++;
> +               if (prev_st)
> +                       widen_imprecise_scalars(env, prev_st, queued_st);
> +               *insn_idx += insn->off;
> +               return 0;
> +       }
> +
>         /* check src2 operand */
>         err = check_reg_arg(env, insn->dst_reg, SRC_OP);
>         if (err)
> @@ -15659,6 +15689,8 @@ static int visit_insn(int t, struct bpf_verifier_env *env)
>         default:
>                 /* conditional jump with two edges */
>                 mark_prune_point(env, t);
> +               if (insn->code == (BPF_JMP | BPF_JMA))

maybe use is_may_goto_insn() here for consistency?

> +                       mark_force_checkpoint(env, t);
>
>                 ret = push_insn(t, t + 1, FALLTHROUGH, env);
>                 if (ret)

[...]

>  patch_call_imm:
>                 fn = env->ops->get_func_proto(insn->imm, env->prog);
> @@ -19952,6 +20015,39 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
>                         return -EFAULT;
>                 }
>                 insn->imm = fn->func - __bpf_call_base;
> +next_insn:
> +               if (subprogs[cur_subprog + 1].start == i + delta + 1) {
> +                       subprogs[cur_subprog].stack_depth += stack_depth_extra;
> +                       subprogs[cur_subprog].stack_extra = stack_depth_extra;
> +                       cur_subprog++;
> +                       stack_depth = subprogs[cur_subprog].stack_depth;
> +                       stack_depth_extra = 0;
> +               }
> +               i++; insn++;

Is there a code path where we don't do i++, insn++? From cursory look
at this loop, I think we always do this, so not sure why `i++, insn++`
had to be moved from for() clause?

But if I missed it and we have to do these increments here, these are
two separate statements, so let's put them on separate lines?

> +       }
> +
> +       env->prog->aux->stack_depth = subprogs[0].stack_depth;
> +       for (i = 0; i < env->subprog_cnt; i++) {
> +               int subprog_start = subprogs[i].start, j;
> +               int stack_slots = subprogs[i].stack_extra / 8;
> +
> +               if (stack_slots >= ARRAY_SIZE(insn_buf)) {
> +                       verbose(env, "verifier bug: stack_extra is too large\n");
> +                       return -EFAULT;
> +               }
> +
> +               /* Add insns to subprog prologue to init extra stack */
> +               for (j = 0; j < stack_slots; j++)
> +                       insn_buf[j] = BPF_ST_MEM(BPF_DW, BPF_REG_FP,
> +                                                -subprogs[i].stack_depth + j * 8, BPF_MAX_LOOPS);
> +               if (j) {
> +                       insn_buf[j] = env->prog->insnsi[subprog_start];
> +
> +                       new_prog = bpf_patch_insn_data(env, subprog_start, insn_buf, j + 1);
> +                       if (!new_prog)
> +                               return -ENOMEM;
> +                       env->prog = prog = new_prog;
> +               }

this code is sort of generic (you don't assume just 0 or 1 extra
slots), but then it initializes each extra slot with BPF_MAX_LOOPS,
which doesn't look generic at all. So it's neither as simple as it
could be nor generic, really...

Maybe let's add WARN_ON if stack_extra>1 (so we catch it if we ever
extend this), but otherwise just have a simple and easier to follow

if (stack_slots) {
    insn_buf[0] = BPF_ST_MEM(..., BPF_MAX_LOOPS);
    /* bpf_patch_insn_data() replaces instruction,
     * so we need to copy first actual insn to preserve it (it's not
that obvious)
     */
    insn_buf[1] = env->prog->insnsi[subprog_start];
    ... patch ...
}

It's pretty minor, overall, but definitely caused some pause for me.

>         }
>
>         /* Since poke tab is now finalized, publish aux to tracker. */
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index a241f407c234..932ffef0dc88 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -42,6 +42,7 @@
>  #define BPF_JSGE       0x70    /* SGE is signed '>=', GE in x86 */
>  #define BPF_JSLT       0xc0    /* SLT is signed, '<' */
>  #define BPF_JSLE       0xd0    /* SLE is signed, '<=' */
> +#define BPF_JMA                0xe0    /* may_goto */
>  #define BPF_CALL       0x80    /* function call */
>  #define BPF_EXIT       0x90    /* function return */
>
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5 bpf-next 2/4] bpf: Recognize that two registers are safe when their ranges match
  2024-03-05  4:52 ` [PATCH v5 bpf-next 2/4] bpf: Recognize that two registers are safe when their ranges match Alexei Starovoitov
@ 2024-03-05 19:03   ` Andrii Nakryiko
  2024-03-05 21:18     ` Alexei Starovoitov
  0 siblings, 1 reply; 10+ messages in thread
From: Andrii Nakryiko @ 2024-03-05 19:03 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, daniel, andrii, martin.lau, memxor, eddyz87, john.fastabend,
	kernel-team

On Mon, Mar 4, 2024 at 8:52 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> From: Alexei Starovoitov <ast@kernel.org>
>
> When open code iterators, bpf_loop or may_goto is used the following two states
> are equivalent and safe to prune the search:
>
> cur state: fp-8_w=scalar(id=3,smin=umin=smin32=umin32=2,smax=umax=smax32=umax32=11,var_off=(0x0; 0xf))
> old state: fp-8_rw=scalar(id=2,smin=umin=smin32=umin32=1,smax=umax=smax32=umax32=11,var_off=(0x0; 0xf))
>
> In other words "exact" state match should ignore liveness and precision marks,
> since open coded iterator logic didn't complete their propagation,
> but range_within logic that applies to scalars, ptr_to_mem, map_value, pkt_ptr
> is safe to rely on.
>
> Avoid doing such comparison when regular infinite loop detection logic is used,
> otherwise bounded loop logic will declare such "infinite loop" as false
> positive. Such example is in progs/verifier_loops1.c not_an_inifinite_loop().
>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---

You haven't updated stacksafe() to work with enums (the code still
assumes bool), please adjust.

pw-bot: cr

Also, I wonder if we actually need three cases. We need EXACT only for
infinite loop detection, right? I wonder if that infinite loop
detection is actually that useful in practice, I rarely encounter it
(if at all), and it constantly causes issues.

What if we just dropped infinite loop detection logic altogether and
have NOT_EXACT vs RANGE_WITHIN logic only?

>  kernel/bpf/verifier.c | 39 ++++++++++++++++++++++-----------------
>  1 file changed, 22 insertions(+), 17 deletions(-)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 226bb65f9c2c..74b55d5571c7 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -16254,8 +16254,8 @@ static int check_btf_info(struct bpf_verifier_env *env,
>  }
>
>  /* check %cur's range satisfies %old's */
> -static bool range_within(struct bpf_reg_state *old,
> -                        struct bpf_reg_state *cur)
> +static bool range_within(const struct bpf_reg_state *old,
> +                        const struct bpf_reg_state *cur)
>  {
>         return old->umin_value <= cur->umin_value &&
>                old->umax_value >= cur->umax_value &&
> @@ -16419,21 +16419,26 @@ static bool regs_exact(const struct bpf_reg_state *rold,
>                check_ids(rold->ref_obj_id, rcur->ref_obj_id, idmap);
>  }
>
> +enum exact_level {
> +       NOT_EXACT,
> +       EXACT,
> +       RANGE_WITHIN
> +};
> +
>  /* Returns true if (rold safe implies rcur safe) */
>  static bool regsafe(struct bpf_verifier_env *env, struct bpf_reg_state *rold,
> -                   struct bpf_reg_state *rcur, struct bpf_idmap *idmap, bool exact)
> +                   struct bpf_reg_state *rcur, struct bpf_idmap *idmap,
> +                   enum exact_level exact)
>  {
> -       if (exact)
> +       if (exact == EXACT)
>                 return regs_exact(rold, rcur, idmap);
>
> -       if (!(rold->live & REG_LIVE_READ))
> +       if (!(rold->live & REG_LIVE_READ) && exact != RANGE_WITHIN)

imo, `exact == NOT_EXACT` is easier to follow (we excluded EXACT above)

>                 /* explored state didn't use this */
>                 return true;
> -       if (rold->type == NOT_INIT)
> +       if (rold->type == NOT_INIT && exact != RANGE_WITHIN)

ditto

>                 /* explored state can't have used this */
>                 return true;
> -       if (rcur->type == NOT_INIT)
> -               return false;

do we need to remove this? in which case will it do the wrong thing?

>
>         /* Enforce that register types have to match exactly, including their
>          * modifiers (like PTR_MAYBE_NULL, MEM_RDONLY, etc), as a general
> @@ -16468,7 +16473,7 @@ static bool regsafe(struct bpf_verifier_env *env, struct bpf_reg_state *rold,
>                         return memcmp(rold, rcur, offsetof(struct bpf_reg_state, id)) == 0 &&
>                                check_scalar_ids(rold->id, rcur->id, idmap);
>                 }
> -               if (!rold->precise)
> +               if (!rold->precise && exact != RANGE_WITHIN)

same, I think explicit `exact == NOT_EXACT` is easier to follow

>                         return true;
>                 /* Why check_ids() for scalar registers?
>                  *
> @@ -16579,7 +16584,7 @@ static struct bpf_reg_state *scalar_reg_for_stack(struct bpf_verifier_env *env,
>  }
>
>  static bool stacksafe(struct bpf_verifier_env *env, struct bpf_func_state *old,
> -                     struct bpf_func_state *cur, struct bpf_idmap *idmap, bool exact)
> +                     struct bpf_func_state *cur, struct bpf_idmap *idmap, enum exact_level exact)
>  {
>         int i, spi;
>
> @@ -16743,7 +16748,7 @@ static bool refsafe(struct bpf_func_state *old, struct bpf_func_state *cur,
>   * the current state will reach 'bpf_exit' instruction safely
>   */
>  static bool func_states_equal(struct bpf_verifier_env *env, struct bpf_func_state *old,
> -                             struct bpf_func_state *cur, bool exact)
> +                             struct bpf_func_state *cur, enum exact_level exact)
>  {
>         int i;
>
> @@ -16770,7 +16775,7 @@ static void reset_idmap_scratch(struct bpf_verifier_env *env)
>  static bool states_equal(struct bpf_verifier_env *env,
>                          struct bpf_verifier_state *old,
>                          struct bpf_verifier_state *cur,
> -                        bool exact)
> +                        enum exact_level exact)
>  {
>         int i;
>
> @@ -17144,7 +17149,7 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
>                          * => unsafe memory access at 11 would not be caught.
>                          */
>                         if (is_iter_next_insn(env, insn_idx)) {
> -                               if (states_equal(env, &sl->state, cur, true)) {
> +                               if (states_equal(env, &sl->state, cur, RANGE_WITHIN)) {
>                                         struct bpf_func_state *cur_frame;
>                                         struct bpf_reg_state *iter_state, *iter_reg;
>                                         int spi;
> @@ -17168,20 +17173,20 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
>                                 goto skip_inf_loop_check;
>                         }
>                         if (is_may_goto_insn(env, insn_idx)) {
> -                               if (states_equal(env, &sl->state, cur, true)) {
> +                               if (states_equal(env, &sl->state, cur, RANGE_WITHIN)) {
>                                         update_loop_entry(cur, &sl->state);
>                                         goto hit;
>                                 }
>                                 goto skip_inf_loop_check;
>                         }
>                         if (calls_callback(env, insn_idx)) {
> -                               if (states_equal(env, &sl->state, cur, true))
> +                               if (states_equal(env, &sl->state, cur, RANGE_WITHIN))
>                                         goto hit;
>                                 goto skip_inf_loop_check;
>                         }
>                         /* attempt to detect infinite loop to avoid unnecessary doomed work */
>                         if (states_maybe_looping(&sl->state, cur) &&
> -                           states_equal(env, &sl->state, cur, true) &&
> +                           states_equal(env, &sl->state, cur, EXACT) &&
>                             !iter_active_depths_differ(&sl->state, cur) &&
>                             sl->state.may_goto_cnt == cur->may_goto_cnt &&
>                             sl->state.callback_unroll_depth == cur->callback_unroll_depth) {
> @@ -17239,7 +17244,7 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
>                  */
>                 loop_entry = get_loop_entry(&sl->state);
>                 force_exact = loop_entry && loop_entry->branches > 0;
> -               if (states_equal(env, &sl->state, cur, force_exact)) {
> +               if (states_equal(env, &sl->state, cur, force_exact ? RANGE_WITHIN : NOT_EXACT)) {
>                         if (force_exact)
>                                 update_loop_entry(cur, loop_entry);
>  hit:
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5 bpf-next 1/4] bpf: Introduce may_goto instruction
  2024-03-05 18:37   ` Andrii Nakryiko
@ 2024-03-05 20:52     ` Alexei Starovoitov
  0 siblings, 0 replies; 10+ messages in thread
From: Alexei Starovoitov @ 2024-03-05 20:52 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
	Kumar Kartikeya Dwivedi, Eddy Z, John Fastabend, Kernel Team

On Tue, Mar 5, 2024 at 10:38 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Mon, Mar 4, 2024 at 8:52 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > From: Alexei Starovoitov <ast@kernel.org>
> >
> > Introduce may_goto instruction that acts on a hidden bpf_iter_num, so that
> > bpf_iter_num_new(), bpf_iter_num_destroy() don't need to be called explicitly.
>
> bpf_iter_num was probably an inspiration, but I think by now the
> analogy is pretty weak. bpf_iter_num_next() returns NULL or pointer to
> int (i.e., it returns some usable value), while may_goto jumps or not.
> So it's not just implicit new/destroy. The above doesn't confuse me,
> but I wonder if someone less familiar with iterators would be confused
> by the above?

Agree. Will reword.

>
> > It can be used in any normal "for" or "while" loop, like
> >
> >   for (i = zero; i < cnt; cond_break, i++) {
> >
> > The verifier recognizes that may_goto is used in the program,
> > reserves additional 8 bytes of stack, initializes them in subprog
> > prologue, and replaces may_goto instruction with:
> > aux_reg = *(u64 *)(fp - 40)
> > if aux_reg == 0 goto pc+off
> > aux_reg += 1
>
> `aux_reg -= 1`?

+1 for -1

>
> > *(u64 *)(fp - 40) = aux_reg
> >
> > may_goto instruction can be used by LLVM to implement __builtin_memcpy,
> > __builtin_strcmp.
> >
> > may_goto is not a full substitute for bpf_for() macro.
> > bpf_for() doesn't have induction variable that verifiers sees,
> > so 'i' in bpf_for(i, 0, 100) is seen as imprecise and bounded.
> >
> > But when the code is written as:
> > for (i = 0; i < 100; cond_break, i++)
> > the verifier see 'i' as precise constant zero,
> > hence cond_break (aka may_goto) doesn't help to converge the loop.
> > A static or global variable can be used as a workaround:
> > static int zero = 0;
> > for (i = zero; i < 100; cond_break, i++) // works!
> >
> > may_goto works well with arena pointers that don't need to be bounds-checked
> > on every iteration. Load/store from arena returns imprecise unbounded scalars.
> >
> > Reserve new opcode BPF_JMP | BPF_JMA for may_goto insn.
> > JMA stands for "jump maybe", and "jump multipurpose", and "jump multi always".
> > Since goto_or_nop insn was proposed, it may use the same opcode.
> > may_goto vs goto_or_nop can be distinguished by src_reg:
> > code = BPF_JMP | BPF_JMA:
> > src_reg = 0 - may_goto
> > src_reg = 1 - goto_or_nop
> > We could have reused BPF_JMP | BPF_JA like:
> > src_reg = 0 - normal goto
> > src_reg = 1 - may_goto
> > src_reg = 2 - goto_or_nop
> > but JA is a real insn and it's unconditional, while may_goto and goto_or_nop
> > are pseudo instructions, and both are conditional. Hence it's better to
> > have a different opcode for them. Hence BPF_JMA.
> >
> > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > ---
> >  include/linux/bpf_verifier.h   |   2 +
> >  include/uapi/linux/bpf.h       |   1 +
> >  kernel/bpf/core.c              |   1 +
> >  kernel/bpf/disasm.c            |   3 +
> >  kernel/bpf/verifier.c          | 156 ++++++++++++++++++++++++++-------
> >  tools/include/uapi/linux/bpf.h |   1 +
> >  6 files changed, 134 insertions(+), 30 deletions(-)
> >
>
> Not a huge fan of BPF_JMA, but there is no clear naming winner.
> BPF_JAUX, BPF_JPSEUDO, BPF_JMAYBE, would be a bit more
> greppable/recognizable, but it's not a big deal.

In the next version I'm planning to go with BPF_JCOND
to describe a new class of conditional pseudo jumps.
A comment hopefully will be good enough to avoid confusion
with existing conditional jumps (all of them except BPF_JA)
I will also add
enum {
 BPF_MAY_GOTO = 0,
};
and check that src_reg is equal to that.
Then in the future it can be extended with:
 BPF_NOP_OR_GOTO = 1,

> Left few nits below, but overall LGTM
>
> Acked-by: Andrii Nakryiko <andrii@kernel.org>
>
> > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > index 84365e6dd85d..917ca603059b 100644
> > --- a/include/linux/bpf_verifier.h
> > +++ b/include/linux/bpf_verifier.h
> > @@ -449,6 +449,7 @@ struct bpf_verifier_state {
> >         u32 jmp_history_cnt;
> >         u32 dfs_depth;
> >         u32 callback_unroll_depth;
> > +       u32 may_goto_cnt;
>
> naming nit: seems like we consistently use "depth" terminology for
> bpf_loop and open-coded iters, any reason to deviate with "cnt"
> terminology here?

sure.

> >  };
> >
> >  #define bpf_get_spilled_reg(slot, frame, mask)                         \
>
> [...]
>
> >
> > +static bool is_may_goto_insn(struct bpf_verifier_env *env, int insn_idx)
> > +{
> > +       return env->prog->insnsi[insn_idx].code == (BPF_JMP | BPF_JMA);
> > +}
> > +
> >  /* process_iter_next_call() is called when verifier gets to iterator's next
> >   * "method" (e.g., bpf_iter_num_next() for numbers iterator) call. We'll refer
> >   * to it as just "iter_next()" in comments below.
> > @@ -14871,11 +14877,35 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
> >         int err;
> >
> >         /* Only conditional jumps are expected to reach here. */
> > -       if (opcode == BPF_JA || opcode > BPF_JSLE) {
> > +       if (opcode == BPF_JA || opcode > BPF_JMA) {
> >                 verbose(env, "invalid BPF_JMP/JMP32 opcode %x\n", opcode);
> >                 return -EINVAL;
> >         }
> >
> > +       if (opcode == BPF_JMA) {
> > +               struct bpf_verifier_state *cur_st = env->cur_state, *queued_st, *prev_st;
> > +               int idx = *insn_idx;
> > +
> > +               if (insn->code != (BPF_JMP | BPF_JMA) ||
> > +                   insn->src_reg || insn->dst_reg || insn->imm || insn->off == 0) {
> > +                       verbose(env, "invalid may_goto off %d imm %d\n",
> > +                               insn->off, insn->imm);
> > +                       return -EINVAL;
> > +               }
> > +               prev_st = find_prev_entry(env, cur_st->parent, idx);
> > +
> > +               /* branch out 'fallthrough' insn as a new state to explore */
> > +               queued_st = push_stack(env, idx + 1, idx, false);
> > +               if (!queued_st)
> > +                       return -ENOMEM;
> > +
> > +               queued_st->may_goto_cnt++;
> > +               if (prev_st)
> > +                       widen_imprecise_scalars(env, prev_st, queued_st);
> > +               *insn_idx += insn->off;
> > +               return 0;
> > +       }
> > +
> >         /* check src2 operand */
> >         err = check_reg_arg(env, insn->dst_reg, SRC_OP);
> >         if (err)
> > @@ -15659,6 +15689,8 @@ static int visit_insn(int t, struct bpf_verifier_env *env)
> >         default:
> >                 /* conditional jump with two edges */
> >                 mark_prune_point(env, t);
> > +               if (insn->code == (BPF_JMP | BPF_JMA))
>
> maybe use is_may_goto_insn() here for consistency?

+1

> > +                       mark_force_checkpoint(env, t);
> >
> >                 ret = push_insn(t, t + 1, FALLTHROUGH, env);
> >                 if (ret)
>
> [...]
>
> >  patch_call_imm:
> >                 fn = env->ops->get_func_proto(insn->imm, env->prog);
> > @@ -19952,6 +20015,39 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
> >                         return -EFAULT;
> >                 }
> >                 insn->imm = fn->func - __bpf_call_base;
> > +next_insn:
> > +               if (subprogs[cur_subprog + 1].start == i + delta + 1) {
> > +                       subprogs[cur_subprog].stack_depth += stack_depth_extra;
> > +                       subprogs[cur_subprog].stack_extra = stack_depth_extra;
> > +                       cur_subprog++;
> > +                       stack_depth = subprogs[cur_subprog].stack_depth;
> > +                       stack_depth_extra = 0;
> > +               }
> > +               i++; insn++;
>
> Is there a code path where we don't do i++, insn++? From cursory look
> at this loop, I think we always do this, so not sure why `i++, insn++`
> had to be moved from for() clause?

I was worried about missing replacing 'continue' with 'goto next_insn'.
Since 'continue' will work for all tests except may_goto tests,
and in arena patch set I have a hunk that is added to this loop too
and it is written with 'continue'.
Technically we can keep 'i++; insn++;' in the for()...
if we're going to be code reviewing any future additions carefully.
In other words 'stack_extra logic plus i and insn increments'
are part of the same logical block. It's an action that should
be done after each insn, hence better to keep them together.
Either inside for() as I did in v1/v2 or here toward the last '}'.
The latter is more canonical C.

>
> But if I missed it and we have to do these increments here, these are
> two separate statements, so let's put them on separate lines?

sure.

>
> > +       }
> > +
> > +       env->prog->aux->stack_depth = subprogs[0].stack_depth;
> > +       for (i = 0; i < env->subprog_cnt; i++) {
> > +               int subprog_start = subprogs[i].start, j;
> > +               int stack_slots = subprogs[i].stack_extra / 8;
> > +
> > +               if (stack_slots >= ARRAY_SIZE(insn_buf)) {
> > +                       verbose(env, "verifier bug: stack_extra is too large\n");
> > +                       return -EFAULT;
> > +               }
> > +
> > +               /* Add insns to subprog prologue to init extra stack */
> > +               for (j = 0; j < stack_slots; j++)
> > +                       insn_buf[j] = BPF_ST_MEM(BPF_DW, BPF_REG_FP,
> > +                                                -subprogs[i].stack_depth + j * 8, BPF_MAX_LOOPS);
> > +               if (j) {
> > +                       insn_buf[j] = env->prog->insnsi[subprog_start];
> > +
> > +                       new_prog = bpf_patch_insn_data(env, subprog_start, insn_buf, j + 1);
> > +                       if (!new_prog)
> > +                               return -ENOMEM;
> > +                       env->prog = prog = new_prog;
> > +               }
>
> this code is sort of generic (you don't assume just 0 or 1 extra
> slots), but then it initializes each extra slot with BPF_MAX_LOOPS,
> which doesn't look generic at all. So it's neither as simple as it
> could be nor generic, really...
>
> Maybe let's add WARN_ON if stack_extra>1 (so we catch it if we ever
> extend this), but otherwise just have a simple and easier to follow
>
> if (stack_slots) {
>     insn_buf[0] = BPF_ST_MEM(..., BPF_MAX_LOOPS);
>     /* bpf_patch_insn_data() replaces instruction,
>      * so we need to copy first actual insn to preserve it (it's not
> that obvious)
>      */
>     insn_buf[1] = env->prog->insnsi[subprog_start];
>     ... patch ...
> }
>
> It's pretty minor, overall, but definitely caused some pause for me.

I had it like that in v2, but then thought that I might
try to use this to fix tail_call from subprogs issue discussed
in the separate thread, so I changed it to generic zero init in v3,
but then went with BPF_MAX_LOOPS init in v4.
Looking at this now it seems hard coding to stack_slots==1 is better indeed.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5 bpf-next 2/4] bpf: Recognize that two registers are safe when their ranges match
  2024-03-05 19:03   ` Andrii Nakryiko
@ 2024-03-05 21:18     ` Alexei Starovoitov
  2024-03-05 21:57       ` Andrii Nakryiko
  0 siblings, 1 reply; 10+ messages in thread
From: Alexei Starovoitov @ 2024-03-05 21:18 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
	Kumar Kartikeya Dwivedi, Eddy Z, John Fastabend, Kernel Team

On Tue, Mar 5, 2024 at 11:03 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Mon, Mar 4, 2024 at 8:52 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > From: Alexei Starovoitov <ast@kernel.org>
> >
> > When open code iterators, bpf_loop or may_goto is used the following two states
> > are equivalent and safe to prune the search:
> >
> > cur state: fp-8_w=scalar(id=3,smin=umin=smin32=umin32=2,smax=umax=smax32=umax32=11,var_off=(0x0; 0xf))
> > old state: fp-8_rw=scalar(id=2,smin=umin=smin32=umin32=1,smax=umax=smax32=umax32=11,var_off=(0x0; 0xf))
> >
> > In other words "exact" state match should ignore liveness and precision marks,
> > since open coded iterator logic didn't complete their propagation,
> > but range_within logic that applies to scalars, ptr_to_mem, map_value, pkt_ptr
> > is safe to rely on.
> >
> > Avoid doing such comparison when regular infinite loop detection logic is used,
> > otherwise bounded loop logic will declare such "infinite loop" as false
> > positive. Such example is in progs/verifier_loops1.c not_an_inifinite_loop().
> >
> > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > ---
>
> You haven't updated stacksafe() to work with enums (the code still
> assumes bool), please adjust.

great catch.

> pw-bot: cr
>
> Also, I wonder if we actually need three cases. We need EXACT only for
> infinite loop detection, right? I wonder if that infinite loop
> detection is actually that useful in practice, I rarely encounter it
> (if at all), and it constantly causes issues.
>
> What if we just dropped infinite loop detection logic altogether and
> have NOT_EXACT vs RANGE_WITHIN logic only?

The infinite loop logic is for better user experience.
With:
time ./test_verifier 145
#145/p calls: conditional call 6 OK
Summary: 1 PASSED, 0 SKIPPED, 0 FAILED

real    0m0.191s
user    0m0.001s
sys    0m0.053s

without (on prod kernel):
time ./test_verifier 145
#145/p calls: conditional call 6 FAIL
Unexpected error message!
    EXP: infinite loop detected
    RES: BPF program is too large. Processed 1000001 insn
verification time 5182443 usec
stack depth 0+0
processed 1000001 insns (limit 1000000) max_states_per_insn 4
total_states 33336 peak_states 33336 mark_read 1

real    0m5.375s
user    0m0.002s
sys    0m4.700s

I hope that eventually we can replace the 1M insn limit with
"if the verifier is making progress, keep going".
Then states_maybe_looping() will be replaced with something else.
It should address non-converging bpf_loop/open iterators
that hit 1M right now while walking the same insns over and over again.
Something like "did we visit this insn 1k times" may be enough.

> >  kernel/bpf/verifier.c | 39 ++++++++++++++++++++++-----------------
> >  1 file changed, 22 insertions(+), 17 deletions(-)
> >
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 226bb65f9c2c..74b55d5571c7 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -16254,8 +16254,8 @@ static int check_btf_info(struct bpf_verifier_env *env,
> >  }
> >
> >  /* check %cur's range satisfies %old's */
> > -static bool range_within(struct bpf_reg_state *old,
> > -                        struct bpf_reg_state *cur)
> > +static bool range_within(const struct bpf_reg_state *old,
> > +                        const struct bpf_reg_state *cur)
> >  {
> >         return old->umin_value <= cur->umin_value &&
> >                old->umax_value >= cur->umax_value &&
> > @@ -16419,21 +16419,26 @@ static bool regs_exact(const struct bpf_reg_state *rold,
> >                check_ids(rold->ref_obj_id, rcur->ref_obj_id, idmap);
> >  }
> >
> > +enum exact_level {
> > +       NOT_EXACT,
> > +       EXACT,
> > +       RANGE_WITHIN
> > +};
> > +
> >  /* Returns true if (rold safe implies rcur safe) */
> >  static bool regsafe(struct bpf_verifier_env *env, struct bpf_reg_state *rold,
> > -                   struct bpf_reg_state *rcur, struct bpf_idmap *idmap, bool exact)
> > +                   struct bpf_reg_state *rcur, struct bpf_idmap *idmap,
> > +                   enum exact_level exact)
> >  {
> > -       if (exact)
> > +       if (exact == EXACT)
> >                 return regs_exact(rold, rcur, idmap);
> >
> > -       if (!(rold->live & REG_LIVE_READ))
> > +       if (!(rold->live & REG_LIVE_READ) && exact != RANGE_WITHIN)
>
> imo, `exact == NOT_EXACT` is easier to follow (we excluded EXACT above)

ok.

> >                 /* explored state didn't use this */
> >                 return true;
> > -       if (rold->type == NOT_INIT)
> > +       if (rold->type == NOT_INIT && exact != RANGE_WITHIN)
>
> ditto
>
> >                 /* explored state can't have used this */
> >                 return true;
> > -       if (rcur->type == NOT_INIT)
> > -               return false;
>
> do we need to remove this? in which case will it do the wrong thing?

Yes.
Consider cur-type == rold->type == NOT_INIT exact == RANGE_WITHIN
and regsafe() will return false, but should return true via:
        default:
                return regs_exact(rold, rcur, idmap);

Having said that it's better to change above two 'if'-s to
if (rold->type == NOT_INIT) {
    if (exact == NOT_EXACT || rcur->type == NOT_INIT)
      return true;
}
to avoid doing regs_exact() for the common case.

> >
> >         /* Enforce that register types have to match exactly, including their
> >          * modifiers (like PTR_MAYBE_NULL, MEM_RDONLY, etc), as a general
> > @@ -16468,7 +16473,7 @@ static bool regsafe(struct bpf_verifier_env *env, struct bpf_reg_state *rold,
> >                         return memcmp(rold, rcur, offsetof(struct bpf_reg_state, id)) == 0 &&
> >                                check_scalar_ids(rold->id, rcur->id, idmap);
> >                 }
> > -               if (!rold->precise)
> > +               if (!rold->precise && exact != RANGE_WITHIN)
>
> same, I think explicit `exact == NOT_EXACT` is easier to follow

ok

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5 bpf-next 2/4] bpf: Recognize that two registers are safe when their ranges match
  2024-03-05 21:18     ` Alexei Starovoitov
@ 2024-03-05 21:57       ` Andrii Nakryiko
  0 siblings, 0 replies; 10+ messages in thread
From: Andrii Nakryiko @ 2024-03-05 21:57 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
	Kumar Kartikeya Dwivedi, Eddy Z, John Fastabend, Kernel Team

On Tue, Mar 5, 2024 at 1:18 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Mar 5, 2024 at 11:03 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Mon, Mar 4, 2024 at 8:52 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > From: Alexei Starovoitov <ast@kernel.org>
> > >
> > > When open code iterators, bpf_loop or may_goto is used the following two states
> > > are equivalent and safe to prune the search:
> > >
> > > cur state: fp-8_w=scalar(id=3,smin=umin=smin32=umin32=2,smax=umax=smax32=umax32=11,var_off=(0x0; 0xf))
> > > old state: fp-8_rw=scalar(id=2,smin=umin=smin32=umin32=1,smax=umax=smax32=umax32=11,var_off=(0x0; 0xf))
> > >
> > > In other words "exact" state match should ignore liveness and precision marks,
> > > since open coded iterator logic didn't complete their propagation,
> > > but range_within logic that applies to scalars, ptr_to_mem, map_value, pkt_ptr
> > > is safe to rely on.
> > >
> > > Avoid doing such comparison when regular infinite loop detection logic is used,
> > > otherwise bounded loop logic will declare such "infinite loop" as false
> > > positive. Such example is in progs/verifier_loops1.c not_an_inifinite_loop().
> > >
> > > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > > ---
> >
> > You haven't updated stacksafe() to work with enums (the code still
> > assumes bool), please adjust.
>
> great catch.
>
> > pw-bot: cr
> >
> > Also, I wonder if we actually need three cases. We need EXACT only for
> > infinite loop detection, right? I wonder if that infinite loop
> > detection is actually that useful in practice, I rarely encounter it
> > (if at all), and it constantly causes issues.
> >
> > What if we just dropped infinite loop detection logic altogether and
> > have NOT_EXACT vs RANGE_WITHIN logic only?
>
> The infinite loop logic is for better user experience.
> With:
> time ./test_verifier 145
> #145/p calls: conditional call 6 OK
> Summary: 1 PASSED, 0 SKIPPED, 0 FAILED
>
> real    0m0.191s
> user    0m0.001s
> sys    0m0.053s
>
> without (on prod kernel):
> time ./test_verifier 145
> #145/p calls: conditional call 6 FAIL
> Unexpected error message!
>     EXP: infinite loop detected
>     RES: BPF program is too large. Processed 1000001 insn
> verification time 5182443 usec
> stack depth 0+0
> processed 1000001 insns (limit 1000000) max_states_per_insn 4
> total_states 33336 peak_states 33336 mark_read 1
>
> real    0m5.375s
> user    0m0.002s
> sys    0m4.700s
>
> I hope that eventually we can replace the 1M insn limit with
> "if the verifier is making progress, keep going".
> Then states_maybe_looping() will be replaced with something else.
> It should address non-converging bpf_loop/open iterators
> that hit 1M right now while walking the same insns over and over again.
> Something like "did we visit this insn 1k times" may be enough.

Yeah, my hopes were very low for getting rid of that check, but it
always is causing problems... Alright, fine, we'll have to live with a
more complicated tri-state exactness check, unfortunately.

>
> > >  kernel/bpf/verifier.c | 39 ++++++++++++++++++++++-----------------
> > >  1 file changed, 22 insertions(+), 17 deletions(-)
> > >
> > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > index 226bb65f9c2c..74b55d5571c7 100644
> > > --- a/kernel/bpf/verifier.c
> > > +++ b/kernel/bpf/verifier.c
> > > @@ -16254,8 +16254,8 @@ static int check_btf_info(struct bpf_verifier_env *env,
> > >  }
> > >
> > >  /* check %cur's range satisfies %old's */
> > > -static bool range_within(struct bpf_reg_state *old,
> > > -                        struct bpf_reg_state *cur)
> > > +static bool range_within(const struct bpf_reg_state *old,
> > > +                        const struct bpf_reg_state *cur)
> > >  {
> > >         return old->umin_value <= cur->umin_value &&
> > >                old->umax_value >= cur->umax_value &&
> > > @@ -16419,21 +16419,26 @@ static bool regs_exact(const struct bpf_reg_state *rold,
> > >                check_ids(rold->ref_obj_id, rcur->ref_obj_id, idmap);
> > >  }
> > >
> > > +enum exact_level {
> > > +       NOT_EXACT,
> > > +       EXACT,
> > > +       RANGE_WITHIN
> > > +};
> > > +
> > >  /* Returns true if (rold safe implies rcur safe) */
> > >  static bool regsafe(struct bpf_verifier_env *env, struct bpf_reg_state *rold,
> > > -                   struct bpf_reg_state *rcur, struct bpf_idmap *idmap, bool exact)
> > > +                   struct bpf_reg_state *rcur, struct bpf_idmap *idmap,
> > > +                   enum exact_level exact)
> > >  {
> > > -       if (exact)
> > > +       if (exact == EXACT)
> > >                 return regs_exact(rold, rcur, idmap);
> > >
> > > -       if (!(rold->live & REG_LIVE_READ))
> > > +       if (!(rold->live & REG_LIVE_READ) && exact != RANGE_WITHIN)
> >
> > imo, `exact == NOT_EXACT` is easier to follow (we excluded EXACT above)
>
> ok.
>
> > >                 /* explored state didn't use this */
> > >                 return true;
> > > -       if (rold->type == NOT_INIT)
> > > +       if (rold->type == NOT_INIT && exact != RANGE_WITHIN)
> >
> > ditto
> >
> > >                 /* explored state can't have used this */
> > >                 return true;
> > > -       if (rcur->type == NOT_INIT)
> > > -               return false;
> >
> > do we need to remove this? in which case will it do the wrong thing?
>
> Yes.
> Consider cur-type == rold->type == NOT_INIT exact == RANGE_WITHIN
> and regsafe() will return false, but should return true via:
>         default:
>                 return regs_exact(rold, rcur, idmap);

oh, very subtle and error-prone, so yeah, I like the explicit check below more

>
> Having said that it's better to change above two 'if'-s to
> if (rold->type == NOT_INIT) {
>     if (exact == NOT_EXACT || rcur->type == NOT_INIT)
>       return true;
> }
> to avoid doing regs_exact() for the common case.

sure, give it a try, but I got to say that it is now quite hard to
follow the logic with the tri-state "exactness"

>
> > >
> > >         /* Enforce that register types have to match exactly, including their
> > >          * modifiers (like PTR_MAYBE_NULL, MEM_RDONLY, etc), as a general
> > > @@ -16468,7 +16473,7 @@ static bool regsafe(struct bpf_verifier_env *env, struct bpf_reg_state *rold,
> > >                         return memcmp(rold, rcur, offsetof(struct bpf_reg_state, id)) == 0 &&
> > >                                check_scalar_ids(rold->id, rcur->id, idmap);
> > >                 }
> > > -               if (!rold->precise)
> > > +               if (!rold->precise && exact != RANGE_WITHIN)
> >
> > same, I think explicit `exact == NOT_EXACT` is easier to follow
>
> ok

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-03-05 21:57 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-05  4:52 [PATCH v5 bpf-next 0/4] bpf: Introduce may_goto and cond_break Alexei Starovoitov
2024-03-05  4:52 ` [PATCH v5 bpf-next 1/4] bpf: Introduce may_goto instruction Alexei Starovoitov
2024-03-05 18:37   ` Andrii Nakryiko
2024-03-05 20:52     ` Alexei Starovoitov
2024-03-05  4:52 ` [PATCH v5 bpf-next 2/4] bpf: Recognize that two registers are safe when their ranges match Alexei Starovoitov
2024-03-05 19:03   ` Andrii Nakryiko
2024-03-05 21:18     ` Alexei Starovoitov
2024-03-05 21:57       ` Andrii Nakryiko
2024-03-05  4:52 ` [PATCH v5 bpf-next 3/4] bpf: Add cond_break macro Alexei Starovoitov
2024-03-05  4:52 ` [PATCH v5 bpf-next 4/4] selftests/bpf: Test may_goto Alexei Starovoitov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.