All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG] verifier escape with iteration helpers (bpf_loop, ...)
@ 2023-07-07 14:04 Andrew Werner
  2023-07-07 16:44 ` Alexei Starovoitov
  0 siblings, 1 reply; 52+ messages in thread
From: Andrew Werner @ 2023-07-07 14:04 UTC (permalink / raw)
  To: bpf
  Cc: Andrei Matei, Tamir Duberstein, joannelkoong, kernel-team,
	andrii.nakryiko, song

My coworkers and I have been playing with tracing use cases with bpf
and we uncovered some programs we did not expect to pass verification.
We believe that there's incorrect logic when in the verifier in the
context of the
`bpf_loop` and `bpf_for_each_map_elem` helpers (I suspect also
`bpf_user_ringbuf_drain`, but I haven't verified it). These helpers are
interesting in that they take a callback function, and a context pointer which
will be passed into the callback function; these are the helpers that do some
form of iterations. The observation is that the verifier doesn't take into
account possible changes to the context made by the callback. I'm interested in
better understanding how the verifier is supposed to work in these cases, and to
collaborate on fixing the underlying problem. Consider the following probes
(Note: this will crash the machine if triggered):

```C
#include <linux/bpf.h>
#include <linux/ptrace.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>

char _license[] SEC("license") = "GPL";

typedef struct context {
    char* buf;
} context_t;

static long loop_callback(__u32 idx, context_t* ctx)
{
    if (idx == 0) {
        ctx->buf = (char*)(0xDEAD);
        return 0;
    }
    if (bpf_probe_read_user(ctx->buf, 8, (void*)(0xBADC0FFEE))) {
        return 1;
    }
    return 0;
}

SEC("uprobe/bpf_loop_bug")
int BPF_KPROBE(bpf_loop_bug, void* foo_ptr)
{
    __u64 buf = 0;
    context_t context = (context_t) { .buf = (char*)(&buf) };
    bpf_loop(100, loop_callback, &context, 0);
}
```

As far as I can tell, the verifier only verifies the callback relative to the
state of the context value when it's initially passed in. This can
problematically allow for pointers to erroneously be considered valid
pointers to
map data when in reality those pointers can point anywhere. The example probes
use this to write arbitrary data into arbitrary memory addresses.

From my reading of the verifier code, it seems that only exactly one
iteration of the loop is checked. The important call, as I understand it, is
`__check_func_call`[1]. It seems that that logic adjusts the verifier state
to move the arguments into the appropriate registers, and then passes
verification into the body of the function, but then, when the verifier hits
a return, it just goes back to the caller as opposed to verifying the next
iteration of the loop. Am I missing something?

When it comes to fixing the problem, I don't quite know where to start.
Perhaps these iteration callbacks ought to be treated more like global functions
-- you can't always make assumptions about the state of the data in the context
pointer. Treating the context pointer as totally opaque seems bad from
a usability
perspective. Maybe there's a way to attempt to verify the function
body of the function
by treating all or part of the context as read-only, and then if that
fails, go back and
assume nothing about that part of the context structure. What is the
right way to
think about plugging this hole?

Somewhat tangential, it seems that if I later interact with the data in the
context which may have been mutated, I get a rather opaque verifier error about
the context type itself: `reg type unsupported for arg#1 function
loop_callback#33`. I'd be interested to understand more deeply why this combined
version fails verification, and how to improve the error. See the
following probe:

```C
SEC("uprobe/both")
int BPF_KPROBE(bpf_for_each_map_elem_bug, void *foo_ptr) {
    __u64 buf = 0;
    context_t context = (context_t) {.buf = (char *)(&buf) };
    bpf_for_each_map_elem(&ARRAY, for_each_map_elem_callback, &context, 0);
    bpf_loop(100, loop_callback, &context, 0);
}
```

Just for completeness, here's one using `bpf_for_each_map_elem`:

```C
struct {
    __uint(type, BPF_MAP_TYPE_ARRAY);
    __type(key, __u32);
    __type(value, __u64);
    __uint(max_entries, 100);
} ARRAY SEC(".maps");

static long for_each_map_elem_callback(
    struct bpf_map* map,
    const __u32* key,
    __u64* value,
    context_t* ctx
) {
    if (*(__u64*)(ctx->buf) == 0) {
        ctx->buf = (char*)(0xDEAD);
        return 0;
    }
    if (bpf_probe_read_user(ctx->buf, 8, (void*)(0xBADC0FFEE))) {
        return 1;
    }
    return 0;
}

SEC("uprobe/for_each_map_elem_bug")
int BPF_KPROBE(for_each_map_elem_bug, void* foo_ptr)
{
    __u64 buf = 0;
    context_t context = (context_t) { .buf = (char*)(&buf) };
    bpf_for_each_map_elem(&ARRAY, for_each_map_elem_callback, &context, 0);
    bpf_loop(100, loop_callback, &context, 0);
}
```
[1]: https://github.com/torvalds/linux/blob/5133c9e51de41bfa902153888e11add3342ede18/kernel/bpf/verifier.c#L8668

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-07-07 14:04 [BUG] verifier escape with iteration helpers (bpf_loop, ...) Andrew Werner
@ 2023-07-07 16:44 ` Alexei Starovoitov
  2023-07-07 18:08   ` Andrii Nakryiko
  0 siblings, 1 reply; 52+ messages in thread
From: Alexei Starovoitov @ 2023-07-07 16:44 UTC (permalink / raw)
  To: Andrew Werner
  Cc: bpf, Andrei Matei, Tamir Duberstein, Joanne Koong, kernel-team,
	Andrii Nakryiko, Song Liu

On Fri, Jul 7, 2023 at 7:04 AM Andrew Werner <awerner32@gmail.com> wrote:
>
> When it comes to fixing the problem, I don't quite know where to start.
> Perhaps these iteration callbacks ought to be treated more like global functions
> -- you can't always make assumptions about the state of the data in the context
> pointer. Treating the context pointer as totally opaque seems bad from
> a usability
> perspective. Maybe there's a way to attempt to verify the function
> body of the function
> by treating all or part of the context as read-only, and then if that
> fails, go back and
> assume nothing about that part of the context structure. What is the
> right way to
> think about plugging this hole?

'treat as global' might be a way to fix it, but it will likely break
some setups, since people passing pointers in a context and current
global func verification doesn't support that.
I think the simplest fix would be to disallow writes into any pointers
within a ctx. Writes to scalars should still be allowed.
Much more complex fix would be to verify callbacks as
process_iter_next_call(). See giant comment next to it.
But since we need a fix for stable I'd try the simple approach first.
Could you try to implement that?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-07-07 16:44 ` Alexei Starovoitov
@ 2023-07-07 18:08   ` Andrii Nakryiko
  2023-07-07 18:21     ` Andrew Werner
  2023-09-17 21:37     ` Eduard Zingerman
  0 siblings, 2 replies; 52+ messages in thread
From: Andrii Nakryiko @ 2023-07-07 18:08 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrew Werner, bpf, Andrei Matei, Tamir Duberstein, Joanne Koong,
	kernel-team, Song Liu

On Fri, Jul 7, 2023 at 9:44 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Jul 7, 2023 at 7:04 AM Andrew Werner <awerner32@gmail.com> wrote:
> >
> > When it comes to fixing the problem, I don't quite know where to start.
> > Perhaps these iteration callbacks ought to be treated more like global functions
> > -- you can't always make assumptions about the state of the data in the context
> > pointer. Treating the context pointer as totally opaque seems bad from
> > a usability
> > perspective. Maybe there's a way to attempt to verify the function
> > body of the function
> > by treating all or part of the context as read-only, and then if that
> > fails, go back and
> > assume nothing about that part of the context structure. What is the
> > right way to
> > think about plugging this hole?
>
> 'treat as global' might be a way to fix it, but it will likely break
> some setups, since people passing pointers in a context and current
> global func verification doesn't support that.

yeah, the intended use case is to return something from callbacks
through context pointer. So it will definitely break real uses.

> I think the simplest fix would be to disallow writes into any pointers
> within a ctx. Writes to scalars should still be allowed.

It might be a partial mitigation, but even with SCALARs there could be
problems due to assumed SCALAR range, which will be invalid if
callback is never called or called many times.

> Much more complex fix would be to verify callbacks as
> process_iter_next_call(). See giant comment next to it.

yep, that seems like the right way forward. We need to simulate 0, 1,
2, .... executions of callbacks until we validate and exhaust all
possible context modification permutations, just like open-coded
iterators do it

> But since we need a fix for stable I'd try the simple approach first.
> Could you try to implement that?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-07-07 18:08   ` Andrii Nakryiko
@ 2023-07-07 18:21     ` Andrew Werner
  2023-09-17 21:37     ` Eduard Zingerman
  1 sibling, 0 replies; 52+ messages in thread
From: Andrew Werner @ 2023-07-07 18:21 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, bpf, Andrei Matei, Tamir Duberstein,
	Joanne Koong, kernel-team, Song Liu

On Fri, Jul 7, 2023 at 2:08 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Fri, Jul 7, 2023 at 9:44 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Fri, Jul 7, 2023 at 7:04 AM Andrew Werner <awerner32@gmail.com> wrote:
> > >
> > > When it comes to fixing the problem, I don't quite know where to start.
> > > Perhaps these iteration callbacks ought to be treated more like global functions
> > > -- you can't always make assumptions about the state of the data in the context
> > > pointer. Treating the context pointer as totally opaque seems bad from
> > > a usability
> > > perspective. Maybe there's a way to attempt to verify the function
> > > body of the function
> > > by treating all or part of the context as read-only, and then if that
> > > fails, go back and
> > > assume nothing about that part of the context structure. What is the
> > > right way to
> > > think about plugging this hole?
> >
> > 'treat as global' might be a way to fix it, but it will likely break
> > some setups, since people passing pointers in a context and current
> > global func verification doesn't support that.
>
> yeah, the intended use case is to return something from callbacks
> through context pointer. So it will definitely break real uses.

Definitely, this would break the probes we're using.

>
> > I think the simplest fix would be to disallow writes into any pointers
> > within a ctx. Writes to scalars should still be allowed.
>
> It might be a partial mitigation, but even with SCALARs there could be
> problems due to assumed SCALAR range, which will be invalid if
> callback is never called or called many times.

Indeed when the bug was first found it was because an offset scalar
in the context was being modified and was being added to an unmodified
pointer.

> > Much more complex fix would be to verify callbacks as
> > process_iter_next_call(). See giant comment next to it.
>
> yep, that seems like the right way forward. We need to simulate 0, 1,
> 2, .... executions of callbacks until we validate and exhaust all
> possible context modification permutations, just like open-coded
> iterators do it

I'll give process_iter_next_call() a shot, and see if I run into trouble.

On Fri, Jul 7, 2023 at 2:08 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Fri, Jul 7, 2023 at 9:44 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Fri, Jul 7, 2023 at 7:04 AM Andrew Werner <awerner32@gmail.com> wrote:
> > >
> > > When it comes to fixing the problem, I don't quite know where to start.
> > > Perhaps these iteration callbacks ought to be treated more like global functions
> > > -- you can't always make assumptions about the state of the data in the context
> > > pointer. Treating the context pointer as totally opaque seems bad from
> > > a usability
> > > perspective. Maybe there's a way to attempt to verify the function
> > > body of the function
> > > by treating all or part of the context as read-only, and then if that
> > > fails, go back and
> > > assume nothing about that part of the context structure. What is the
> > > right way to
> > > think about plugging this hole?
> >
> > 'treat as global' might be a way to fix it, but it will likely break
> > some setups, since people passing pointers in a context and current
> > global func verification doesn't support that.
>
> yeah, the intended use case is to return something from callbacks
> through context pointer. So it will definitely break real uses.
>
> > I think the simplest fix would be to disallow writes into any pointers
> > within a ctx. Writes to scalars should still be allowed.
>
> It might be a partial mitigation, but even with SCALARs there could be
> problems due to assumed SCALAR range, which will be invalid if
> callback is never called or called many times.
>
> > Much more complex fix would be to verify callbacks as
> > process_iter_next_call(). See giant comment next to it.
>
> yep, that seems like the right way forward. We need to simulate 0, 1,
> 2, .... executions of callbacks until we validate and exhaust all
> possible context modification permutations, just like open-coded
> iterators do it
>
> > But since we need a fix for stable I'd try the simple approach first.
> > Could you try to implement that?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-07-07 18:08   ` Andrii Nakryiko
  2023-07-07 18:21     ` Andrew Werner
@ 2023-09-17 21:37     ` Eduard Zingerman
  2023-09-17 22:09       ` Kumar Kartikeya Dwivedi
  2023-09-19 23:14       ` Andrii Nakryiko
  1 sibling, 2 replies; 52+ messages in thread
From: Eduard Zingerman @ 2023-09-17 21:37 UTC (permalink / raw)
  To: Andrii Nakryiko, Alexei Starovoitov
  Cc: Andrew Werner, bpf, Andrei Matei, Tamir Duberstein, Joanne Koong,
	kernel-team, Song Liu

On Fri, 2023-07-07 at 11:08 -0700, Andrii Nakryiko wrote:
> On Fri, Jul 7, 2023 at 9:44 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> > 
> > On Fri, Jul 7, 2023 at 7:04 AM Andrew Werner <awerner32@gmail.com> wrote:
> > > 
> > > When it comes to fixing the problem, I don't quite know where to start.
> > > Perhaps these iteration callbacks ought to be treated more like global functions
> > > -- you can't always make assumptions about the state of the data in the context
> > > pointer. Treating the context pointer as totally opaque seems bad from
> > > a usability
> > > perspective. Maybe there's a way to attempt to verify the function
> > > body of the function
> > > by treating all or part of the context as read-only, and then if that
> > > fails, go back and
> > > assume nothing about that part of the context structure. What is the
> > > right way to
> > > think about plugging this hole?
> > 
> > 'treat as global' might be a way to fix it, but it will likely break
> > some setups, since people passing pointers in a context and current
> > global func verification doesn't support that.
> 
> yeah, the intended use case is to return something from callbacks
> through context pointer. So it will definitely break real uses.
> 
> > I think the simplest fix would be to disallow writes into any pointers
> > within a ctx. Writes to scalars should still be allowed.
> 
> It might be a partial mitigation, but even with SCALARs there could be
> problems due to assumed SCALAR range, which will be invalid if
> callback is never called or called many times.
> 
> > Much more complex fix would be to verify callbacks as
> > process_iter_next_call(). See giant comment next to it.
> 
> yep, that seems like the right way forward. We need to simulate 0, 1,
> 2, .... executions of callbacks until we validate and exhaust all
> possible context modification permutations, just like open-coded
> iterators do it
> 
> > But since we need a fix for stable I'd try the simple approach first.
> > Could you try to implement that?

Hi All,

This issue seems stalled, so I took a look over the weekend.
I have a work in progress fix, please let me know if you don't agree
with direction I'm taking or if I'm stepping on anyone's toes.

After some analysis I decided to go with Alexei's suggestion and
implement something similar to iterators for selected set of helper
functions that accept "looping" callbacks, such as:
- bpf_loop
- bpf_for_each_map_elem
- bpf_user_ringbuf_drain
- bpf_timer_set_callback (?)

The sketch of the fix looks as follows:
- extend struct bpf_func_state with callback iterator state information:
  - active/non-active flag
  - depth
  - r1-r5 state at the entry to callback;
- extend __check_func_call() to setup callback iterator state when
  call to suitable helper function is processed;
- extend BPF_EXIT processing (prepare_func_exit()) to queue new
  callback visit upon return from iterating callback
  (similar to process_iter_next_call());
- extend is_state_visited() to account for callback iterator hits in a
  way similar to regular iterators;
- extend backtrack_insn() to correctly react to jumps from callback
  exit to callback entry.
  
I have a patch (at the end of this email) that correctly recognizes
the bpf_loop example in this thread as unsafe. However this patch has
a few deficiencies:

- verif_scale_strobemeta_bpf_loop selftest is not passing, because
  read_map_var function invoked from the callback continuously
  increments 'payload' pointer used in subsequent iterations.
  
  In order to handle such code I need to add an upper bound tracking
  for iteration depth (when such bound could be deduced).

- loop detection is broken for simple callback as below:

  static int loop_callback_infinite(__u32 idx, __u64 *data)
  {
      for (;;)
          (*ctx)++;
      return 0;
  }
  
  To handle such code I need to change is_state_visited() to do
  callback iterator loop/hit detection on exit from callback
  (returns are not prune points at the moment), currently it is done
  on entry.

- the callback as bellow currently causes state explosion:

  static int precise1_callback(__u32 idx, struct precise1_ctx *ctx)
  {
      if (idx == 0)
          ctx->a = 1;
      if (idx == 1 && ctx->a == 1)
          ctx->b = 1;
      return 0;
  }
  
  I'm not sure yet what to do about this, there are several possibilities:
  - tweak the order in which states are visited (need to think about it);
  - check states in bpf_verifier_env::head (not explored yet) for
    equivalence and avoid enqueuing duplicate states.
    
I'll proceed addressing the issues above on Monday.

Thanks,
Eduard

---

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index a3236651ec64..5589f55e42ba 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -70,6 +70,17 @@ enum bpf_iter_state {
 	BPF_ITER_STATE_DRAINED,
 };
 
+struct bpf_iter {
+	/* BTF container and BTF type ID describing
+	 * struct bpf_iter_<type> of an iterator state
+	 */
+	struct btf *btf;
+	u32 btf_id;
+	/* packing following two fields to fit iter state into 16 bytes */
+	enum bpf_iter_state state:2;
+	int depth:30;
+};
+
 struct bpf_reg_state {
 	/* Ordering of fields matters.  See states_equal() */
 	enum bpf_reg_type type;
@@ -115,16 +126,7 @@ struct bpf_reg_state {
 		} dynptr;
 
 		/* For bpf_iter stack slots */
-		struct {
-			/* BTF container and BTF type ID describing
-			 * struct bpf_iter_<type> of an iterator state
-			 */
-			struct btf *btf;
-			u32 btf_id;
-			/* packing following two fields to fit iter state into 16 bytes */
-			enum bpf_iter_state state:2;
-			int depth:30;
-		} iter;
+		struct bpf_iter iter;
 
 		/* Max size from any of the above. */
 		struct {
@@ -300,6 +302,8 @@ struct bpf_func_state {
 	bool in_callback_fn;
 	struct tnum callback_ret_range;
 	bool in_async_callback_fn;
+	struct bpf_iter callback_iter;
+	struct bpf_reg_state callback_entry_state[MAX_BPF_REG];
 
 	/* The following fields should be last. See copy_func_state() */
 	int acquired_refs;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index dbba2b806017..e79a4bec4461 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -2357,6 +2357,7 @@ static void init_func_state(struct bpf_verifier_env *env,
 	state->callback_ret_range = tnum_range(0, 0);
 	init_reg_state(env, state);
 	mark_verifier_state_scratched(env);
+	state->callback_iter.state = BPF_ITER_STATE_INVALID;
 }
 
 /* Similar to push_stack(), but for async callbacks */
@@ -3599,6 +3600,39 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx,
 			}
 		} else if (opcode == BPF_EXIT) {
 			bool r0_precise;
+			int subprog;
+
+			/* Jump from 'exit' to start of the same subprogram,
+			 * this happens for callback iteration, e.g.:
+			 *
+			 *   int cb_func(...) {
+			 *   start:
+			 *     ...
+			 *     return 0; // <-- BPF_EXIT processing in do_check()
+			 *               //     adds a state with IP set to 'start'.
+			 *   }
+			 *   bpf_loop(..., cb_func, ...);
+			 *
+			 * Clear r1-r5 as in the callback case above, but do
+			 * not change frame number.
+			 */
+			if (subseq_idx >= 0 &&
+			    ((subprog = find_subprog(env, subseq_idx)) >= 0) &&
+			    idx >= subseq_idx &&
+			    (subprog >= env->subprog_cnt ||
+			     idx < env->subprog_info[subprog + 1].start)) {
+				if (bt_reg_mask(bt) & ~BPF_REGMASK_ARGS) {
+					verbose(env, "BUG regs %x\n", bt_reg_mask(bt));
+					WARN_ONCE(1, "verifier backtracking bug");
+					return -EFAULT;
+				}
+				if (bt_stack_mask(bt) != 0)
+					return -ENOTSUPP;
+				/* clear r1-r5 in callback subprog's mask */
+				for (i = BPF_REG_1; i <= BPF_REG_5; i++)
+					bt_clear_reg(bt, i);
+				return 0;
+			}
 
 			if (bt_reg_mask(bt) & BPF_REGMASK_ARGS) {
 				/* if backtracing was looking for registers R1-R5
@@ -8869,13 +8903,17 @@ static int set_callee_state(struct bpf_verifier_env *env,
 			    struct bpf_func_state *caller,
 			    struct bpf_func_state *callee, int insn_idx);
 
+static int set_loop_callback_state(struct bpf_verifier_env *env,
+				   struct bpf_func_state *caller,
+				   struct bpf_func_state *callee, int insn_idx);
+
 static int __check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			     int *insn_idx, int subprog,
 			     set_callee_state_fn set_callee_state_cb)
 {
 	struct bpf_verifier_state *state = env->cur_state;
 	struct bpf_func_state *caller, *callee;
-	int err;
+	int err, i;
 
 	if (state->curframe + 1 >= MAX_CALL_FRAMES) {
 		verbose(env, "the call stack of %d frames is too deep\n",
@@ -8972,7 +9010,6 @@ static int __check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 			*insn_idx /* callsite */,
 			state->curframe + 1 /* frameno within this callchain */,
 			subprog /* subprog number within this prog */);
-
 	/* Transfer references to the callee */
 	err = copy_reference_state(callee, caller);
 	if (err)
@@ -8982,6 +9019,14 @@ static int __check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 	if (err)
 		goto err_out;
 
+	if (set_callee_state_cb == set_loop_callback_state) {
+		callee->callback_iter.state = BPF_ITER_STATE_ACTIVE;
+		callee->callback_iter.depth = 0;
+		for (i = BPF_REG_0; i < MAX_BPF_REG; ++i)
+			copy_register_state(&callee->callback_entry_state[i],
+					    &callee->regs[i]);
+	}
+
 	clear_caller_saved_regs(env, caller->regs);
 
 	/* only increment it after check_reg_arg() finished */
@@ -9256,7 +9301,7 @@ static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx)
 	struct bpf_verifier_state *state = env->cur_state;
 	struct bpf_func_state *caller, *callee;
 	struct bpf_reg_state *r0;
-	int err;
+	int err, i;
 
 	callee = state->frame[state->curframe];
 	r0 = &callee->regs[BPF_REG_0];
@@ -9301,6 +9346,25 @@ static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx)
 			return err;
 	}
 
+	if (callee->in_callback_fn && callee->callback_iter.state == BPF_ITER_STATE_ACTIVE) {
+		struct bpf_verifier_state *queued_st;
+		struct bpf_func_state *queued_callee;
+		struct bpf_iter *queued_iter;
+
+		queued_st = push_stack(env, env->subprog_info[callee->subprogno].start,
+				       *insn_idx, false);
+		if (!queued_st)
+			return -ENOMEM;
+
+		queued_callee = queued_st->frame[callee->frameno];
+		queued_iter = &queued_callee->callback_iter;
+		queued_iter->depth++;
+
+		for (i = BPF_REG_0; i < MAX_BPF_REG; ++i)
+			copy_register_state(&queued_callee->regs[i],
+					    &queued_callee->callback_entry_state[i]);
+	}
+
 	*insn_idx = callee->callsite + 1;
 	if (env->log.level & BPF_LOG_LEVEL) {
 		verbose(env, "returning from callee:\n");
@@ -16112,6 +16176,15 @@ static bool is_iter_next_insn(struct bpf_verifier_env *env, int insn_idx)
 	return env->insn_aux_data[insn_idx].is_iter_next;
 }
 
+static bool is_callback_iter_entry(struct bpf_verifier_env *env, int insn_idx)
+{
+	struct bpf_func_state *cur_frame = cur_func(env);
+
+	return cur_frame->in_callback_fn &&
+	       env->subprog_info[cur_frame->subprogno].start == insn_idx &&
+	       cur_frame->callback_iter.state != BPF_ITER_STATE_INVALID;
+}
+
 /* is_state_visited() handles iter_next() (see process_iter_next_call() for
  * terminology) calls specially: as opposed to bounded BPF loops, it *expects*
  * states to match, which otherwise would look like an infinite loop. So while
@@ -16190,6 +16263,9 @@ static bool iter_active_depths_differ(struct bpf_verifier_state *old, struct bpf
 			if (cur_slot->iter.depth != slot->iter.depth)
 				return true;
 		}
+		if (state->callback_iter.state == BPF_ITER_STATE_ACTIVE &&
+		    state->callback_iter.depth != cur->frame[fr]->callback_iter.depth)
+			return true;
 	}
 	return false;
 }
@@ -16277,6 +16353,12 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
 				}
 				goto skip_inf_loop_check;
 			}
+			if (is_callback_iter_entry(env, insn_idx)) {
+				if (states_equal(env, &sl->state, cur) &&
+				    cur_func(env)->callback_iter.state == BPF_ITER_STATE_ACTIVE)
+					goto hit;
+				goto skip_inf_loop_check;
+			}
 			/* attempt to detect infinite loop to avoid unnecessary doomed work */
 			if (states_maybe_looping(&sl->state, cur) &&
 			    states_equal(env, &sl->state, cur) &&


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-09-17 21:37     ` Eduard Zingerman
@ 2023-09-17 22:09       ` Kumar Kartikeya Dwivedi
  2023-09-18 13:06         ` Eduard Zingerman
  2023-09-19 23:14       ` Andrii Nakryiko
  1 sibling, 1 reply; 52+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-09-17 22:09 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Andrii Nakryiko, Alexei Starovoitov, Andrew Werner, bpf,
	Andrei Matei, Tamir Duberstein, Joanne Koong, kernel-team,
	Song Liu

On Sun, 17 Sept 2023 at 23:37, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Fri, 2023-07-07 at 11:08 -0700, Andrii Nakryiko wrote:
> > On Fri, Jul 7, 2023 at 9:44 AM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Fri, Jul 7, 2023 at 7:04 AM Andrew Werner <awerner32@gmail.com> wrote:
> > > >
> > > > When it comes to fixing the problem, I don't quite know where to start.
> > > > Perhaps these iteration callbacks ought to be treated more like global functions
> > > > -- you can't always make assumptions about the state of the data in the context
> > > > pointer. Treating the context pointer as totally opaque seems bad from
> > > > a usability
> > > > perspective. Maybe there's a way to attempt to verify the function
> > > > body of the function
> > > > by treating all or part of the context as read-only, and then if that
> > > > fails, go back and
> > > > assume nothing about that part of the context structure. What is the
> > > > right way to
> > > > think about plugging this hole?
> > >
> > > 'treat as global' might be a way to fix it, but it will likely break
> > > some setups, since people passing pointers in a context and current
> > > global func verification doesn't support that.
> >
> > yeah, the intended use case is to return something from callbacks
> > through context pointer. So it will definitely break real uses.
> >
> > > I think the simplest fix would be to disallow writes into any pointers
> > > within a ctx. Writes to scalars should still be allowed.
> >
> > It might be a partial mitigation, but even with SCALARs there could be
> > problems due to assumed SCALAR range, which will be invalid if
> > callback is never called or called many times.
> >
> > > Much more complex fix would be to verify callbacks as
> > > process_iter_next_call(). See giant comment next to it.
> >
> > yep, that seems like the right way forward. We need to simulate 0, 1,
> > 2, .... executions of callbacks until we validate and exhaust all
> > possible context modification permutations, just like open-coded
> > iterators do it
> >
> > > But since we need a fix for stable I'd try the simple approach first.
> > > Could you try to implement that?
>
> Hi All,
>
> This issue seems stalled, so I took a look over the weekend.
> I have a work in progress fix, please let me know if you don't agree
> with direction I'm taking or if I'm stepping on anyone's toes.
>

I was planning to get to this eventually, but it's great if you are
looking to do it.

> After some analysis I decided to go with Alexei's suggestion and
> implement something similar to iterators for selected set of helper
> functions that accept "looping" callbacks, such as:
> - bpf_loop
> - bpf_for_each_map_elem
> - bpf_user_ringbuf_drain
> - bpf_timer_set_callback (?)
>

The last one won't require this, I think. The callback only runs once,
and asynchronously.
Others you are missing are bpf_find_vma and the bpf_rbtree_add kfunc.
While the latter is not an interator, it can invoke the same callback
an unknown number of times.

The other major thing that needs to be checked is cases where callback
will be executed zero times. There is some discussion on this in [0],
where this bug was reported originally. Basically, we need to explore
a path where the callback execution does not happen and ensure the
program is still safe.

[0]: https://lore.kernel.org/bpf/CAP01T74cOJzo3xQcW6weURH+mYRQ7kAWMqQOgtd_ymSbhrOoMQ@mail.gmail.com

You could also consider taking the selftests from this link, some of
them allow completely breaking safety properties of the verifier.

> The sketch of the fix looks as follows:
> - extend struct bpf_func_state with callback iterator state information:
>   - active/non-active flag
>   - depth
>   - r1-r5 state at the entry to callback;
> - extend __check_func_call() to setup callback iterator state when
>   call to suitable helper function is processed;
> - extend BPF_EXIT processing (prepare_func_exit()) to queue new
>   callback visit upon return from iterating callback
>   (similar to process_iter_next_call());
> - extend is_state_visited() to account for callback iterator hits in a
>   way similar to regular iterators;
> - extend backtrack_insn() to correctly react to jumps from callback
>   exit to callback entry.
>
> I have a patch (at the end of this email) that correctly recognizes
> the bpf_loop example in this thread as unsafe. However this patch has
> a few deficiencies:
>
> - verif_scale_strobemeta_bpf_loop selftest is not passing, because
>   read_map_var function invoked from the callback continuously
>   increments 'payload' pointer used in subsequent iterations.
>
>   In order to handle such code I need to add an upper bound tracking
>   for iteration depth (when such bound could be deduced).
>

Yes, either the loop converges before the upper limit is hit, or we
keep unrolling it until we exhaust the deduced loop count passed to
the bpf_loop helper. But we will need to mark the loop count argument
register as precise when we make use of its value in such a case.

> - loop detection is broken for simple callback as below:
>
>   static int loop_callback_infinite(__u32 idx, __u64 *data)
>   {
>       for (;;)
>           (*ctx)++;
>       return 0;
>   }
>
>   To handle such code I need to change is_state_visited() to do
>   callback iterator loop/hit detection on exit from callback
>   (returns are not prune points at the moment), currently it is done
>   on entry.
>
> - the callback as bellow currently causes state explosion:
>
>   static int precise1_callback(__u32 idx, struct precise1_ctx *ctx)
>   {
>       if (idx == 0)
>           ctx->a = 1;
>       if (idx == 1 && ctx->a == 1)
>           ctx->b = 1;
>       return 0;
>   }
>
>   I'm not sure yet what to do about this, there are several possibilities:
>   - tweak the order in which states are visited (need to think about it);
>   - check states in bpf_verifier_env::head (not explored yet) for
>     equivalence and avoid enqueuing duplicate states.
>
>
> I'll proceed addressing the issues above on Monday.
>

I think there is a class of programs that are nevertheless going to be
broken now, as bpf_loop and others basically permitted incorrect code
to pass through. And the iteration until we arrive at a fixpoint won't
be able to cover all classes of loop bodies, so I think we will need
to make a tradeoff in terms of expressibility vs verifier complexity.
In general, cases like this with branches/conditional exit from the
loop body which do not converge will lead to state explosion anyway.

> Thanks,
> Eduard
>
> ---
>
> [...]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-09-17 22:09       ` Kumar Kartikeya Dwivedi
@ 2023-09-18 13:06         ` Eduard Zingerman
  2023-09-19 16:28           ` Eduard Zingerman
  0 siblings, 1 reply; 52+ messages in thread
From: Eduard Zingerman @ 2023-09-18 13:06 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: Andrii Nakryiko, Alexei Starovoitov, Andrew Werner, bpf,
	Andrei Matei, Tamir Duberstein, Joanne Koong, kernel-team,
	Song Liu

On Mon, 2023-09-18 at 00:09 +0200, Kumar Kartikeya Dwivedi wrote:
[...]
> 
> I was planning to get to this eventually, but it's great if you are
> looking to do it.
> 
> > After some analysis I decided to go with Alexei's suggestion and
> > implement something similar to iterators for selected set of helper
> > functions that accept "looping" callbacks, such as:
> > - bpf_loop
> > - bpf_for_each_map_elem
> > - bpf_user_ringbuf_drain
> > - bpf_timer_set_callback (?)
> > 
> 
> The last one won't require this, I think. The callback only runs once,
> and asynchronously.
> Others you are missing are bpf_find_vma and the bpf_rbtree_add kfunc.
> While the latter is not an interator, it can invoke the same callback
> an unknown number of times.

Noted, thank you.

> The other major thing that needs to be checked is cases where callback
> will be executed zero times. There is some discussion on this in [0],
> where this bug was reported originally. Basically, we need to explore
> a path where the callback execution does not happen and ensure the
> program is still safe.
> 
> [0]: https://lore.kernel.org/bpf/CAP01T74cOJzo3xQcW6weURH+mYRQ7kAWMqQOgtd_ymSbhrOoMQ@mail.gmail.com
> 
> You could also consider taking the selftests from this link, some of
> them allow completely breaking safety properties of the verifier.

Thank you, I missed this thread.

[...]
> > I have a patch (at the end of this email) that correctly recognizes
> > the bpf_loop example in this thread as unsafe. However this patch has
> > a few deficiencies:
> > 
> > - verif_scale_strobemeta_bpf_loop selftest is not passing, because
> >   read_map_var function invoked from the callback continuously
> >   increments 'payload' pointer used in subsequent iterations.
> > 
> >   In order to handle such code I need to add an upper bound tracking
> >   for iteration depth (when such bound could be deduced).
> > 
> 
> Yes, either the loop converges before the upper limit is hit, or we
> keep unrolling it until we exhaust the deduced loop count passed to
> the bpf_loop helper. But we will need to mark the loop count argument
> register as precise when we make use of its value in such a case.

Yes, marking counter as precise makes sense but I think it would
happen automatically. My current plan is to mark index register as
bounded and use depth tracking only internally to decide whether to
schedule another "active" iteration state. If that would not be
sufficient, I will try assigning specific values for loop counter
(hesitate to do it to avoid conflict with convergence detection).

[...]
> > - the callback as bellow currently causes state explosion:
> > 
> >   static int precise1_callback(__u32 idx, struct precise1_ctx *ctx)
> >   {
> >       if (idx == 0)
> >           ctx->a = 1;
> >       if (idx == 1 && ctx->a == 1)
> >           ctx->b = 1;
> >       return 0;
> >   }
> > 
> >   I'm not sure yet what to do about this, there are several possibilities:
> >   - tweak the order in which states are visited (need to think about it);
> >   - check states in bpf_verifier_env::head (not explored yet) for
> >     equivalence and avoid enqueuing duplicate states.
> > 
> > 
> > I'll proceed addressing the issues above on Monday.
> > 
> 
> I think there is a class of programs that are nevertheless going to be
> broken now, as bpf_loop and others basically permitted incorrect code
> to pass through. And the iteration until we arrive at a fixpoint won't
> be able to cover all classes of loop bodies, so I think we will need
> to make a tradeoff in terms of expressibility vs verifier complexity.
> In general, cases like this with branches/conditional exit from the
> loop body which do not converge will lead to state explosion anyway.

This might be the case, however I found that specific issue with
"precise1_callback" could be resolved by using mark_force_checkpoint()
to mark iteration back-edges (same way it is done for regular iterators).
So maybe impact won't be that bad.

Thanks,
Eduard

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-09-18 13:06         ` Eduard Zingerman
@ 2023-09-19 16:28           ` Eduard Zingerman
  2023-09-19 23:02             ` Andrii Nakryiko
  0 siblings, 1 reply; 52+ messages in thread
From: Eduard Zingerman @ 2023-09-19 16:28 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu,
	Kumar Kartikeya Dwivedi

It looks like I hit a related but slightly different bug with bpf_iter_next().
Consider the following example:

    SEC("fentry/" SYS_PREFIX "sys_nanosleep")
    int num_iter_bug(const void *ctx) {
        struct bpf_iter_num it;                 // fp[-8] below
        __u64 val = 0;                          // fp[-16] in the below
        __u64 *ptr = &val;                      // r7 below
        __u64 rnd = bpf_get_current_pid_tgid(); // r6 below
        void *v;
    
        bpf_iter_num_new(&it, 0, 10);
        while ((v = bpf_iter_num_next(&it))) {
            rnd++;
            if (rnd == 42) {
                ptr = (void*)(0xdead);
                continue;
            }
            bpf_probe_read_user(ptr, 8, (void*)(0xdeadbeef));
        }
        bpf_iter_num_destroy(&it);
        return 0;
    }

(Unfortunately, it had to be converted to assembly to avoid compiler
 clobbering loop structure, complete test case is at the end of the email).
 
The example is not safe because of 0xdead being a possible `ptr` value.
However, currently it is marked as safe.

This happens because of states_equal() usage for iterator convergence
detection:

    static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
        ...
    	while (sl)
    		states_cnt++;
    		if (sl->state.insn_idx != insn_idx)
    			goto next;
    
    		if (sl->state.branches)
                ...
    			if (is_iter_next_insn(env, insn_idx)) {
    				if (states_equal(env, &sl->state, cur)) {
                        ...
    					if (iter_state->iter.state == BPF_ITER_STATE_ACTIVE)
    						goto hit;
    				}
    				goto skip_inf_loop_check;
    			}
        ...

With some additional logging I see that the following states are
considered equal:

    13: (85) call bpf_iter_num_next#59908
    ...
    at is_iter_next_insn(insn 13):
      old state:
         R0=scalar() R1_rw=fp-8 R6_r=scalar(id=1) R7=fp-16 R10=fp0
         fp-8_r=iter_num(ref_id=2,state=active,depth=0) fp-16=00000000 refs=2
      cur state:
         R0=rdonly_mem(id=3,ref_obj_id=2,off=0,imm=0) R1_w=fp-8 R6=42 R7_w=57005
         R10=fp0 fp-8=iter_num(ref_id=2,state=active,depth=1) fp-16=00000000 refs=2
    states_equal()?: true

Note that R7=fp-16 in old state vs R7_w=57005 in cur state.
The registers are considered equal because R7 does not have a read mark.
However read marks are not yet finalized for old state because
sl->state.branches != 0. (Note: precision marks are not finalized as
well, which should be a problem, but this requires another example).

A possible fix is to add a special flag to states_equal() and
conditionally ignore logic related to liveness and precision when this
flag is set. Set this flag for is_iter_next_insn() branch above.

---

/* BTF FUNC records are not generated for kfuncs referenced
 * from inline assembly. These records are necessary for
 * libbpf to link the program. The function below is a hack
 * to ensure that BTF FUNC records are generated.
 */
void __kfunc_btf_root(void)
{
	bpf_iter_num_new(0, 0, 0);
	bpf_iter_num_next(0);
	bpf_iter_num_destroy(0);
}

SEC("fentry/" SYS_PREFIX "sys_nanosleep")
__naked int num_iter_bug(const void *ctx)
{
	asm volatile (
		// r7 = &fp[-16]
		// fp[-16] = 0
		"r7 = r10;"
		"r7 += -16;"
		"r0 = 0;"
		"*(u64 *)(r7 + 0) = r0;"
		// r6 = bpf_get_current_pid_tgid()
		"call %[bpf_get_current_pid_tgid];"
		"r6 = r0;"
		// bpf_iter_num_new(&fp[-8], 0, 10)
		"r1 = r10;"
		"r1 += -8;"
		"r2 = 0;"
		"r3 = 10;"
		"call %[bpf_iter_num_new];"
		// while (bpf_iter_num_next(&fp[-8])) {
		//   r6++
		//   if (r6 != 42) {
		//     r7 = 0xdead
		//     continue;
		//   }
		//   bpf_probe_read_user(r7, 8, 0xdeadbeef)
		// }
	"1:"
		"r1 = r10;"
		"r1 += -8;"
		"call %[bpf_iter_num_next];"
		"if r0 == 0 goto 2f;"
		"r6 += 1;"
		"if r6 != 42 goto 3f;"
		"r7 = 0xdead;"
		"goto 1b;"
	"3:"
		"r1 = r7;"
		"r2 = 8;"
		"r3 = 0xdeadbeef;"
		"call %[bpf_probe_read_user];"
		"goto 1b;"
	"2:"
		// bpf_iter_num_destroy(&fp[-8])
		"r1 = r10;"
		"r1 += -8;"
		"call %[bpf_iter_num_destroy];"
		// return 0
		"r0 = 0;"
		"exit;"
		:
		: __imm(bpf_get_current_pid_tgid),
		  __imm(bpf_iter_num_new),
		  __imm(bpf_iter_num_next),
		  __imm(bpf_iter_num_destroy),
		  __imm(bpf_probe_read_user)
		: __clobber_all
	);
}

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-09-19 16:28           ` Eduard Zingerman
@ 2023-09-19 23:02             ` Andrii Nakryiko
  2023-09-20  0:19               ` Eduard Zingerman
  0 siblings, 1 reply; 52+ messages in thread
From: Andrii Nakryiko @ 2023-09-19 23:02 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Alexei Starovoitov, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu,
	Kumar Kartikeya Dwivedi

On Tue, Sep 19, 2023 at 9:28 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> It looks like I hit a related but slightly different bug with bpf_iter_next().
> Consider the following example:
>
>     SEC("fentry/" SYS_PREFIX "sys_nanosleep")
>     int num_iter_bug(const void *ctx) {
>         struct bpf_iter_num it;                 // fp[-8] below
>         __u64 val = 0;                          // fp[-16] in the below
>         __u64 *ptr = &val;                      // r7 below
>         __u64 rnd = bpf_get_current_pid_tgid(); // r6 below
>         void *v;
>
>         bpf_iter_num_new(&it, 0, 10);
>         while ((v = bpf_iter_num_next(&it))) {
>             rnd++;
>             if (rnd == 42) {
>                 ptr = (void*)(0xdead);
>                 continue;
>             }
>             bpf_probe_read_user(ptr, 8, (void*)(0xdeadbeef));
>         }
>         bpf_iter_num_destroy(&it);
>         return 0;
>     }
>
> (Unfortunately, it had to be converted to assembly to avoid compiler
>  clobbering loop structure, complete test case is at the end of the email).
>
> The example is not safe because of 0xdead being a possible `ptr` value.
> However, currently it is marked as safe.
>
> This happens because of states_equal() usage for iterator convergence
> detection:
>
>     static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
>         ...
>         while (sl)
>                 states_cnt++;
>                 if (sl->state.insn_idx != insn_idx)
>                         goto next;
>
>                 if (sl->state.branches)
>                 ...
>                         if (is_iter_next_insn(env, insn_idx)) {
>                                 if (states_equal(env, &sl->state, cur)) {
>                         ...
>                                         if (iter_state->iter.state == BPF_ITER_STATE_ACTIVE)
>                                                 goto hit;
>                                 }
>                                 goto skip_inf_loop_check;
>                         }
>         ...
>
> With some additional logging I see that the following states are
> considered equal:
>
>     13: (85) call bpf_iter_num_next#59908
>     ...
>     at is_iter_next_insn(insn 13):
>       old state:
>          R0=scalar() R1_rw=fp-8 R6_r=scalar(id=1) R7=fp-16 R10=fp0
>          fp-8_r=iter_num(ref_id=2,state=active,depth=0) fp-16=00000000 refs=2
>       cur state:
>          R0=rdonly_mem(id=3,ref_obj_id=2,off=0,imm=0) R1_w=fp-8 R6=42 R7_w=57005
>          R10=fp0 fp-8=iter_num(ref_id=2,state=active,depth=1) fp-16=00000000 refs=2
>     states_equal()?: true
>
> Note that R7=fp-16 in old state vs R7_w=57005 in cur state.
> The registers are considered equal because R7 does not have a read mark.
> However read marks are not yet finalized for old state because
> sl->state.branches != 0. (Note: precision marks are not finalized as
> well, which should be a problem, but this requires another example).

I originally convinced myself that non-finalized precision marking is
fine, see the comment next to process_iter_next_call() for reasoning.
If you can have a dedicated example for precision that would be great.

As for read marks... Yep, that's a bummer. That r7 is not used after
the loop, so is not marked as read, and it's basically ignored for
loop invariant checking, which is definitely not what was intended.

>
> A possible fix is to add a special flag to states_equal() and
> conditionally ignore logic related to liveness and precision when this
> flag is set. Set this flag for is_iter_next_insn() branch above.

Probably not completely ignore liveness (and maybe precision, but
let's do it one step at a time), but only at least one of the
registers or stack slots are marked as written or read in one of
old/new states? Basically, if some register/slot at the end of
iteration is read or modified, it must be important for loop
invariant, so even if the parent state says it's not read, we still
treat it as read. Can you please try that with just read/write marks
and see if anything fails?


>
> ---
>
> /* BTF FUNC records are not generated for kfuncs referenced
>  * from inline assembly. These records are necessary for
>  * libbpf to link the program. The function below is a hack
>  * to ensure that BTF FUNC records are generated.
>  */
> void __kfunc_btf_root(void)
> {
>         bpf_iter_num_new(0, 0, 0);
>         bpf_iter_num_next(0);
>         bpf_iter_num_destroy(0);
> }
>
> SEC("fentry/" SYS_PREFIX "sys_nanosleep")
> __naked int num_iter_bug(const void *ctx)
> {
>         asm volatile (
>                 // r7 = &fp[-16]
>                 // fp[-16] = 0
>                 "r7 = r10;"
>                 "r7 += -16;"
>                 "r0 = 0;"
>                 "*(u64 *)(r7 + 0) = r0;"
>                 // r6 = bpf_get_current_pid_tgid()
>                 "call %[bpf_get_current_pid_tgid];"
>                 "r6 = r0;"
>                 // bpf_iter_num_new(&fp[-8], 0, 10)
>                 "r1 = r10;"
>                 "r1 += -8;"
>                 "r2 = 0;"
>                 "r3 = 10;"
>                 "call %[bpf_iter_num_new];"
>                 // while (bpf_iter_num_next(&fp[-8])) {
>                 //   r6++
>                 //   if (r6 != 42) {
>                 //     r7 = 0xdead
>                 //     continue;
>                 //   }
>                 //   bpf_probe_read_user(r7, 8, 0xdeadbeef)
>                 // }
>         "1:"
>                 "r1 = r10;"
>                 "r1 += -8;"
>                 "call %[bpf_iter_num_next];"
>                 "if r0 == 0 goto 2f;"
>                 "r6 += 1;"
>                 "if r6 != 42 goto 3f;"
>                 "r7 = 0xdead;"
>                 "goto 1b;"
>         "3:"
>                 "r1 = r7;"
>                 "r2 = 8;"
>                 "r3 = 0xdeadbeef;"
>                 "call %[bpf_probe_read_user];"
>                 "goto 1b;"
>         "2:"
>                 // bpf_iter_num_destroy(&fp[-8])
>                 "r1 = r10;"
>                 "r1 += -8;"
>                 "call %[bpf_iter_num_destroy];"
>                 // return 0
>                 "r0 = 0;"
>                 "exit;"
>                 :
>                 : __imm(bpf_get_current_pid_tgid),
>                   __imm(bpf_iter_num_new),
>                   __imm(bpf_iter_num_next),
>                   __imm(bpf_iter_num_destroy),
>                   __imm(bpf_probe_read_user)
>                 : __clobber_all
>         );
> }

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-09-17 21:37     ` Eduard Zingerman
  2023-09-17 22:09       ` Kumar Kartikeya Dwivedi
@ 2023-09-19 23:14       ` Andrii Nakryiko
  2023-09-20  0:06         ` Eduard Zingerman
  1 sibling, 1 reply; 52+ messages in thread
From: Andrii Nakryiko @ 2023-09-19 23:14 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Alexei Starovoitov, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu

On Sun, Sep 17, 2023 at 2:37 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Fri, 2023-07-07 at 11:08 -0700, Andrii Nakryiko wrote:
> > On Fri, Jul 7, 2023 at 9:44 AM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Fri, Jul 7, 2023 at 7:04 AM Andrew Werner <awerner32@gmail.com> wrote:
> > > >
> > > > When it comes to fixing the problem, I don't quite know where to start.
> > > > Perhaps these iteration callbacks ought to be treated more like global functions
> > > > -- you can't always make assumptions about the state of the data in the context
> > > > pointer. Treating the context pointer as totally opaque seems bad from
> > > > a usability
> > > > perspective. Maybe there's a way to attempt to verify the function
> > > > body of the function
> > > > by treating all or part of the context as read-only, and then if that
> > > > fails, go back and
> > > > assume nothing about that part of the context structure. What is the
> > > > right way to
> > > > think about plugging this hole?
> > >
> > > 'treat as global' might be a way to fix it, but it will likely break
> > > some setups, since people passing pointers in a context and current
> > > global func verification doesn't support that.
> >
> > yeah, the intended use case is to return something from callbacks
> > through context pointer. So it will definitely break real uses.
> >
> > > I think the simplest fix would be to disallow writes into any pointers
> > > within a ctx. Writes to scalars should still be allowed.
> >
> > It might be a partial mitigation, but even with SCALARs there could be
> > problems due to assumed SCALAR range, which will be invalid if
> > callback is never called or called many times.
> >
> > > Much more complex fix would be to verify callbacks as
> > > process_iter_next_call(). See giant comment next to it.
> >
> > yep, that seems like the right way forward. We need to simulate 0, 1,
> > 2, .... executions of callbacks until we validate and exhaust all
> > possible context modification permutations, just like open-coded
> > iterators do it
> >
> > > But since we need a fix for stable I'd try the simple approach first.
> > > Could you try to implement that?
>
> Hi All,
>
> This issue seems stalled, so I took a look over the weekend.
> I have a work in progress fix, please let me know if you don't agree
> with direction I'm taking or if I'm stepping on anyone's toes.
>
> After some analysis I decided to go with Alexei's suggestion and
> implement something similar to iterators for selected set of helper
> functions that accept "looping" callbacks, such as:
> - bpf_loop
> - bpf_for_each_map_elem
> - bpf_user_ringbuf_drain
> - bpf_timer_set_callback (?)

As Kumar mentioned, pretty much all helpers with callbacks are either
0-1, or 0-1-many cases, so all of them (except for the async callback
ones that shouldn't be accepting parent stack pointers) should be
validated.

>
> The sketch of the fix looks as follows:
> - extend struct bpf_func_state with callback iterator state information:
>   - active/non-active flag

not sure why you need this flag

>   - depth

yep, this seems necessary

>   - r1-r5 state at the entry to callback;

not sure why you need this, this should be part of func_state already?


> - extend __check_func_call() to setup callback iterator state when
>   call to suitable helper function is processed;

this logic is "like iterator", but it's not exactly the same, so I
don't think we should reuse bpf_iter state (as you can see above with
active/non-active flag, for example)

> - extend BPF_EXIT processing (prepare_func_exit()) to queue new
>   callback visit upon return from iterating callback
>   (similar to process_iter_next_call());

as mentioned above, we should simulate "no callback called" situation
as well, don't forget about that

> - extend is_state_visited() to account for callback iterator hits in a
>   way similar to regular iterators;
> - extend backtrack_insn() to correctly react to jumps from callback
>   exit to callback entry.
>
> I have a patch (at the end of this email) that correctly recognizes
> the bpf_loop example in this thread as unsafe. However this patch has
> a few deficiencies:
>
> - verif_scale_strobemeta_bpf_loop selftest is not passing, because
>   read_map_var function invoked from the callback continuously
>   increments 'payload' pointer used in subsequent iterations.
>
>   In order to handle such code I need to add an upper bound tracking
>   for iteration depth (when such bound could be deduced).

if the example is broken and really can get out of bounds, we should
fix an example to be provable within bounds no matter how many
iterations it was called with

>
> - loop detection is broken for simple callback as below:
>
>   static int loop_callback_infinite(__u32 idx, __u64 *data)
>   {
>       for (;;)
>           (*ctx)++;
>       return 0;
>   }
>
>   To handle such code I need to change is_state_visited() to do
>   callback iterator loop/hit detection on exit from callback
>   (returns are not prune points at the moment), currently it is done
>   on entry.

I'm a bit confused. What's ctx in the above example? And why loop
detection doesn't detect for(;;) loop right now?

>
> - the callback as bellow currently causes state explosion:
>
>   static int precise1_callback(__u32 idx, struct precise1_ctx *ctx)
>   {
>       if (idx == 0)
>           ctx->a = 1;
>       if (idx == 1 && ctx->a == 1)
>           ctx->b = 1;
>       return 0;
>   }

why state explosion? there should be a bunch of different branches
(idx 0, 1, something else x ctx->a = 1 or not 1 and ctx->b being 1 or
not), but it should be a fixed number of states? Do you know what
causes the explosion?

>
>   I'm not sure yet what to do about this, there are several possibilities:
>   - tweak the order in which states are visited (need to think about it);
>   - check states in bpf_verifier_env::head (not explored yet) for
>     equivalence and avoid enqueuing duplicate states.
>
> I'll proceed addressing the issues above on Monday.
>
> Thanks,
> Eduard
>
> ---
>
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index a3236651ec64..5589f55e42ba 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -70,6 +70,17 @@ enum bpf_iter_state {
>         BPF_ITER_STATE_DRAINED,
>  };
>
> +struct bpf_iter {
> +       /* BTF container and BTF type ID describing
> +        * struct bpf_iter_<type> of an iterator state
> +        */
> +       struct btf *btf;
> +       u32 btf_id;
> +       /* packing following two fields to fit iter state into 16 bytes */
> +       enum bpf_iter_state state:2;
> +       int depth:30;
> +};
> +
>  struct bpf_reg_state {
>         /* Ordering of fields matters.  See states_equal() */
>         enum bpf_reg_type type;
> @@ -115,16 +126,7 @@ struct bpf_reg_state {
>                 } dynptr;
>
>                 /* For bpf_iter stack slots */
> -               struct {
> -                       /* BTF container and BTF type ID describing
> -                        * struct bpf_iter_<type> of an iterator state
> -                        */
> -                       struct btf *btf;
> -                       u32 btf_id;
> -                       /* packing following two fields to fit iter state into 16 bytes */
> -                       enum bpf_iter_state state:2;
> -                       int depth:30;
> -               } iter;
> +               struct bpf_iter iter;

Let's not do this, conceptually processes are similar, but bpf_iter is
one thing, and this callback validation is another thing. Let's not
conflate things.

>
>                 /* Max size from any of the above. */
>                 struct {

[...]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-09-19 23:14       ` Andrii Nakryiko
@ 2023-09-20  0:06         ` Eduard Zingerman
  2023-09-20 16:37           ` Andrii Nakryiko
  0 siblings, 1 reply; 52+ messages in thread
From: Eduard Zingerman @ 2023-09-20  0:06 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu

On Tue, 2023-09-19 at 16:14 -0700, Andrii Nakryiko wrote:
[...]
> > Hi All,
> > 
> > This issue seems stalled, so I took a look over the weekend.
> > I have a work in progress fix, please let me know if you don't agree
> > with direction I'm taking or if I'm stepping on anyone's toes.
> > 
> > After some analysis I decided to go with Alexei's suggestion and
> > implement something similar to iterators for selected set of helper
> > functions that accept "looping" callbacks, such as:
> > - bpf_loop
> > - bpf_for_each_map_elem
> > - bpf_user_ringbuf_drain
> > - bpf_timer_set_callback (?)

Hi Andrii, thank you for taking a look.

> As Kumar mentioned, pretty much all helpers with callbacks are either
> 0-1, or 0-1-many cases, so all of them (except for the async callback
> ones that shouldn't be accepting parent stack pointers) should be
> validated.

Yes, makes sense. I need to finish the fix for bpf_loop first, as it
seems as a good representative for many possible issues.

> > The sketch of the fix looks as follows:
> > - extend struct bpf_func_state with callback iterator state information:
> >   - active/non-active flag
> 
> not sure why you need this flag

Note:
  I have a better version of the fix locally but will share it a bit later.
  It actually depends on states_equal() discussion in the sibling thread.

In mu upgraded version I use non-zero depth as an indicator the.
So no separate flag in the end.

> >   - depth
> 
> yep, this seems necessary
> 
> >   - r1-r5 state at the entry to callback;
> 
> not sure why you need this, this should be part of func_state already?

This was a bit tricky but I think I figured an acceptable solution w/o
extra copies for r1-r5. The tricky part is the structure of
check_helper_call():
- collect arguments 'meta' info & check arguments
- call __check_func_call():
  - setup frame for callback;
  - schedule next instruction index to be callback entry;
- reset r1-r5 in caller's frame;
- set r0 in caller's frame.

The problem is that check_helper_call() resets caller's r1-r5
immediately. I figured that this reset could be done at BPF_EXIT
processing for callback instead => no extra copy needed.

> > - extend __check_func_call() to setup callback iterator state when
> >   call to suitable helper function is processed;
> 
> this logic is "like iterator", but it's not exactly the same, so I
> don't think we should reuse bpf_iter state (as you can see above with
> active/non-active flag, for example)

Yes, I agree, already removed this locally.

> > - extend BPF_EXIT processing (prepare_func_exit()) to queue new
> >   callback visit upon return from iterating callback
> >   (similar to process_iter_next_call());
> 
> as mentioned above, we should simulate "no callback called" situation
> as well, don't forget about that

Yeap

> > - extend is_state_visited() to account for callback iterator hits in a
> >   way similar to regular iterators;
> > - extend backtrack_insn() to correctly react to jumps from callback
> >   exit to callback entry.
> > 
> > I have a patch (at the end of this email) that correctly recognizes
> > the bpf_loop example in this thread as unsafe. However this patch has
> > a few deficiencies:
> > 
> > - verif_scale_strobemeta_bpf_loop selftest is not passing, because
> >   read_map_var function invoked from the callback continuously
> >   increments 'payload' pointer used in subsequent iterations.
> > 
> >   In order to handle such code I need to add an upper bound tracking
> >   for iteration depth (when such bound could be deduced).
> 
> if the example is broken and really can get out of bounds, we should
> fix an example to be provable within bounds no matter how many
> iterations it was called with

For that particular case number of iterations guarantees that payload
pointer will not get out of bounds. It is bumped up 4 bytes on each
iteration, but the number of iterations and buffer size correlate to
avoid out of bound access.

> > - loop detection is broken for simple callback as below:
> > 
> >   static int loop_callback_infinite(__u32 idx, __u64 *data)
> >   {
> >       for (;;)
> >           (*ctx)++;
> >       return 0;
> >   }
> > 
> >   To handle such code I need to change is_state_visited() to do
> >   callback iterator loop/hit detection on exit from callback
> >   (returns are not prune points at the moment), currently it is done
> >   on entry.
> 
> I'm a bit confused. What's ctx in the above example? And why loop
> detection doesn't detect for(;;) loop right now?

It's an implementation detail for the fix sketch shared in the parent
email. It can catch cases like this:

    ... any insn ...;
    for (;;)
        (*ctx)++;
    return 0;

But cannot catch case like this:

    for (;;)
        (*ctx)++;
    return 0;

In that sketch I jump to the callback start from callback return and
callback entry needs two checks:
- iteration convergence
- simple looping
Because of the code structure only iteration convergence check was done.
Locally, I fixed this issue by jumping to the callback call instruction
instead.

> > - the callback as bellow currently causes state explosion:
> > 
> >   static int precise1_callback(__u32 idx, struct precise1_ctx *ctx)
> >   {
> >       if (idx == 0)
> >           ctx->a = 1;
> >       if (idx == 1 && ctx->a == 1)
> >           ctx->b = 1;
> >       return 0;
> >   }
> 
> why state explosion? there should be a bunch of different branches
> (idx 0, 1, something else x ctx->a = 1 or not 1 and ctx->b being 1 or
> not), but it should be a fixed number of states? Do you know what
> causes the explosion?

I forgot to do mark_force_checkpoint() at callback entry. Fixed locally.

> 
> > 
> >   I'm not sure yet what to do about this, there are several possibilities:
> >   - tweak the order in which states are visited (need to think about it);
> >   - check states in bpf_verifier_env::head (not explored yet) for
> >     equivalence and avoid enqueuing duplicate states.
> > 
> > I'll proceed addressing the issues above on Monday.
> > 
> > Thanks,
> > Eduard
> > 
> > ---
> > 
> > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > index a3236651ec64..5589f55e42ba 100644
> > --- a/include/linux/bpf_verifier.h
> > +++ b/include/linux/bpf_verifier.h
> > @@ -70,6 +70,17 @@ enum bpf_iter_state {
> >         BPF_ITER_STATE_DRAINED,
> >  };
> > 
> > +struct bpf_iter {
> > +       /* BTF container and BTF type ID describing
> > +        * struct bpf_iter_<type> of an iterator state
> > +        */
> > +       struct btf *btf;
> > +       u32 btf_id;
> > +       /* packing following two fields to fit iter state into 16 bytes */
> > +       enum bpf_iter_state state:2;
> > +       int depth:30;
> > +};
> > +
> >  struct bpf_reg_state {
> >         /* Ordering of fields matters.  See states_equal() */
> >         enum bpf_reg_type type;
> > @@ -115,16 +126,7 @@ struct bpf_reg_state {
> >                 } dynptr;
> > 
> >                 /* For bpf_iter stack slots */
> > -               struct {
> > -                       /* BTF container and BTF type ID describing
> > -                        * struct bpf_iter_<type> of an iterator state
> > -                        */
> > -                       struct btf *btf;
> > -                       u32 btf_id;
> > -                       /* packing following two fields to fit iter state into 16 bytes */
> > -                       enum bpf_iter_state state:2;
> > -                       int depth:30;
> > -               } iter;
> > +               struct bpf_iter iter;
> 
> Let's not do this, conceptually processes are similar, but bpf_iter is
> one thing, and this callback validation is another thing. Let's not
> conflate things.
> 
> > 
> >                 /* Max size from any of the above. */
> >                 struct {
> 
> [...]


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-09-19 23:02             ` Andrii Nakryiko
@ 2023-09-20  0:19               ` Eduard Zingerman
  2023-09-20 16:20                 ` Eduard Zingerman
  2023-09-21  9:14                 ` Alexei Starovoitov
  0 siblings, 2 replies; 52+ messages in thread
From: Eduard Zingerman @ 2023-09-20  0:19 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu,
	Kumar Kartikeya Dwivedi

On Tue, 2023-09-19 at 16:02 -0700, Andrii Nakryiko wrote:
> On Tue, Sep 19, 2023 at 9:28 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > 
> > It looks like I hit a related but slightly different bug with bpf_iter_next().
> > Consider the following example:
> > 
> >     SEC("fentry/" SYS_PREFIX "sys_nanosleep")
> >     int num_iter_bug(const void *ctx) {
> >         struct bpf_iter_num it;                 // fp[-8] below
> >         __u64 val = 0;                          // fp[-16] in the below
> >         __u64 *ptr = &val;                      // r7 below
> >         __u64 rnd = bpf_get_current_pid_tgid(); // r6 below
> >         void *v;
> > 
> >         bpf_iter_num_new(&it, 0, 10);
> >         while ((v = bpf_iter_num_next(&it))) {
> >             rnd++;
> >             if (rnd == 42) {
> >                 ptr = (void*)(0xdead);
> >                 continue;
> >             }
> >             bpf_probe_read_user(ptr, 8, (void*)(0xdeadbeef));
> >         }
> >         bpf_iter_num_destroy(&it);
> >         return 0;
> >     }
> > 
> > (Unfortunately, it had to be converted to assembly to avoid compiler
> >  clobbering loop structure, complete test case is at the end of the email).
> > 
> > The example is not safe because of 0xdead being a possible `ptr` value.
> > However, currently it is marked as safe.
> > 
> > This happens because of states_equal() usage for iterator convergence
> > detection:
> > 
> >     static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
> >         ...
> >         while (sl)
> >                 states_cnt++;
> >                 if (sl->state.insn_idx != insn_idx)
> >                         goto next;
> > 
> >                 if (sl->state.branches)
> >                 ...
> >                         if (is_iter_next_insn(env, insn_idx)) {
> >                                 if (states_equal(env, &sl->state, cur)) {
> >                         ...
> >                                         if (iter_state->iter.state == BPF_ITER_STATE_ACTIVE)
> >                                                 goto hit;
> >                                 }
> >                                 goto skip_inf_loop_check;
> >                         }
> >         ...
> > 
> > With some additional logging I see that the following states are
> > considered equal:
> > 
> >     13: (85) call bpf_iter_num_next#59908
> >     ...
> >     at is_iter_next_insn(insn 13):
> >       old state:
> >          R0=scalar() R1_rw=fp-8 R6_r=scalar(id=1) R7=fp-16 R10=fp0
> >          fp-8_r=iter_num(ref_id=2,state=active,depth=0) fp-16=00000000 refs=2
> >       cur state:
> >          R0=rdonly_mem(id=3,ref_obj_id=2,off=0,imm=0) R1_w=fp-8 R6=42 R7_w=57005
> >          R10=fp0 fp-8=iter_num(ref_id=2,state=active,depth=1) fp-16=00000000 refs=2
> >     states_equal()?: true
> > 
> > Note that R7=fp-16 in old state vs R7_w=57005 in cur state.
> > The registers are considered equal because R7 does not have a read mark.
> > However read marks are not yet finalized for old state because
> > sl->state.branches != 0. (Note: precision marks are not finalized as
> > well, which should be a problem, but this requires another example).
> 
> I originally convinced myself that non-finalized precision marking is
> fine, see the comment next to process_iter_next_call() for reasoning.
> If you can have a dedicated example for precision that would be great.
> 
> As for read marks... Yep, that's a bummer. That r7 is not used after
> the loop, so is not marked as read, and it's basically ignored for
> loop invariant checking, which is definitely not what was intended.

Liveness is a precondition for all subsequent checks, so example
involving precision would also involve liveness. Here is a combined
example:

    r8 = 0
    fp[-16] = 0
    r7 = -16
    r6 = bpf_get_current_pid_tgid()
    bpf_iter_num_new(&fp[-8], 0, 10)
    while (bpf_iter_num_next(&fp[-8])) {
      r6++
      if (r6 != 42) {
        r7 = -32
        continue;
      }
      r0 = r10
      r0 += r7
      r8 = *(u64 *)(r0 + 0)
    }
    bpf_iter_num_destroy(&fp[-8])
    return r8

(Complete source code is at the end of the email).

The call to bpf_iter_num_next() is reachable with r7 values -16 and -32.
State with r7=-16 is visited first, at which point r7 has no read mark
and is not marked precise.
State with r7=-32 is visited second:
- states_equal() for is_iter_next_insn() should ignore absence of
  REG_LIVE_READ mark on r7, otherwise both states would be considered
  identical;
- states_equal() for is_iter_next_insn() should ignore absence of
  precision mark on r7, otherwise both states would be considered
  identical.

> > A possible fix is to add a special flag to states_equal() and
> > conditionally ignore logic related to liveness and precision when this
> > flag is set. Set this flag for is_iter_next_insn() branch above.
> 
> Probably not completely ignore liveness (and maybe precision, but
> let's do it one step at a time), but only at least one of the
> registers or stack slots are marked as written or read in one of
> old/new states? Basically, if some register/slot at the end of
> iteration is read or modified, it must be important for loop
> invariant, so even if the parent state says it's not read, we still
> treat it as read. Can you please try that with just read/write marks
> and see if anything fails?

Unfortunately for the example above neither liveness nor precision
marks are present when states are compared, e.g. here is the
(extended) log:

12: (85) call bpf_iter_num_next#52356
...
is_iter_next (12):
  old:
     R0=scalar() R1_rw=fp-8 R6_r=scalar(id=1) R7=-16 R8_r=0 R10=fp0
     fp-8_r=iter_num(ref_id=2,state=active,depth=0) fp-16=00000000 refs=2
  new:
     R0=rdonly_mem(id=3,ref_obj_id=2,off=0,imm=0) R1_w=fp-8 R6=42 R7_w=-32 R8=0 R10=fp0
     fp-8=iter_num(ref_id=2,state=active,depth=1) fp-16=00000000 refs=2
> hit

Note that for old state r7 value is not really important to get to exit,
the verification trace looks as follows:
1. ... start ...
2. bpf_iter_num_next(&fp[-8]) ;; r7=-16, checkpoint saved
3. r6++
4. r7 = -32
5. bpf_iter_num_next(&fp[-8]) ;; push new state with r7=-32
6. bpf_iter_num_destroy(&fp[-8])
7. return 8

The r7 liveness and precision is important for state scheduled at (5)
but it is not known yet, so when both states are compared marks are
absent in cur and in old.

---

/* BTF FUNC records are not generated for kfuncs referenced
 * from inline assembly. These records are necessary for
 * libbpf to link the program. The function below is a hack
 * to ensure that BTF FUNC records are generated.
 */
void __kfunc_btf_root(void)
{
	bpf_iter_num_new(0, 0, 0);
	bpf_iter_num_next(0);
	bpf_iter_num_destroy(0);
}

SEC("fentry/" SYS_PREFIX "sys_nanosleep")
__failure
__msg("20: (79) r8 = *(u64 *)(r0 +0)")
__msg("invalid read from stack R0 off=-32 size=8")
__naked int iter_next_exact(const void *ctx)
{
	asm volatile (
		"r8 = 0;"
		"*(u64 *)(r10 - 16) = r8;"
		"r7 = -16;"
		"call %[bpf_get_current_pid_tgid];"
		"r6 = r0;"
		"r1 = r10;"
		"r1 += -8;"
		"r2 = 0;"
		"r3 = 10;"
		"call %[bpf_iter_num_new];"
	"1:"
		"r1 = r10;"
		"r1 += -8;\n"
		"call %[bpf_iter_num_next];"
		"if r0 == 0 goto 2f;"
		"r6 += 1;"
		"if r6 != 42 goto 3f;"
		"r7 = -32;"
		"goto 1b;\n"
	"3:"
		"r0 = r10;"
		"r0 += r7;"
		"r8 = *(u64 *)(r0 + 0);"
		"goto 1b;\n"
	"2:"
		"r1 = r10;"
		"r1 += -8;"
		"call %[bpf_iter_num_destroy];"
		"r0 = r8;"
		"exit;"
		:
		: __imm(bpf_get_current_pid_tgid),
		  __imm(bpf_iter_num_new),
		  __imm(bpf_iter_num_next),
		  __imm(bpf_iter_num_destroy),
		  __imm(bpf_probe_read_user)
		: __clobber_all
	);
}


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-09-20  0:19               ` Eduard Zingerman
@ 2023-09-20 16:20                 ` Eduard Zingerman
  2023-09-20 16:57                   ` Andrii Nakryiko
  2023-09-21  9:14                 ` Alexei Starovoitov
  1 sibling, 1 reply; 52+ messages in thread
From: Eduard Zingerman @ 2023-09-20 16:20 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu,
	Kumar Kartikeya Dwivedi

On Wed, 2023-09-20 at 03:19 +0300, Eduard Zingerman wrote:
[...]
> Liveness is a precondition for all subsequent checks, so example
> involving precision would also involve liveness. Here is a combined
> example:
> 
>     r8 = 0
>     fp[-16] = 0
>     r7 = -16
>     r6 = bpf_get_current_pid_tgid()
>     bpf_iter_num_new(&fp[-8], 0, 10)
>     while (bpf_iter_num_next(&fp[-8])) {
>       r6++
>       if (r6 != 42) {
>         r7 = -32
>         continue;
>       }
>       r0 = r10
>       r0 += r7
>       r8 = *(u64 *)(r0 + 0)
>     }
>     bpf_iter_num_destroy(&fp[-8])
>     return r8
> 
> (Complete source code is at the end of the email).
> 
> The call to bpf_iter_num_next() is reachable with r7 values -16 and -32.
> State with r7=-16 is visited first, at which point r7 has no read mark
> and is not marked precise.
> State with r7=-32 is visited second:
> - states_equal() for is_iter_next_insn() should ignore absence of
>   REG_LIVE_READ mark on r7, otherwise both states would be considered
>   identical;
> - states_equal() for is_iter_next_insn() should ignore absence of
>   precision mark on r7, otherwise both states would be considered
>   identical.
> 
> > > A possible fix is to add a special flag to states_equal() and
> > > conditionally ignore logic related to liveness and precision when this
> > > flag is set. Set this flag for is_iter_next_insn() branch above.
> > 
> > Probably not completely ignore liveness (and maybe precision, but
> > let's do it one step at a time), but only at least one of the
> > registers or stack slots are marked as written or read in one of
> > old/new states? Basically, if some register/slot at the end of
> > iteration is read or modified, it must be important for loop
> > invariant, so even if the parent state says it's not read, we still
> > treat it as read. Can you please try that with just read/write marks
> > and see if anything fails?
> 
> Unfortunately for the example above neither liveness nor precision
> marks are present when states are compared, e.g. here is the
> (extended) log:
> 
> 12: (85) call bpf_iter_num_next#52356
> ...
> is_iter_next (12):
>   old:
>      R0=scalar() R1_rw=fp-8 R6_r=scalar(id=1) R7=-16 R8_r=0 R10=fp0
>      fp-8_r=iter_num(ref_id=2,state=active,depth=0) fp-16=00000000 refs=2
>   new:
>      R0=rdonly_mem(id=3,ref_obj_id=2,off=0,imm=0) R1_w=fp-8 R6=42 R7_w=-32 R8=0 R10=fp0
>      fp-8=iter_num(ref_id=2,state=active,depth=1) fp-16=00000000 refs=2
> > hit

It might be the case that I miss-read situation a bit, because there
is also R6 which is actually used to pick the branch. And it has read
mark but does not have precision mark, because jump was not predicted.

In general, if we hit an old state with .branches > 0:
- either there is an execution path "old -> safe exit";
- or there is a loop (not necessarily because of iterator) and old is
  a [grand]parent of the "cur" state.

In either case, "cur" state is safe to prune if:
- it is possible to show that it would take same jump branches as
  "old" state;
- and iteration depth differs.

All registers that took part in conditional jumps should already have
read marks in old state. However, not all jumps could be predicted,
thus not all such registers are marked as precise.

Unfortunately, treating read marks as precision marks is not sufficient.
For the particular example above:

    old: ... R6_r=scalar(id=1) ...
    cur: ... R6=42 ...

Here range_within() check in regsafe() would return true and old and
cur would still be considered identical:

    static bool regsafe(struct bpf_verifier_env *env, struct bpf_reg_state *rold,
                struct bpf_reg_state *rcur, struct bpf_idmap *idmap)
    {
        ...
        switch (base_type(rold->type)) {
        case SCALAR_VALUE:
            ...
            if (!rold->precise)
                return true;
            ...
            return range_within(rold, rcur) &&
                   tnum_in(rold->var_off, rcur->var_off) &&
                   check_scalar_ids(rold->id, rcur->id, idmap);
        case ...
        }
        ...
    }
    
So it appears that there is no need to ignore liveness checks after all,
but I don't see a way to make non-exact scalar comparisons at the moment.

> Note that for old state r7 value is not really important to get to exit,
> the verification trace looks as follows:
> 1. ... start ...
> 2. bpf_iter_num_next(&fp[-8]) ;; r7=-16, checkpoint saved
> 3. r6++
> 4. r7 = -32
> 5. bpf_iter_num_next(&fp[-8]) ;; push new state with r7=-32
> 6. bpf_iter_num_destroy(&fp[-8])
> 7. return 8
> 
> The r7 liveness and precision is important for state scheduled at (5)
> but it is not known yet, so when both states are compared marks are
> absent in cur and in old.
> 
> ---
> 
> /* BTF FUNC records are not generated for kfuncs referenced
>  * from inline assembly. These records are necessary for
>  * libbpf to link the program. The function below is a hack
>  * to ensure that BTF FUNC records are generated.
>  */
> void __kfunc_btf_root(void)
> {
> 	bpf_iter_num_new(0, 0, 0);
> 	bpf_iter_num_next(0);
> 	bpf_iter_num_destroy(0);
> }
> 
> SEC("fentry/" SYS_PREFIX "sys_nanosleep")
> __failure
> __msg("20: (79) r8 = *(u64 *)(r0 +0)")
> __msg("invalid read from stack R0 off=-32 size=8")
> __naked int iter_next_exact(const void *ctx)
> {
> 	asm volatile (
> 		"r8 = 0;"
> 		"*(u64 *)(r10 - 16) = r8;"
> 		"r7 = -16;"
> 		"call %[bpf_get_current_pid_tgid];"
> 		"r6 = r0;"
> 		"r1 = r10;"
> 		"r1 += -8;"
> 		"r2 = 0;"
> 		"r3 = 10;"
> 		"call %[bpf_iter_num_new];"
> 	"1:"
> 		"r1 = r10;"
> 		"r1 += -8;\n"
> 		"call %[bpf_iter_num_next];"
> 		"if r0 == 0 goto 2f;"
> 		"r6 += 1;"
> 		"if r6 != 42 goto 3f;"
> 		"r7 = -32;"
> 		"goto 1b;\n"
> 	"3:"
> 		"r0 = r10;"
> 		"r0 += r7;"
> 		"r8 = *(u64 *)(r0 + 0);"
> 		"goto 1b;\n"
> 	"2:"
> 		"r1 = r10;"
> 		"r1 += -8;"
> 		"call %[bpf_iter_num_destroy];"
> 		"r0 = r8;"
> 		"exit;"
> 		:
> 		: __imm(bpf_get_current_pid_tgid),
> 		  __imm(bpf_iter_num_new),
> 		  __imm(bpf_iter_num_next),
> 		  __imm(bpf_iter_num_destroy),
> 		  __imm(bpf_probe_read_user)
> 		: __clobber_all
> 	);
> }
> 


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-09-20  0:06         ` Eduard Zingerman
@ 2023-09-20 16:37           ` Andrii Nakryiko
  2023-09-20 17:13             ` Eduard Zingerman
  0 siblings, 1 reply; 52+ messages in thread
From: Andrii Nakryiko @ 2023-09-20 16:37 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Alexei Starovoitov, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu

On Tue, Sep 19, 2023 at 5:06 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Tue, 2023-09-19 at 16:14 -0700, Andrii Nakryiko wrote:
> [...]
> > > Hi All,
> > >
> > > This issue seems stalled, so I took a look over the weekend.
> > > I have a work in progress fix, please let me know if you don't agree
> > > with direction I'm taking or if I'm stepping on anyone's toes.
> > >
> > > After some analysis I decided to go with Alexei's suggestion and
> > > implement something similar to iterators for selected set of helper
> > > functions that accept "looping" callbacks, such as:
> > > - bpf_loop
> > > - bpf_for_each_map_elem
> > > - bpf_user_ringbuf_drain
> > > - bpf_timer_set_callback (?)
>
> Hi Andrii, thank you for taking a look.
>
> > As Kumar mentioned, pretty much all helpers with callbacks are either
> > 0-1, or 0-1-many cases, so all of them (except for the async callback
> > ones that shouldn't be accepting parent stack pointers) should be
> > validated.
>
> Yes, makes sense. I need to finish the fix for bpf_loop first, as it
> seems as a good representative for many possible issues.

yes, it's the most generic 0-1-many case

>
> > > The sketch of the fix looks as follows:
> > > - extend struct bpf_func_state with callback iterator state information:
> > >   - active/non-active flag
> >
> > not sure why you need this flag
>
> Note:
>   I have a better version of the fix locally but will share it a bit later.
>   It actually depends on states_equal() discussion in the sibling thread.
>
> In mu upgraded version I use non-zero depth as an indicator the.
> So no separate flag in the end.

great

>
> > >   - depth
> >
> > yep, this seems necessary
> >
> > >   - r1-r5 state at the entry to callback;
> >
> > not sure why you need this, this should be part of func_state already?
>
> This was a bit tricky but I think I figured an acceptable solution w/o
> extra copies for r1-r5. The tricky part is the structure of
> check_helper_call():
> - collect arguments 'meta' info & check arguments
> - call __check_func_call():
>   - setup frame for callback;
>   - schedule next instruction index to be callback entry;
> - reset r1-r5 in caller's frame;
> - set r0 in caller's frame.
>
> The problem is that check_helper_call() resets caller's r1-r5
> immediately. I figured that this reset could be done at BPF_EXIT
> processing for callback instead => no extra copy needed.
>

I guess then r0 setting would have to happen at BPF_EXIT as well,
right? Is that a problem?

> > > - extend __check_func_call() to setup callback iterator state when
> > >   call to suitable helper function is processed;
> >
> > this logic is "like iterator", but it's not exactly the same, so I
> > don't think we should reuse bpf_iter state (as you can see above with
> > active/non-active flag, for example)
>
> Yes, I agree, already removed this locally.

cool

>
> > > - extend BPF_EXIT processing (prepare_func_exit()) to queue new
> > >   callback visit upon return from iterating callback
> > >   (similar to process_iter_next_call());
> >
> > as mentioned above, we should simulate "no callback called" situation
> > as well, don't forget about that
>
> Yeap
>
> > > - extend is_state_visited() to account for callback iterator hits in a
> > >   way similar to regular iterators;
> > > - extend backtrack_insn() to correctly react to jumps from callback
> > >   exit to callback entry.
> > >
> > > I have a patch (at the end of this email) that correctly recognizes
> > > the bpf_loop example in this thread as unsafe. However this patch has
> > > a few deficiencies:
> > >
> > > - verif_scale_strobemeta_bpf_loop selftest is not passing, because
> > >   read_map_var function invoked from the callback continuously
> > >   increments 'payload' pointer used in subsequent iterations.
> > >
> > >   In order to handle such code I need to add an upper bound tracking
> > >   for iteration depth (when such bound could be deduced).
> >
> > if the example is broken and really can get out of bounds, we should
> > fix an example to be provable within bounds no matter how many
> > iterations it was called with
>
> For that particular case number of iterations guarantees that payload
> pointer will not get out of bounds. It is bumped up 4 bytes on each
> iteration, but the number of iterations and buffer size correlate to
> avoid out of bound access.

ok, good, I was trying to avoid deducing bounds on the number of
iterations or anything like that.

>
> > > - loop detection is broken for simple callback as below:
> > >
> > >   static int loop_callback_infinite(__u32 idx, __u64 *data)
> > >   {
> > >       for (;;)
> > >           (*ctx)++;
> > >       return 0;
> > >   }
> > >
> > >   To handle such code I need to change is_state_visited() to do
> > >   callback iterator loop/hit detection on exit from callback
> > >   (returns are not prune points at the moment), currently it is done
> > >   on entry.
> >
> > I'm a bit confused. What's ctx in the above example? And why loop
> > detection doesn't detect for(;;) loop right now?
>
> It's an implementation detail for the fix sketch shared in the parent
> email. It can catch cases like this:
>
>     ... any insn ...;
>     for (;;)
>         (*ctx)++;
>     return 0;
>
> But cannot catch case like this:
>
>     for (;;)
>         (*ctx)++;
>     return 0;
>
> In that sketch I jump to the callback start from callback return and
> callback entry needs two checks:
> - iteration convergence
> - simple looping
> Because of the code structure only iteration convergence check was done.
> Locally, I fixed this issue by jumping to the callback call instruction
> instead.

wouldn't this be a problem for just any subprog if we don't check the
looping condition on the entry instruction? Perhaps that's a separate
issue that needs generic fix?

>
> > > - the callback as bellow currently causes state explosion:
> > >
> > >   static int precise1_callback(__u32 idx, struct precise1_ctx *ctx)
> > >   {
> > >       if (idx == 0)
> > >           ctx->a = 1;
> > >       if (idx == 1 && ctx->a == 1)
> > >           ctx->b = 1;
> > >       return 0;
> > >   }
> >
> > why state explosion? there should be a bunch of different branches
> > (idx 0, 1, something else x ctx->a = 1 or not 1 and ctx->b being 1 or
> > not), but it should be a fixed number of states? Do you know what
> > causes the explosion?
>
> I forgot to do mark_force_checkpoint() at callback entry. Fixed locally.

ok, makes sense

>
> >
> > >
> > >   I'm not sure yet what to do about this, there are several possibilities:
> > >   - tweak the order in which states are visited (need to think about it);
> > >   - check states in bpf_verifier_env::head (not explored yet) for
> > >     equivalence and avoid enqueuing duplicate states.
> > >
> > > I'll proceed addressing the issues above on Monday.
> > >
> > > Thanks,
> > > Eduard
> > >
> > > ---
> > >
> > > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > > index a3236651ec64..5589f55e42ba 100644
> > > --- a/include/linux/bpf_verifier.h
> > > +++ b/include/linux/bpf_verifier.h
> > > @@ -70,6 +70,17 @@ enum bpf_iter_state {
> > >         BPF_ITER_STATE_DRAINED,
> > >  };
> > >
> > > +struct bpf_iter {
> > > +       /* BTF container and BTF type ID describing
> > > +        * struct bpf_iter_<type> of an iterator state
> > > +        */
> > > +       struct btf *btf;
> > > +       u32 btf_id;
> > > +       /* packing following two fields to fit iter state into 16 bytes */
> > > +       enum bpf_iter_state state:2;
> > > +       int depth:30;
> > > +};
> > > +
> > >  struct bpf_reg_state {
> > >         /* Ordering of fields matters.  See states_equal() */
> > >         enum bpf_reg_type type;
> > > @@ -115,16 +126,7 @@ struct bpf_reg_state {
> > >                 } dynptr;
> > >
> > >                 /* For bpf_iter stack slots */
> > > -               struct {
> > > -                       /* BTF container and BTF type ID describing
> > > -                        * struct bpf_iter_<type> of an iterator state
> > > -                        */
> > > -                       struct btf *btf;
> > > -                       u32 btf_id;
> > > -                       /* packing following two fields to fit iter state into 16 bytes */
> > > -                       enum bpf_iter_state state:2;
> > > -                       int depth:30;
> > > -               } iter;
> > > +               struct bpf_iter iter;
> >
> > Let's not do this, conceptually processes are similar, but bpf_iter is
> > one thing, and this callback validation is another thing. Let's not
> > conflate things.
> >
> > >
> > >                 /* Max size from any of the above. */
> > >                 struct {
> >
> > [...]
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-09-20 16:20                 ` Eduard Zingerman
@ 2023-09-20 16:57                   ` Andrii Nakryiko
  0 siblings, 0 replies; 52+ messages in thread
From: Andrii Nakryiko @ 2023-09-20 16:57 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Alexei Starovoitov, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu,
	Kumar Kartikeya Dwivedi

On Wed, Sep 20, 2023 at 9:20 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Wed, 2023-09-20 at 03:19 +0300, Eduard Zingerman wrote:
> [...]
> > Liveness is a precondition for all subsequent checks, so example
> > involving precision would also involve liveness. Here is a combined
> > example:
> >
> >     r8 = 0
> >     fp[-16] = 0
> >     r7 = -16
> >     r6 = bpf_get_current_pid_tgid()
> >     bpf_iter_num_new(&fp[-8], 0, 10)
> >     while (bpf_iter_num_next(&fp[-8])) {
> >       r6++
> >       if (r6 != 42) {
> >         r7 = -32
> >         continue;
> >       }
> >       r0 = r10
> >       r0 += r7
> >       r8 = *(u64 *)(r0 + 0)
> >     }
> >     bpf_iter_num_destroy(&fp[-8])
> >     return r8
> >
> > (Complete source code is at the end of the email).
> >
> > The call to bpf_iter_num_next() is reachable with r7 values -16 and -32.
> > State with r7=-16 is visited first, at which point r7 has no read mark
> > and is not marked precise.
> > State with r7=-32 is visited second:
> > - states_equal() for is_iter_next_insn() should ignore absence of
> >   REG_LIVE_READ mark on r7, otherwise both states would be considered
> >   identical;
> > - states_equal() for is_iter_next_insn() should ignore absence of
> >   precision mark on r7, otherwise both states would be considered
> >   identical.
> >
> > > > A possible fix is to add a special flag to states_equal() and
> > > > conditionally ignore logic related to liveness and precision when this
> > > > flag is set. Set this flag for is_iter_next_insn() branch above.
> > >
> > > Probably not completely ignore liveness (and maybe precision, but
> > > let's do it one step at a time), but only at least one of the
> > > registers or stack slots are marked as written or read in one of
> > > old/new states? Basically, if some register/slot at the end of
> > > iteration is read or modified, it must be important for loop
> > > invariant, so even if the parent state says it's not read, we still
> > > treat it as read. Can you please try that with just read/write marks
> > > and see if anything fails?
> >
> > Unfortunately for the example above neither liveness nor precision
> > marks are present when states are compared, e.g. here is the
> > (extended) log:
> >
> > 12: (85) call bpf_iter_num_next#52356
> > ...
> > is_iter_next (12):
> >   old:
> >      R0=scalar() R1_rw=fp-8 R6_r=scalar(id=1) R7=-16 R8_r=0 R10=fp0
> >      fp-8_r=iter_num(ref_id=2,state=active,depth=0) fp-16=00000000 refs=2
> >   new:
> >      R0=rdonly_mem(id=3,ref_obj_id=2,off=0,imm=0) R1_w=fp-8 R6=42 R7_w=-32 R8=0 R10=fp0
> >      fp-8=iter_num(ref_id=2,state=active,depth=1) fp-16=00000000 refs=2
> > > hit
>
> It might be the case that I miss-read situation a bit, because there
> is also R6 which is actually used to pick the branch. And it has read
> mark but does not have precision mark, because jump was not predicted.
>
> In general, if we hit an old state with .branches > 0:
> - either there is an execution path "old -> safe exit";
> - or there is a loop (not necessarily because of iterator) and old is
>   a [grand]parent of the "cur" state.
>
> In either case, "cur" state is safe to prune if:
> - it is possible to show that it would take same jump branches as
>   "old" state;
> - and iteration depth differs.
>
> All registers that took part in conditional jumps should already have
> read marks in old state. However, not all jumps could be predicted,
> thus not all such registers are marked as precise.
>
> Unfortunately, treating read marks as precision marks is not sufficient.
> For the particular example above:
>
>     old: ... R6_r=scalar(id=1) ...
>     cur: ... R6=42 ...
>
> Here range_within() check in regsafe() would return true and old and
> cur would still be considered identical:
>
>     static bool regsafe(struct bpf_verifier_env *env, struct bpf_reg_state *rold,
>                 struct bpf_reg_state *rcur, struct bpf_idmap *idmap)
>     {
>         ...
>         switch (base_type(rold->type)) {
>         case SCALAR_VALUE:
>             ...
>             if (!rold->precise)
>                 return true;
>             ...
>             return range_within(rold, rcur) &&
>                    tnum_in(rold->var_off, rcur->var_off) &&
>                    check_scalar_ids(rold->id, rcur->id, idmap);
>         case ...
>         }
>         ...
>     }
>
> So it appears that there is no need to ignore liveness checks after all,
> but I don't see a way to make non-exact scalar comparisons at the moment.

Yeah, me neither. We can try something drastic like marking all
registers in the state as precise at the beginning of
iteration/callback execution. But I suspect that will pessimize
verification a lot and might start rejecting valid cases.

Don't know, I'm still trying to think of something that would work.

>
> > Note that for old state r7 value is not really important to get to exit,
> > the verification trace looks as follows:
> > 1. ... start ...
> > 2. bpf_iter_num_next(&fp[-8]) ;; r7=-16, checkpoint saved
> > 3. r6++
> > 4. r7 = -32
> > 5. bpf_iter_num_next(&fp[-8]) ;; push new state with r7=-32
> > 6. bpf_iter_num_destroy(&fp[-8])
> > 7. return 8
> >
> > The r7 liveness and precision is important for state scheduled at (5)
> > but it is not known yet, so when both states are compared marks are
> > absent in cur and in old.
> >
> > ---
> >
> > /* BTF FUNC records are not generated for kfuncs referenced
> >  * from inline assembly. These records are necessary for
> >  * libbpf to link the program. The function below is a hack
> >  * to ensure that BTF FUNC records are generated.
> >  */
> > void __kfunc_btf_root(void)
> > {
> >       bpf_iter_num_new(0, 0, 0);
> >       bpf_iter_num_next(0);
> >       bpf_iter_num_destroy(0);
> > }
> >
> > SEC("fentry/" SYS_PREFIX "sys_nanosleep")
> > __failure
> > __msg("20: (79) r8 = *(u64 *)(r0 +0)")
> > __msg("invalid read from stack R0 off=-32 size=8")
> > __naked int iter_next_exact(const void *ctx)
> > {
> >       asm volatile (
> >               "r8 = 0;"
> >               "*(u64 *)(r10 - 16) = r8;"
> >               "r7 = -16;"
> >               "call %[bpf_get_current_pid_tgid];"
> >               "r6 = r0;"
> >               "r1 = r10;"
> >               "r1 += -8;"
> >               "r2 = 0;"
> >               "r3 = 10;"
> >               "call %[bpf_iter_num_new];"
> >       "1:"
> >               "r1 = r10;"
> >               "r1 += -8;\n"
> >               "call %[bpf_iter_num_next];"
> >               "if r0 == 0 goto 2f;"
> >               "r6 += 1;"
> >               "if r6 != 42 goto 3f;"
> >               "r7 = -32;"
> >               "goto 1b;\n"
> >       "3:"
> >               "r0 = r10;"
> >               "r0 += r7;"
> >               "r8 = *(u64 *)(r0 + 0);"
> >               "goto 1b;\n"
> >       "2:"
> >               "r1 = r10;"
> >               "r1 += -8;"
> >               "call %[bpf_iter_num_destroy];"
> >               "r0 = r8;"
> >               "exit;"
> >               :
> >               : __imm(bpf_get_current_pid_tgid),
> >                 __imm(bpf_iter_num_new),
> >                 __imm(bpf_iter_num_next),
> >                 __imm(bpf_iter_num_destroy),
> >                 __imm(bpf_probe_read_user)
> >               : __clobber_all
> >       );
> > }
> >
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-09-20 16:37           ` Andrii Nakryiko
@ 2023-09-20 17:13             ` Eduard Zingerman
  0 siblings, 0 replies; 52+ messages in thread
From: Eduard Zingerman @ 2023-09-20 17:13 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu

On Wed, 2023-09-20 at 09:37 -0700, Andrii Nakryiko wrote:
> On Tue, Sep 19, 2023 at 5:06 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
[...]
> > This was a bit tricky but I think I figured an acceptable solution w/o
> > extra copies for r1-r5. The tricky part is the structure of
> > check_helper_call():
> > - collect arguments 'meta' info & check arguments
> > - call __check_func_call():
> >   - setup frame for callback;
> >   - schedule next instruction index to be callback entry;
> > - reset r1-r5 in caller's frame;
> > - set r0 in caller's frame.
> > 
> > The problem is that check_helper_call() resets caller's r1-r5
> > immediately. I figured that this reset could be done at BPF_EXIT
> > processing for callback instead => no extra copy needed.
> > 
> 
> I guess then r0 setting would have to happen at BPF_EXIT as well,
> right? Is that a problem?

Ideally yes, r0 should be set at BPF_EXIT, but that would require:
- splitting check_helper_call() in two parts;
- separate handling for helpers that don't call callbacks.

For now I decided against it and r0 in caller's frame is modified
immediately. This is safe, because check_helper_call() logic does not
rely on r0 value (and check_helper_call() would be called again and
again for each new iteration). But it is a hack and maybe change in
check_helper_call() structure is indeed necessary. I leave it out for
now as a secondary concern.

[...]
> > > > - loop detection is broken for simple callback as below:
> > > > 
> > > >   static int loop_callback_infinite(__u32 idx, __u64 *data)
> > > >   {
> > > >       for (;;)
> > > >           (*ctx)++;
> > > >       return 0;
> > > >   }
> > > > 
> > > >   To handle such code I need to change is_state_visited() to do
> > > >   callback iterator loop/hit detection on exit from callback
> > > >   (returns are not prune points at the moment), currently it is done
> > > >   on entry.
> > > 
> > > I'm a bit confused. What's ctx in the above example? And why loop
> > > detection doesn't detect for(;;) loop right now?
> > 
> > It's an implementation detail for the fix sketch shared in the parent
> > email. It can catch cases like this:
> > 
> >     ... any insn ...;
> >     for (;;)
> >         (*ctx)++;
> >     return 0;
> > 
> > But cannot catch case like this:
> > 
> >     for (;;)
> >         (*ctx)++;
> >     return 0;
> > 
> > In that sketch I jump to the callback start from callback return and
> > callback entry needs two checks:
> > - iteration convergence
> > - simple looping
> > Because of the code structure only iteration convergence check was done.
> > Locally, I fixed this issue by jumping to the callback call instruction
> > instead.
> 
> wouldn't this be a problem for just any subprog if we don't check the
> looping condition on the entry instruction? Perhaps that's a separate
> issue that needs generic fix?

This didn't occur to me. In the following example loop detection does
not work indeed, however verifier still bails out correctly upon
instruction processing limit:

    SEC("fentry/" SYS_PREFIX "sys_nanosleep")
    __failure
    int iter_next_infinite_loop(const void *ctx)
    {
    	struct bpf_iter_num it;

    	bpf_iter_num_new(&it, 0, 10);
    	for (;;)
    		bpf_iter_num_next(&it);
    	bpf_iter_num_destroy(&it);
    	return 0;
    }

[...]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-09-20  0:19               ` Eduard Zingerman
  2023-09-20 16:20                 ` Eduard Zingerman
@ 2023-09-21  9:14                 ` Alexei Starovoitov
  2023-09-21 11:03                   ` Eduard Zingerman
  1 sibling, 1 reply; 52+ messages in thread
From: Alexei Starovoitov @ 2023-09-21  9:14 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Andrii Nakryiko, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu,
	Kumar Kartikeya Dwivedi

On Tue, Sep 19, 2023 at 5:19 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> > >
> > > Note that R7=fp-16 in old state vs R7_w=57005 in cur state.
> > > The registers are considered equal because R7 does not have a read mark.
> > > However read marks are not yet finalized for old state because
> > > sl->state.branches != 0.

By "liveness marks are not finalized" you mean that
regs don't have LIVE_DONE?
That shouldn't matter to state comparison.
It only needs to see LIVE_READ.
LIVE_DONE is a sanity check for liveness algorithm only.
It doesn't affect correctness.

> > > (Note: precision marks are not finalized as
> > > well, which should be a problem, but this requires another example).
> >
> > I originally convinced myself that non-finalized precision marking is
> > fine, see the comment next to process_iter_next_call() for reasoning.
> > If you can have a dedicated example for precision that would be great.
> >
> > As for read marks... Yep, that's a bummer. That r7 is not used after
> > the loop, so is not marked as read, and it's basically ignored for
> > loop invariant checking, which is definitely not what was intended.
>
> Liveness is a precondition for all subsequent checks, so example
> involving precision would also involve liveness. Here is a combined
> example:
>
>     r8 = 0
>     fp[-16] = 0
>     r7 = -16
>     r6 = bpf_get_current_pid_tgid()
>     bpf_iter_num_new(&fp[-8], 0, 10)
>     while (bpf_iter_num_next(&fp[-8])) {
>       r6++
>       if (r6 != 42) {
>         r7 = -32
>         continue;
>       }
>       r0 = r10
>       r0 += r7
>       r8 = *(u64 *)(r0 + 0)
>     }
>     bpf_iter_num_destroy(&fp[-8])
>     return r8
>
> (Complete source code is at the end of the email).
>
> The call to bpf_iter_num_next() is reachable with r7 values -16 and -32.
> State with r7=-16 is visited first, at which point r7 has no read mark
> and is not marked precise.
> State with r7=-32 is visited second:
> - states_equal() for is_iter_next_insn() should ignore absence of
>   REG_LIVE_READ mark on r7, otherwise both states would be considered
>   identical;

I think assuming live_read on all regs in regsafe() when
cb or iter body is being processed will have big impact
on processed insns.
I suspect the underlying issue is different.

In the first pass of the body with r7=-16 the 'r0 += r7'
insn should have added the read mark to r7,
so is_iter_next_insn after 2nd pass with r7=-32 should reach
case SCALAR_VALUE in regsafe().
There we need to do something with lack of precision in r7=-16,
but assume that is fixed, the 3rd iter of the loop should continue
and 'r0 += r7' in the _3rd_ iter of the loop should propagate
read mark all the way to r7=-32 reg.
I mean the parentage chain between regs should link
regs of iterations.
I believe the process_iter_next_call() logic maintains
correct parentage chain, since it's just a push_stack()
and is_state_visited() should be connecting parents.
So in case of iter body the issue with above loop is only
in missing precision,
but for your cb verification code I suspect the issue is due
to lack of parentage chain and liveness is not propagated ?

I could be completely off the mark.
The discussion is hard to follow.
It's probably time to post raw patch and continue with code.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-09-21  9:14                 ` Alexei Starovoitov
@ 2023-09-21 11:03                   ` Eduard Zingerman
  2023-09-21 12:56                     ` Alexei Starovoitov
  0 siblings, 1 reply; 52+ messages in thread
From: Eduard Zingerman @ 2023-09-21 11:03 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu,
	Kumar Kartikeya Dwivedi

On Thu, 2023-09-21 at 02:14 -0700, Alexei Starovoitov wrote:
> On Tue, Sep 19, 2023 at 5:19 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > 
> > > > 
> > > > Note that R7=fp-16 in old state vs R7_w=57005 in cur state.
> > > > The registers are considered equal because R7 does not have a read mark.
> > > > However read marks are not yet finalized for old state because
> > > > sl->state.branches != 0.
> 
> By "liveness marks are not finalized" you mean that
> regs don't have LIVE_DONE?
> That shouldn't matter to state comparison.
> It only needs to see LIVE_READ.
> LIVE_DONE is a sanity check for liveness algorithm only.
> It doesn't affect correctness.

No, actually I mean that the state where R7 is read had not been seen
yet and thus read mark had not been propagated to the parent state.
See below.

> > > > (Note: precision marks are not finalized as
> > > > well, which should be a problem, but this requires another example).
> > > 
> > > I originally convinced myself that non-finalized precision marking is
> > > fine, see the comment next to process_iter_next_call() for reasoning.
> > > If you can have a dedicated example for precision that would be great.
> > > 
> > > As for read marks... Yep, that's a bummer. That r7 is not used after
> > > the loop, so is not marked as read, and it's basically ignored for
> > > loop invariant checking, which is definitely not what was intended.
> > 
> > Liveness is a precondition for all subsequent checks, so example
> > involving precision would also involve liveness. Here is a combined
> > example:
> > 
> >     r8 = 0
> >     fp[-16] = 0
> >     r7 = -16
> >     r6 = bpf_get_current_pid_tgid()
> >     bpf_iter_num_new(&fp[-8], 0, 10)
> >     while (bpf_iter_num_next(&fp[-8])) {
> >       r6++
> >       if (r6 != 42) {
> >         r7 = -32
> >         continue;
> >       }
> >       r0 = r10
> >       r0 += r7
> >       r8 = *(u64 *)(r0 + 0)
> >     }
> >     bpf_iter_num_destroy(&fp[-8])
> >     return r8
> > 
> > (Complete source code is at the end of the email).
> > 
> > The call to bpf_iter_num_next() is reachable with r7 values -16 and -32.
> > State with r7=-16 is visited first, at which point r7 has no read mark
> > and is not marked precise.
> > State with r7=-32 is visited second:
> > - states_equal() for is_iter_next_insn() should ignore absence of
> >   REG_LIVE_READ mark on r7, otherwise both states would be considered
> >   identical;
> 
> I think assuming live_read on all regs in regsafe() when
> cb or iter body is being processed will have big impact
> on processed insns.
> I suspect the underlying issue is different.
> 
> In the first pass of the body with r7=-16 the 'r0 += r7'
> insn should have added the read mark to r7,
> so is_iter_next_insn after 2nd pass with r7=-32 should reach
> case SCALAR_VALUE in regsafe().
> There we need to do something with lack of precision in r7=-16,
> but assume that is fixed, the 3rd iter of the loop should continue
> and 'r0 += r7' in the _3rd_ iter of the loop should propagate
> read mark all the way to r7=-32 reg.

I repeat the complete example with added instruction numbers in the
end of the email. As of now verifier takes the following paths:

  First:
    // First path is for "drained" iterator and is not very interesting.
    0: (b7) r8 = 0                        ; R8_w=0
       ...
    2: (b7) r7 = -16                      ; R7_w=-16
       ...
    4: (bf) r6 = r0                       ; R0_w=scalar(id=1) R6_w=scalar(id=1)
       ...
    12: (85) call bpf_iter_num_next#63182
       ; R0_w=0 fp-8=iter_num(ref_id=2,state=drained,depth=0) refs=2
    13: (15) if r0 == 0x0 goto pc+8
    22: (bf) r1 = r10
       ...
    26: (95) exit
  
  Second:
    // Note: at 13 a first checkpoint with "active" iterator state is reached
    //       this checkpoint is created for R7=-16 w/o read mark.
    from 12 to 13: R0_w=rdonly_mem(id=3,ref_obj_id=2,off=0,imm=0) R6=scalar(id=1)
                   R7=-16 R8=0 R10=fp0 fp-8=iter_num(ref_id=2,state=active,depth=1)
                   fp-16=00000000 refs=2
    13: (15) if r0 == 0x0 goto pc+8       ; R0_w=rdonly_mem(id=3,ref_obj_id=2,off=0,imm=0) refs=2
    14: (07) r6 += 1                      ; R6=scalar() refs=2
    15: (55) if r6 != 0x2a goto pc+2      ; R6=42 refs=2
    16: (b7) r7 = -32                     ; R7_w=-32 refs=2
    17: (05) goto pc-8
    10: (bf) r1 = r10                     ; R1_w=fp0 R10=fp0 refs=2
    11: (07) r1 += -8
      is_iter_next (12):
        old:
           R0=scalar() R1_rw=fp-8 R6_r=scalar(id=1) R7=-16 R8_r=0 R10=fp0
           fp-8_r=iter_num(ref_id=2,state=active,depth=0) fp-16=00000000 refs=2
        cur:
           R0=rdonly_mem(id=3,ref_obj_id=2,off=0,imm=0) R1_w=fp-8 R6=42 R7_w=-32
           R8=0 R10=fp0 fp-8=iter_num(ref_id=2,state=active,depth=1) fp-16=00000000 refs=2
      > hit
    12: safe
    // At this point R7=-32 but it still lacks read or precision marks,
    // thus states_equal() called from is_state_visited() for is_iter_next_insn()
    // branch returns true. (I added some logging to make it clear above)
    
  Third:
    // This is r6 != 42 branch, it hits same checkpoint as "second" path.
    // Note that this branch has "r0 += r7" and will propagate read mark
    // for R7 to the "old" checkpoint but that's too late, R7=-32 state
    // has already been pruned.
    from 15 to 18: R0=rdonly_mem(id=3,ref_obj_id=2,off=0,imm=0) R6=scalar()
                   R7=-16 R8=0 R10=fp0 fp-8=iter_num(ref_id=2,state=active,depth=1)
                   fp-16=00000000 refs=2
    18: (bf) r0 = r10                     ; R0_w=fp0 R10=fp0 refs=2
    19: (0f) r0 += r7
    20: (79) r8 = *(u64 *)(r0 +0)         ; R0_w=fp-16 R8_w=P0 fp-16=00000000 refs=2
    21: (05) goto pc-12
    10: (bf) r1 = r10                     ; R1_w=fp0 R10=fp0 refs=2
    11: (07) r1 += -8
      is_iter_next (12):
        old:
           R0=scalar() R1_rw=fp-8 R6_r=scalar(id=1) R7_r=P-16 R8_r=0 R10=fp0
           fp-8_r=iter_num(ref_id=2,state=active,depth=0) fp-16_r=00000000 refs=2
        cur:
           R0_w=fp-16 R1_w=fp-8 R6=scalar() R7=-16 R8_w=P0 R10=fp0
           fp-8=iter_num(ref_id=2,state=active,depth=1) fp-16=00000000 refs=2
      > hit
    12: safe

Here on the second path the state with R7=-32 is pruned and this is
not safe. Registers that differ in this state are R6 and R7. I can
update the example so that there would be no difference in R6, so
let's ignore that for now. As for R7, in the "old" state it is not
marked read because children states of "old" that read R7 had not been
visited yet.

> I mean the parentage chain between regs should link
> regs of iterations.
> I believe the process_iter_next_call() logic maintains
> correct parentage chain, since it's just a push_stack()
> and is_state_visited() should be connecting parents.
> So in case of iter body the issue with above loop is only
> in missing precision,
> but for your cb verification code I suspect the issue is due
> to lack of parentage chain and liveness is not propagated ?

Right, parentage chain is preserved by both process_iter_next_call()
and cb processing code as both use push_stack().

> I could be completely off the mark.
> The discussion is hard to follow.
> It's probably time to post raw patch and continue with code.

The "fix" that I have locally is not really a fix. It forces "exact"
states comparisons for is_iter_next_insn() case:
1. liveness marks are ignored;
2. precision marks are ignored;
3. scalars are compared using regs_exact().

It breaks a number a number of tests, because iterator "entry" states
are no longer equal and upper iteration bound is not tracked:
- iters/simplest_loop
- iters/iter_obfuscate_counter
- iters/testmod_seq_truncated
- iters/num
- iters/testmod_seq
- linked_list/pop_front_off
- linked_list/pop_back_off

The log above shows that (1) and (2) are inevitable.
(But might be relaxed if some data flow analysis which marks
 registers that *might* be precise or read is done before
 main verification path).
For (3), I'm not sure and need to think a bit more but "range_within"
logic seems fishy for states with .branches > 0.

--- complete example with insn numbers ---

SEC("fentry/" SYS_PREFIX "sys_nanosleep")
__failure
__msg("20: (79) r8 = *(u64 *)(r0 +0)")
__msg("invalid read from stack R0 off=-32 size=8")
__naked int iter_next_exact(void)
{
	/* r8 = 0
	 * fp[-16] = 0
	 * r7 = -16
	 * r6 = bpf_get_current_pid_tgid()
	 * bpf_iter_num_new(&fp[-8], 0, 10)
	 * while (bpf_iter_num_next(&fp[-8])) {
	 *   r6++
	 *   if (r6 == 42) {
	 *     r7 = -32
	 *     continue;
	 *   }
	 *   r0 = r10
	 *   r0 += r7
	 *   r8 = *(u64 *)(r0 + 0)
	 * }
	 * bpf_iter_num_destroy(&fp[-8])
	 * return r8
	 */
	asm volatile (
		"r8 = 0;"                            // 0
		"*(u64 *)(r10 - 16) = r8;"           // 1
		"r7 = -16;"                          // 2
		"call %[bpf_get_current_pid_tgid];"  // 3
		"r6 = r0;"                           // 4
		"r1 = r10;"                          // 5
		"r1 += -8;"                          // 6
		"r2 = 0;"                            // 7
		"r3 = 10;"                           // 8
		"call %[bpf_iter_num_new];"          // 9
	"1:"
		"r1 = r10;"                          // 10
		"r1 += -8;\n"                        // 11
		"call %[bpf_iter_num_next];"         // 12
		"if r0 == 0 goto 2f;"                // 13
		"r6 += 1;"                           // 14
		"if r6 != 42 goto 3f;"               // 15
		"r7 = -32;"                          // 16
		"goto 1b;\n"                         // 17
	"3:"
		"r0 = r10;"                          // 18
		"r0 += r7;"                          // 19
		"r8 = *(u64 *)(r0 + 0);"             // 20
		"goto 1b;\n"                         // 21
	"2:"
		"r1 = r10;"                          // 22
		"r1 += -8;"                          // 23
		"call %[bpf_iter_num_destroy];"      // 24
		"r0 = r8;"                           // 25
		"exit;"                              // 26
		:
		: __imm(bpf_get_current_pid_tgid),
		  __imm(bpf_iter_num_new),
		  __imm(bpf_iter_num_next),
		  __imm(bpf_iter_num_destroy),
		  __imm(bpf_probe_read_user)
		: __clobber_all
	);
}

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-09-21 11:03                   ` Eduard Zingerman
@ 2023-09-21 12:56                     ` Alexei Starovoitov
  2023-09-21 16:23                       ` Eduard Zingerman
  0 siblings, 1 reply; 52+ messages in thread
From: Alexei Starovoitov @ 2023-09-21 12:56 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Andrii Nakryiko, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu,
	Kumar Kartikeya Dwivedi

On Thu, Sep 21, 2023 at 4:03 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> I repeat the complete example with added instruction numbers in the
> end of the email. As of now verifier takes the following paths:
>
>   First:
>     // First path is for "drained" iterator and is not very interesting.
>     0: (b7) r8 = 0                        ; R8_w=0
>        ...
>     2: (b7) r7 = -16                      ; R7_w=-16
>        ...
>     4: (bf) r6 = r0                       ; R0_w=scalar(id=1) R6_w=scalar(id=1)
>        ...
>     12: (85) call bpf_iter_num_next#63182
>        ; R0_w=0 fp-8=iter_num(ref_id=2,state=drained,depth=0) refs=2
>     13: (15) if r0 == 0x0 goto pc+8
>     22: (bf) r1 = r10
>        ...
>     26: (95) exit
>
>   Second:
>     // Note: at 13 a first checkpoint with "active" iterator state is reached
>     //       this checkpoint is created for R7=-16 w/o read mark.

Thanks for detailed example.
The analysis in my previous email was based on C code
where I assumed the asm code generated by compiler for
if (r6 != 42)
would be
if (unlikely(r6 != 42))
and fallthrough insn after 'if' insn would be 'r0 = r10'.
Now I see that asm matches if (likely(r6 != 42)).
I suspect if you use that in C code you wouldn't need to
write the test in asm.
Just a thought.

>     from 12 to 13: R0_w=rdonly_mem(id=3,ref_obj_id=2,off=0,imm=0) R6=scalar(id=1)
>                    R7=-16 R8=0 R10=fp0 fp-8=iter_num(ref_id=2,state=active,depth=1)
>                    fp-16=00000000 refs=2
>     13: (15) if r0 == 0x0 goto pc+8       ; R0_w=rdonly_mem(id=3,ref_obj_id=2,off=0,imm=0) refs=2
>     14: (07) r6 += 1                      ; R6=scalar() refs=2
>     15: (55) if r6 != 0x2a goto pc+2      ; R6=42 refs=2
>     16: (b7) r7 = -32                     ; R7_w=-32 refs=2
>     17: (05) goto pc-8
>     10: (bf) r1 = r10                     ; R1_w=fp0 R10=fp0 refs=2
>     11: (07) r1 += -8
>       is_iter_next (12):
>         old:
>            R0=scalar() R1_rw=fp-8 R6_r=scalar(id=1) R7=-16 R8_r=0 R10=fp0
>            fp-8_r=iter_num(ref_id=2,state=active,depth=0) fp-16=00000000 refs=2
>         cur:
>            R0=rdonly_mem(id=3,ref_obj_id=2,off=0,imm=0) R1_w=fp-8 R6=42 R7_w=-32
>            R8=0 R10=fp0 fp-8=iter_num(ref_id=2,state=active,depth=1) fp-16=00000000 refs=2
>       > hit
>     12: safe
>     // At this point R7=-32 but it still lacks read or precision marks,
>     // thus states_equal() called from is_state_visited() for is_iter_next_insn()
>     // branch returns true. (I added some logging to make it clear above)

Maybe instead of brute forcing all regs to live and precise
we can add iter.depth check to stacksafe() such
that depth=0 != depth=1, but
depth=1 == depthN ?
(and a tweak to iter_active_depths_differ() as well)

Then in the above r7 will be 'equivalent', but fp-8 will differ,
then the state with r7=-32 won't be pruned
and it will address this particular example ? or not ?

Even if it does the gut feel that it won't be enough though,
but I wanted to mention this to brainstorm, since below:

> The "fix" that I have locally is not really a fix. It forces "exact"
> states comparisons for is_iter_next_insn() case:
> 1. liveness marks are ignored;
> 2. precision marks are ignored;
> 3. scalars are compared using regs_exact().

is too drastic.

> It breaks a number a number of tests, because iterator "entry" states
> are no longer equal and upper iteration bound is not tracked:
> - iters/simplest_loop

that's a clear sign that forcing 1,2,3 is a dead end.

Another idea is to add another state.branches specifically for loop body
and keep iterating the body until branches==0.
Same concept as the verifier uses for the whole prog, but localized
to a loop to make sure we don't declare 'equivalent' state
until all paths in the loop body are explored.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-09-21 12:56                     ` Alexei Starovoitov
@ 2023-09-21 16:23                       ` Eduard Zingerman
  2023-09-21 16:35                         ` Andrii Nakryiko
  0 siblings, 1 reply; 52+ messages in thread
From: Eduard Zingerman @ 2023-09-21 16:23 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu,
	Kumar Kartikeya Dwivedi

On Thu, 2023-09-21 at 05:56 -0700, Alexei Starovoitov wrote:
[...]
> Now I see that asm matches if (likely(r6 != 42)).
> I suspect if you use that in C code you wouldn't need to
> write the test in asm.
> Just a thought.

Thank you this does change test behavior, however compiler still decides
to partially unroll the loop for whatever reason. Will stick to asm
snippets for now.

> Maybe instead of brute forcing all regs to live and precise
> we can add iter.depth check to stacksafe() such
> that depth=0 != depth=1, but
> depth=1 == depthN ?
> (and a tweak to iter_active_depths_differ() as well)
> 
> Then in the above r7 will be 'equivalent', but fp-8 will differ,
> then the state with r7=-32 won't be pruned
> and it will address this particular example ? or not ?

This does help for the particular example, however a simple
modification can still trick the verifier:

     ...
     r6 = bpf_get_current_pid_tgid()
     bpf_iter_num_new(&fp[-8], 0, 10)
+    bpf_iter_num_next(&fp[-8])
     while (bpf_iter_num_next(&fp[-8])) {
       ...
     }
     ...

> Another idea is to add another state.branches specifically for loop body
> and keep iterating the body until branches==0.
> Same concept as the verifier uses for the whole prog, but localized
> to a loop to make sure we don't declare 'equivalent' state
> until all paths in the loop body are explored.

I'm not sure I understand the idea. If we count branches there always
would be back-edges leading to new branches. Or do you suggest to not
prune "equivalent" loop states until all basic blocks in the loop are
visited? (So that all read marks are propagated and probably all
precision marks).

I'm still doubt range_within() check for is_iter_next_insn() case but
can't come up with counter example.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-09-21 16:23                       ` Eduard Zingerman
@ 2023-09-21 16:35                         ` Andrii Nakryiko
  2023-09-21 18:16                           ` Eduard Zingerman
  0 siblings, 1 reply; 52+ messages in thread
From: Andrii Nakryiko @ 2023-09-21 16:35 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Alexei Starovoitov, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu,
	Kumar Kartikeya Dwivedi

On Thu, Sep 21, 2023 at 9:23 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Thu, 2023-09-21 at 05:56 -0700, Alexei Starovoitov wrote:
> [...]
> > Now I see that asm matches if (likely(r6 != 42)).
> > I suspect if you use that in C code you wouldn't need to
> > write the test in asm.
> > Just a thought.
>
> Thank you this does change test behavior, however compiler still decides
> to partially unroll the loop for whatever reason. Will stick to asm
> snippets for now.
>
> > Maybe instead of brute forcing all regs to live and precise
> > we can add iter.depth check to stacksafe() such
> > that depth=0 != depth=1, but
> > depth=1 == depthN ?
> > (and a tweak to iter_active_depths_differ() as well)
> >
> > Then in the above r7 will be 'equivalent', but fp-8 will differ,
> > then the state with r7=-32 won't be pruned
> > and it will address this particular example ? or not ?
>
> This does help for the particular example, however a simple
> modification can still trick the verifier:
>
>      ...
>      r6 = bpf_get_current_pid_tgid()
>      bpf_iter_num_new(&fp[-8], 0, 10)
> +    bpf_iter_num_next(&fp[-8])
>      while (bpf_iter_num_next(&fp[-8])) {
>        ...
>      }
>      ...
>
> > Another idea is to add another state.branches specifically for loop body
> > and keep iterating the body until branches==0.
> > Same concept as the verifier uses for the whole prog, but localized
> > to a loop to make sure we don't declare 'equivalent' state
> > until all paths in the loop body are explored.
>
> I'm not sure I understand the idea. If we count branches there always
> would be back-edges leading to new branches. Or do you suggest to not
> prune "equivalent" loop states until all basic blocks in the loop are
> visited? (So that all read marks are propagated and probably all
> precision marks).
>

I've been thinking in a similar direction as Alexei, overnight. Here
are some raw thoughts.

I think the overall approach with iter verification is sound. If we
loop and see an equivalent state at the entry to next loop iteration,
then it's safe to assume doing many iterations is safe. The problem is
that delayed precision and read marks make this state equivalence
wrong in some case. So we need to find a solution that will ensure
that all precision and read marks are propagated to parent state
before making a decision about convergence.

The idea is to let verifier explore all code paths starting from
iteration #N, except the code paths that lead to looping into
iteration #N+1. I tried to do that with validating NULL case first and
exiting from loop on each iteration (first), but clearly that doesn't
capture all the cases, as Eduard have shown.

So what if we delay convergence state checks (and then further
exploration at iteration #N+1) using BFS instead of DFS? That is, when
we are about to start new iteration and check state convergence, we
enqueue current state to be processed later after all the states that
"belong" to iteration #N are processed.

We should work out exact details on how to do this hybrid BFS+DFS, but
let's think carefully if this will solve the problems?

I think this is conceptually similar to what Alexei proposed above.
Basically, we "unroll" loop verification iteration by iteration, but
make sure that we explore all the branching within iteration #N before
going one iteration deeper.

Let's think if there are any cases which wouldn't be handled. And then
think how to implement this elegantly (e.g., some sort of queue within
a parent state, which sounds similar to this separate "branches"
counter that Alexei is proposing above).


> I'm still doubt range_within() check for is_iter_next_insn() case but
> can't come up with counter example.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-09-21 16:35                         ` Andrii Nakryiko
@ 2023-09-21 18:16                           ` Eduard Zingerman
  2023-09-22  1:01                             ` Eduard Zingerman
  0 siblings, 1 reply; 52+ messages in thread
From: Eduard Zingerman @ 2023-09-21 18:16 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu,
	Kumar Kartikeya Dwivedi

On Thu, 2023-09-21 at 09:35 -0700, Andrii Nakryiko wrote:
> I've been thinking in a similar direction as Alexei, overnight. Here
> are some raw thoughts.
> 
> I think the overall approach with iter verification is sound. If we
> loop and see an equivalent state at the entry to next loop iteration,
> then it's safe to assume doing many iterations is safe. The problem is
> that delayed precision and read marks make this state equivalence
> wrong in some case. So we need to find a solution that will ensure
> that all precision and read marks are propagated to parent state
> before making a decision about convergence.
> 
> The idea is to let verifier explore all code paths starting from
> iteration #N, except the code paths that lead to looping into
> iteration #N+1. I tried to do that with validating NULL case first and
> exiting from loop on each iteration (first), but clearly that doesn't
> capture all the cases, as Eduard have shown.
>
> So what if we delay convergence state checks (and then further
> exploration at iteration #N+1) using BFS instead of DFS? That is, when
> we are about to start new iteration and check state convergence, we
> enqueue current state to be processed later after all the states that
> "belong" to iteration #N are processed.

This sounds correct if one iterator is involved.

> We should work out exact details on how to do this hybrid BFS+DFS, but
> let's think carefully if this will solve the problems?
> 
> I think this is conceptually similar to what Alexei proposed above.
> Basically, we "unroll" loop verification iteration by iteration, but
> make sure that we explore all the branching within iteration #N before
> going one iteration deeper.
> 
> Let's think if there are any cases which wouldn't be handled. And then
> think how to implement this elegantly (e.g., some sort of queue within
> a parent state, which sounds similar to this separate "branches"
> counter that Alexei is proposing above).

To better understand the suggestion, suppose there is some iterator
'I' and two states:
- state S1 where depth(I) == N and pending instruction is "next(I)"
- state S2 where depth(I) == N and pending instruction is *not* "next(I)"
In such situation state S2 should be verified first, right?
And in general, any state that is not at "next(I)" should be explored
before S1, right?

Such interpretation seems to be prone to deadlocks, e.g. suppose there
are two iterators: I1 and I2, and two states:
- state S1 with depth(I1) == N, depth(I2) == M, at instruction "next(I1)";
- state S2 with depth(I1) == N, depth(I2) == M, at instruction "next(I2)".

E.g. like in the following loop:

    for (;;) {
      if (<random condition>)
        if (!next(it1)) break; // a
      else
        if (!next(it2)) break; // b
      ...
    }

I think it is possible to get to two states here:
- (a) it1(active, depth 0), it2(active, depth 0) at insn "next(it1)"
- (b) it1(active, depth 0), it2(active, depth 0) at insn "next(it2)"

And it is unclear which one should be processed next.
Am I missing something?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-09-21 18:16                           ` Eduard Zingerman
@ 2023-09-22  1:01                             ` Eduard Zingerman
  2023-09-22  2:48                               ` Andrii Nakryiko
  0 siblings, 1 reply; 52+ messages in thread
From: Eduard Zingerman @ 2023-09-22  1:01 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu,
	Kumar Kartikeya Dwivedi

On Thu, 2023-09-21 at 21:16 +0300, Eduard Zingerman wrote:
> On Thu, 2023-09-21 at 09:35 -0700, Andrii Nakryiko wrote:
> > I've been thinking in a similar direction as Alexei, overnight. Here
> > are some raw thoughts.
> > 
> > I think the overall approach with iter verification is sound. If we
> > loop and see an equivalent state at the entry to next loop iteration,
> > then it's safe to assume doing many iterations is safe. The problem is
> > that delayed precision and read marks make this state equivalence
> > wrong in some case. So we need to find a solution that will ensure
> > that all precision and read marks are propagated to parent state
> > before making a decision about convergence.
> > 
> > The idea is to let verifier explore all code paths starting from
> > iteration #N, except the code paths that lead to looping into
> > iteration #N+1. I tried to do that with validating NULL case first and
> > exiting from loop on each iteration (first), but clearly that doesn't
> > capture all the cases, as Eduard have shown.
> > 
> > So what if we delay convergence state checks (and then further
> > exploration at iteration #N+1) using BFS instead of DFS? That is, when
> > we are about to start new iteration and check state convergence, we
> > enqueue current state to be processed later after all the states that
> > "belong" to iteration #N are processed.
> 
> This sounds correct if one iterator is involved.
> 
> > We should work out exact details on how to do this hybrid BFS+DFS, but
> > let's think carefully if this will solve the problems?
> > 
> > I think this is conceptually similar to what Alexei proposed above.
> > Basically, we "unroll" loop verification iteration by iteration, but
> > make sure that we explore all the branching within iteration #N before
> > going one iteration deeper.
> > 
> > Let's think if there are any cases which wouldn't be handled. And then
> > think how to implement this elegantly (e.g., some sort of queue within
> > a parent state, which sounds similar to this separate "branches"
> > counter that Alexei is proposing above).
> 
> To better understand the suggestion, suppose there is some iterator
> 'I' and two states:
> - state S1 where depth(I) == N and pending instruction is "next(I)"
> - state S2 where depth(I) == N and pending instruction is *not* "next(I)"
> In such situation state S2 should be verified first, right?
> And in general, any state that is not at "next(I)" should be explored
> before S1, right?
> 
> Such interpretation seems to be prone to deadlocks, e.g. suppose there
> are two iterators: I1 and I2, and two states:
> - state S1 with depth(I1) == N, depth(I2) == M, at instruction "next(I1)";
> - state S2 with depth(I1) == N, depth(I2) == M, at instruction "next(I2)".
> 
> E.g. like in the following loop:
> 
>     for (;;) {
>       if (<random condition>)
>         if (!next(it1)) break; // a
>       else
>         if (!next(it2)) break; // b
>       ...
>     }
> 
> I think it is possible to get to two states here:
> - (a) it1(active, depth 0), it2(active, depth 0) at insn "next(it1)"
> - (b) it1(active, depth 0), it2(active, depth 0) at insn "next(it2)"
> 
> And it is unclear which one should be processed next.
> Am I missing something?

Not sure if such ordering issues a real issues or any of the
candidates could be picked.

How about a more dumb solution which, imho, is easier to convince
oneself is correct (at-least from logical pov, not necessarily
implementation): substitute the goal of finding exact precision marks
with the goal of finding conservative precision marks. These marks
could be used instead of exact for states_equal() when .branches > 0.

A straightforward way to compute such marks is to run a flow-sensitive
context/path-insensitive backwards DFA before do_check(). The result
of this DFA is a bitmask encoding which registers / stack slots may be
precise at each instruction. Information for instructions other than
is_iter_next_insn() could be discarded.

Looking at the tests that are currently failing with my local fixes
(which force exact states comparions for .branches > 0) I think that
DFA based version would cover all of them.

This sure adds some code to the verifier, however changing current
states traversal logic from DFS to combination of DFS+BFS is a
significant change as well.

Wdyt?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-09-22  1:01                             ` Eduard Zingerman
@ 2023-09-22  2:48                               ` Andrii Nakryiko
  2023-09-22 18:36                                 ` Eduard Zingerman
  0 siblings, 1 reply; 52+ messages in thread
From: Andrii Nakryiko @ 2023-09-22  2:48 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Alexei Starovoitov, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu,
	Kumar Kartikeya Dwivedi

On Thu, Sep 21, 2023 at 6:01 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Thu, 2023-09-21 at 21:16 +0300, Eduard Zingerman wrote:
> > On Thu, 2023-09-21 at 09:35 -0700, Andrii Nakryiko wrote:
> > > I've been thinking in a similar direction as Alexei, overnight. Here
> > > are some raw thoughts.
> > >
> > > I think the overall approach with iter verification is sound. If we
> > > loop and see an equivalent state at the entry to next loop iteration,
> > > then it's safe to assume doing many iterations is safe. The problem is
> > > that delayed precision and read marks make this state equivalence
> > > wrong in some case. So we need to find a solution that will ensure
> > > that all precision and read marks are propagated to parent state
> > > before making a decision about convergence.
> > >
> > > The idea is to let verifier explore all code paths starting from
> > > iteration #N, except the code paths that lead to looping into
> > > iteration #N+1. I tried to do that with validating NULL case first and
> > > exiting from loop on each iteration (first), but clearly that doesn't
> > > capture all the cases, as Eduard have shown.
> > >
> > > So what if we delay convergence state checks (and then further
> > > exploration at iteration #N+1) using BFS instead of DFS? That is, when
> > > we are about to start new iteration and check state convergence, we
> > > enqueue current state to be processed later after all the states that
> > > "belong" to iteration #N are processed.
> >
> > This sounds correct if one iterator is involved.
> >
> > > We should work out exact details on how to do this hybrid BFS+DFS, but
> > > let's think carefully if this will solve the problems?
> > >
> > > I think this is conceptually similar to what Alexei proposed above.
> > > Basically, we "unroll" loop verification iteration by iteration, but
> > > make sure that we explore all the branching within iteration #N before
> > > going one iteration deeper.
> > >
> > > Let's think if there are any cases which wouldn't be handled. And then
> > > think how to implement this elegantly (e.g., some sort of queue within
> > > a parent state, which sounds similar to this separate "branches"
> > > counter that Alexei is proposing above).
> >
> > To better understand the suggestion, suppose there is some iterator
> > 'I' and two states:
> > - state S1 where depth(I) == N and pending instruction is "next(I)"
> > - state S2 where depth(I) == N and pending instruction is *not* "next(I)"
> > In such situation state S2 should be verified first, right?
> > And in general, any state that is not at "next(I)" should be explored
> > before S1, right?
> >
> > Such interpretation seems to be prone to deadlocks, e.g. suppose there
> > are two iterators: I1 and I2, and two states:
> > - state S1 with depth(I1) == N, depth(I2) == M, at instruction "next(I1)";
> > - state S2 with depth(I1) == N, depth(I2) == M, at instruction "next(I2)".
> >
> > E.g. like in the following loop:
> >
> >     for (;;) {
> >       if (<random condition>)
> >         if (!next(it1)) break; // a
> >       else
> >         if (!next(it2)) break; // b
> >       ...
> >     }
> >
> > I think it is possible to get to two states here:
> > - (a) it1(active, depth 0), it2(active, depth 0) at insn "next(it1)"
> > - (b) it1(active, depth 0), it2(active, depth 0) at insn "next(it2)"
> >
> > And it is unclear which one should be processed next.
> > Am I missing something?
>
> Not sure if such ordering issues a real issues or any of the
> candidates could be picked.
>

Yes, my gut feeling was that if this idea works at all, then ordering
for this won't matter. The question is if the idea itself is sound.
Basically, I need to convince myself that subsequent iterations won't
add any new read/precise marks. You are good at counter examples, so
maybe you can come up with an example where input state into iteration
#1 will get more precision marks only at iteration #2 or deeper. If
that's the case, then this whole idea of postponing loop convergence
checks probably doesn't work.

> How about a more dumb solution which, imho, is easier to convince
> oneself is correct (at-least from logical pov, not necessarily
> implementation): substitute the goal of finding exact precision marks
> with the goal of finding conservative precision marks. These marks
> could be used instead of exact for states_equal() when .branches > 0.
>
> A straightforward way to compute such marks is to run a flow-sensitive
> context/path-insensitive backwards DFA before do_check(). The result
> of this DFA is a bitmask encoding which registers / stack slots may be
> precise at each instruction. Information for instructions other than
> is_iter_next_insn() could be discarded.
>
> Looking at the tests that are currently failing with my local fixes
> (which force exact states comparions for .branches > 0) I think that
> DFA based version would cover all of them.
>
> This sure adds some code to the verifier, however changing current
> states traversal logic from DFS to combination of DFS+BFS is a
> significant change as well.

I don't think adding BFS sequencing is a lot of code. It's going to be
a simple and small amount of code with potentially intricate behavior.
But code-wise should be pretty minimal.

>
> Wdyt?

I can't say I understand what exactly you are proposing and how you
are going to determine these conservative precision marks. But I'd
like to learn some new ideas, so please elaborate :)

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-09-22  2:48                               ` Andrii Nakryiko
@ 2023-09-22 18:36                                 ` Eduard Zingerman
  2023-09-22 20:52                                   ` Andrii Nakryiko
  0 siblings, 1 reply; 52+ messages in thread
From: Eduard Zingerman @ 2023-09-22 18:36 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu,
	Kumar Kartikeya Dwivedi

On Thu, 2023-09-21 at 19:48 -0700, Andrii Nakryiko wrote:
> Yes, my gut feeling was that if this idea works at all, then ordering
> for this won't matter. The question is if the idea itself is sound.
> Basically, I need to convince myself that subsequent iterations won't
> add any new read/precise marks. You are good at counter examples, so
> maybe you can come up with an example where input state into iteration
> #1 will get more precision marks only at iteration #2 or deeper. If
> that's the case, then this whole idea of postponing loop convergence
> checks probably doesn't work.

Consider the following code:

    1.  SEC("fentry/" SYS_PREFIX "sys_nanosleep")
    2.  int num_iter_bug(const void *ctx) {
    3.      struct bpf_iter_num it;
    4.      __u64 val = 0;
    5.      __u64 *ptr = &val;
    6.      __u64 rnd = bpf_get_current_pid_tgid();
    7.
    8.      bpf_iter_num_new(&it, 0, 10);
    9.      while (bpf_iter_num_next(&it)) {
    10.          rnd++;
    11.          if (rnd == 42) {
    12.              ptr = (void*)(0xdead);
    13.              continue;
    14.          }
    15.          if (!bpf_iter_num_next(&it))
    16.              break;
    17.          bpf_probe_read_user(ptr, 8, (void*)(0xdeadbeef));
    18.      }
    19.      bpf_iter_num_destroy(&it);
    20.      return 0;
    21. }

As far as I understand the following verification paths would be
explored (ignoring traces with drained iterators):
- entry:
  - 8,9,10,11,12,13:
    - at 9 checkpoint (C1) would be created with it{.depth=0,.state=active},ptr=&val;
    - at 11 branch (B1) would be created with it{.depth=0,.state=active},ptr=&val;
    - jump from 13 to 9 would be postponed with state
      it{.depth=0,.state=active},ptr=0xdead as proceeding would increase 'it' depth;
- jump from 11 to 15 (branch B1):
  - 15:
    - checkpoint would be created with it{.depth=0,.state=active};
    - jump from 15 to 17 would be postponed with state
      it{.depth=0,.state=active} as proceeding would increase 'it' depth.
- at this point verifier would run out of paths that don't increase
  iterators depth and there are two choices:
  - (a) jump from 13 to 9 with state it{.depth=0,.state=active},ptr=0xdead;
  - (b) jump from 15 to 17 with state it{.depth=0,.state=active},ptr=&val.
  If (a) would be processed first there would be no read mark for
  'ptr' in C1 yet and verifier runs into to the same issue as with
  original example.

Do I miss something or is it a legit counter-example?

> I can't say I understand what exactly you are proposing and how you
> are going to determine these conservative precision marks. But I'd
> like to learn some new ideas, so please elaborate :)

I'll send a follow-up email, trying to figure out what to do with pointers.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-09-22 18:36                                 ` Eduard Zingerman
@ 2023-09-22 20:52                                   ` Andrii Nakryiko
  2023-09-25  1:01                                     ` Eduard Zingerman
  0 siblings, 1 reply; 52+ messages in thread
From: Andrii Nakryiko @ 2023-09-22 20:52 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Alexei Starovoitov, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu,
	Kumar Kartikeya Dwivedi

On Fri, Sep 22, 2023 at 11:36 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Thu, 2023-09-21 at 19:48 -0700, Andrii Nakryiko wrote:
> > Yes, my gut feeling was that if this idea works at all, then ordering
> > for this won't matter. The question is if the idea itself is sound.
> > Basically, I need to convince myself that subsequent iterations won't
> > add any new read/precise marks. You are good at counter examples, so
> > maybe you can come up with an example where input state into iteration
> > #1 will get more precision marks only at iteration #2 or deeper. If
> > that's the case, then this whole idea of postponing loop convergence
> > checks probably doesn't work.
>
> Consider the following code:
>
>     1.  SEC("fentry/" SYS_PREFIX "sys_nanosleep")
>     2.  int num_iter_bug(const void *ctx) {
>     3.      struct bpf_iter_num it;
>     4.      __u64 val = 0;
>     5.      __u64 *ptr = &val;
>     6.      __u64 rnd = bpf_get_current_pid_tgid();
>     7.
>     8.      bpf_iter_num_new(&it, 0, 10);
>     9.      while (bpf_iter_num_next(&it)) {
>     10.          rnd++;
>     11.          if (rnd == 42) {
>     12.              ptr = (void*)(0xdead);
>     13.              continue;
>     14.          }
>     15.          if (!bpf_iter_num_next(&it))
>     16.              break;
>     17.          bpf_probe_read_user(ptr, 8, (void*)(0xdeadbeef));
>     18.      }
>     19.      bpf_iter_num_destroy(&it);
>     20.      return 0;
>     21. }
>
> As far as I understand the following verification paths would be
> explored (ignoring traces with drained iterators):
> - entry:
>   - 8,9,10,11,12,13:
>     - at 9 checkpoint (C1) would be created with it{.depth=0,.state=active},ptr=&val;
>     - at 11 branch (B1) would be created with it{.depth=0,.state=active},ptr=&val;
>     - jump from 13 to 9 would be postponed with state
>       it{.depth=0,.state=active},ptr=0xdead as proceeding would increase 'it' depth;
> - jump from 11 to 15 (branch B1):
>   - 15:
>     - checkpoint would be created with it{.depth=0,.state=active};
>     - jump from 15 to 17 would be postponed with state
>       it{.depth=0,.state=active} as proceeding would increase 'it' depth.
> - at this point verifier would run out of paths that don't increase
>   iterators depth and there are two choices:
>   - (a) jump from 13 to 9 with state it{.depth=0,.state=active},ptr=0xdead;
>   - (b) jump from 15 to 17 with state it{.depth=0,.state=active},ptr=&val.
>   If (a) would be processed first there would be no read mark for
>   'ptr' in C1 yet and verifier runs into to the same issue as with
>   original example.
>
> Do I miss something or is it a legit counter-example?
>

I think it is, yes. With the use of

if (!bpf_iter_num_next(&it))
    break;

you can postpone any important markings as deep into iteration
verification as necessary. So yeah, the idea to explore code path
level by level won't cover all the cases, unfortunately.

> > I can't say I understand what exactly you are proposing and how you
> > are going to determine these conservative precision marks. But I'd
> > like to learn some new ideas, so please elaborate :)
>
> I'll send a follow-up email, trying to figure out what to do with pointers.

Ok, sounds good.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-09-22 20:52                                   ` Andrii Nakryiko
@ 2023-09-25  1:01                                     ` Eduard Zingerman
  2023-09-26  0:33                                       ` Andrii Nakryiko
  0 siblings, 1 reply; 52+ messages in thread
From: Eduard Zingerman @ 2023-09-25  1:01 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu,
	Kumar Kartikeya Dwivedi

On Fri, 2023-09-22 at 13:52 -0700, Andrii Nakryiko wrote:
> > > I can't say I understand what exactly you are proposing and how you
> > > are going to determine these conservative precision marks. But I'd
> > > like to learn some new ideas, so please elaborate :)
> > 
> > I'll send a follow-up email, trying to figure out what to do with pointers.
> 
> Ok, sounds good.

(below explains why DFA was a bad idea, please proceed to the next
 section if that's not interesting).

My initial thought was that it would be possible to use a simple
live-variable DFA in order to compute conservative read and precise
marks. Like a textbook version:
- split code in basic blocks;
- for each instruction propagate liveness information backwards:
  - define USE(insn) to be a set of registers / stack slots read by
    this instruction;
  - define KILL(insn) to be a set of registers / stack slots written
    to by this instruction.
  - compute set of live variables before insn as:
    LIVE_in(insn) = USE(insn) U (LIVE_out(insn) / KILL(insn))
- compute basic block BB_out as a union of BB_in of each successor;
- etc... iterate in postorder until fixed point is reached.

However, this is a moot point because of presence of pointers.
For series of instructions like:
  "r1 = r10; r1 += -8; r1 = *(u64 *)(r1 + 0);"
Such algorithm would need to know possible range of values for r1.
Meaning that before executing liveness DFA some sort of value range
analysis would have to be run.
And these things might get complicated [1]
(I never implemented one, maybe it's not that bad).
So, basically that would be a mini-copy of the verifier but
implemented as a thing that computes fixed point instead of "tracing".
Probably a no-go.

[1] https://boxbase.org/entries/2017/oct/23/range_analysis_papers/

---

However, I think there are two more relatively simple options that
should give correct results.

Option A. "Exit or Loop"
------------------------

- Similar to Andrii's suggestion postpone exploration of
  bpf_iter_*_next() #N->#N+1 states for as long as there are other
  states to explore.
- When evaluating is_state_visited() for bpf_iter_*_next() #N->#N+1
  allow to use visited state with .branches > 0 to prune current
  state if:
  - states_equal() for visited and current states returns true;
  - all verfication paths starting from current instruction index with
    current iterator ref_obj_id either run to safe exit or run back to
    current instruction index with current iterator ref_obj_id.

The last rule is needed to:
- Ensure presence of read and precision marks in a state used for pruning.
- Give a meaning to the precise scalars comparisons in regsafe() which
  I think is currently missing for bpf_iter_*_next() .branches > 0 case.
  Precise scalars comparison for visited states with .branches == 0
  could be read as: "using any value from a specific range it is
  possible to reach safe exit".
  Precise scalars comparison for looping states from above
  could be read as: "using any value from a specific range it is
  possible to reach either safe exit or get back to this instruction".

This algorithm should be able to handle simple iteration patterns like:

    bpf_iter_*_new(&it1, ...)
    while (!bpf_iter_*_next(&it1)) { ... }
    bpf_iter_*_destroy(&it1)

And also handle nested iteration:

    bpf_iter_*_new(&it1, ...)
    while (!bpf_iter_*_next(&it1)) {
      bpf_iter_*_new(&it2, ...)
      while (!bpf_iter_*_next(&it2)) { ... }
      bpf_iter_*_destroy(&it2)
    }
    bpf_iter_*_destroy(&it1)

But it would fail to handle more complex patterns, e.g. interleaved
iteration:

    for (;;) {
      if (!bpf_iter_*_next(&it1))  // a
        break;
      if (!bpf_iter_*_next(&it2))  // b
        break;
      ...
    }

For any state originating in (a) there would always be a state in (b)
pending verification and vice versa. It would actually bail out even
if it1 == it2.

Option B. "Widening"
--------------------

The trivial fix for current .branches > 0 states comparison is to
force "exact" states comparisons for is_iter_next_insn() case:
1. Ignore liveness marks, as those are not finalized yet;
2. Ignore precision marks, as those are not finalized yet;
3. Use regs_exact() for scalars comparison.

This, however, is very restrictive as it fails to verify trivial
programs like (iters_looping/simplest_loop):

    sum = 0;
    bpf_iter_num_new(&it, 0, 10);
    while (!(r0 = bpf_iter_num_next()))
      sum += *(u32 *)r0;
    bpf_iter_num_destroy(&it);

Here ever increasing bounds for "sum" prevent regs_exact() from ever
returning true.

One way to lift this limitation is to borrow something like the
"widening" trick from the abstract interpretation papers mentioned
earlier:
- suppose there is current state C and a visited is_iter_next_insn()
  state V with .branches > 0;
- suppose states_equal(V, C) returns true but exact states comparison
  returns false because there are scalar values not marked as precise
  that differ between V and C.
- copy C as C' and mark these scalars as __mark_reg_unbounded() in C';
- run verification for C':
  - if verification succeeds -- problem solved;
  - if verification fails -- discard C' and proceed from C.

Such algorithm should succeed for programs that don't use widened
values in "precise" context.

Looking at testcases failing with trivial fix I think that such
limitations would be similar to those already present (e.g. the big
comment in progs/iters.c:iter_obfuscate_counter would still apply).

---

Option B appears to be simpler to implement so I'm proceeding with it.
Will share progress on Monday. Please let me know if there are any
obvious flaws with what I shared above.

Thanks,
Eduard

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-09-25  1:01                                     ` Eduard Zingerman
@ 2023-09-26  0:33                                       ` Andrii Nakryiko
  2023-09-26 15:55                                         ` Eduard Zingerman
  0 siblings, 1 reply; 52+ messages in thread
From: Andrii Nakryiko @ 2023-09-26  0:33 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Alexei Starovoitov, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu,
	Kumar Kartikeya Dwivedi

On Sun, Sep 24, 2023 at 6:02 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Fri, 2023-09-22 at 13:52 -0700, Andrii Nakryiko wrote:
> > > > I can't say I understand what exactly you are proposing and how you
> > > > are going to determine these conservative precision marks. But I'd
> > > > like to learn some new ideas, so please elaborate :)
> > >
> > > I'll send a follow-up email, trying to figure out what to do with pointers.
> >
> > Ok, sounds good.
>
> (below explains why DFA was a bad idea, please proceed to the next
>  section if that's not interesting).

ok, that's what I'm doing :)

>
> My initial thought was that it would be possible to use a simple
> live-variable DFA in order to compute conservative read and precise
> marks. Like a textbook version:
> - split code in basic blocks;
> - for each instruction propagate liveness information backwards:
>   - define USE(insn) to be a set of registers / stack slots read by
>     this instruction;
>   - define KILL(insn) to be a set of registers / stack slots written
>     to by this instruction.
>   - compute set of live variables before insn as:
>     LIVE_in(insn) = USE(insn) U (LIVE_out(insn) / KILL(insn))
> - compute basic block BB_out as a union of BB_in of each successor;
> - etc... iterate in postorder until fixed point is reached.
>
> However, this is a moot point because of presence of pointers.
> For series of instructions like:
>   "r1 = r10; r1 += -8; r1 = *(u64 *)(r1 + 0);"
> Such algorithm would need to know possible range of values for r1.
> Meaning that before executing liveness DFA some sort of value range
> analysis would have to be run.
> And these things might get complicated [1]
> (I never implemented one, maybe it's not that bad).
> So, basically that would be a mini-copy of the verifier but
> implemented as a thing that computes fixed point instead of "tracing".
> Probably a no-go.
>
> [1] https://boxbase.org/entries/2017/oct/23/range_analysis_papers/
>
> ---
>
> However, I think there are two more relatively simple options that
> should give correct results.
>
> Option A. "Exit or Loop"
> ------------------------
>
> - Similar to Andrii's suggestion postpone exploration of
>   bpf_iter_*_next() #N->#N+1 states for as long as there are other
>   states to explore.
> - When evaluating is_state_visited() for bpf_iter_*_next() #N->#N+1
>   allow to use visited state with .branches > 0 to prune current
>   state if:
>   - states_equal() for visited and current states returns true;
>   - all verfication paths starting from current instruction index with
>     current iterator ref_obj_id either run to safe exit or run back to
>     current instruction index with current iterator ref_obj_id.
>
> The last rule is needed to:
> - Ensure presence of read and precision marks in a state used for pruning.
> - Give a meaning to the precise scalars comparisons in regsafe() which
>   I think is currently missing for bpf_iter_*_next() .branches > 0 case.
>   Precise scalars comparison for visited states with .branches == 0
>   could be read as: "using any value from a specific range it is
>   possible to reach safe exit".
>   Precise scalars comparison for looping states from above
>   could be read as: "using any value from a specific range it is
>   possible to reach either safe exit or get back to this instruction".
>
> This algorithm should be able to handle simple iteration patterns like:
>
>     bpf_iter_*_new(&it1, ...)
>     while (!bpf_iter_*_next(&it1)) { ... }
>     bpf_iter_*_destroy(&it1)
>
> And also handle nested iteration:
>
>     bpf_iter_*_new(&it1, ...)
>     while (!bpf_iter_*_next(&it1)) {
>       bpf_iter_*_new(&it2, ...)
>       while (!bpf_iter_*_next(&it2)) { ... }
>       bpf_iter_*_destroy(&it2)
>     }
>     bpf_iter_*_destroy(&it1)
>
> But it would fail to handle more complex patterns, e.g. interleaved
> iteration:
>
>     for (;;) {
>       if (!bpf_iter_*_next(&it1))  // a
>         break;
>       if (!bpf_iter_*_next(&it2))  // b
>         break;
>       ...
>     }
>
> For any state originating in (a) there would always be a state in (b)
> pending verification and vice versa. It would actually bail out even
> if it1 == it2.

not working for this intermixed iterator case would be ok, I think
most practical use cases would be a properly nested loops.

But, compiler can actually do something like even for a proper loop.
E.g., something like below

for (; bpf_iter_num_next(&it); ) {
   ....
}

in assembly could be layed out as


bpf_iter_num_next(&it);
...
again:
r0 = bpf_iter_num_next(&it)
...
if (r0) goto again


Or something along those lines. So these assumptions that the
iterator's next() call is happening at the same instruction is
problematic.



>
> Option B. "Widening"
> --------------------
>
> The trivial fix for current .branches > 0 states comparison is to
> force "exact" states comparisons for is_iter_next_insn() case:
> 1. Ignore liveness marks, as those are not finalized yet;
> 2. Ignore precision marks, as those are not finalized yet;
> 3. Use regs_exact() for scalars comparison.
>
> This, however, is very restrictive as it fails to verify trivial
> programs like (iters_looping/simplest_loop):
>
>     sum = 0;
>     bpf_iter_num_new(&it, 0, 10);
>     while (!(r0 = bpf_iter_num_next()))
>       sum += *(u32 *)r0;
>     bpf_iter_num_destroy(&it);
>
> Here ever increasing bounds for "sum" prevent regs_exact() from ever
> returning true.
>
> One way to lift this limitation is to borrow something like the
> "widening" trick from the abstract interpretation papers mentioned
> earlier:
> - suppose there is current state C and a visited is_iter_next_insn()
>   state V with .branches > 0;
> - suppose states_equal(V, C) returns true but exact states comparison
>   returns false because there are scalar values not marked as precise
>   that differ between V and C.
> - copy C as C' and mark these scalars as __mark_reg_unbounded() in C';
> - run verification for C':
>   - if verification succeeds -- problem solved;
>   - if verification fails -- discard C' and proceed from C.
>
> Such algorithm should succeed for programs that don't use widened
> values in "precise" context.
>
> Looking at testcases failing with trivial fix I think that such
> limitations would be similar to those already present (e.g. the big
> comment in progs/iters.c:iter_obfuscate_counter would still apply).
>

This makes sense. I was originally thinking along those lines (and
rejected it for myself), but in a more eager (and thus restrictive)
way: for any scalar register where old and new register states are
equivalent thanks to read mark or precision bit (or rather lack of
precision bit), force that precision/mark in old state. But that is
too pessimistic for cases where we truly needed to simulate N+1 due to
state differences, while keeping the old state intact.

What you propose with temporary C -> C' seems to be along similar
lines, but will give "a second chance" if something doesn't work out:
we'll go on N+1 instead of just failing.


But let's think about this case. Let's say in V R1 is set to 7, but
imprecise. In C we have R1 as 14, also imprecise. According to your
algorithm, we'll create C1 with R1 set to <any num>. Now I have two
questions:

1. How do we terminate C' validation? What happens when we get back to
the next() call again and do states_equal() between V and C'? We just
assume it's correct because C' has R1 as unbounded?

2. Assuming yes. What if later on, V's R1 is actually forced to be
precise, so only if V's R1=P7, then it is safe. But we already
concluded that C' is safe based on R1 being unbounded, right? Isn't
that a violation?


> ---
>
> Option B appears to be simpler to implement so I'm proceeding with it.
> Will share progress on Monday. Please let me know if there are any
> obvious flaws with what I shared above.
>
> Thanks,
> Eduard

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-09-26  0:33                                       ` Andrii Nakryiko
@ 2023-09-26 15:55                                         ` Eduard Zingerman
  2023-09-26 16:25                                           ` Andrii Nakryiko
  0 siblings, 1 reply; 52+ messages in thread
From: Eduard Zingerman @ 2023-09-26 15:55 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu,
	Kumar Kartikeya Dwivedi

On Mon, 2023-09-25 at 17:33 -0700, Andrii Nakryiko wrote:
[...]
> not working for this intermixed iterator case would be ok, I think
> most practical use cases would be a properly nested loops.
> 
> But, compiler can actually do something like even for a proper loop.
> E.g., something like below
> 
> for (; bpf_iter_num_next(&it); ) {
>    ....
> }
> 
> in assembly could be layed out as
> 
> 
> bpf_iter_num_next(&it);
> ...
> again:
> r0 = bpf_iter_num_next(&it)
> ...
> if (r0) goto again
> 
> 
> Or something along those lines. So these assumptions that the
> iterator's next() call is happening at the same instruction is
> problematic.

For the specific example above I think it would not be an issue
because there is still only one next() call in the loop =>
there would be no stalled next() transition states.

I checked disassembly for progs/iters.c and it appears that for each
program there compiler preserves nested loops structure such that:
- each loop is driven by a single next() call of a dedicated iterator;
- each nested loop exit goes through destroy() for corresponding
  iterator, meaning that outer loop's next will never see inner's loop
  iterator as active => no stalled states due to inner loop
  processing.

Of-course, I'm not a robot and might have missed something that would
break real implementation.
 
> > Option B. "Widening"
> > --------------------
> > 
> > The trivial fix for current .branches > 0 states comparison is to
> > force "exact" states comparisons for is_iter_next_insn() case:
> > 1. Ignore liveness marks, as those are not finalized yet;
> > 2. Ignore precision marks, as those are not finalized yet;
> > 3. Use regs_exact() for scalars comparison.
> > 
> > This, however, is very restrictive as it fails to verify trivial
> > programs like (iters_looping/simplest_loop):
> > 
> >     sum = 0;
> >     bpf_iter_num_new(&it, 0, 10);
> >     while (!(r0 = bpf_iter_num_next()))
> >       sum += *(u32 *)r0;
> >     bpf_iter_num_destroy(&it);
> > 
> > Here ever increasing bounds for "sum" prevent regs_exact() from ever
> > returning true.
> > 
> > One way to lift this limitation is to borrow something like the
> > "widening" trick from the abstract interpretation papers mentioned
> > earlier:
> > - suppose there is current state C and a visited is_iter_next_insn()
> >   state V with .branches > 0;
> > - suppose states_equal(V, C) returns true but exact states comparison
> >   returns false because there are scalar values not marked as precise
> >   that differ between V and C.
> > - copy C as C' and mark these scalars as __mark_reg_unbounded() in C';
> > - run verification for C':
> >   - if verification succeeds -- problem solved;
> >   - if verification fails -- discard C' and proceed from C.
> > 
> > Such algorithm should succeed for programs that don't use widened
> > values in "precise" context.
> > 
> > Looking at testcases failing with trivial fix I think that such
> > limitations would be similar to those already present (e.g. the big
> > comment in progs/iters.c:iter_obfuscate_counter would still apply).
> > 
> 
> This makes sense. I was originally thinking along those lines (and
> rejected it for myself), but in a more eager (and thus restrictive)
> way: for any scalar register where old and new register states are
> equivalent thanks to read mark or precision bit (or rather lack of
> precision bit), force that precision/mark in old state. But that is
> too pessimistic for cases where we truly needed to simulate N+1 due to
> state differences, while keeping the old state intact.

In other words there is a function states_equal' for comparison of
states when old{.branches > 0}, which differs from states_equal in
the following way:
- considers all registers read;
- considers all scalars precise.

> What you propose with temporary C -> C' seems to be along similar
> lines, but will give "a second chance" if something doesn't work out:
> we'll go on N+1 instead of just failing.

Right.

> But let's think about this case. Let's say in V R1 is set to 7, but
> imprecise. In C we have R1 as 14, also imprecise. According to your
> algorithm, we'll create C1 with R1 set to <any num>. Now I have two
> questions:
> 
> 1. How do we terminate C' validation? What happens when we get back to
> the next() call again and do states_equal() between V and C'? We just
> assume it's correct because C' has R1 as unbounded?

For the definition above states_equal'(V, C') is false, but because we
are at checkpoint on the next iteration we would get to
states_equal'(C', C'') where C'' is derived from C' and same rules
would apply. If we are lucky nothing would change and there would no
point in scheduling another traversal.

> 2. Assuming yes. What if later on, V's R1 is actually forced to be
> precise, so only if V's R1=P7, then it is safe. But we already
> concluded that C' is safe based on R1 being unbounded, right? Isn't
> that a violation?

My base assumption is that:
1. any instruction reachable from V should also be reachable from C';
2. and that it is also guaranteed to be visited from C'.
I cannot give a formal reasoning for (2) at the moment.

In case if this assumption holds, when verification proceeds from C'
it would eventually get to instruction for which R1 value is only safe
when R1=7, if so:
- verification of C' conjecture would fail;
- all states derived from C' would need to be removed from
  states stack / explored states;
- verification should proceed from C.
(And there is also a question of delaying read/precision propagation
 from unproven conjecture states to regular states).

(Also I'm still not sure whether to use regs_exact() for scalars
 comparison instead of precision logic in states_equal').

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-09-26 15:55                                         ` Eduard Zingerman
@ 2023-09-26 16:25                                           ` Andrii Nakryiko
  2023-09-28  1:09                                             ` Eduard Zingerman
  0 siblings, 1 reply; 52+ messages in thread
From: Andrii Nakryiko @ 2023-09-26 16:25 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Alexei Starovoitov, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu,
	Kumar Kartikeya Dwivedi

On Tue, Sep 26, 2023 at 8:55 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Mon, 2023-09-25 at 17:33 -0700, Andrii Nakryiko wrote:
> [...]
> > not working for this intermixed iterator case would be ok, I think
> > most practical use cases would be a properly nested loops.
> >
> > But, compiler can actually do something like even for a proper loop.
> > E.g., something like below
> >
> > for (; bpf_iter_num_next(&it); ) {
> >    ....
> > }
> >
> > in assembly could be layed out as
> >
> >
> > bpf_iter_num_next(&it);
> > ...
> > again:
> > r0 = bpf_iter_num_next(&it)
> > ...
> > if (r0) goto again
> >
> >
> > Or something along those lines. So these assumptions that the
> > iterator's next() call is happening at the same instruction is
> > problematic.
>
> For the specific example above I think it would not be an issue
> because there is still only one next() call in the loop =>
> there would be no stalled next() transition states.
>
> I checked disassembly for progs/iters.c and it appears that for each
> program there compiler preserves nested loops structure such that:
> - each loop is driven by a single next() call of a dedicated iterator;
> - each nested loop exit goes through destroy() for corresponding
>   iterator, meaning that outer loop's next will never see inner's loop
>   iterator as active => no stalled states due to inner loop
>   processing.
>
> Of-course, I'm not a robot and might have missed something that would
> break real implementation.
>
> > > Option B. "Widening"
> > > --------------------
> > >
> > > The trivial fix for current .branches > 0 states comparison is to
> > > force "exact" states comparisons for is_iter_next_insn() case:
> > > 1. Ignore liveness marks, as those are not finalized yet;
> > > 2. Ignore precision marks, as those are not finalized yet;
> > > 3. Use regs_exact() for scalars comparison.
> > >
> > > This, however, is very restrictive as it fails to verify trivial
> > > programs like (iters_looping/simplest_loop):
> > >
> > >     sum = 0;
> > >     bpf_iter_num_new(&it, 0, 10);
> > >     while (!(r0 = bpf_iter_num_next()))
> > >       sum += *(u32 *)r0;
> > >     bpf_iter_num_destroy(&it);
> > >
> > > Here ever increasing bounds for "sum" prevent regs_exact() from ever
> > > returning true.
> > >
> > > One way to lift this limitation is to borrow something like the
> > > "widening" trick from the abstract interpretation papers mentioned
> > > earlier:
> > > - suppose there is current state C and a visited is_iter_next_insn()
> > >   state V with .branches > 0;
> > > - suppose states_equal(V, C) returns true but exact states comparison
> > >   returns false because there are scalar values not marked as precise
> > >   that differ between V and C.
> > > - copy C as C' and mark these scalars as __mark_reg_unbounded() in C';
> > > - run verification for C':
> > >   - if verification succeeds -- problem solved;
> > >   - if verification fails -- discard C' and proceed from C.
> > >
> > > Such algorithm should succeed for programs that don't use widened
> > > values in "precise" context.
> > >
> > > Looking at testcases failing with trivial fix I think that such
> > > limitations would be similar to those already present (e.g. the big
> > > comment in progs/iters.c:iter_obfuscate_counter would still apply).
> > >
> >
> > This makes sense. I was originally thinking along those lines (and
> > rejected it for myself), but in a more eager (and thus restrictive)
> > way: for any scalar register where old and new register states are
> > equivalent thanks to read mark or precision bit (or rather lack of
> > precision bit), force that precision/mark in old state. But that is
> > too pessimistic for cases where we truly needed to simulate N+1 due to
> > state differences, while keeping the old state intact.
>
> In other words there is a function states_equal' for comparison of
> states when old{.branches > 0}, which differs from states_equal in
> the following way:
> - considers all registers read;
> - considers all scalars precise.
>

Not really. The important aspect is to mark registers that were
required to be imprecise in old state as "required to be imprecise",
and if later we decide that this register has to be precise, too bad,
too late, game over (which is why I didn't propose it, this seems too
restrictive).

> > What you propose with temporary C -> C' seems to be along similar
> > lines, but will give "a second chance" if something doesn't work out:
> > we'll go on N+1 instead of just failing.
>
> Right.
>
> > But let's think about this case. Let's say in V R1 is set to 7, but
> > imprecise. In C we have R1 as 14, also imprecise. According to your
> > algorithm, we'll create C1 with R1 set to <any num>. Now I have two
> > questions:
> >
> > 1. How do we terminate C' validation? What happens when we get back to
> > the next() call again and do states_equal() between V and C'? We just
> > assume it's correct because C' has R1 as unbounded?
>
> For the definition above states_equal'(V, C') is false, but because we
> are at checkpoint on the next iteration we would get to
> states_equal'(C', C'') where C'' is derived from C' and same rules
> would apply. If we are lucky nothing would change and there would no
> point in scheduling another traversal.

Ah, I see, "fixed point" state convergence is the hope, ok.

>
> > 2. Assuming yes. What if later on, V's R1 is actually forced to be
> > precise, so only if V's R1=P7, then it is safe. But we already
> > concluded that C' is safe based on R1 being unbounded, right? Isn't
> > that a violation?
>
> My base assumption is that:
> 1. any instruction reachable from V should also be reachable from C';
> 2. and that it is also guaranteed to be visited from C'.
> I cannot give a formal reasoning for (2) at the moment.
>
> In case if this assumption holds, when verification proceeds from C'
> it would eventually get to instruction for which R1 value is only safe
> when R1=7, if so:
> - verification of C' conjecture would fail;
> - all states derived from C' would need to be removed from
>   states stack / explored states;
> - verification should proceed from C.
> (And there is also a question of delaying read/precision propagation
>  from unproven conjecture states to regular states).
>
> (Also I'm still not sure whether to use regs_exact() for scalars
>  comparison instead of precision logic in states_equal').

Yeah, it's getting complicated, eh? :)

But aside from this approach, I was thinking back to my proposal to
combine BFS and DFS approaches. Let's give it another look. Quoting
from my earlier email:


The idea is to let verifier explore all code paths starting from
iteration #N, except the code paths that lead to looping into
iteration #N+1. I tried to do that with validating NULL case first and
exiting from loop on each iteration (first), but clearly that doesn't
capture all the cases, as Eduard have shown.

So what if we delay convergence state checks (and then further
exploration at iteration #N+1) using BFS instead of DFS? That is, when
we are about to start new iteration and check state convergence, we
enqueue current state to be processed later after all the states that
"belong" to iteration #N are processed.


I still think that the whole "let's explore all states except the
looping back ones" idea is the right direction. But the above is a bit
imprecise about "iteration #N and iteration #N+1" parts. We
interpreted it as when we get back to *any* next() call for a given
iterator, then we postpone the checks.

And you've shown with your counter examples how we actually don't
explore all possible code paths. Let's see if we can actually fix that
problem and make sure that we do explore everything that is
terminatable. How about the following clarifications/refinements to
the above:

1. If V and C (using your terminology from earlier, where V is the old
parent state at some next() call instruction, and C is the current one
on the same instruction) are different -- we just keep going. So
always try to explore different input states for the loop.

2. But if V and C are equivalent, it's too early to conclude anything.
So enqueue C for later in a separate BFS queue (and perhaps that queue
is per-instruction, actually; or maybe even per-state, not sure), and
keep exploring all the other pending queues from the (global) DFS
queue, until we get back to state V again. At that point we need to
start looking at postponed states for that V state. But this time we
should be sure that precision and read marks are propagated from all
those terminatable code paths.

Basically, this tries to make sure that we do mark every register that
is important for all the branching decision making, memory
dereferences, etc. And just avoids going into endless loops with the
same input conditions.

Give it some fresh thought and let's see if we are missing something
again. Thanks!

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-09-26 16:25                                           ` Andrii Nakryiko
@ 2023-09-28  1:09                                             ` Eduard Zingerman
  2023-09-28 18:30                                               ` Andrii Nakryiko
  2023-09-30  0:41                                               ` Alexei Starovoitov
  0 siblings, 2 replies; 52+ messages in thread
From: Eduard Zingerman @ 2023-09-28  1:09 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu,
	Kumar Kartikeya Dwivedi

On Tue, 2023-09-26 at 09:25 -0700, Andrii Nakryiko wrote:
[...]
> > In other words there is a function states_equal' for comparison of
> > states when old{.branches > 0}, which differs from states_equal in
> > the following way:
> > - considers all registers read;
> > - considers all scalars precise.
> >
> 
> Not really. The important aspect is to mark registers that were
> required to be imprecise in old state as "required to be imprecise",
> and if later we decide that this register has to be precise, too bad,
> too late, game over (which is why I didn't propose it, this seems too
> restrictive).

Could you please elaborate a bit? What's wrong with the following:
Suppose I see a register R that differs between V and C an is not
precise in both. I fork C as C', mark R unbound in C' and proceed with
C' verification. At some point during that verification I see that
some precise R's value is necessary, thus C' verification fails.
If that happens verification resumes from C, otherwise C is discarded.
I also postpone read and precision marks propagation from C' to it's
parent until C' verification succeeds (if it succeeds).

[...]
> 1. If V and C (using your terminology from earlier, where V is the old
> parent state at some next() call instruction, and C is the current one
> on the same instruction) are different -- we just keep going. So
> always try to explore different input states for the loop.
> 
> 2. But if V and C are equivalent, it's too early to conclude anything.
> So enqueue C for later in a separate BFS queue (and perhaps that queue
> is per-instruction, actually; or maybe even per-state, not sure), and
> keep exploring all the other pending queues from the (global) DFS
> queue, until we get back to state V again. At that point we need to
> start looking at postponed states for that V state. But this time we
> should be sure that precision and read marks are propagated from all
> those terminatable code paths.
> 
> Basically, this tries to make sure that we do mark every register that
> is important for all the branching decision making, memory
> dereferences, etc. And just avoids going into endless loops with the
> same input conditions.
> 
> Give it some fresh thought and let's see if we are missing something
> again. Thanks!

This should work for examples we've seen so far.
Why do you think a separate per-instruction queue is necessary?
The way I read it the following algorithm should suffice:
- add a field bpf_verifier_env::iter_head similar to 'head' but for
  postponed looping states;
- add functions push_iter_stack(), pop_iter_stack() similar to
  push_stack() and pop_stack();
- modify is_state_visited() as follows:

 static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
 {
     ...
     while (sl) {
         ...
         if (sl->state.branches) {
             ...
             if (is_iter_next_insn(env, insn_idx)) {
                 if (states_equal(env, &sl->state, cur)) {
                     ...
                     iter_state = &func(env, iter_reg)->stack[spi].spilled_ptr;
                     if (iter_state->iter.state == BPF_ITER_STATE_ACTIVE) {
+                        // Don't want to proceed with 'cur' verification,
+                        // push it to iters queue to check again if states
+                        // are still equal after env->head is exahusted.
+                        if (env->stack_size != 0)
+                            push_iter_stack(env, cur, ...);
                         goto hit;
                     }
                 }
                 goto skip_inf_loop_check;
             }
     ...
 }
 
- modify do_check() to do pop_iter_stack() if pop_stack() is
  exhausted, the popped state would get into is_state_visited() and
  checked against old state, which at that moment should have all
  read/precision masks that env->head could have provided.

After working on "widening conjectures" implementation a bit this
approach seems to be much simpler. Need to think harder if I can break it.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-09-28  1:09                                             ` Eduard Zingerman
@ 2023-09-28 18:30                                               ` Andrii Nakryiko
  2023-10-02  3:26                                                 ` Eduard Zingerman
  2023-09-30  0:41                                               ` Alexei Starovoitov
  1 sibling, 1 reply; 52+ messages in thread
From: Andrii Nakryiko @ 2023-09-28 18:30 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Alexei Starovoitov, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu,
	Kumar Kartikeya Dwivedi

On Wed, Sep 27, 2023 at 6:09 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Tue, 2023-09-26 at 09:25 -0700, Andrii Nakryiko wrote:
> [...]
> > > In other words there is a function states_equal' for comparison of
> > > states when old{.branches > 0}, which differs from states_equal in
> > > the following way:
> > > - considers all registers read;
> > > - considers all scalars precise.
> > >
> >
> > Not really. The important aspect is to mark registers that were
> > required to be imprecise in old state as "required to be imprecise",
> > and if later we decide that this register has to be precise, too bad,
> > too late, game over (which is why I didn't propose it, this seems too
> > restrictive).
>
> Could you please elaborate a bit? What's wrong with the following:
> Suppose I see a register R that differs between V and C an is not
> precise in both. I fork C as C', mark R unbound in C' and proceed with
> C' verification. At some point during that verification I see that
> some precise R's value is necessary, thus C' verification fails.
> If that happens verification resumes from C, otherwise C is discarded.
> I also postpone read and precision marks propagation from C' to it's
> parent until C' verification succeeds (if it succeeds).

Nothing wrong, I'm just saying your C' derivation gives a chance to
undo things, while my eager idea wouldn't allow to do that. And that's
bad, so I didn't propose any of that.

Aside from that, I'd be curious to see how to prevent read/precision
marks propagation to V, seems like yet another hack and special case,
which just seems like yet another complication.

>
> [...]
> > 1. If V and C (using your terminology from earlier, where V is the old
> > parent state at some next() call instruction, and C is the current one
> > on the same instruction) are different -- we just keep going. So
> > always try to explore different input states for the loop.
> >
> > 2. But if V and C are equivalent, it's too early to conclude anything.
> > So enqueue C for later in a separate BFS queue (and perhaps that queue
> > is per-instruction, actually; or maybe even per-state, not sure), and
> > keep exploring all the other pending queues from the (global) DFS
> > queue, until we get back to state V again. At that point we need to
> > start looking at postponed states for that V state. But this time we
> > should be sure that precision and read marks are propagated from all
> > those terminatable code paths.
> >
> > Basically, this tries to make sure that we do mark every register that
> > is important for all the branching decision making, memory
> > dereferences, etc. And just avoids going into endless loops with the
> > same input conditions.
> >
> > Give it some fresh thought and let's see if we are missing something
> > again. Thanks!
>
> This should work for examples we've seen so far.
> Why do you think a separate per-instruction queue is necessary?

I'm not positive, it was just a possibility that it might matter. I'd
try with global queue first and tried to break the approach.

> The way I read it the following algorithm should suffice:
> - add a field bpf_verifier_env::iter_head similar to 'head' but for
>   postponed looping states;
> - add functions push_iter_stack(), pop_iter_stack() similar to
>   push_stack() and pop_stack();

I don't like the suggested naming, it's too iter-centric, and it's
actually a queue, not a stack, etc. But that's something to figure out
later.

> - modify is_state_visited() as follows:
>
>  static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
>  {
>      ...
>      while (sl) {
>          ...
>          if (sl->state.branches) {
>              ...
>              if (is_iter_next_insn(env, insn_idx)) {
>                  if (states_equal(env, &sl->state, cur)) {
>                      ...
>                      iter_state = &func(env, iter_reg)->stack[spi].spilled_ptr;
>                      if (iter_state->iter.state == BPF_ITER_STATE_ACTIVE) {
> +                        // Don't want to proceed with 'cur' verification,
> +                        // push it to iters queue to check again if states
> +                        // are still equal after env->head is exahusted.
> +                        if (env->stack_size != 0)
> +                            push_iter_stack(env, cur, ...);
>                          goto hit;
>                      }
>                  }
>                  goto skip_inf_loop_check;
>              }
>      ...
>  }
>
> - modify do_check() to do pop_iter_stack() if pop_stack() is
>   exhausted, the popped state would get into is_state_visited() and
>   checked against old state, which at that moment should have all
>   read/precision masks that env->head could have provided.

Yeah, something like that.

>
> After working on "widening conjectures" implementation a bit this
> approach seems to be much simpler. Need to think harder if I can break it.

yeah, I think this C' derivation while conceptually seems
straightforward, probably has tons of non-obvious complications.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-09-28  1:09                                             ` Eduard Zingerman
  2023-09-28 18:30                                               ` Andrii Nakryiko
@ 2023-09-30  0:41                                               ` Alexei Starovoitov
  2023-10-02  1:40                                                 ` Eduard Zingerman
  1 sibling, 1 reply; 52+ messages in thread
From: Alexei Starovoitov @ 2023-09-30  0:41 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Andrii Nakryiko, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu,
	Kumar Kartikeya Dwivedi

On Wed, Sep 27, 2023 at 6:09 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
>  static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
>  {
>      ...
>      while (sl) {
>          ...
>          if (sl->state.branches) {
>              ...
>              if (is_iter_next_insn(env, insn_idx)) {
>                  if (states_equal(env, &sl->state, cur)) {
>                      ...
>                      iter_state = &func(env, iter_reg)->stack[spi].spilled_ptr;
>                      if (iter_state->iter.state == BPF_ITER_STATE_ACTIVE) {
> +                        // Don't want to proceed with 'cur' verification,
> +                        // push it to iters queue to check again if states
> +                        // are still equal after env->head is exahusted.
> +                        if (env->stack_size != 0)
> +                            push_iter_stack(env, cur, ...);
>                          goto hit;

I suspect that neither option A "Exit or Loop" or B "widening"
are not correct.
In both cases we will do states_equal() with states that have
not reached exit and don't have live/precision marks.

The key aspect of the verifier state pruning is that safe states
in sl list explored all possible paths.
Say, there is an 'if' block after iter_destroy.
The push_stack/pop_stack logic is done as a stack to make sure
that the last 'if' has to be explored before earlier 'if' blocks are checked.
The existing bpf iter logic violated that.
To fix that we need to logically do:

if (is_iter_next_insn(env, insn_idx))
  if (states_equal(env, &sl->state, cur)) {
   if (iter_state->iter.state == BPF_ITER_STATE_DRAINED)
     goto hit;

instead of BPF_ITER_STATE_ACTIVE.
In other words we need to process loop iter == 0 case first
all the way till bpf_exit (just like we do right now),
then process loop iter == 1 and let it exit the loop and
go till bpf_exit (hopefully state prune will find equal states
right after exit from the loop).
then process loop iter == 2 and so on.
If there was an 'if' pushed to stack during loop iter == 1
it has to be popped and explored all the way till bpf_exit.

We cannot just replace BPF_ITER_STATE_ACTIVE with DRAINED.
It would still be ACTIVE with sl->state.branches==0 after that
state explored all paths to bpf_exit.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-09-30  0:41                                               ` Alexei Starovoitov
@ 2023-10-02  1:40                                                 ` Eduard Zingerman
  2023-10-02 16:29                                                   ` Alexei Starovoitov
  0 siblings, 1 reply; 52+ messages in thread
From: Eduard Zingerman @ 2023-10-02  1:40 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu,
	Kumar Kartikeya Dwivedi

On Fri, 2023-09-29 at 17:41 -0700, Alexei Starovoitov wrote:
[...]
> I suspect that neither option A "Exit or Loop" or B "widening"
> are not correct.
> In both cases we will do states_equal() with states that have
> not reached exit and don't have live/precision marks.

I'd like to argue about B "widening" for a bit, as I think it might be
interesting in general, and put A aside for now. The algorithm for
widening looks as follows:
- In is_states_equal() for (sl->state.branches > 0 && is_iter_next_insn()) case:
  - Check if states are equal exactly:
    - ignore liveness marks on old state;
    - demand same type for registers and stack slots;
    - ignore precision marks, instead compare scalars using
      regs_exact() [this differs from my previous emails, I'm now sure
      that for this scheme to be correct regs_exact() is needed].
  - If there is an exact match then follow "hit" branch. The idea
    being that visiting exactly the same state can't produce new
    execution paths (like with graph traversal).
  - If there is no exact match but there is some state V which for
    which states_equal(V, C) and V and C differ only in inexact
    scalars:
    - copy C to state C';
    - mark range for above mentioned inexact scalars as unbound;
    - continue verification from C';
    - if C' verification fails discard it and resume verification from C.

The hope here is that guess for "wide" scalars would be correct and
range for those would not matter. In such case C' would be added to
the explored states right after it's creation (as it is a checkpoint),
and later verification would get to the explored C' copy again, so
that exact comparison would prune further traversals.

Unfortunately, this is riddled with a number of technical
complications on implementation side.
What Andrii suggests might be simpler.

> The key aspect of the verifier state pruning is that safe states
> in sl list explored all possible paths.
> Say, there is an 'if' block after iter_destroy.
> The push_stack/pop_stack logic is done as a stack to make sure
> that the last 'if' has to be explored before earlier 'if' blocks are checked.
> The existing bpf iter logic violated that.
> To fix that we need to logically do:
> 
> if (is_iter_next_insn(env, insn_idx))
>   if (states_equal(env, &sl->state, cur)) {
>    if (iter_state->iter.state == BPF_ITER_STATE_DRAINED)
>      goto hit;
> 
> instead of BPF_ITER_STATE_ACTIVE.
> In other words we need to process loop iter == 0 case first
> all the way till bpf_exit (just like we do right now),
> then process loop iter == 1 and let it exit the loop and
> go till bpf_exit (hopefully state prune will find equal states
> right after exit from the loop).
> then process loop iter == 2 and so on.
> If there was an 'if' pushed to stack during loop iter == 1
> it has to be popped and explored all the way till bpf_exit.
> 
> We cannot just replace BPF_ITER_STATE_ACTIVE with DRAINED.
> It would still be ACTIVE with sl->state.branches==0 after that
> state explored all paths to bpf_exit.

This is similar to what Andrii suggests, please correct me if I'm wrong:

> 1. If V and C (using your terminology from earlier, where V is the old
> parent state at some next() call instruction, and C is the current one
> on the same instruction) are different -- we just keep going. So
> always try to explore different input states for the loop.

> 2. But if V and C are equivalent, it's too early to conclude anything.
> So enqueue C for later in a separate BFS queue (and perhaps that queue
> is per-instruction, actually; or maybe even per-state, not sure), and
> keep exploring all the other pending queues from the (global) DFS
> queue, until we get back to state V again. At that point we need to
> start looking at postponed states for that V state. But this time we
> should be sure that precision and read marks are propagated from all
> those terminatable code paths.

More formally, before pruning potential looping states we need to
make sure that all precision and read marks are in place.
To achieve this:
- Process states from env->head while those are available, in case if
  potential looping state (is_states_equal()) is reached put it to a
  separate queue.
- Once all env->head states are processed the only source for new read
  and precision marks is in postponed looping states, some of which
  might not be is_states_equal() anymore. Submit each such state for
  verification until fixed point is reached (repeating steps for
  env->head processing).

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-09-28 18:30                                               ` Andrii Nakryiko
@ 2023-10-02  3:26                                                 ` Eduard Zingerman
  0 siblings, 0 replies; 52+ messages in thread
From: Eduard Zingerman @ 2023-10-02  3:26 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu,
	Kumar Kartikeya Dwivedi

On Thu, 2023-09-28 at 11:30 -0700, Andrii Nakryiko wrote:
[...]
> > The way I read it the following algorithm should suffice:
> > - add a field bpf_verifier_env::iter_head similar to 'head' but for
> >   postponed looping states;
> > - add functions push_iter_stack(), pop_iter_stack() similar to
> >   push_stack() and pop_stack();
> 
> I don't like the suggested naming, it's too iter-centric, and it's
> actually a queue, not a stack, etc. But that's something to figure out
> later.

After spending some time I think I figured out an example that shows
that postponed looping states do not form a queue. Please bear with
me, as it is quite large:

    1.  j = iter_new();             // fp[-16]
    2.  a = 0;                      // r6
    3.  b = 0;                      // r7
    4.  c = -24;                    // r8
    5.  while (iter_next(j)) {
    6.    i = iter_new();
    7.    a = 0;
    8.    b = 0;
    9.    while (iter_next(i)) {
    10.      if (a == 1) {
    11.        a = 0;
    12.        b = 1;
    13.      } else if (a == 0) {
    14.        a = 1;
    15.        if (random() == 42)
    16.          continue;
    17.        if (b == 1) {
    18.          *(r10 + c) = 7;
    19.          iter_destroy(i);
    20.          iter_destroy(j);
    21.          return;
    22.        }
    23.      }
    24.    }
    25.    iter_destroy(i);
    26.    a = 0;
    27.    b = 0;
    28.    c = -25;
    29.  }
    30.  iter_destroy(j);
    31.  return;

This example is unsafe, because on second iteration of loop 5-29 the
value of 'c' is -25 (assignment at 28) and 'c' is used for stack
access at 18, offset of -25 is misaligned.
The basic idea is to:
(a) hide the fact that (c) is precise behind a postponed state;
(b) hide the unsafe (c) value behind a postponed state.

The (b) is achieved on first iteration of the loop 5-29:
enter 5 in state with c=-24 not precise, 'i' is initially considered
drained thus verification path is 5-9,25-29,5. Upon second visit to 5
the state is postponed because c=-24 is considered equal to c=-25
(both are not precise).

State at first visit of 5:
  R0=scalar() R1_rw=fp-16 R6=0 R7=0 R8=-24 R10=fp0
  fp-16_r=iter_num(ref_id=1,state=active,depth=0) refs=1

State at second visit of 5 (postponed, let's call it P1):
  R1_w=fp-16 R6=0 R7=0 R8=-25 R10=fp0
  fp-16=iter_num(ref_id=1,state=active,depth=1) refs=1

The (a) is achieved by a series of jumps in the inner loop 9-24:
- first visit of 9 is in state {a=0,b=0,c=XX}
  (here XX is a value of C and it is either -24 or -25, depending on
   outer loop iteration number);
- second visit of 9 is in state {a=1P,b=0,c=XX}
  (path 9,14-16,9; at this point 'a' is marked precise because of prediction at 13);
- third visit of 9 is in state {a=0P,b=1,c=XX}
  (path 9-12,9; 'b' is not yet marked as precise and thus this state
   is equal to the first one and is postponed).
- after this verifier visits 17, predicts condition as false and marks
  'b' as precise, but {a=1P,b=1,c=XX} is already postponed.
  
E.g. the following state is postponed at first iteration, let's call it P2:
  R0=... R1_w=fp-8 R6=0 R7=1 R8=-24 R10=fp0
  fp-8=iter_num(ref_id=3,state=active,depth=2)
  fp-16=iter_num(ref_id=1,state=active,depth=1) refs=1,3

If verification were to proceed from state P2 {a=0P,b=1P,c=XX} at 9
the path would be 9,13-14,17-21 and 'c' would be marked as precise.

But if we process postponed states as a queue P1 would be processed
first, 'c' would not have a precision mark yet and P1 would be
discarded => P2 with R8=-25 would never be investigated.

I converted this test case to assembly [2] and tried with a
prototype [1] to make sure that I don't miss anything.

---

In general, states form a tree during verification and we want to stop
the tree growth when we see a would be back-edge to a similar state.
Consider the following state graph:

    .---> S1 <---.     Here there is a chain of derived states:
    |     |      |       S1 -> S2 -> S3
    |     v      |
    |     S2 -> S2'    And there are potentially looping states:
    |     |            - S2' (derived from S2, states_equal to S1)
    |     v            - S3' (derived from S3, states_equal to S1)
   S3' <- S3

If we find that S2' is no longer states_equal to S1 it might add a
derivation that would eventually propagate read or precision mark to
some register in S1. Same is true for S3'. Thus there is no clear
order in which S2' and S3' should be processed.

It is possible to imagine more complex state graphs. Unfortunately,
preparing real BPF examples for this graphs is very time consuming
(at-least for me).

Hence, I think that a fixed point computation is necessary as I write
in a sibling email to Alexei:
- Process states from env->head while those are available, in case if
  potential looping state (is_states_equal()) is reached put it to a
  separate queue.
- Once all env->head states are processed the only source for new read
  and precision marks is in postponed looping states, some of which
  might not be is_states_equal() anymore. Submit each such state for
  verification until fixed point is reached (repeating steps for
  env->head processing).

The work-in-progress implementation for this algorithm is in [1].

---

While working on this I found an unexpected interaction with infinite
loop detection for the following test case:

    // any branch
    if (random() > 42)
      a = 0;
    else
      a = 1;
    // iterator loop
    i = iter_new()
    while (iter_next(i))
      // do something

When first branch of 'if' is processed a looping state for 'while' is
added to a postponed states queue. Then second branch of 'if' is
processed and reaches 'iter_new()', at which point infinite loop
detection logic sounds an alarm:

    if (states_maybe_looping(&sl->state, cur) &&
        states_equal(env, &sl->state, cur) &&
        !iter_active_depths_differ(&sl->state, cur) && false) {
        verbose_linfo(env, insn_idx, "; ");
        verbose(env, "infinite loop detected at insn %d\n", insn_idx);
        return -EINVAL;
    }

The state with iter_new() is a parent for a looping state thus has
it's .branches counter as non-zero, and iterator depth is 0 for both
hands of the 'if'.

Adding such states to the looping states queue is not safe, as there
is no guarantee this arbitrary state would terminate (opposed to
iter_next() for which we are guaranteed to have 'drained' branch).
Thus, the only fix that I see now is to disable infinite loop
detection if there are active iterators.

(Might do some trick with remembering this state are printing it if
 instructions limit counter is reached, but I'm not sure if added
 complexity is worth it, e.g. what to do if there are several such
 states?)

---

[1] https://github.com/eddyz87/bpf/tree/iters-bug-delayed-traversal
[2] https://github.com/eddyz87/bpf/blob/iters-bug-delayed-traversal/tools/testing/selftests/bpf/progs/iters.c#L860

Thanks,
Eduard

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-10-02  1:40                                                 ` Eduard Zingerman
@ 2023-10-02 16:29                                                   ` Alexei Starovoitov
  2023-10-02 17:18                                                     ` Eduard Zingerman
  0 siblings, 1 reply; 52+ messages in thread
From: Alexei Starovoitov @ 2023-10-02 16:29 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Andrii Nakryiko, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu,
	Kumar Kartikeya Dwivedi

On Sun, Oct 1, 2023 at 6:40 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Fri, 2023-09-29 at 17:41 -0700, Alexei Starovoitov wrote:
> [...]
> > I suspect that neither option A "Exit or Loop" or B "widening"
> > are not correct.
> > In both cases we will do states_equal() with states that have
> > not reached exit and don't have live/precision marks.
>
> I'd like to argue about B "widening" for a bit, as I think it might be
> interesting in general, and put A aside for now. The algorithm for
> widening looks as follows:
> - In is_states_equal() for (sl->state.branches > 0 && is_iter_next_insn()) case:
>   - Check if states are equal exactly:
>     - ignore liveness marks on old state;
>     - demand same type for registers and stack slots;
>     - ignore precision marks, instead compare scalars using
>       regs_exact() [this differs from my previous emails, I'm now sure
>       that for this scheme to be correct regs_exact() is needed].
>   - If there is an exact match then follow "hit" branch. The idea
>     being that visiting exactly the same state can't produce new
>     execution paths (like with graph traversal).

Right. Exactly the same C state won't produce new paths
as seen in visited state V, but
if C==V at the same insn indx it means we're in the infinite loop.

>   - If there is no exact match but there is some state V which for
>     which states_equal(V, C) and V and C differ only in inexact
>     scalars:
>     - copy C to state C';
>     - mark range for above mentioned inexact scalars as unbound;
>     - continue verification from C';
>     - if C' verification fails discard it and resume verification from C.
>
> The hope here is that guess for "wide" scalars would be correct and
> range for those would not matter. In such case C' would be added to
> the explored states right after it's creation (as it is a checkpoint),
> and later verification would get to the explored C' copy again, so
> that exact comparison would prune further traversals.
>
> Unfortunately, this is riddled with a number of technical
> complications on implementation side.
> What Andrii suggests might be simpler.
>
> > The key aspect of the verifier state pruning is that safe states
> > in sl list explored all possible paths.
> > Say, there is an 'if' block after iter_destroy.
> > The push_stack/pop_stack logic is done as a stack to make sure
> > that the last 'if' has to be explored before earlier 'if' blocks are checked.
> > The existing bpf iter logic violated that.
> > To fix that we need to logically do:
> >
> > if (is_iter_next_insn(env, insn_idx))
> >   if (states_equal(env, &sl->state, cur)) {
> >    if (iter_state->iter.state == BPF_ITER_STATE_DRAINED)
> >      goto hit;
> >
> > instead of BPF_ITER_STATE_ACTIVE.
> > In other words we need to process loop iter == 0 case first
> > all the way till bpf_exit (just like we do right now),
> > then process loop iter == 1 and let it exit the loop and
> > go till bpf_exit (hopefully state prune will find equal states
> > right after exit from the loop).
> > then process loop iter == 2 and so on.
> > If there was an 'if' pushed to stack during loop iter == 1
> > it has to be popped and explored all the way till bpf_exit.
> >
> > We cannot just replace BPF_ITER_STATE_ACTIVE with DRAINED.
> > It would still be ACTIVE with sl->state.branches==0 after that
> > state explored all paths to bpf_exit.
>
> This is similar to what Andrii suggests, please correct me if I'm wrong:
>
> > 1. If V and C (using your terminology from earlier, where V is the old
> > parent state at some next() call instruction, and C is the current one
> > on the same instruction) are different -- we just keep going. So
> > always try to explore different input states for the loop.
>
> > 2. But if V and C are equivalent, it's too early to conclude anything.
> > So enqueue C for later in a separate BFS queue (and perhaps that queue
> > is per-instruction, actually; or maybe even per-state, not sure), and
> > keep exploring all the other pending queues from the (global) DFS
> > queue, until we get back to state V again. At that point we need to
> > start looking at postponed states for that V state. But this time we
> > should be sure that precision and read marks are propagated from all
> > those terminatable code paths.
>
> More formally, before pruning potential looping states we need to
> make sure that all precision and read marks are in place.
> To achieve this:
> - Process states from env->head while those are available, in case if
>   potential looping state (is_states_equal()) is reached put it to a
>   separate queue.
> - Once all env->head states are processed the only source for new read
>   and precision marks is in postponed looping states, some of which
>   might not be is_states_equal() anymore. Submit each such state for
>   verification until fixed point is reached (repeating steps for
>   env->head processing).

Comparing if (sl->state.branches) makes sense to find infinite loop.
It's waste for the verifier to consider visited state V with branches > 0
for pruning.
The safety of V is unknown. The lack of liveness and precision
is just one part. The verifier didn't conclude that V is safe yet.
The current state C being equivalent to V doesn't tell us anything.

If infinite loop detection logic trips us, let's disable it.
I feel the fix should be in process_iter_next_call() to somehow
make it stop doing push_stack() when states_equal(N-1, N-2).

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-10-02 16:29                                                   ` Alexei Starovoitov
@ 2023-10-02 17:18                                                     ` Eduard Zingerman
  2023-10-03  0:05                                                       ` Alexei Starovoitov
  0 siblings, 1 reply; 52+ messages in thread
From: Eduard Zingerman @ 2023-10-02 17:18 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu,
	Kumar Kartikeya Dwivedi

On Mon, 2023-10-02 at 09:29 -0700, Alexei Starovoitov wrote:
[...]
> > I'd like to argue about B "widening" for a bit, as I think it might be
> > interesting in general, and put A aside for now. The algorithm for
> > widening looks as follows:
> > - In is_states_equal() for (sl->state.branches > 0 && is_iter_next_insn()) case:
> >   - Check if states are equal exactly:
> >     - ignore liveness marks on old state;
> >     - demand same type for registers and stack slots;
> >     - ignore precision marks, instead compare scalars using
> >       regs_exact() [this differs from my previous emails, I'm now sure
> >       that for this scheme to be correct regs_exact() is needed].
> >   - If there is an exact match then follow "hit" branch. The idea
> >     being that visiting exactly the same state can't produce new
> >     execution paths (like with graph traversal).
> 
> Right. Exactly the same C state won't produce new paths
> as seen in visited state V, but
> if C==V at the same insn indx it means we're in the infinite loop.

This is true in general, but for bpf_iter_next() we have a guarantee
that iteration would end eventually.

> > More formally, before pruning potential looping states we need to
> > make sure that all precision and read marks are in place.
> > To achieve this:
> > - Process states from env->head while those are available, in case if
> >   potential looping state (is_states_equal()) is reached put it to a
> >   separate queue.
> > - Once all env->head states are processed the only source for new read
> >   and precision marks is in postponed looping states, some of which
> >   might not be is_states_equal() anymore. Submit each such state for
> >   verification until fixed point is reached (repeating steps for
> >   env->head processing).
> 
> Comparing if (sl->state.branches) makes sense to find infinite loop.
> It's waste for the verifier to consider visited state V with branches > 0
> for pruning.
> The safety of V is unknown. The lack of liveness and precision
> is just one part. The verifier didn't conclude that V is safe yet.
> The current state C being equivalent to V doesn't tell us anything.
> 
> If infinite loop detection logic trips us, let's disable it.
> I feel the fix should be in process_iter_next_call() to somehow
> make it stop doing push_stack() when states_equal(N-1, N-2).

Consider that we get to the environment state where:
- all env->head states are exhausted;
- all potentially looping states (stored in as a separate set of
  states instead of env->head) are states_equal() to some already
  explored state.

I argue that if such environment state is reached the program should
be safe, because:
- Each looping state L is a sub-state of some explored state V and
  every path from V leads to either safe exit or another loop.
- Iterator loops are guaranteed to exit eventually.

Achieving this steady state is the mechanism that tells verifier that
there is no need to schedule exploration of the N+1 iteration level
for any iterator in the program.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-10-02 17:18                                                     ` Eduard Zingerman
@ 2023-10-03  0:05                                                       ` Alexei Starovoitov
  2023-10-03  2:00                                                         ` Alexei Starovoitov
  2023-10-03 15:33                                                         ` Eduard Zingerman
  0 siblings, 2 replies; 52+ messages in thread
From: Alexei Starovoitov @ 2023-10-03  0:05 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Andrii Nakryiko, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu,
	Kumar Kartikeya Dwivedi

On Mon, Oct 2, 2023 at 10:18 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Mon, 2023-10-02 at 09:29 -0700, Alexei Starovoitov wrote:
> [...]
> > > I'd like to argue about B "widening" for a bit, as I think it might be
> > > interesting in general, and put A aside for now. The algorithm for
> > > widening looks as follows:
> > > - In is_states_equal() for (sl->state.branches > 0 && is_iter_next_insn()) case:
> > >   - Check if states are equal exactly:
> > >     - ignore liveness marks on old state;
> > >     - demand same type for registers and stack slots;
> > >     - ignore precision marks, instead compare scalars using
> > >       regs_exact() [this differs from my previous emails, I'm now sure
> > >       that for this scheme to be correct regs_exact() is needed].
> > >   - If there is an exact match then follow "hit" branch. The idea
> > >     being that visiting exactly the same state can't produce new
> > >     execution paths (like with graph traversal).
> >
> > Right. Exactly the same C state won't produce new paths
> > as seen in visited state V, but
> > if C==V at the same insn indx it means we're in the infinite loop.
>
> This is true in general, but for bpf_iter_next() we have a guarantee
> that iteration would end eventually.
>
> > > More formally, before pruning potential looping states we need to
> > > make sure that all precision and read marks are in place.
> > > To achieve this:
> > > - Process states from env->head while those are available, in case if
> > >   potential looping state (is_states_equal()) is reached put it to a
> > >   separate queue.
> > > - Once all env->head states are processed the only source for new read
> > >   and precision marks is in postponed looping states, some of which
> > >   might not be is_states_equal() anymore. Submit each such state for
> > >   verification until fixed point is reached (repeating steps for
> > >   env->head processing).
> >
> > Comparing if (sl->state.branches) makes sense to find infinite loop.
> > It's waste for the verifier to consider visited state V with branches > 0
> > for pruning.
> > The safety of V is unknown. The lack of liveness and precision
> > is just one part. The verifier didn't conclude that V is safe yet.
> > The current state C being equivalent to V doesn't tell us anything.
> >
> > If infinite loop detection logic trips us, let's disable it.
> > I feel the fix should be in process_iter_next_call() to somehow
> > make it stop doing push_stack() when states_equal(N-1, N-2).
>
> Consider that we get to the environment state where:
> - all env->head states are exhausted;
> - all potentially looping states (stored in as a separate set of
>   states instead of env->head) are states_equal() to some already
>   explored state.
>
> I argue that if such environment state is reached the program should
> be safe, because:
> - Each looping state L is a sub-state of some explored state V and
>   every path from V leads to either safe exit or another loop.
> - Iterator loops are guaranteed to exit eventually.

It sounds correct, but I don't like that the new mechanism
with two stacks of states completely changes the way the verifier works.
The check you proposed:
if (env->stack_size != 0)
      push_iter_stack()
rings alarm bells.

env->stack_size == 0 (same as env->head exhausted) means we're done
with verification (ignoring bpf_exit in callbacks and subprogs).
So above check looks like a hack for something that I don't understand yet.
Also there could be branches in the code before and after iter loop.
With "opportunistic" states_equal() for looping states and delayed
reschedule_loop_states() to throw states back at the verifier
the whole verification model is non comprehensible (at least to me).
The stack + iter_stack + reschedule_loop_states means that in the following:
foo()
{
  br1 // some if() {...} block
  loop {
    br2
  }
  br3
}

the normal verifier pop_stack logic will check br3 and br1,
but br2 may or may not be checked depending on "luck" of states_equal
and looping states that will be in iter_stack.
Then the verifier will restart from checking loop-ing states.
If they somehow go past the end of the loop all kinds of things go crazy.
update_branch_counts() might warn, propagate_liveness, propagate_precision
will do nonsensical things.
This out-of-order state processing distorts the existing model so
much that I don't see how we can reason about these two stacks verification.


I think the cleaner solution is to keep current single stack model.
In the above example the verifier would reach the end, then check br3,
then check br2,
then we need to split branches counter somehow, so that we can
compare loop iter states with previous visited states that are known
to be safe.
In visited states we explored everything in br3 and in br2,
so no concerns that some path inside the loop or after the loop
missed precision or liveness.
Maybe we can split branches counter into branches due to 'if' blocks
and branches due to process_iter_next_call().
If there are pending process_iter_next_call-caused states it's still
ok to call states_equal on such visited states.

I could be missing something, of course.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-10-03  0:05                                                       ` Alexei Starovoitov
@ 2023-10-03  2:00                                                         ` Alexei Starovoitov
  2023-10-03 15:33                                                         ` Eduard Zingerman
  1 sibling, 0 replies; 52+ messages in thread
From: Alexei Starovoitov @ 2023-10-03  2:00 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Andrii Nakryiko, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu,
	Kumar Kartikeya Dwivedi

[-- Attachment #1: Type: text/plain, Size: 5808 bytes --]

On Mon, Oct 2, 2023 at 5:05 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Oct 2, 2023 at 10:18 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> >
> > On Mon, 2023-10-02 at 09:29 -0700, Alexei Starovoitov wrote:
> > [...]
> > > > I'd like to argue about B "widening" for a bit, as I think it might be
> > > > interesting in general, and put A aside for now. The algorithm for
> > > > widening looks as follows:
> > > > - In is_states_equal() for (sl->state.branches > 0 && is_iter_next_insn()) case:
> > > >   - Check if states are equal exactly:
> > > >     - ignore liveness marks on old state;
> > > >     - demand same type for registers and stack slots;
> > > >     - ignore precision marks, instead compare scalars using
> > > >       regs_exact() [this differs from my previous emails, I'm now sure
> > > >       that for this scheme to be correct regs_exact() is needed].
> > > >   - If there is an exact match then follow "hit" branch. The idea
> > > >     being that visiting exactly the same state can't produce new
> > > >     execution paths (like with graph traversal).
> > >
> > > Right. Exactly the same C state won't produce new paths
> > > as seen in visited state V, but
> > > if C==V at the same insn indx it means we're in the infinite loop.
> >
> > This is true in general, but for bpf_iter_next() we have a guarantee
> > that iteration would end eventually.
> >
> > > > More formally, before pruning potential looping states we need to
> > > > make sure that all precision and read marks are in place.
> > > > To achieve this:
> > > > - Process states from env->head while those are available, in case if
> > > >   potential looping state (is_states_equal()) is reached put it to a
> > > >   separate queue.
> > > > - Once all env->head states are processed the only source for new read
> > > >   and precision marks is in postponed looping states, some of which
> > > >   might not be is_states_equal() anymore. Submit each such state for
> > > >   verification until fixed point is reached (repeating steps for
> > > >   env->head processing).
> > >
> > > Comparing if (sl->state.branches) makes sense to find infinite loop.
> > > It's waste for the verifier to consider visited state V with branches > 0
> > > for pruning.
> > > The safety of V is unknown. The lack of liveness and precision
> > > is just one part. The verifier didn't conclude that V is safe yet.
> > > The current state C being equivalent to V doesn't tell us anything.
> > >
> > > If infinite loop detection logic trips us, let's disable it.
> > > I feel the fix should be in process_iter_next_call() to somehow
> > > make it stop doing push_stack() when states_equal(N-1, N-2).
> >
> > Consider that we get to the environment state where:
> > - all env->head states are exhausted;
> > - all potentially looping states (stored in as a separate set of
> >   states instead of env->head) are states_equal() to some already
> >   explored state.
> >
> > I argue that if such environment state is reached the program should
> > be safe, because:
> > - Each looping state L is a sub-state of some explored state V and
> >   every path from V leads to either safe exit or another loop.
> > - Iterator loops are guaranteed to exit eventually.
>
> It sounds correct, but I don't like that the new mechanism
> with two stacks of states completely changes the way the verifier works.
> The check you proposed:
> if (env->stack_size != 0)
>       push_iter_stack()
> rings alarm bells.
>
> env->stack_size == 0 (same as env->head exhausted) means we're done
> with verification (ignoring bpf_exit in callbacks and subprogs).
> So above check looks like a hack for something that I don't understand yet.
> Also there could be branches in the code before and after iter loop.
> With "opportunistic" states_equal() for looping states and delayed
> reschedule_loop_states() to throw states back at the verifier
> the whole verification model is non comprehensible (at least to me).
> The stack + iter_stack + reschedule_loop_states means that in the following:
> foo()
> {
>   br1 // some if() {...} block
>   loop {
>     br2
>   }
>   br3
> }
>
> the normal verifier pop_stack logic will check br3 and br1,
> but br2 may or may not be checked depending on "luck" of states_equal
> and looping states that will be in iter_stack.
> Then the verifier will restart from checking loop-ing states.
> If they somehow go past the end of the loop all kinds of things go crazy.
> update_branch_counts() might warn, propagate_liveness, propagate_precision
> will do nonsensical things.
> This out-of-order state processing distorts the existing model so
> much that I don't see how we can reason about these two stacks verification.
>
>
> I think the cleaner solution is to keep current single stack model.
> In the above example the verifier would reach the end, then check br3,
> then check br2,
> then we need to split branches counter somehow, so that we can
> compare loop iter states with previous visited states that are known
> to be safe.
> In visited states we explored everything in br3 and in br2,
> so no concerns that some path inside the loop or after the loop
> missed precision or liveness.
> Maybe we can split branches counter into branches due to 'if' blocks
> and branches due to process_iter_next_call().
> If there are pending process_iter_next_call-caused states it's still
> ok to call states_equal on such visited states.
>
> I could be missing something, of course.

Attached patch is what I meant.
It needs more work, but gives an idea.
simple iterators load fine and it correctly
detects unsafe code in num_iter_bug() example where you have
0xdeadbeef in R1.

[-- Attachment #2: 0001-iter-hack.patch --]
[-- Type: application/octet-stream, Size: 3532 bytes --]

From f194cb540cd3a87e3d745a43c3c82dbcd629d769 Mon Sep 17 00:00:00 2001
From: Alexei Starovoitov <ast@kernel.org>
Date: Mon, 2 Oct 2023 18:30:23 -0700
Subject: [PATCH] iter hack

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 include/linux/bpf_verifier.h |  2 ++
 kernel/bpf/verifier.c        | 13 ++++++++++---
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 94ec766432f5..3b2131c2ddcd 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -367,12 +367,14 @@ struct bpf_verifier_state {
 	 * In such cases BPF_COMPLEXITY_LIMIT_INSNS limit kicks in.
 	 */
 	u32 branches;
+	u32 looping_states;
 	u32 insn_idx;
 	u32 curframe;
 
 	struct bpf_active_lock active_lock;
 	bool speculative;
 	bool active_rcu_lock;
+	bool looping_state;
 
 	/* first and last insn idx of this verifier state */
 	u32 first_insn_idx;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index eed7350e15f4..7ef41c357575 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1762,6 +1762,8 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state,
 	dst_state->active_lock.ptr = src->active_lock.ptr;
 	dst_state->active_lock.id = src->active_lock.id;
 	dst_state->branches = src->branches;
+	dst_state->looping_states = src->looping_states;
+	dst_state->looping_state = src->looping_state;
 	dst_state->parent = src->parent;
 	dst_state->first_insn_idx = src->first_insn_idx;
 	dst_state->last_insn_idx = src->last_insn_idx;
@@ -1785,6 +1787,8 @@ static void update_branch_counts(struct bpf_verifier_env *env, struct bpf_verifi
 	while (st) {
 		u32 br = --st->branches;
 
+		if (st->looping_state && st->parent)
+			st->parent->looping_states--;
 		/* WARN_ON(br > 1) technically makes sense here,
 		 * but see comment in push_stack(), hence:
 		 */
@@ -1847,6 +1851,7 @@ static struct bpf_verifier_state *push_stack(struct bpf_verifier_env *env,
 	err = copy_verifier_state(&elem->st, cur);
 	if (err)
 		goto err;
+	elem->st.looping_state = false;
 	elem->st.speculative |= speculative;
 	if (env->stack_size > BPF_COMPLEXITY_LIMIT_JMP_SEQ) {
 		verbose(env, "The sequence of %d jumps is too complex.\n",
@@ -7733,7 +7738,9 @@ static int process_iter_next_call(struct bpf_verifier_env *env, int insn_idx,
 		queued_st = push_stack(env, insn_idx + 1, insn_idx, false);
 		if (!queued_st)
 			return -ENOMEM;
-
+	        if (queued_st->parent)
+	                queued_st->parent->looping_states++;
+		queued_st->looping_state = true;
 		queued_iter = &queued_st->frame[iter_frameno]->stack[iter_spi].spilled_ptr;
 		queued_iter->iter.state = BPF_ITER_STATE_ACTIVE;
 		queued_iter->iter.depth++;
@@ -16451,7 +16458,7 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
 		if (sl->state.insn_idx != insn_idx)
 			goto next;
 
-		if (sl->state.branches) {
+		if (sl->state.branches && sl->state.branches != sl->state.looping_states) {
 			struct bpf_func_state *frame = sl->state.frame[sl->state.curframe];
 
 			if (frame->in_async_callback_fn &&
@@ -16481,7 +16488,7 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
 			 * account iter_next() contract of eventually returning
 			 * sticky NULL result.
 			 */
-			if (is_iter_next_insn(env, insn_idx)) {
+			if (0 && is_iter_next_insn(env, insn_idx)) {
 				if (states_equal(env, &sl->state, cur)) {
 					struct bpf_func_state *cur_frame;
 					struct bpf_reg_state *iter_state, *iter_reg;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-10-03  0:05                                                       ` Alexei Starovoitov
  2023-10-03  2:00                                                         ` Alexei Starovoitov
@ 2023-10-03 15:33                                                         ` Eduard Zingerman
  2023-10-03 16:07                                                           ` Alexei Starovoitov
  2023-10-03 18:50                                                           ` Alexei Starovoitov
  1 sibling, 2 replies; 52+ messages in thread
From: Eduard Zingerman @ 2023-10-03 15:33 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu,
	Kumar Kartikeya Dwivedi

On Mon, 2023-10-02 at 17:05 -0700, Alexei Starovoitov wrote:
[...]
> > Consider that we get to the environment state where:
> > - all env->head states are exhausted;
> > - all potentially looping states (stored in as a separate set of
> >   states instead of env->head) are states_equal() to some already
> >   explored state.
> > 
> > I argue that if such environment state is reached the program should
> > be safe, because:
> > - Each looping state L is a sub-state of some explored state V and
> >   every path from V leads to either safe exit or another loop.
> > - Iterator loops are guaranteed to exit eventually.
> 
> It sounds correct, but I don't like that the new mechanism
> with two stacks of states completely changes the way the verifier works.
> The check you proposed:
> if (env->stack_size != 0)
>       push_iter_stack()
> rings alarm bells.
> 
> env->stack_size == 0 (same as env->head exhausted) means we're done
> with verification (ignoring bpf_exit in callbacks and subprogs).
> So above check looks like a hack for something that I don't understand yet.
> Also there could be branches in the code before and after iter loop.
> With "opportunistic" states_equal() for looping states and delayed
> reschedule_loop_states() to throw states back at the verifier
> the whole verification model is non comprehensible (at least to me).
> The stack + iter_stack + reschedule_loop_states means that in the following:
> foo()
> {
>   br1 // some if() {...} block
>   loop {
>     br2
>   }
>   br3
> }
> 
> the normal verifier pop_stack logic will check br3 and br1,
> but br2 may or may not be checked depending on "luck" of states_equal
> and looping states that will be in iter_stack.
> Then the verifier will restart from checking loop-ing states.
>
> If they somehow go past the end of the loop all kinds of things go crazy.
> update_branch_counts() might warn, propagate_liveness, propagate_precision
> will do nonsensical things.

When I put states to the loop stack I do copy_verifier_state() and
increase .branches counter for each state parent, so this should not
trigger warnings with update_branch_counts(). Logically, any state
that has a loop state as it's grandchild is not verified to be safe
until loops steady state is achieved, thus such states could not be
used for states pruning and it is correct to keep their branch
counters > 0.

propagate_liveness() and propagate_precision() should not be called
for the delayed states (work-in-progress I'll update the patch).
Behavior for non-delayed states should not be altered by these changes.

> This out-of-order state processing distorts the existing model so
> much that I don't see how we can reason about these two stacks verification.

Iterators (and callbacks) try to diverge from regular verification
logic of visiting all possible paths by making some general
assumptions about loop bodies. There is no way to verify safety of
such bodies w/o computing steady states.

> I think the cleaner solution is to keep current single stack model.
> In the above example the verifier would reach the end, then check br3,
> then check br2,
> then we need to split branches counter somehow, so that we can
> compare loop iter states with previous visited states that are known
> to be safe.
> In visited states we explored everything in br3 and in br2,
> so no concerns that some path inside the loop or after the loop
> missed precision or liveness.
>
> Maybe we can split branches counter into branches due to 'if' blocks
> and branches due to process_iter_next_call().
> If there are pending process_iter_next_call-caused states it's still
> ok to call states_equal on such visited states.

Using a stack forces DFS states traversal, if there is a loop state
with branching this is prone to infinite nesting, e.g.:

  0. // Complete assembly code is in the end of the email.
  1. while (next(i)) {
  2.   if (random())
  3.     continue;
  4.   r0 += 0;
  5. }

Would lead to an infinite loop when using the patch shared in [1].
At (2) verifier would always create a state with {.loop_state=false},
thus checkpoint state at (1) would always have
`sl->branches != sl->state.looping_states`:
- looping state is current one,
- non-looping state is the "else" branch scheduled in (2).
Therefore checkpoint in (1) is not eligible for pruning and verifier
would eagerly descend via path 1,2,3,1,2,3,...

I don't think this is a quirk of the patch, with only a single stack
there are no means to postpone exploration.

Note, that removal of `elem->st.looping_state = false;` from
push_stack() is not safe either, precision marks and unsafe values
could be concealed in nested iteration states as shown in example [2],
so we are risking pruning some states too early.

[1] https://lore.kernel.org/bpf/CAADnVQJ3=x8hfv7d29FQ-ckzh9=MXo54cnFShFp=eG0fJjdDow@mail.gmail.com/T/#m8fc0fc3e338f57845f9fb65e0c3798a2ef5fb2e7
[2] https://lore.kernel.org/bpf/CAADnVQJ3=x8hfv7d29FQ-ckzh9=MXo54cnFShFp=eG0fJjdDow@mail.gmail.com/T/#m6014e44a00ab7732890c13b83b5497f8d856fc81

---

In theory, loops steady state does not have to be tracked globally it
can be tracked per some key states, e.g. entry states for top-level loops:
- at the entry state E for a top level loop:
  - establish a separate regular and loop state stacks: E.stack, E.loop_stack;
  - add E to explored states as a checkpoint and continue verification
    from E' a copy of E (at this point E branches counter is > 0);
- in is_state_visited() / .branches > 0 / is_bpf_next_insn():
  - only consider current state C for states_equal(V, C) if V and C
    have a common grandparent E;
  - use E.loop_stack to delay C;
- use same logic for steady state as before:
  - E.stack should be exhausted;
  - states in E.loop_stack should all be states_equal() to some
    explored state V that has E as it's grandparent.
- if such steady state is achieved E's loop states could be dropped
  and branches counter for E could be set to 0, thus opening it up for
  regular states pruning.
  
Such logic would make verification of loops more similar to
verification of regular conditionals, e.g.:

  foo() {
    if (random()) {
      r0 = 0;  // branch B1
    } else {
      r0 = 42; // branch B2
    }
    i = iter_new();
    // loop entry E
    while (iter_next()) {
      // loop body LB
      do_something();
    }
  }

Here verifier would first visit path B1,E,LB, second visit to LB would
be delayed and steady state for E would be declared, E.branches would
be set to 0. Then path B1,E would be visited and declared safe as E is
eligible for states pruning.

This gives more opportunities for states pruning and makes user's
reasoning about iterators verification similar to user's reasoning
about regular loops verification.
But it adds some more complexity to the implementation
(albeit, I think it's not much, env->head becomes a stack of stacks
 and a few other tweaks).

---

SEC("?raw_tp")
__success
__naked int loop1(void)
{
 	asm volatile (
		"r1 = r10;"
		"r1 += -8;"
		"r2 = 0;"
		"r3 = 10;"
		"call %[bpf_iter_num_new];"
	"loop_%=:"
		"r1 = r10;"
		"r1 += -8;"
		"call %[bpf_iter_num_next];"
		"if r0 == 0 goto loop_end_%=;"
		"call %[bpf_get_prandom_u32];"
		"if r0 != 42 goto loop_%=;"
		"r0 += 0;"
		"goto loop_%=;"
	"loop_end_%=:"
		"r1 = r10;"
		"r1 += -8;"
		"call %[bpf_iter_num_destroy];"
		"r0 = 0;"
		"exit;"
		:
		: __imm(bpf_get_prandom_u32),
		  __imm(bpf_iter_num_new),
		  __imm(bpf_iter_num_next),
		  __imm(bpf_iter_num_destroy)
		: __clobber_all
	);
}

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-10-03 15:33                                                         ` Eduard Zingerman
@ 2023-10-03 16:07                                                           ` Alexei Starovoitov
  2023-10-03 18:50                                                           ` Alexei Starovoitov
  1 sibling, 0 replies; 52+ messages in thread
From: Alexei Starovoitov @ 2023-10-03 16:07 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Andrii Nakryiko, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu,
	Kumar Kartikeya Dwivedi

[-- Attachment #1: Type: text/plain, Size: 2728 bytes --]

On Tue, Oct 3, 2023 at 8:33 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
>
> Would lead to an infinite loop when using the patch shared in [1].

yeah. the first hacky patch was too buggy.
Attached is a better version, but the same idea.
It passes all existing tests and accurately detects num_iter_bug,
Then I took 5 from:
https://github.com/kernel-patches/bpf/commit/379c865e6b99302d69392d4f224b5c540247bcb2

It does the right thing for all of them
including iter_precision_fixed_point[12]
except loop_state_deps1.

>
> Note, that removal of `elem->st.looping_state = false;` from
> push_stack() is not safe either, precision marks and unsafe values
> could be concealed in nested iteration states as shown in example [2],
> so we are risking pruning some states too early.

it's loop_state_deps1, right?
fp-25 is a tricky one.

> SEC("?raw_tp")
> __success
> __naked int loop1(void)
> {
>         asm volatile (
>                 "r1 = r10;"
>                 "r1 += -8;"
>                 "r2 = 0;"
>                 "r3 = 10;"
>                 "call %[bpf_iter_num_new];"
>         "loop_%=:"
>                 "r1 = r10;"
>                 "r1 += -8;"
>                 "call %[bpf_iter_num_next];"
>                 "if r0 == 0 goto loop_end_%=;"
>                 "call %[bpf_get_prandom_u32];"
>                 "if r0 != 42 goto loop_%=;"
>                 "r0 += 0;"
>                 "goto loop_%=;"
>         "loop_end_%=:"
>                 "r1 = r10;"
>                 "r1 += -8;"
>                 "call %[bpf_iter_num_destroy];"
>                 "r0 = 0;"
>                 "exit;"
>                 :
>                 : __imm(bpf_get_prandom_u32),
>                   __imm(bpf_iter_num_new),
>                   __imm(bpf_iter_num_next),
>                   __imm(bpf_iter_num_destroy)
>                 : __clobber_all
>         );
> }

this one loads fine with the attached patch.

I still prefer to avoid two stack hacks if one is enough.

The key idea of the patch is:
@@ -7733,7 +7743,8 @@ static int process_iter_next_call(struct
bpf_verifier_env *env, int insn_idx,
                queued_st = push_stack(env, insn_idx + 1, insn_idx, false);
                if (!queued_st)
                        return -ENOMEM;
-
+               queued_st->looping_states++;
+               queued_st->branches--;

so deferred looping state from process_iter_next_call() are not
considered branches in _that_ state.
The push_stack still did branches++ in the parent.
So the verifier will process everything after process_iter_next_call()
before reducing its branches to zer0 and making it eligible
for state equality comparison.

[-- Attachment #2: 0001-iter-hack-2.patch --]
[-- Type: application/octet-stream, Size: 4904 bytes --]

From 0d5fd67af67bc158394fe96742ab4ad94085ea08 Mon Sep 17 00:00:00 2001
From: Alexei Starovoitov <ast@kernel.org>
Date: Mon, 2 Oct 2023 18:30:23 -0700
Subject: [PATCH] iter hack 2

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 include/linux/bpf_verifier.h |  1 +
 kernel/bpf/verifier.c        | 34 +++++++++++++++++++++++++++-------
 2 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 94ec766432f5..34f7de583aae 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -367,6 +367,7 @@ struct bpf_verifier_state {
 	 * In such cases BPF_COMPLEXITY_LIMIT_INSNS limit kicks in.
 	 */
 	u32 branches;
+	u32 looping_states;
 	u32 insn_idx;
 	u32 curframe;
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index eed7350e15f4..cf0a5b779149 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1762,6 +1762,7 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state,
 	dst_state->active_lock.ptr = src->active_lock.ptr;
 	dst_state->active_lock.id = src->active_lock.id;
 	dst_state->branches = src->branches;
+	dst_state->looping_states = src->looping_states;
 	dst_state->parent = src->parent;
 	dst_state->first_insn_idx = src->first_insn_idx;
 	dst_state->last_insn_idx = src->last_insn_idx;
@@ -1783,7 +1784,12 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state,
 static void update_branch_counts(struct bpf_verifier_env *env, struct bpf_verifier_state *st)
 {
 	while (st) {
-		u32 br = --st->branches;
+		u32 br = 0;
+
+		if (st->branches)
+			br = --st->branches;
+		else if (st->looping_states)
+			st->looping_states--;
 
 		/* WARN_ON(br > 1) technically makes sense here,
 		 * but see comment in push_stack(), hence:
@@ -1791,7 +1797,7 @@ static void update_branch_counts(struct bpf_verifier_env *env, struct bpf_verifi
 		WARN_ONCE((int)br < 0,
 			  "BUG update_branch_counts:branches_to_explore=%d\n",
 			  br);
-		if (br)
+		if (br || st->looping_states)
 			break;
 		st = st->parent;
 	}
@@ -1847,6 +1853,10 @@ static struct bpf_verifier_state *push_stack(struct bpf_verifier_env *env,
 	err = copy_verifier_state(&elem->st, cur);
 	if (err)
 		goto err;
+	if (elem->st.looping_states) {
+		elem->st.looping_states--;
+		elem->st.branches++;
+	}
 	elem->st.speculative |= speculative;
 	if (env->stack_size > BPF_COMPLEXITY_LIMIT_JMP_SEQ) {
 		verbose(env, "The sequence of %d jumps is too complex.\n",
@@ -7733,7 +7743,8 @@ static int process_iter_next_call(struct bpf_verifier_env *env, int insn_idx,
 		queued_st = push_stack(env, insn_idx + 1, insn_idx, false);
 		if (!queued_st)
 			return -ENOMEM;
-
+		queued_st->looping_states++;
+		queued_st->branches--;
 		queued_iter = &queued_st->frame[iter_frameno]->stack[iter_spi].spilled_ptr;
 		queued_iter->iter.state = BPF_ITER_STATE_ACTIVE;
 		queued_iter->iter.depth++;
@@ -15824,7 +15835,7 @@ static void clean_live_states(struct bpf_verifier_env *env, int insn,
 
 	sl = *explored_state(env, insn);
 	while (sl) {
-		if (sl->state.branches)
+		if (sl->state.branches || sl->state.looping_states)
 			goto next;
 		if (sl->state.insn_idx != insn ||
 		    sl->state.curframe != cur->curframe)
@@ -16451,6 +16462,7 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
 		if (sl->state.insn_idx != insn_idx)
 			goto next;
 
+		verbose(env, "branches=%d/%d ", sl->state.branches, sl->state.looping_states);
 		if (sl->state.branches) {
 			struct bpf_func_state *frame = sl->state.frame[sl->state.curframe];
 
@@ -16481,7 +16493,7 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
 			 * account iter_next() contract of eventually returning
 			 * sticky NULL result.
 			 */
-			if (is_iter_next_insn(env, insn_idx)) {
+			if (0 && is_iter_next_insn(env, insn_idx)) {
 				if (states_equal(env, &sl->state, cur)) {
 					struct bpf_func_state *cur_frame;
 					struct bpf_reg_state *iter_state, *iter_reg;
@@ -16638,8 +16650,16 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
 		kfree(new_sl);
 		return err;
 	}
+	if (new->looping_states && 0) {
+		new->looping_states--;
+		new->branches++;
+	}
+	if (cur->looping_states && 0) {
+		cur->looping_states--;
+		cur->branches++;
+	}
 	new->insn_idx = insn_idx;
-	WARN_ONCE(new->branches != 1,
+	WARN_ONCE(new->branches + new->looping_states != 1,
 		  "BUG is_state_visited:branches_to_explore=%d insn %d\n", new->branches, insn_idx);
 
 	cur->parent = new;
@@ -16779,7 +16799,7 @@ static int do_check(struct bpf_verifier_env *env)
 		insn = &insns[env->insn_idx];
 		class = BPF_CLASS(insn->code);
 
-		if (++env->insn_processed > BPF_COMPLEXITY_LIMIT_INSNS) {
+		if (++env->insn_processed > 1000) {//BPF_COMPLEXITY_LIMIT_INSNS) {
 			verbose(env,
 				"BPF program is too large. Processed %d insn\n",
 				env->insn_processed);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-10-03 15:33                                                         ` Eduard Zingerman
  2023-10-03 16:07                                                           ` Alexei Starovoitov
@ 2023-10-03 18:50                                                           ` Alexei Starovoitov
  2023-10-03 21:52                                                             ` Eduard Zingerman
  1 sibling, 1 reply; 52+ messages in thread
From: Alexei Starovoitov @ 2023-10-03 18:50 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Andrii Nakryiko, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu,
	Kumar Kartikeya Dwivedi

On Tue, Oct 3, 2023 at 8:33 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
>
> When I put states to the loop stack I do copy_verifier_state() and
> increase .branches counter for each state parent, so this should not
> trigger warnings with update_branch_counts().

No warn doesn't mean it's correct.
I suspect what your reschedule_loop_states() will increase
branches from 0 to 1.
Which is equivalent to increasing refcnt from zero. Not good.

> Logically, any state
> that has a loop state as it's grandchild is not verified to be safe
> until loops steady state is achieved, thus such states could not be
> used for states pruning and it is correct to keep their branch
> counters > 0.

Correct.

>
> propagate_liveness() and propagate_precision() should not be called
> for the delayed states (work-in-progress I'll update the patch).
> Behavior for non-delayed states should not be altered by these changes.

I disagree. Delayed states should not have different propagation rules.

> Iterators (and callbacks) try to diverge from regular verification
> logic of visiting all possible paths by making some general
> assumptions about loop bodies. There is no way to verify safety of
> such bodies w/o computing steady states.

Iterators and callbacks should not be special.
They are no different than bounded loops
and their verification shouldn't be much different.
Bounded loops have constant upper bound while
iterators do not, so they have to converge based on state
equivalence, but the verifier still has to do DFS and discover
all possible paths, propagate precision and liveness before
considering two states equivalent.
It does this today for bounded loops and should do the same
for iterators.
The question is how to do it.
Delaying states and breaking DFS assumptions is a non starter.

I see now that my 2nd hack is still buggy.
Differentiating state with branches vs looping_states counters
doesn't seem to work in all cases.
Your loop_state_deps1 demonstrates that. It's a great test.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-10-03 18:50                                                           ` Alexei Starovoitov
@ 2023-10-03 21:52                                                             ` Eduard Zingerman
  2023-10-03 22:03                                                               ` Eduard Zingerman
  2023-10-03 23:08                                                               ` Alexei Starovoitov
  0 siblings, 2 replies; 52+ messages in thread
From: Eduard Zingerman @ 2023-10-03 21:52 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu,
	Kumar Kartikeya Dwivedi

On Tue, 2023-10-03 at 11:50 -0700, Alexei Starovoitov wrote:
> On Tue, Oct 3, 2023 at 8:33 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > 
> > When I put states to the loop stack I do copy_verifier_state() and
> > increase .branches counter for each state parent, so this should not
> > trigger warnings with update_branch_counts().
> 
> No warn doesn't mean it's correct.
> I suspect what your reschedule_loop_states() will increase
> branches from 0 to 1.
> Which is equivalent to increasing refcnt from zero. Not good.

Not really, here is how I modified bpf_verifier_env:

    struct bpf_verifier_state_stack {
        struct bpf_verifier_stack_elem *head; /* stack of verifier states to be processed */
        int size;                             /* number of states to be processed */
    };
    
    struct bpf_verifier_env {
        ...
        struct bpf_verifier_state_stack stack;
        struct bpf_verifier_state_stack loop_stack;
        ...
    }

Here env->stack is used for DFS traversal and env->loop_stack is used
to delay verification of the loop states.

When bpf_iter_next() is reached in state C and states_equal() shows
that there is potentially equivalent state V:
- copy C' of C is created sing copy_verifier_state(), it updates all
  branch counters up the ownership chain as with any other state;
- C' is put to env->loop_stack.

The reschedule_loop_states() [1] loops over states in loop_stack to
see if there are states that no longer have equivalent state, such
states are removed from env->loop_stack and put to env->stack.
This is done without branch counters update, so at any point in time
parent states have loop state accounted for in their branch counters.
Branch counter *never* transitions from 1 to 0 and than to 1 again
during this process.

[1] https://github.com/kernel-patches/bpf/compare/bpf-next_base...eddyz87:bpf:iters-bug-delayed-traversal#diff-edbb57adf10d1ce1fbb830a34fa92712fd01db1fbd9b6f2504001eb7bcc7b9d0R16823

> > Logically, any state
> > that has a loop state as it's grandchild is not verified to be safe
> > until loops steady state is achieved, thus such states could not be
> > used for states pruning and it is correct to keep their branch
> > counters > 0.
> 
> Correct.
>
> > propagate_liveness() and propagate_precision() should not be called
> > for the delayed states (work-in-progress I'll update the patch).
> > Behavior for non-delayed states should not be altered by these changes.
> 
> I disagree. Delayed states should not have different propagation rules.

Upon first discovery the loop state C is states_equal to some state V,
precision and read marks are not yet finalized and thus states_equal(V, C)
become false at some point. Also, C should not be used for states pruning.
Thus, there is no need to propagate liveness and precision from V to C
at this point.

It is possible and correct to propagate liveness and precision from V
to C when loop steady state is achieved, as at that point we know for
sure that C is a sub-state of V. However, currently loop states are
tracked globally and no states are processed after loops steady state
is reached, hence I don't do liveness and precision propagation.

On the other hand, if loop steady state would be tracked non-globally
e.g. for top level loop headers as I wrote in the other email today,
propagation of liveness and precision information would make sense,
as these states could be used for state pruning later.

> Iterators and callbacks should not be special.
> They are no different than bounded loops
> and their verification shouldn't be much different.
> Bounded loops have constant upper bound while
> iterators do not, so they have to converge based on state
> equivalence, but the verifier still has to do DFS and discover
> all possible paths, propagate precision and liveness before
> considering two states equivalent.
> It does this today for bounded loops and should do the same
> for iterators.
> The question is how to do it.
> Delaying states and breaking DFS assumptions is a non starter.

I think that absence of bound is a fundamental difference.

On each verification path of a bounded loop there is always a precise
value that at some point triggers loop termination. There is no need
for states equivalence checks (except for verification performance).

Each verification path of iterator or callback based loop is logically
an infinite series of states. In order to terminate verification of
such path it is necessary to:
- either find some prior state V identical to current state C;
- or find some prior state V such that current state C is a sub-state of V.

Terminating infinite series by identical state turns out to be very
limiting in practice.

On the other hand, sub-state check requires finalization of read and
precision marks in V. We've seen that such marks could be concealed at
an unknown depth. In addition, state V might be a parent state for
multiple grandchildren contributing to precision and read marks.
These grandchildren states are created when conditional jumps are
processed on possibly different iteration depths.

The necessity to visit all grandchildren states of V leads to
necessity of mixed DFS and BFS traversal of the loop states.

(In fact, this is not a special case but a generalization,
 bounded loops processing could be done within the same framework).

> I see now that my 2nd hack is still buggy.
> Differentiating state with branches vs looping_states counters
> doesn't seem to work in all cases.
> Your loop_state_deps1 demonstrates that. It's a great test.

I'm glad this test is useful.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-10-03 21:52                                                             ` Eduard Zingerman
@ 2023-10-03 22:03                                                               ` Eduard Zingerman
  2023-10-03 23:08                                                               ` Alexei Starovoitov
  1 sibling, 0 replies; 52+ messages in thread
From: Eduard Zingerman @ 2023-10-03 22:03 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu,
	Kumar Kartikeya Dwivedi

I'm sorry, here are a few typos corrected:

  When bpf_iter_next() is reached in state C and states_equal() shows
  that there is potentially equivalent state V:
- - copy C' of C is created sing copy_verifier_state(), it updates all
+ - copy C' of C is created using copy_verifier_state(), it updates all
    branch counters up the ownership chain as with any other state;

  Upon first discovery the loop state C is states_equal to some state V,
  precision and read marks are not yet finalized and thus states_equal(V, C)
- become false at some point. Also, C should not be used for states pruning.
+ might become false at some point. Also, C should not be used for states pruning.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-10-03 21:52                                                             ` Eduard Zingerman
  2023-10-03 22:03                                                               ` Eduard Zingerman
@ 2023-10-03 23:08                                                               ` Alexei Starovoitov
  2023-10-03 23:14                                                                 ` Eduard Zingerman
  2023-10-04  1:05                                                                 ` Eduard Zingerman
  1 sibling, 2 replies; 52+ messages in thread
From: Alexei Starovoitov @ 2023-10-03 23:08 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Andrii Nakryiko, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu,
	Kumar Kartikeya Dwivedi

On Wed, Oct 04, 2023 at 12:52:07AM +0300, Eduard Zingerman wrote:
> On Tue, 2023-10-03 at 11:50 -0700, Alexei Starovoitov wrote:
> > On Tue, Oct 3, 2023 at 8:33 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > > 
> > > When I put states to the loop stack I do copy_verifier_state() and
> > > increase .branches counter for each state parent, so this should not
> > > trigger warnings with update_branch_counts().
> > 
> > No warn doesn't mean it's correct.
> > I suspect what your reschedule_loop_states() will increase
> > branches from 0 to 1.
> > Which is equivalent to increasing refcnt from zero. Not good.
> 
> Not really, here is how I modified bpf_verifier_env:
> 
>     struct bpf_verifier_state_stack {
>         struct bpf_verifier_stack_elem *head; /* stack of verifier states to be processed */
>         int size;                             /* number of states to be processed */
>     };
>     
>     struct bpf_verifier_env {
>         ...
>         struct bpf_verifier_state_stack stack;
>         struct bpf_verifier_state_stack loop_stack;
>         ...
>     }
> 
> Here env->stack is used for DFS traversal and env->loop_stack is used
> to delay verification of the loop states.
> 
> When bpf_iter_next() is reached in state C and states_equal() shows
> that there is potentially equivalent state V:
> - copy C' of C is created sing copy_verifier_state(), it updates all
>   branch counters up the ownership chain as with any other state;
> - C' is put to env->loop_stack.

and C' points to normal parent and increments its branches as part of __push_stack().

By delaying all looping states that looks to be equal to V states
due to broken precision marks the verifier will walk almost everything
after the loop, converge all branches to 0, then clean_live_states() and REG_LIVE_DONE
in the bottom part,
then it will proceed to verify all branches before the loop, but
since their branch counter is stuck due to states in loop_stack,
the upper part of the program won't get the same treatment.
update_branch_counts() will never go past the point where C' was put on loop_stack,
so every state before the loop will have branches > 0 and
verification of branches before the loop will not do any state pruning at all.

void foo()
{
  if (...) { .. } // no pruning
  if (...) { .. } // no pruning

  bpf_for(...)
  if (...) { .. } // all ok
}

Essentially any complex code before the loop has a high chance of the verifier
state explosion.

> It is possible and correct to propagate liveness and precision from V
> to C when loop steady state is achieved, as at that point we know for
> sure that C is a sub-state of V. However, currently loop states are
> tracked globally and no states are processed after loops steady state
> is reached, hence I don't do liveness and precision propagation.

Hmm. I think the code is doing it:
if (is_iter_next_insn(env, insn_idx)) {
  if (states_equal(env, &sl->state, cur)) {
    push_loop_stack(env, insn_idx, env->prev_insn_idx);
    goto hit;
hit:
  propagate_liveness()

> The necessity to visit all grandchildren states of V leads to
> necessity of mixed DFS and BFS traversal of the loop states.

DFS + BFS traversal doesn't guarantee safety.
We delayed looping states in loop_stack, but it doesn't mean
that the loop body reached a steady state.
Your own num_iter_bug() is an example.
The verifier will miss exploring path with r7 = 0xdead.
When C' is popped from loop_stack there is chance it will explore them,
but there is no guarantee.
For this particular num_iter_bug() case the loop processing
will eventually propagate precision marks and retrying with C' later,
the C' will be correctly rejected, since precision marks are there.
But I think that is more due to luck and the shape of the test.

if (is_iter_next_insn(env, insn_idx)) {
  if (states_equal(env, &sl->state, cur)) {

is a foot gun. It finds broken equivalence and potentially skipping
whole blocks of code.
What guarantees that C' from loop_stack will explore them?
I think avoiding states_equal(V, cur) when V state didn't converge
is a mandatory part of the fix.

I can see that little bit of out-of-order state processing probably
is still correct, but delaying loop_stack all the way until 
env->stack.size == 0 is just broken.
For example we can push looping state in process_iter_next_call()
not the top of the stack, but place it after branches with
insn_idx >= iter_next's insn_idx.
This way the delayed looping states will still be processed
around loop processing and every code block before the loop
will converge to branches = 0.
To do that we don't need another loop_stack.

I think we need a call to converge. Office hours on Thu at 9am?
Or tomorrow at 9am?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-10-03 23:08                                                               ` Alexei Starovoitov
@ 2023-10-03 23:14                                                                 ` Eduard Zingerman
  2023-10-04  0:22                                                                   ` Andrii Nakryiko
  2023-10-04  1:05                                                                 ` Eduard Zingerman
  1 sibling, 1 reply; 52+ messages in thread
From: Eduard Zingerman @ 2023-10-03 23:14 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu,
	Kumar Kartikeya Dwivedi

On Tue, 2023-10-03 at 16:08 -0700, Alexei Starovoitov wrote:
> ...

I'll reply a bit later.
Keeping all states before loop in a limbo might indeed be an issue
performance wise.

> I think we need a call to converge. Office hours on Thu at 9am?
> Or tomorrow at 9am?

Both are fine by me.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-10-03 23:14                                                                 ` Eduard Zingerman
@ 2023-10-04  0:22                                                                   ` Andrii Nakryiko
  0 siblings, 0 replies; 52+ messages in thread
From: Andrii Nakryiko @ 2023-10-04  0:22 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Alexei Starovoitov, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu,
	Kumar Kartikeya Dwivedi

On Tue, Oct 3, 2023 at 4:14 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Tue, 2023-10-03 at 16:08 -0700, Alexei Starovoitov wrote:
> > ...
>
> I'll reply a bit later.
> Keeping all states before loop in a limbo might indeed be an issue
> performance wise.
>
> > I think we need a call to converge. Office hours on Thu at 9am?
> > Or tomorrow at 9am?
>
> Both are fine by me.

Ditto.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-10-03 23:08                                                               ` Alexei Starovoitov
  2023-10-03 23:14                                                                 ` Eduard Zingerman
@ 2023-10-04  1:05                                                                 ` Eduard Zingerman
  2023-10-04  2:57                                                                   ` Alexei Starovoitov
  1 sibling, 1 reply; 52+ messages in thread
From: Eduard Zingerman @ 2023-10-04  1:05 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu,
	Kumar Kartikeya Dwivedi

On Tue, 2023-10-03 at 16:08 -0700, Alexei Starovoitov wrote:
[...]
> > When bpf_iter_next() is reached in state C and states_equal() shows
> > that there is potentially equivalent state V:
> > - copy C' of C is created sing copy_verifier_state(), it updates all
> >   branch counters up the ownership chain as with any other state;
> > - C' is put to env->loop_stack.
> 
> and C' points to normal parent and increments its branches as part of __push_stack().

Yes.

> void foo()
> {
>   if (...) { .. } // no pruning
>   if (...) { .. } // no pruning
> 
>   bpf_for(...)
>   if (...) { .. } // all ok
> }
> 
> Essentially any complex code before the loop has a high chance of the verifier
> state explosion.

Yes. This may be an issue. I'll try to hack a layered variant I talked
before to see what are the underlying issues. The idea is to verify
every child state of the loop entry before moving to any branches of
loop's parent states. This would allow to compute local "steady loop"
and mark everything after loop entry as safe (.branches == 0).

> > It is possible and correct to propagate liveness and precision from V
> > to C when loop steady state is achieved, as at that point we know for
> > sure that C is a sub-state of V. However, currently loop states are
> > tracked globally and no states are processed after loops steady state
> > is reached, hence I don't do liveness and precision propagation.
> 
> Hmm. I think the code is doing it:
> if (is_iter_next_insn(env, insn_idx)) {
>   if (states_equal(env, &sl->state, cur)) {
>     push_loop_stack(env, insn_idx, env->prev_insn_idx);
>     goto hit;
> hit:
>   propagate_liveness()

Yes and this should be changed. It should not produce incorrect
results as additional precision and read marks are conservative,
but will hinder state pruning in some scenarios.

> DFS + BFS traversal doesn't guarantee safety.
> We delayed looping states in loop_stack, but it doesn't mean
> that the loop body reached a steady state.
> Your own num_iter_bug() is an example.
> The verifier will miss exploring path with r7 = 0xdead.
> When C' is popped from loop_stack there is chance it will explore them,
> but there is no guarantee.
> For this particular num_iter_bug() case the loop processing
> will eventually propagate precision marks and retrying with C' later,
> the C' will be correctly rejected, since precision marks are there.
> But I think that is more due to luck and the shape of the test.
> 
> if (is_iter_next_insn(env, insn_idx)) {
>   if (states_equal(env, &sl->state, cur)) {
> 
> is a foot gun. It finds broken equivalence and potentially skipping
> whole blocks of code.
> What guarantees that C' from loop_stack will explore them?

The idea is that when `env->stack.size == 0` all V's non-delayed
children states are explored and relevant precision marks are
propagated to V. If at this point some states_equal(V, C') is false it
is necessary to schedule C' for one more exploration round as it might
visit some code blocks that were not visited on the path from V to C'
(as different predictions decisions could be made).
If a point is reached when for all loop states C' there are some
states V such that states_equal(V, C'), there are no more
configurations that can visit code blocks not visited on the path from
V to C', as prediction decisions would be the same.

> I think avoiding states_equal(V, cur) when V state didn't converge
> is a mandatory part of the fix.

But there are no other heuristics that suggest that exploration of the
infinite series of loop iterations could be stopped for current path.

> I can see that little bit of out-of-order state processing probably
> is still correct, but delaying loop_stack all the way until 
> env->stack.size == 0 is just broken.
> For example we can push looping state in process_iter_next_call()
> not the top of the stack, but place it after branches with
> insn_idx >= iter_next's insn_idx.

This should be based on state parentage chain as code blocks could be
non-linearly structured.

> This way the delayed looping states will still be processed
> around loop processing and every code block before the loop
> will converge to branches = 0.
> To do that we don't need another loop_stack.

Will comment in the morning.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-10-04  1:05                                                                 ` Eduard Zingerman
@ 2023-10-04  2:57                                                                   ` Alexei Starovoitov
  2023-10-04  5:50                                                                     ` Alexei Starovoitov
  2023-10-04 11:52                                                                     ` Eduard Zingerman
  0 siblings, 2 replies; 52+ messages in thread
From: Alexei Starovoitov @ 2023-10-04  2:57 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Andrii Nakryiko, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu,
	Kumar Kartikeya Dwivedi

On Wed, Oct 04, 2023 at 04:05:30AM +0300, Eduard Zingerman wrote:
> On Tue, 2023-10-03 at 16:08 -0700, Alexei Starovoitov wrote:
> [...]
> > > When bpf_iter_next() is reached in state C and states_equal() shows
> > > that there is potentially equivalent state V:
> > > - copy C' of C is created sing copy_verifier_state(), it updates all
> > >   branch counters up the ownership chain as with any other state;
> > > - C' is put to env->loop_stack.
> > 
> > and C' points to normal parent and increments its branches as part of __push_stack().
> 
> Yes.
> 
> > void foo()
> > {
> >   if (...) { .. } // no pruning
> >   if (...) { .. } // no pruning
> > 
> >   bpf_for(...)
> >   if (...) { .. } // all ok
> > }
> > 
> > Essentially any complex code before the loop has a high chance of the verifier
> > state explosion.
> 
> Yes. This may be an issue. I'll try to hack a layered variant I talked
> before to see what are the underlying issues. The idea is to verify
> every child state of the loop entry before moving to any branches of
> loop's parent states. 

Isn't that what current DFS logic does?

> This would allow to compute local "steady loop"
> and mark everything after loop entry as safe (.branches == 0).
> 
> > > It is possible and correct to propagate liveness and precision from V
> > > to C when loop steady state is achieved, as at that point we know for
> > > sure that C is a sub-state of V. However, currently loop states are
> > > tracked globally and no states are processed after loops steady state
> > > is reached, hence I don't do liveness and precision propagation.
> > 
> > Hmm. I think the code is doing it:
> > if (is_iter_next_insn(env, insn_idx)) {
> >   if (states_equal(env, &sl->state, cur)) {
> >     push_loop_stack(env, insn_idx, env->prev_insn_idx);
> >     goto hit;
> > hit:
> >   propagate_liveness()
> 
> Yes and this should be changed. It should not produce incorrect
> results as additional precision and read marks are conservative,
> but will hinder state pruning in some scenarios.
> 
> > DFS + BFS traversal doesn't guarantee safety.
> > We delayed looping states in loop_stack, but it doesn't mean
> > that the loop body reached a steady state.
> > Your own num_iter_bug() is an example.
> > The verifier will miss exploring path with r7 = 0xdead.
> > When C' is popped from loop_stack there is chance it will explore them,
> > but there is no guarantee.
> > For this particular num_iter_bug() case the loop processing
> > will eventually propagate precision marks and retrying with C' later,
> > the C' will be correctly rejected, since precision marks are there.
> > But I think that is more due to luck and the shape of the test.
> > 
> > if (is_iter_next_insn(env, insn_idx)) {
> >   if (states_equal(env, &sl->state, cur)) {
> > 
> > is a foot gun. It finds broken equivalence and potentially skipping
> > whole blocks of code.
> > What guarantees that C' from loop_stack will explore them?
> 
> The idea is that when `env->stack.size == 0` all V's non-delayed
> children states are explored and relevant precision marks are
> propagated to V. 

But that is not true.
if (is_iter_next_insn(env, insn_idx)) {
  if (states_equal(env, &sl->state, cur)) {
with sl->state.branches > 0
will prevent exploration of all children.
Hence I still believe that fine tunning this equavalence check is
the first step in any fix.

> If at this point some states_equal(V, C') is false it
> is necessary to schedule C' for one more exploration round as it might
> visit some code blocks that were not visited on the path from V to C'
> (as different predictions decisions could be made).

exactly my point above,
but because of broken double 'if' above the 2nd pass might be hitting
the same issues as the first. Because some path wasn't explored
the precision marks might still be wrong.

More loop states can be created and delayed again into loop_stack?

> If a point is reached when for all loop states C' there are some
> states V such that states_equal(V, C'), there are no more
> configurations that can visit code blocks not visited on the path from
> V to C', as prediction decisions would be the same.

and it not, we can repeat loop_stack->stack->loop_stack potentially forever?

> > I think avoiding states_equal(V, cur) when V state didn't converge
> > is a mandatory part of the fix.
> 
> But there are no other heuristics that suggest that exploration of the
> infinite series of loop iterations could be stopped for current path.

Did you look at my broken patch? yes. it's buggy, but it demonstrates
the idea where it removes above problematic double 'if' and uses
states_equal() on states where st->branches==0 && st->looping_states==1
meaning that in those states all paths were explored except the one
forked by iter_next kfunc.
So precision and liveness is correct,
while doing states_equal on V with branches=2+ is broken. It's a guess.
Could just be random true/false and algorithm of delaying looping states
will probably converge the same way on the tests we have so far.

> > I can see that little bit of out-of-order state processing probably
> > is still correct, but delaying loop_stack all the way until 
> > env->stack.size == 0 is just broken.
> > For example we can push looping state in process_iter_next_call()
> > not the top of the stack, but place it after branches with
> > insn_idx >= iter_next's insn_idx.
> 
> This should be based on state parentage chain as code blocks could be
> non-linearly structured.

ok. discard that idea.

> > This way the delayed looping states will still be processed
> > around loop processing and every code block before the loop
> > will converge to branches = 0.
> > To do that we don't need another loop_stack.
> 
> Will comment in the morning.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-10-04  2:57                                                                   ` Alexei Starovoitov
@ 2023-10-04  5:50                                                                     ` Alexei Starovoitov
  2023-10-04  9:49                                                                       ` Eduard Zingerman
  2023-10-04 11:52                                                                     ` Eduard Zingerman
  1 sibling, 1 reply; 52+ messages in thread
From: Alexei Starovoitov @ 2023-10-04  5:50 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Andrii Nakryiko, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu,
	Kumar Kartikeya Dwivedi

[-- Attachment #1: Type: text/plain, Size: 1146 bytes --]

On Tue, Oct 3, 2023 at 7:57 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> ok. discard that idea.

Attached is a 3rd version of the same idea I argued earlier.
Let normal DFS go as normal,
do states_equal() on V which has 1 looping branch remain
and all other explored.
To achieve that when iter_next() is seen do parent->looping_states += 2;

then when processing any children do parent->looping_states++;
in the correct parent.
Since there could be many intermediate states have to walk back
parentage chain to increment correct parent.
When the state reaches bpf_exit or safety, walk back
the parentage chain and do looping_states--.
The state is ok to use in states_equal() if looping_states==1.

With this patch all existing iter tests still pass,
and all Ed's special tests pass or fail as needed.
Ex: loop_state_deps1 is rejected with misaligned stack,
loop1 loads with success, num_iter_bug fails with bad pointer.

Please review.
I could be just lucky with the way tests are constructed,
but I feel this is a better path to fix this issue instead
of DFS/BFS combo that I have doubts about.

[-- Attachment #2: 0001-iter-hack-3.patch --]
[-- Type: application/octet-stream, Size: 6161 bytes --]

From 4e01fce2666188b2f203c8b8047a40a9116900c6 Mon Sep 17 00:00:00 2001
From: Alexei Starovoitov <ast@kernel.org>
Date: Mon, 2 Oct 2023 18:30:23 -0700
Subject: [PATCH] iter hack 3

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 include/linux/bpf_verifier.h |  1 +
 kernel/bpf/verifier.c        | 48 +++++++++++++++++++++++++++++++++---
 2 files changed, 46 insertions(+), 3 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 94ec766432f5..34f7de583aae 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -367,6 +367,7 @@ struct bpf_verifier_state {
 	 * In such cases BPF_COMPLEXITY_LIMIT_INSNS limit kicks in.
 	 */
 	u32 branches;
+	u32 looping_states;
 	u32 insn_idx;
 	u32 curframe;
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index eed7350e15f4..fb52d41b70f3 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1762,6 +1762,7 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state,
 	dst_state->active_lock.ptr = src->active_lock.ptr;
 	dst_state->active_lock.id = src->active_lock.id;
 	dst_state->branches = src->branches;
+	dst_state->looping_states = src->looping_states;
 	dst_state->parent = src->parent;
 	dst_state->first_insn_idx = src->first_insn_idx;
 	dst_state->last_insn_idx = src->last_insn_idx;
@@ -1782,9 +1783,23 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state,
 
 static void update_branch_counts(struct bpf_verifier_env *env, struct bpf_verifier_state *st)
 {
+	struct bpf_verifier_state *save = st;
+
+	while (st) {
+		if (st->looping_states) {
+			st->looping_states--;
+			verbose(env, "update_br1 %lx branches=%d/%d\n",
+				((long)st) >> 3 & 0xFFFF, st->branches, st->looping_states);
+		}
+		st = st->parent;
+	}
+	st = save;
 	while (st) {
 		u32 br = --st->branches;
 
+		verbose(env, "update_br2 %lx branches=%d/%d\n",
+			((long)st) >> 3 & 0xFFFF, st->branches, st->looping_states);
+
 		/* WARN_ON(br > 1) technically makes sense here,
 		 * but see comment in push_stack(), hence:
 		 */
@@ -1811,6 +1826,8 @@ static int pop_stack(struct bpf_verifier_env *env, int *prev_insn_idx,
 		err = copy_verifier_state(cur, &head->st);
 		if (err)
 			return err;
+		verbose(env, "%d: pop_stack %lx branches=%d/%d\n", head->insn_idx,
+			((long)&head->st) >> 3 & 0xFFFF, cur->branches, cur->looping_states);
 	}
 	if (pop_log)
 		bpf_vlog_reset(&env->log, head->log_pos);
@@ -1826,6 +1843,16 @@ static int pop_stack(struct bpf_verifier_env *env, int *prev_insn_idx,
 	return 0;
 }
 
+static void increase_looping(struct bpf_verifier_state *st)
+{
+	while (st) {
+		if (st->looping_states) {
+			st->looping_states++;
+			break;
+		}
+		st = st->parent;
+	}
+}
 static struct bpf_verifier_state *push_stack(struct bpf_verifier_env *env,
 					     int insn_idx, int prev_insn_idx,
 					     bool speculative)
@@ -1847,6 +1874,9 @@ static struct bpf_verifier_state *push_stack(struct bpf_verifier_env *env,
 	err = copy_verifier_state(&elem->st, cur);
 	if (err)
 		goto err;
+	verbose(env, "%d: push_stack %lx branches=%d/%d parent %lx\n", insn_idx,
+		((long)&elem->st) >> 3 & 0xFFFF, cur->branches, cur->looping_states,
+		((long)elem->st.parent) >> 3 & 0xFFFF);
 	elem->st.speculative |= speculative;
 	if (env->stack_size > BPF_COMPLEXITY_LIMIT_JMP_SEQ) {
 		verbose(env, "The sequence of %d jumps is too complex.\n",
@@ -1855,6 +1885,7 @@ static struct bpf_verifier_state *push_stack(struct bpf_verifier_env *env,
 	}
 	if (elem->st.parent) {
 		++elem->st.parent->branches;
+		increase_looping(elem->st.parent);
 		/* WARN_ON(branches > 2) technically makes sense here,
 		 * but
 		 * 1. speculative states will bump 'branches' for non-branch
@@ -7734,6 +7765,11 @@ static int process_iter_next_call(struct bpf_verifier_env *env, int insn_idx,
 		if (!queued_st)
 			return -ENOMEM;
 
+		queued_st->parent->looping_states += 2;
+		verbose(env, "process_iter_next_call %lx branches=%d/%d\n",
+			((long)queued_st) >> 3 & 0xFFFF, queued_st->branches, queued_st->looping_states);
+		verbose(env, "process_iter_next_call parent %lx branches=%d/%d\n",
+			((long)queued_st->parent) >> 3 & 0xFFFF, queued_st->parent->branches, queued_st->parent->looping_states);
 		queued_iter = &queued_st->frame[iter_frameno]->stack[iter_spi].spilled_ptr;
 		queued_iter->iter.state = BPF_ITER_STATE_ACTIVE;
 		queued_iter->iter.depth++;
@@ -16451,7 +16487,11 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
 		if (sl->state.insn_idx != insn_idx)
 			goto next;
 
-		if (sl->state.branches) {
+		verbose(env, "search %lx branches=%d/%d ", ((long)&sl->state) >> 3 & 0xFFFF,
+			sl->state.branches, sl->state.looping_states);
+		if (sl->state.branches &&
+		    !(sl->state.looping_states == 1 &&
+		      is_iter_next_insn(env, insn_idx))) {
 			struct bpf_func_state *frame = sl->state.frame[sl->state.curframe];
 
 			if (frame->in_async_callback_fn &&
@@ -16481,7 +16521,7 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
 			 * account iter_next() contract of eventually returning
 			 * sticky NULL result.
 			 */
-			if (is_iter_next_insn(env, insn_idx)) {
+			if (0 && is_iter_next_insn(env, insn_idx)) {
 				if (states_equal(env, &sl->state, cur)) {
 					struct bpf_func_state *cur_frame;
 					struct bpf_reg_state *iter_state, *iter_reg;
@@ -16638,6 +16678,8 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
 		kfree(new_sl);
 		return err;
 	}
+	verbose(env, "%d: add_state %lx branches=%d/%d\n", insn_idx, ((long)new) >> 3 & 0xFFFF, new->branches, new->looping_states);
+
 	new->insn_idx = insn_idx;
 	WARN_ONCE(new->branches != 1,
 		  "BUG is_state_visited:branches_to_explore=%d insn %d\n", new->branches, insn_idx);
@@ -16779,7 +16821,7 @@ static int do_check(struct bpf_verifier_env *env)
 		insn = &insns[env->insn_idx];
 		class = BPF_CLASS(insn->code);
 
-		if (++env->insn_processed > BPF_COMPLEXITY_LIMIT_INSNS) {
+		if (++env->insn_processed > 1000) {//BPF_COMPLEXITY_LIMIT_INSNS) {
 			verbose(env,
 				"BPF program is too large. Processed %d insn\n",
 				env->insn_processed);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-10-04  5:50                                                                     ` Alexei Starovoitov
@ 2023-10-04  9:49                                                                       ` Eduard Zingerman
  0 siblings, 0 replies; 52+ messages in thread
From: Eduard Zingerman @ 2023-10-04  9:49 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu,
	Kumar Kartikeya Dwivedi

[-- Attachment #1: Type: text/plain, Size: 2256 bytes --]

On Tue, 2023-10-03 at 22:50 -0700, Alexei Starovoitov wrote:
> On Tue, Oct 3, 2023 at 7:57 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> > 
> > ok. discard that idea.
> 
> Attached is a 3rd version of the same idea I argued earlier.
> Let normal DFS go as normal,
> do states_equal() on V which has 1 looping branch remain
> and all other explored.
> To achieve that when iter_next() is seen do parent->looping_states += 2;
> 
> then when processing any children do parent->looping_states++;
> in the correct parent.
> Since there could be many intermediate states have to walk back
> parentage chain to increment correct parent.
> When the state reaches bpf_exit or safety, walk back
> the parentage chain and do looping_states--.
> The state is ok to use in states_equal() if looping_states==1.

But what if each next iteration spawns more than two looping states?
E.g. for the following example:

  0. // full test attached as a patch
  1. while (next(i)) {
  2.   if (random())
  3.     continue;
  4.   if (random())
  5.     continue;
  6.   if (random())
  7.     continue;
  8.   r0 += 0;
  9. }

For me it bails out with the following message:

    run_subtest:FAIL:unexpected_load_failure unexpected error: -28
    ....
    The sequence of 8193 jumps is too complex.
    processed 49161 insns (limit 1000000) max_states_per_insn 4 total_states 2735 peak_states 2735 mark_read 2

(I bumped insn complexity limit back to 1,000,000).

> With this patch all existing iter tests still pass,
> and all Ed's special tests pass or fail as needed.
> Ex: loop_state_deps1 is rejected with misaligned stack,
> loop1 loads with success, num_iter_bug fails with bad pointer.

Are you sure that correct version of the patch was shared?
I get the following log for loop_state_deps1:

    run_subtest:FAIL:unexpected_load_success unexpected success: 0
    from 21 to 22: safe
    ...
    update_br2 80c0 branches=0/0
    processed 75 insns (limit 1000000) max_states_per_insn 2 total_states 29 peak_states 29 mark_read 9
    =============
    #104/26  iters/loop_state_deps1:FAIL

The test case is marked as safe.
iter_precision_fixed_point{1,2} and num_iter_bug work as expected.

[-- Attachment #2: hydra1.patch --]
[-- Type: text/x-patch, Size: 1646 bytes --]

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 019c8dde5fa2..cf9041fac272 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -16821,7 +16821,7 @@ static int do_check(struct bpf_verifier_env *env)
 		insn = &insns[env->insn_idx];
 		class = BPF_CLASS(insn->code);
 
-		if (++env->insn_processed > 1000) {//BPF_COMPLEXITY_LIMIT_INSNS) {
+		if (++env->insn_processed > BPF_COMPLEXITY_LIMIT_INSNS) {
 			verbose(env,
 				"BPF program is too large. Processed %d insn\n",
 				env->insn_processed);
diff --git a/tools/testing/selftests/bpf/progs/iters.c b/tools/testing/selftests/bpf/progs/iters.c
index 9add71d79a3a..f08288c0b74a 100644
--- a/tools/testing/selftests/bpf/progs/iters.c
+++ b/tools/testing/selftests/bpf/progs/iters.c
@@ -991,4 +991,42 @@ int num_iter_bug(const void *ctx) {
      return 0;
 }
 
+SEC("?raw_tp")
+__success
+__naked int hydra1(void)
+{
+	asm volatile (
+		"r1 = r10;"
+		"r1 += -8;"
+		"r2 = 0;"
+		"r3 = 10;"
+		"call %[bpf_iter_num_new];"
+	"loop_%=:"
+		"r1 = r10;"
+		"r1 += -8;"
+		"call %[bpf_iter_num_next];"
+		"if r0 == 0 goto loop_end_%=;"
+		"call %[bpf_get_prandom_u32];"
+		"if r0 != 42 goto loop_%=;"
+		"call %[bpf_get_prandom_u32];"
+		"if r0 != 42 goto loop_%=;"
+		"call %[bpf_get_prandom_u32];"
+		"if r0 != 42 goto loop_%=;"
+		"r0 += 0;"
+		"goto loop_%=;"
+	"loop_end_%=:"
+		"r1 = r10;"
+		"r1 += -8;"
+		"call %[bpf_iter_num_destroy];"
+		"r0 = 0;"
+		"exit;"
+		:
+		: __imm(bpf_get_prandom_u32),
+		  __imm(bpf_iter_num_new),
+		  __imm(bpf_iter_num_next),
+		  __imm(bpf_iter_num_destroy)
+		: __clobber_all
+	);
+}
+
 char _license[] SEC("license") = "GPL";

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
  2023-10-04  2:57                                                                   ` Alexei Starovoitov
  2023-10-04  5:50                                                                     ` Alexei Starovoitov
@ 2023-10-04 11:52                                                                     ` Eduard Zingerman
  1 sibling, 0 replies; 52+ messages in thread
From: Eduard Zingerman @ 2023-10-04 11:52 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, Andrew Werner, bpf, Andrei Matei,
	Tamir Duberstein, Joanne Koong, kernel-team, Song Liu,
	Kumar Kartikeya Dwivedi

On Tue, 2023-10-03 at 19:57 -0700, Alexei Starovoitov wrote:
> > Yes. This may be an issue. I'll try to hack a layered variant I talked
> > before to see what are the underlying issues. The idea is to verify
> > every child state of the loop entry before moving to any branches of
> > loop's parent states. 
> 
> Isn't that what current DFS logic does?

With a twist: it should wait for local loop convergence before
declaring loop entry state safe. I described it in [1].
Upon entry to a top level loop state E:
1. do DFS for E, disallow switching to any branches that are not
   children of E, collect all loop states generated by E's DFS;
3. reschedule loop states;
4. do DFS for rescheduled states, collect all loop states generated in
   the process;
5. repeat from 3 until no loop state could be rescheduled;
6. mark E safe, proceed from regular states stack.

[1] https://lore.kernel.org/bpf/8b75e01d27696cd6661890e49bdc06b1e96092c7.camel@gmail.com/

> > The idea is that when `env->stack.size == 0` all V's non-delayed
> > children states are explored and relevant precision marks are
> > propagated to V. 
> 
> But that is not true.
> if (is_iter_next_insn(env, insn_idx)) {
>   if (states_equal(env, &sl->state, cur)) {
> with sl->state.branches > 0
> will prevent exploration of all children.

Yes, it would postpone exploration of some children, but it would be
given a chance to reschedule.

> Hence I still believe that fine tunning this equavalence check is
> the first step in any fix.

Maybe, but so far states_equal(V, C) seems ok:
1. values read by V are defined in C;
2. non-precise scalars are equivalent unconditionally;
3. precise scalars in C are sub-ranges of precise scalars in V;
4. non-scalar values are compared exactly (modulo idmap).

Which of these points are overly optimistic in your opinion?

> > If at this point some states_equal(V, C') is false it
> > is necessary to schedule C' for one more exploration round as it might
> > visit some code blocks that were not visited on the path from V to C'
> > (as different predictions decisions could be made).
> 
> exactly my point above,
> but because of broken double 'if' above the 2nd pass might be hitting
> the same issues as the first. Because some path wasn't explored
> the precision marks might still be wrong.
> 
> More loop states can be created and delayed again into loop_stack?

Yes, and new reschedule would be done and so on.

> > If a point is reached when for all loop states C' there are some
> > states V such that states_equal(V, C'), there are no more
> > configurations that can visit code blocks not visited on the path from
> > V to C', as prediction decisions would be the same.
> 
> and it not, we can repeat loop_stack->stack->loop_stack potentially forever?

It might be the case that such logic would not converge, e.g. as in
the following simple example:

  it = iter_new();
  i = 0;
  while (iter_next(it)) {
    if (i & 0x1 == 0)
      printk("'i' is precise and different in each state");
    ++i;
  }
  iter_destroy(it);

However:
- we have such limitation already;
- examples like this don't play well with current states pruning logic
  in general (I think that doing some opportunistic widening of
  precise values is an interesting idea (loosing or worsening precision
  with ability to backtrack on failure), but this is a different topic).

If loop logic would not converge verifier would report an error:
either instruction or jump limit would be reached.

For the cases when this logic does converge we end up in a state when
no new read or precision marks could be gained, thus it is safe to
accept states_equal == true verdicts.

> Did you look at my broken patch? yes. it's buggy, but it demonstrates
> the idea where it removes above problematic double 'if' and uses
> states_equal() on states where st->branches==0 && st->looping_states==1

<replied in a different email>

> meaning that in those states all paths were explored except the one
> forked by iter_next kfunc.
> So precision and liveness is correct,

That's probably true.

> while doing states_equal on V with branches=2+ is broken.

Not that it is broken, but it's meaning is "maybe a sub-state" instead
of "definitely a sub-state".

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2023-10-04 11:52 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-07 14:04 [BUG] verifier escape with iteration helpers (bpf_loop, ...) Andrew Werner
2023-07-07 16:44 ` Alexei Starovoitov
2023-07-07 18:08   ` Andrii Nakryiko
2023-07-07 18:21     ` Andrew Werner
2023-09-17 21:37     ` Eduard Zingerman
2023-09-17 22:09       ` Kumar Kartikeya Dwivedi
2023-09-18 13:06         ` Eduard Zingerman
2023-09-19 16:28           ` Eduard Zingerman
2023-09-19 23:02             ` Andrii Nakryiko
2023-09-20  0:19               ` Eduard Zingerman
2023-09-20 16:20                 ` Eduard Zingerman
2023-09-20 16:57                   ` Andrii Nakryiko
2023-09-21  9:14                 ` Alexei Starovoitov
2023-09-21 11:03                   ` Eduard Zingerman
2023-09-21 12:56                     ` Alexei Starovoitov
2023-09-21 16:23                       ` Eduard Zingerman
2023-09-21 16:35                         ` Andrii Nakryiko
2023-09-21 18:16                           ` Eduard Zingerman
2023-09-22  1:01                             ` Eduard Zingerman
2023-09-22  2:48                               ` Andrii Nakryiko
2023-09-22 18:36                                 ` Eduard Zingerman
2023-09-22 20:52                                   ` Andrii Nakryiko
2023-09-25  1:01                                     ` Eduard Zingerman
2023-09-26  0:33                                       ` Andrii Nakryiko
2023-09-26 15:55                                         ` Eduard Zingerman
2023-09-26 16:25                                           ` Andrii Nakryiko
2023-09-28  1:09                                             ` Eduard Zingerman
2023-09-28 18:30                                               ` Andrii Nakryiko
2023-10-02  3:26                                                 ` Eduard Zingerman
2023-09-30  0:41                                               ` Alexei Starovoitov
2023-10-02  1:40                                                 ` Eduard Zingerman
2023-10-02 16:29                                                   ` Alexei Starovoitov
2023-10-02 17:18                                                     ` Eduard Zingerman
2023-10-03  0:05                                                       ` Alexei Starovoitov
2023-10-03  2:00                                                         ` Alexei Starovoitov
2023-10-03 15:33                                                         ` Eduard Zingerman
2023-10-03 16:07                                                           ` Alexei Starovoitov
2023-10-03 18:50                                                           ` Alexei Starovoitov
2023-10-03 21:52                                                             ` Eduard Zingerman
2023-10-03 22:03                                                               ` Eduard Zingerman
2023-10-03 23:08                                                               ` Alexei Starovoitov
2023-10-03 23:14                                                                 ` Eduard Zingerman
2023-10-04  0:22                                                                   ` Andrii Nakryiko
2023-10-04  1:05                                                                 ` Eduard Zingerman
2023-10-04  2:57                                                                   ` Alexei Starovoitov
2023-10-04  5:50                                                                     ` Alexei Starovoitov
2023-10-04  9:49                                                                       ` Eduard Zingerman
2023-10-04 11:52                                                                     ` Eduard Zingerman
2023-09-19 23:14       ` Andrii Nakryiko
2023-09-20  0:06         ` Eduard Zingerman
2023-09-20 16:37           ` Andrii Nakryiko
2023-09-20 17:13             ` Eduard Zingerman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.