bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next 1/2] bpf: Migrate release_on_unlock logic to non-owning ref semantics
@ 2022-12-30  1:07 Alexei Starovoitov
  2022-12-30  1:07 ` [PATCH bpf-next 2/2] selftests/bpf: Update linked_list tests for " Alexei Starovoitov
  2022-12-30  1:13 ` [PATCH bpf-next 1/2] bpf: Migrate release_on_unlock logic to " Alexei Starovoitov
  0 siblings, 2 replies; 5+ messages in thread
From: Alexei Starovoitov @ 2022-12-30  1:07 UTC (permalink / raw)
  To: davem
  Cc: daniel, andrii, martin.lau, void, memxor, davemarchevsky, bpf,
	kernel-team

From: Alexei Starovoitov <ast@kernel.org>

This patch introduces non-owning reference semantics to the verifier,
specifically linked_list API kfunc handling. release_on_unlock logic for
refs is refactored - with small functional changes - to implement these
semantics, and bpf_list_push_{front,back} are migrated to use them.

When a list node is pushed to a list, the program still has a pointer to
the node:

  n = bpf_obj_new(typeof(*n));

  bpf_spin_lock(&l);
  bpf_list_push_back(&l, n);
  /* n still points to the just-added node */
  bpf_spin_unlock(&l);

What the verifier considers n to be after the push, and thus what can be
done with n, are changed by this patch.

Common properties both before/after this patch:
  * After push, n is only a valid reference to the node until end of
    critical section
  * After push, n cannot be pushed to any list
  * After push, the program can read the node's fields using n

Before:
  * After push, n retains the ref_obj_id which it received on
    bpf_obj_new, but the associated bpf_reference_state's
    release_on_unlock field is set to true
    * release_on_unlock field and associated logic is used to implement
      "n is only a valid ref until end of critical section"
  * After push, n cannot be written to, the node must be removed from
    the list before writing to its fields
  * After push, n is marked PTR_UNTRUSTED

After:
  * After push, n's ref is released and ref_obj_id set to 0. The
    bpf_reg_state's non_owning_ref_lock struct is populated with the
    currently active lock
    * non_owning_ref_lock and logic is used to implement "n is only a
      valid ref until end of critical section"
  * n can be written to (except for special fields e.g. bpf_list_node,
    timer, ...)
  * No special type flag is added to n after push

Summary of specific implementation changes to achieve the above:

  * release_on_unlock field, ref_set_release_on_unlock helper, and logic
    to "release on unlock" based on that field are removed

  * Convert pair { map | btf, id } into single u32

  * u32 active_lock_id field is added to bpf_reg_state's PTR_TO_BTF_ID union

  * Helpers are added to implement non-owning ref semantics as described above
    * invalidate_non_owning_refs - helper to clobber all non-owning refs
      matching a particular bpf_active_lock identity. Replaces
      release_on_unlock logic in process_spin_lock.
    * ref_convert_owning_non_owning - convert owning reference w/
      specified ref_obj_id to non-owning references. Setup
      active_lock_id for each reg with that ref_obj_id,
      clear its ref_obj_id, and remove ref_obj_id from state->acquired_refs

After these changes, linked_list's "release on unlock" logic continues
to function as before, except for the semantic differences noted above.
The patch immediately following this one makes minor changes to
linked_list selftests to account for the differing behavior.

Co-developed-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 include/linux/bpf_verifier.h |  6 +--
 kernel/bpf/verifier.c        | 92 ++++++++++++++++++------------------
 2 files changed, 47 insertions(+), 51 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 127058cfec47..3fc41edff267 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -68,6 +68,7 @@ struct bpf_reg_state {
 		struct {
 			struct btf *btf;
 			u32 btf_id;
+			u32 active_lock_id;
 		};
 
 		u32 mem_size; /* for PTR_TO_MEM | PTR_TO_MEM_OR_NULL */
@@ -223,11 +224,6 @@ struct bpf_reference_state {
 	 * exiting a callback function.
 	 */
 	int callback_ref;
-	/* Mark the reference state to release the registers sharing the same id
-	 * on bpf_spin_unlock (for nodes that we will lose ownership to but are
-	 * safe to access inside the critical section).
-	 */
-	bool release_on_unlock;
 };
 
 /* state of the program:
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 4a25375ebb0d..3e7fd564132c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -190,6 +190,7 @@ struct bpf_verifier_stack_elem {
 
 static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx);
 static int release_reference(struct bpf_verifier_env *env, int ref_obj_id);
+static void invalidate_non_owning_refs(struct bpf_verifier_env *env, u32 lock_id);
 
 static bool bpf_map_ptr_poisoned(const struct bpf_insn_aux_data *aux)
 {
@@ -931,6 +932,8 @@ static void print_verifier_state(struct bpf_verifier_env *env,
 				verbose_a("id=%d", reg->id);
 			if (reg->ref_obj_id)
 				verbose_a("ref_obj_id=%d", reg->ref_obj_id);
+			if (reg->active_lock_id)
+				verbose_a("lock_id=%d", reg->active_lock_id);
 			if (t != SCALAR_VALUE)
 				verbose_a("off=%d", reg->off);
 			if (type_is_pkt_pointer(t))
@@ -5782,9 +5785,7 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
 			cur->active_lock.ptr = btf;
 		cur->active_lock.id = reg->id;
 	} else {
-		struct bpf_func_state *fstate = cur_func(env);
 		void *ptr;
-		int i;
 
 		if (map)
 			ptr = map;
@@ -5800,25 +5801,11 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
 			verbose(env, "bpf_spin_unlock of different lock\n");
 			return -EINVAL;
 		}
-		cur->active_lock.ptr = NULL;
-		cur->active_lock.id = 0;
 
-		for (i = fstate->acquired_refs - 1; i >= 0; i--) {
-			int err;
+		invalidate_non_owning_refs(env, cur->active_lock.id);
 
-			/* Complain on error because this reference state cannot
-			 * be freed before this point, as bpf_spin_lock critical
-			 * section does not allow functions that release the
-			 * allocated object immediately.
-			 */
-			if (!fstate->refs[i].release_on_unlock)
-				continue;
-			err = release_reference(env, fstate->refs[i].id);
-			if (err) {
-				verbose(env, "failed to release release_on_unlock reference");
-				return err;
-			}
-		}
+		cur->active_lock.ptr = NULL;
+		cur->active_lock.id = 0;
 	}
 	return 0;
 }
@@ -7081,6 +7068,17 @@ static int release_reference(struct bpf_verifier_env *env,
 	return 0;
 }
 
+static void invalidate_non_owning_refs(struct bpf_verifier_env *env, u32 lock_id)
+{
+	struct bpf_func_state *unused;
+	struct bpf_reg_state *reg;
+
+	bpf_for_each_reg_in_vstate(env->cur_state, unused, reg, ({
+		if (base_type(reg->type) == PTR_TO_BTF_ID && reg->active_lock_id == lock_id)
+			__mark_reg_unknown(env, reg);
+	}));
+}
+
 static void clear_caller_saved_regs(struct bpf_verifier_env *env,
 				    struct bpf_reg_state *regs)
 {
@@ -8583,38 +8581,38 @@ static int process_kf_arg_ptr_to_kptr(struct bpf_verifier_env *env,
 	return 0;
 }
 
-static int ref_set_release_on_unlock(struct bpf_verifier_env *env, u32 ref_obj_id)
+static int ref_convert_owning_non_owning(struct bpf_verifier_env *env, u32 ref_obj_id)
 {
-	struct bpf_func_state *state = cur_func(env);
+	struct bpf_func_state *state, *unused;
 	struct bpf_reg_state *reg;
 	int i;
 
-	/* bpf_spin_lock only allows calling list_push and list_pop, no BPF
-	 * subprogs, no global functions. This means that the references would
-	 * not be released inside the critical section but they may be added to
-	 * the reference state, and the acquired_refs are never copied out for a
-	 * different frame as BPF to BPF calls don't work in bpf_spin_lock
-	 * critical sections.
-	 */
+	state = cur_func(env);
+
 	if (!ref_obj_id) {
-		verbose(env, "verifier internal error: ref_obj_id is zero for release_on_unlock\n");
+		verbose(env, "verifier internal error: ref_obj_id is zero for "
+			     "owning -> non-owning conversion\n");
 		return -EFAULT;
 	}
+
 	for (i = 0; i < state->acquired_refs; i++) {
-		if (state->refs[i].id == ref_obj_id) {
-			if (state->refs[i].release_on_unlock) {
-				verbose(env, "verifier internal error: expected false release_on_unlock");
-				return -EFAULT;
+		if (state->refs[i].id != ref_obj_id)
+			continue;
+
+		bpf_for_each_reg_in_vstate(env->cur_state, unused, reg, ({
+			if (reg->ref_obj_id == ref_obj_id) {
+				/* Clear ref_obj_id only. The rest of PTR_TO_BTF_ID stays as-is */
+				reg->ref_obj_id = 0;
+				reg->active_lock_id = env->cur_state->active_lock.id;
 			}
-			state->refs[i].release_on_unlock = true;
-			/* Now mark everyone sharing same ref_obj_id as untrusted */
-			bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
-				if (reg->ref_obj_id == ref_obj_id)
-					reg->type |= PTR_UNTRUSTED;
-			}));
-			return 0;
-		}
+		}));
+
+		/* There are no referenced regs with this ref_obj_id in the current state.
+		 * Removed ref_obj_id from acquired_refs. It should definitely succeed.
+		 */
+		return release_reference_state(state, ref_obj_id);
 	}
+
 	verbose(env, "verifier internal error: ref state missing for ref_obj_id\n");
 	return -EFAULT;
 }
@@ -8794,8 +8792,7 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
 			btf_name_by_offset(field->graph_root.btf, et->name_off));
 		return -EINVAL;
 	}
-	/* Set arg#1 for expiration after unlock */
-	return ref_set_release_on_unlock(env, reg->ref_obj_id);
+	return ref_convert_owning_non_owning(env, reg->ref_obj_id);
 }
 
 static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta)
@@ -8997,7 +8994,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 				return -EINVAL;
 			}
 			if (reg->type == (PTR_TO_BTF_ID | MEM_ALLOC) && !reg->ref_obj_id) {
-				verbose(env, "allocated object must be referenced\n");
+				verbose(env, "Allocated list_head must be referenced\n");
 				return -EINVAL;
 			}
 			ret = process_kf_arg_ptr_to_list_head(env, reg, regno, meta);
@@ -9010,7 +9007,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 				return -EINVAL;
 			}
 			if (!reg->ref_obj_id) {
-				verbose(env, "allocated object must be referenced\n");
+				verbose(env, "Allocated list_node must be referenced\n");
 				return -EINVAL;
 			}
 			ret = process_kf_arg_ptr_to_list_node(env, reg, regno, meta);
@@ -11959,7 +11956,10 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
 		dst_reg->type = PTR_TO_MAP_VALUE;
 		dst_reg->off = aux->map_off;
 		WARN_ON_ONCE(map->max_entries != 1);
-		/* We want reg->id to be same (0) as map_value is not distinct */
+		/* map->id is positive s32. Negative map->id will not collide with env->id_gen.
+		 * This lets us track active_lock state as single u32 instead of pair { map|btf, id }
+		 */
+		dst_reg->id = -map->id;
 	} else if (insn->src_reg == BPF_PSEUDO_MAP_FD ||
 		   insn->src_reg == BPF_PSEUDO_MAP_IDX) {
 		dst_reg->type = CONST_PTR_TO_MAP;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH bpf-next 2/2] selftests/bpf: Update linked_list tests for non-owning ref semantics
  2022-12-30  1:07 [PATCH bpf-next 1/2] bpf: Migrate release_on_unlock logic to non-owning ref semantics Alexei Starovoitov
@ 2022-12-30  1:07 ` Alexei Starovoitov
  2022-12-30  1:13 ` [PATCH bpf-next 1/2] bpf: Migrate release_on_unlock logic to " Alexei Starovoitov
  1 sibling, 0 replies; 5+ messages in thread
From: Alexei Starovoitov @ 2022-12-30  1:07 UTC (permalink / raw)
  To: davem
  Cc: daniel, andrii, martin.lau, void, memxor, davemarchevsky, bpf,
	kernel-team

From: Dave Marchevsky <davemarchevsky@fb.com>

Current linked_list semantics for release_on_unlock node refs are almost
exactly the same as newly-introduced "non-owning reference" concept. The
only difference: writes to a release_on_unlock node ref are not allowed,
while writes to non-owning reference pointees are.

As a result the linked_list "write after push" failure tests are no
longer scenarios that should fail.

The test##_missing_lock_##op and test##_incorrect_lock_##op
macro-generated failure tests need to have a valid node argument in
order to have the same error output as before. Otherwise verification
will fail early and the expected error output won't be seen.

Some other tests have minor changes in error output, but fail for the
same reason.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20221217082506.1570898-4-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 .../selftests/bpf/prog_tests/linked_list.c    |   6 +-
 .../selftests/bpf/prog_tests/spin_lock.c      |   2 +-
 .../testing/selftests/bpf/progs/linked_list.c |   2 +-
 .../selftests/bpf/progs/linked_list_fail.c    | 100 +++++++++++-------
 4 files changed, 66 insertions(+), 44 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/linked_list.c b/tools/testing/selftests/bpf/prog_tests/linked_list.c
index 9a7d4c47af63..a2ed99cf2417 100644
--- a/tools/testing/selftests/bpf/prog_tests/linked_list.c
+++ b/tools/testing/selftests/bpf/prog_tests/linked_list.c
@@ -78,12 +78,10 @@ static struct {
 	{ "direct_write_head", "direct access to bpf_list_head is disallowed" },
 	{ "direct_read_node", "direct access to bpf_list_node is disallowed" },
 	{ "direct_write_node", "direct access to bpf_list_node is disallowed" },
-	{ "write_after_push_front", "only read is supported" },
-	{ "write_after_push_back", "only read is supported" },
 	{ "use_after_unlock_push_front", "invalid mem access 'scalar'" },
 	{ "use_after_unlock_push_back", "invalid mem access 'scalar'" },
-	{ "double_push_front", "arg#1 expected pointer to allocated object" },
-	{ "double_push_back", "arg#1 expected pointer to allocated object" },
+	{ "double_push_front", "Allocated list_node must be referenced" },
+	{ "double_push_back", "Allocated list_node must be referenced" },
 	{ "no_node_value_type", "bpf_list_node not found at offset=0" },
 	{ "incorrect_value_type",
 	  "operation on bpf_list_head expects arg#1 bpf_list_node at offset=0 in struct foo, "
diff --git a/tools/testing/selftests/bpf/prog_tests/spin_lock.c b/tools/testing/selftests/bpf/prog_tests/spin_lock.c
index d9270bd3d920..31ae22bb70e0 100644
--- a/tools/testing/selftests/bpf/prog_tests/spin_lock.c
+++ b/tools/testing/selftests/bpf/prog_tests/spin_lock.c
@@ -16,7 +16,7 @@ static struct {
 	  "R1_w=ptr_foo(id=2,ref_obj_id=2,off=0,imm=0) refs=2\n6: (85) call bpf_this_cpu_ptr#154\n"
 	  "R1 type=ptr_ expected=percpu_ptr_" },
 	{ "lock_id_global_zero",
-	  "; R1_w=map_value(off=0,ks=4,vs=4,imm=0)\n2: (85) call bpf_this_cpu_ptr#154\n"
+	  "off=0,ks=4,vs=4,imm=0)\n2: (85) call bpf_this_cpu_ptr#154\n"
 	  "R1 type=map_value expected=percpu_ptr_" },
 	{ "lock_id_mapval_preserve",
 	  "8: (bf) r1 = r0                       ; R0_w=map_value(id=1,off=0,ks=4,vs=8,imm=0) "
diff --git a/tools/testing/selftests/bpf/progs/linked_list.c b/tools/testing/selftests/bpf/progs/linked_list.c
index 4ad88da5cda2..4fa4a9b01bde 100644
--- a/tools/testing/selftests/bpf/progs/linked_list.c
+++ b/tools/testing/selftests/bpf/progs/linked_list.c
@@ -260,7 +260,7 @@ int test_list_push_pop_multiple(struct bpf_spin_lock *lock, struct bpf_list_head
 {
 	int ret;
 
-	ret = list_push_pop_multiple(lock ,head, false);
+	ret = list_push_pop_multiple(lock, head, false);
 	if (ret)
 		return ret;
 	return list_push_pop_multiple(lock, head, true);
diff --git a/tools/testing/selftests/bpf/progs/linked_list_fail.c b/tools/testing/selftests/bpf/progs/linked_list_fail.c
index 1d9017240e19..69cdc07cba13 100644
--- a/tools/testing/selftests/bpf/progs/linked_list_fail.c
+++ b/tools/testing/selftests/bpf/progs/linked_list_fail.c
@@ -54,28 +54,44 @@
 		return 0;                                   \
 	}
 
-CHECK(kptr, push_front, &f->head);
-CHECK(kptr, push_back, &f->head);
 CHECK(kptr, pop_front, &f->head);
 CHECK(kptr, pop_back, &f->head);
 
-CHECK(global, push_front, &ghead);
-CHECK(global, push_back, &ghead);
 CHECK(global, pop_front, &ghead);
 CHECK(global, pop_back, &ghead);
 
-CHECK(map, push_front, &v->head);
-CHECK(map, push_back, &v->head);
 CHECK(map, pop_front, &v->head);
 CHECK(map, pop_back, &v->head);
 
-CHECK(inner_map, push_front, &iv->head);
-CHECK(inner_map, push_back, &iv->head);
 CHECK(inner_map, pop_front, &iv->head);
 CHECK(inner_map, pop_back, &iv->head);
 
 #undef CHECK
 
+#define CHECK(test, op, hexpr, nexpr)					\
+	SEC("?tc")							\
+	int test##_missing_lock_##op(void *ctx)				\
+	{								\
+		INIT;							\
+		void (*p)(void *, void *) = (void *)&bpf_list_##op;	\
+		p(hexpr, nexpr);					\
+		return 0;						\
+	}
+
+CHECK(kptr, push_front, &f->head, b);
+CHECK(kptr, push_back, &f->head, b);
+
+CHECK(global, push_front, &ghead, f);
+CHECK(global, push_back, &ghead, f);
+
+CHECK(map, push_front, &v->head, f);
+CHECK(map, push_back, &v->head, f);
+
+CHECK(inner_map, push_front, &iv->head, f);
+CHECK(inner_map, push_back, &iv->head, f);
+
+#undef CHECK
+
 #define CHECK(test, op, lexpr, hexpr)                       \
 	SEC("?tc")                                          \
 	int test##_incorrect_lock_##op(void *ctx)           \
@@ -108,11 +124,47 @@ CHECK(inner_map, pop_back, &iv->head);
 	CHECK(inner_map_global, op, &iv->lock, &ghead);        \
 	CHECK(inner_map_map, op, &iv->lock, &v->head);
 
-CHECK_OP(push_front);
-CHECK_OP(push_back);
 CHECK_OP(pop_front);
 CHECK_OP(pop_back);
 
+#undef CHECK
+#undef CHECK_OP
+
+#define CHECK(test, op, lexpr, hexpr, nexpr)				\
+	SEC("?tc")							\
+	int test##_incorrect_lock_##op(void *ctx)			\
+	{								\
+		INIT;							\
+		void (*p)(void *, void*) = (void *)&bpf_list_##op;	\
+		bpf_spin_lock(lexpr);					\
+		p(hexpr, nexpr);					\
+		return 0;						\
+	}
+
+#define CHECK_OP(op)							\
+	CHECK(kptr_kptr, op, &f1->lock, &f2->head, b);			\
+	CHECK(kptr_global, op, &f1->lock, &ghead, f);			\
+	CHECK(kptr_map, op, &f1->lock, &v->head, f);			\
+	CHECK(kptr_inner_map, op, &f1->lock, &iv->head, f);		\
+									\
+	CHECK(global_global, op, &glock2, &ghead, f);			\
+	CHECK(global_kptr, op, &glock, &f1->head, b);			\
+	CHECK(global_map, op, &glock, &v->head, f);			\
+	CHECK(global_inner_map, op, &glock, &iv->head, f);		\
+									\
+	CHECK(map_map, op, &v->lock, &v2->head, f);			\
+	CHECK(map_kptr, op, &v->lock, &f2->head, b);			\
+	CHECK(map_global, op, &v->lock, &ghead, f);			\
+	CHECK(map_inner_map, op, &v->lock, &iv->head, f);		\
+									\
+	CHECK(inner_map_inner_map, op, &iv->lock, &iv2->head, f);	\
+	CHECK(inner_map_kptr, op, &iv->lock, &f2->head, b);		\
+	CHECK(inner_map_global, op, &iv->lock, &ghead, f);		\
+	CHECK(inner_map_map, op, &iv->lock, &v->head, f);
+
+CHECK_OP(push_front);
+CHECK_OP(push_back);
+
 #undef CHECK
 #undef CHECK_OP
 #undef INIT
@@ -303,34 +355,6 @@ int direct_write_node(void *ctx)
 	return 0;
 }
 
-static __always_inline
-int write_after_op(void (*push_op)(void *head, void *node))
-{
-	struct foo *f;
-
-	f = bpf_obj_new(typeof(*f));
-	if (!f)
-		return 0;
-	bpf_spin_lock(&glock);
-	push_op(&ghead, &f->node);
-	f->data = 42;
-	bpf_spin_unlock(&glock);
-
-	return 0;
-}
-
-SEC("?tc")
-int write_after_push_front(void *ctx)
-{
-	return write_after_op((void *)bpf_list_push_front);
-}
-
-SEC("?tc")
-int write_after_push_back(void *ctx)
-{
-	return write_after_op((void *)bpf_list_push_back);
-}
-
 static __always_inline
 int use_after_unlock(void (*op)(void *head, void *node))
 {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH bpf-next 1/2] bpf: Migrate release_on_unlock logic to non-owning ref semantics
  2022-12-30  1:07 [PATCH bpf-next 1/2] bpf: Migrate release_on_unlock logic to non-owning ref semantics Alexei Starovoitov
  2022-12-30  1:07 ` [PATCH bpf-next 2/2] selftests/bpf: Update linked_list tests for " Alexei Starovoitov
@ 2022-12-30  1:13 ` Alexei Starovoitov
  2023-01-17 22:42   ` Dave Marchevsky
  1 sibling, 1 reply; 5+ messages in thread
From: Alexei Starovoitov @ 2022-12-30  1:13 UTC (permalink / raw)
  To: David S. Miller
  Cc: Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, David Vernet,
	Kumar Kartikeya Dwivedi, Dave Marchevsky, bpf, Kernel Team

On Thu, Dec 29, 2022 at 5:07 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> @@ -11959,7 +11956,10 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
>                 dst_reg->type = PTR_TO_MAP_VALUE;
>                 dst_reg->off = aux->map_off;
>                 WARN_ON_ONCE(map->max_entries != 1);
> -               /* We want reg->id to be same (0) as map_value is not distinct */
> +               /* map->id is positive s32. Negative map->id will not collide with env->id_gen.
> +                * This lets us track active_lock state as single u32 instead of pair { map|btf, id }
> +                */
> +               dst_reg->id = -map->id;

Here is what I meant in my earlier reply to Dave's patches 1 and 2.
imo this is a simpler implementation of the same logic.
The only tricky part is above bit that is necessary
to use a single u32 in bpf_reg_state to track active_lock.
We can follow up with clean up of active_lock comparison
in other places of the verifier.
Instead of:
                if (map)
                        cur->active_lock.ptr = map;
                else
                        cur->active_lock.ptr = btf;
                cur->active_lock.id = reg->id;
it will be:
  cur->active_lock_id = reg->id;

Another cleanup is needed to compare active_lock_id properly
in regsafe().

Thoughts?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH bpf-next 1/2] bpf: Migrate release_on_unlock logic to non-owning ref semantics
  2022-12-30  1:13 ` [PATCH bpf-next 1/2] bpf: Migrate release_on_unlock logic to " Alexei Starovoitov
@ 2023-01-17 22:42   ` Dave Marchevsky
  2023-01-18  0:41     ` Alexei Starovoitov
  0 siblings, 1 reply; 5+ messages in thread
From: Dave Marchevsky @ 2023-01-17 22:42 UTC (permalink / raw)
  To: Alexei Starovoitov, David S. Miller
  Cc: Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, David Vernet,
	Kumar Kartikeya Dwivedi, Dave Marchevsky, bpf, Kernel Team

On 12/29/22 8:13 PM, Alexei Starovoitov wrote:
> On Thu, Dec 29, 2022 at 5:07 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
>> @@ -11959,7 +11956,10 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
>>                 dst_reg->type = PTR_TO_MAP_VALUE;
>>                 dst_reg->off = aux->map_off;
>>                 WARN_ON_ONCE(map->max_entries != 1);
>> -               /* We want reg->id to be same (0) as map_value is not distinct */
>> +               /* map->id is positive s32. Negative map->id will not collide with env->id_gen.
>> +                * This lets us track active_lock state as single u32 instead of pair { map|btf, id }
>> +                */
>> +               dst_reg->id = -map->id;
> 
> Here is what I meant in my earlier reply to Dave's patches 1 and 2.
> imo this is a simpler implementation of the same logic.
> The only tricky part is above bit that is necessary
> to use a single u32 in bpf_reg_state to track active_lock.
> We can follow up with clean up of active_lock comparison
> in other places of the verifier.
> Instead of:
>                 if (map)
>                         cur->active_lock.ptr = map;
>                 else
>                         cur->active_lock.ptr = btf;
>                 cur->active_lock.id = reg->id;
> it will be:
>   cur->active_lock_id = reg->id;
> 
> Another cleanup is needed to compare active_lock_id properly
> in regsafe().
> 
> Thoughts?

Before Kumar's commit d0d78c1df9b1d ("bpf: Allow locking bpf_spin_lock global
variables"), setting of dst_reg->id in that code path was guarded by "does
map val have spin_lock check":

  if (btf_record_has_field(map->record, BPF_SPIN_LOCK))
    dst_reg->id = ++env->id_gen;

Should we do that here as well? Not sure what the implications of overzealous
setting of dst_reg->id are.


More generally: I see why doing -map->id will not overlap with env->id_gen
in practice. For PTR_TO_BTF_ID, I'm not sure that we'll always have a nonzero
reg->id here. check_kfunc_call has

  if (reg_may_point_to_spin_lock(&regs[BPF_REG_0]) && !regs[BPF_REG_0].id)
    regs[BPF_REG_0].id = ++env->id_gen;

when checking retval, but there's no such equivalent in check_helper_call,
which instead does

  if (type_may_be_null(regs[BPF_REG_0].type))
    regs[BPF_REG_0].id = ++env->id_gen;

and similar in a few paths.

Should we ensure that "any PTR_TO_BTF_ID reg that has a spin_lock must
have a nonzero id"? If we don't, a helper which returns
PTR_TO_BTF_ID w/ spin_lock that isn't MAYBE_NULL will break this whole scheme.


Some future-proofing concerns:

Kumar's commit above mentions this in the summary:

  Note that this can be extended in the future to also remember offset
  used for locking, so that we can introduce multiple bpf_spin_lock fields
  in the same allocation.

When the above happens we'll need to go back to a struct - or some packed
int - to describe "lock identity" anyways.

IIRC in the long term we wanted to stop overloading reg->id's meaning,
with the ideal end state being reg->id used only for "null id" purposes. If so,
this is moving us back towards the overloaded state.

Happy to take this over and respin once we're done discussing, unless you
want to do it.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH bpf-next 1/2] bpf: Migrate release_on_unlock logic to non-owning ref semantics
  2023-01-17 22:42   ` Dave Marchevsky
@ 2023-01-18  0:41     ` Alexei Starovoitov
  0 siblings, 0 replies; 5+ messages in thread
From: Alexei Starovoitov @ 2023-01-18  0:41 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: David S. Miller, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, David Vernet, Kumar Kartikeya Dwivedi,
	Dave Marchevsky, bpf, Kernel Team

On Tue, Jan 17, 2023 at 05:42:39PM -0500, Dave Marchevsky wrote:
> On 12/29/22 8:13 PM, Alexei Starovoitov wrote:
> > On Thu, Dec 29, 2022 at 5:07 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> >> @@ -11959,7 +11956,10 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
> >>                 dst_reg->type = PTR_TO_MAP_VALUE;
> >>                 dst_reg->off = aux->map_off;
> >>                 WARN_ON_ONCE(map->max_entries != 1);
> >> -               /* We want reg->id to be same (0) as map_value is not distinct */
> >> +               /* map->id is positive s32. Negative map->id will not collide with env->id_gen.
> >> +                * This lets us track active_lock state as single u32 instead of pair { map|btf, id }
> >> +                */
> >> +               dst_reg->id = -map->id;
> > 
> > Here is what I meant in my earlier reply to Dave's patches 1 and 2.
> > imo this is a simpler implementation of the same logic.
> > The only tricky part is above bit that is necessary
> > to use a single u32 in bpf_reg_state to track active_lock.
> > We can follow up with clean up of active_lock comparison
> > in other places of the verifier.
> > Instead of:
> >                 if (map)
> >                         cur->active_lock.ptr = map;
> >                 else
> >                         cur->active_lock.ptr = btf;
> >                 cur->active_lock.id = reg->id;
> > it will be:
> >   cur->active_lock_id = reg->id;
> > 
> > Another cleanup is needed to compare active_lock_id properly
> > in regsafe().
> > 
> > Thoughts?
> 
> Before Kumar's commit d0d78c1df9b1d ("bpf: Allow locking bpf_spin_lock global
> variables"), setting of dst_reg->id in that code path was guarded by "does
> map val have spin_lock check":
> 
>   if (btf_record_has_field(map->record, BPF_SPIN_LOCK))
>     dst_reg->id = ++env->id_gen;
> 
> Should we do that here as well? Not sure what the implications of overzealous
> setting of dst_reg->id are.

That won't work, since for spin_locks in global data that 'id' has to be stable.
Hence I went with -map->id.

> More generally: I see why doing -map->id will not overlap with env->id_gen
> in practice. For PTR_TO_BTF_ID, I'm not sure that we'll always have a nonzero
> reg->id here. check_kfunc_call has
> 
>   if (reg_may_point_to_spin_lock(&regs[BPF_REG_0]) && !regs[BPF_REG_0].id)
>     regs[BPF_REG_0].id = ++env->id_gen;

correct.

> when checking retval, but there's no such equivalent in check_helper_call,
> which instead does
> 
>   if (type_may_be_null(regs[BPF_REG_0].type))
>     regs[BPF_REG_0].id = ++env->id_gen;
> 
> and similar in a few paths.
> 
> Should we ensure that "any PTR_TO_BTF_ID reg that has a spin_lock must
> have a nonzero id"? If we don't, a helper which returns
> PTR_TO_BTF_ID w/ spin_lock that isn't MAYBE_NULL will break this whole scheme.

correct. it's not a problem in practice, because there are few helpers
that return PTR_TO_BTF_ID and none of them point to spin_lock.

> Some future-proofing concerns:
> 
> Kumar's commit above mentions this in the summary:
> 
>   Note that this can be extended in the future to also remember offset
>   used for locking, so that we can introduce multiple bpf_spin_lock fields
>   in the same allocation.

Yeah. I forgot about this 'offset' idea.

> When the above happens we'll need to go back to a struct - or some packed
> int - to describe "lock identity" anyways.

yeah.
I was hoping we can represent all of 'active_lock' as a single id,
but 'offset' is hard to encode into an integer.
We won't be able to add it in top 32-bits, since the resulting 64-bit 'id'
would need to go through check_ids(). Even if we extend idmap_scratch to be 64-bit
it won't be correct. Two different 32-bit IDs are ok to see and the states
might still be equivalent, but two different 'offset' are not ok.
The offset in 'old' state and offset in 'cur' state has to be the same for
states to be equivalent. So the future 'offset' would need to be compared in regsafe()
manually anyway. Since we'd have to do that there is little point combining
active_lock.ptr comparison into 32-bit id.
Sigh.

> IIRC in the long term we wanted to stop overloading reg->id's meaning,
> with the ideal end state being reg->id used only for "null id" purposes. If so,
> this is moving us back towards the overloaded state.
> 
> Happy to take this over and respin once we're done discussing, unless you
> want to do it.

Please go ahead.
Let's scratch my -map->id idea for now.
Let's go with embedding 'struct bpf_active_lock {void *ptr; u32 id;}' into bpf_reg_state,
but please make sure that bpf_reg_state grows by 8 bytes only.
Simply adding bpf_active_lock in btf_id part of it will grow by 16.

Note the other idea you had to keep a pointer to active_lock in bpf_reg_state is
too dangerous and probably incorrect. That pointer might become dangling after we
copy_verifier_state. All pointers in bpf_reg_state need to be stable objects.
The only exception is 'parent' which is very tricky on its own.

Maybe we can do
struct bpf_active_lock {
   u32 map_btf_id;  /* instead of void *ptr */
   u32 id;
};
and init map_btf_id with -map->id or btf->id;
It's guaranteed not to overlap and != 0.
Don't know whether other pointers will be there. Maybe premature optimization.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-01-18  0:53 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-30  1:07 [PATCH bpf-next 1/2] bpf: Migrate release_on_unlock logic to non-owning ref semantics Alexei Starovoitov
2022-12-30  1:07 ` [PATCH bpf-next 2/2] selftests/bpf: Update linked_list tests for " Alexei Starovoitov
2022-12-30  1:13 ` [PATCH bpf-next 1/2] bpf: Migrate release_on_unlock logic to " Alexei Starovoitov
2023-01-17 22:42   ` Dave Marchevsky
2023-01-18  0:41     ` Alexei Starovoitov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).