All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 bpf-next 0/7] BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes
@ 2023-08-01 20:36 Dave Marchevsky
  2023-08-01 20:36 ` [PATCH v1 bpf-next 1/7] bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire Dave Marchevsky
                   ` (7 more replies)
  0 siblings, 8 replies; 25+ messages in thread
From: Dave Marchevsky @ 2023-08-01 20:36 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team, Dave Marchevsky

This series is the third of three (or more) followups to address issues
in the bpf_refcount shared ownership implementation discovered by Kumar.
This series addresses the use-after-free scenario described in [0]. The
first followup series ([1]) also attempted to address the same
use-after-free, but only got rid of the splat without addressing the
underlying issue. After this series the underyling issue is fixed and
bpf_refcount_acquire can be re-enabled.

The main fix here is migration of bpf_obj_drop to use
bpf_mem_free_rcu. To understand why this fixes the issue, let us consider
the example interleaving provided by Kumar in [0]:

CPU 0                                   CPU 1
n = bpf_obj_new
lock(lock1)
bpf_rbtree_add(rbtree1, n)
m = bpf_rbtree_acquire(n)
unlock(lock1)

kptr_xchg(map, m) // move to map
// at this point, refcount = 2
					m = kptr_xchg(map, NULL)
					lock(lock2)
lock(lock1)				bpf_rbtree_add(rbtree2, m)
p = bpf_rbtree_first(rbtree1)			if (!RB_EMPTY_NODE) bpf_obj_drop_impl(m) // A
bpf_rbtree_remove(rbtree1, p)
unlock(lock1)
bpf_obj_drop(p) // B
					bpf_refcount_acquire(m) // use-after-free
					...

Before this series, bpf_obj_drop returns memory to the allocator using
bpf_mem_free. At this point (B in the example) there might be some
non-owning references to that memory which the verifier believes are valid,
but where the underlying memory was reused for some other allocation.
Commit 7793fc3babe9 ("bpf: Make bpf_refcount_acquire fallible for
non-owning refs") attempted to fix this by doing refcount_inc_non_zero
on refcount_acquire in instead of refcount_inc under the assumption that
preventing erroneous incr-on-0 would be sufficient. This isn't true,
though: refcount_inc_non_zero must *check* if the refcount is zero, and
the memory it's checking could have been reused, so the check may look
at and incr random reused bytes.

If we wait to reuse this memory until all non-owning refs that could
point to it are gone, there is no possibility of this scenario
happening. Migrating bpf_obj_drop to use bpf_mem_free_rcu for refcounted
nodes accomplishes this.

For such nodes, the validity of their underlying memory is now tied to
RCU Tasks Trace critical section. This matches MEM_RCU trustedness
expectations, so the series takes the opportunity to more explicitly 
mark this trustedness state.

The functional effects of trustedness changes here are rather small.
This is largely due to local kptrs having separate verifier handling -
with implicit trustedness assumptions - than arbitrary kptrs.
Regardless, let's take the opportunity to move towards a world where
trustedness is more explictly handled.

Summary of patch contents, with sub-bullets being leading questions and
comments I think are worth reviewer attention:

  * Patches 1 and 2 are moreso documententation - and
    enforcement, in patch 1's case - of existing semantics / expectations

  * Patch 3 changes bpf_obj_drop behavior for refcounted nodes such that
    their underlying memory is not reused until RCU grace period elapses
    * Perhaps it makes sense to move to mem_free_rcu for _all_
      non-owning refs in the future, not just refcounted. This might
      allow custom non-owning ref lifetime + invalidation logic to be
      entirely subsumed by MEM_RCU handling. IMO this needs a bit more
      thought and should be tackled outside of a fix series, so it's not
      attempted here.

  * Patch 4 re-enables bpf_refcount_acquire as changes in patch 3 fix
    the remaining use-after-free
    * One might expect this patch to be last in the series, or last
      before selftest changes. Patches 5 and 6 don't change
      verification or runtime behavior for existing BPF progs, though.

  * Patch 5 brings the verifier's understanding of refcounted node
    trustedness in line with Patch 4's changes

  * Patch 6 allows some bpf_spin_{lock, unlock} calls in sleepable
    progs. Marked RFC for a few reasons:
    * bpf_spin_{lock,unlock} haven't been usable in sleepable progs
      since before the introduction of bpf linked list and rbtree. As
      such this feels more like a new feature that may not belong in
      this fixes series.
    * If we do want to allow bpf_spin_{lock,unlock} in sleepable progs,
      Alexei has expressed a preference for that do be done as part of a
      broader set of changes to verifier determination of where those
      helpers can be called, and what can be called inside the spin_lock
      CS.
    * I'm unsure whether preemption needs to be disabled in this patch
      as well.

  * Patch 7 adds tests

  [0]: https://lore.kernel.org/bpf/atfviesiidev4hu53hzravmtlau3wdodm2vqs7rd7tnwft34e3@xktodqeqevir/
  [1]: https://lore.kernel.org/bpf/20230602022647.1571784-1-davemarchevsky@fb.com/

Dave Marchevsky (7):
  bpf: Ensure kptr_struct_meta is non-NULL for collection insert and
    refcount_acquire
  bpf: Consider non-owning refs trusted
  bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes
  bpf: Reenable bpf_refcount_acquire
  bpf: Consider non-owning refs to refcounted nodes RCU protected
  [RFC] bpf: Allow bpf_spin_{lock,unlock} in sleepable prog's RCU CS
  selftests/bpf: Add tests for rbtree API interaction in sleepable progs

 include/linux/bpf.h                           |  3 +-
 include/linux/bpf_verifier.h                  |  2 +-
 kernel/bpf/helpers.c                          |  6 ++-
 kernel/bpf/verifier.c                         | 45 ++++++++++++-----
 .../bpf/prog_tests/refcounted_kptr.c          | 26 ++++++++++
 .../selftests/bpf/progs/refcounted_kptr.c     | 37 ++++++++++++++
 .../bpf/progs/refcounted_kptr_fail.c          | 48 +++++++++++++++++++
 7 files changed, 153 insertions(+), 14 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v1 bpf-next 1/7] bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire
  2023-08-01 20:36 [PATCH v1 bpf-next 0/7] BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes Dave Marchevsky
@ 2023-08-01 20:36 ` Dave Marchevsky
  2023-08-02  3:57   ` Yonghong Song
  2023-08-01 20:36 ` [PATCH v1 bpf-next 2/7] bpf: Consider non-owning refs trusted Dave Marchevsky
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Dave Marchevsky @ 2023-08-01 20:36 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team, Dave Marchevsky

It's straightforward to prove that kptr_struct_meta must be non-NULL for
any valid call to these kfuncs:

  * btf_parse_struct_metas in btf.c creates a btf_struct_meta for any
    struct in user BTF with a special field (e.g. bpf_refcount,
    {rb,list}_node). These are stored in that BTF's struct_meta_tab.

  * __process_kf_arg_ptr_to_graph_node in verifier.c ensures that nodes
    have {rb,list}_node field and that it's at the correct offset.
    Similarly, check_kfunc_args ensures bpf_refcount field existence for
    node param to bpf_refcount_acquire.

  * So a btf_struct_meta must have been created for the struct type of
    node param to these kfuncs

  * That BTF and its struct_meta_tab are guaranteed to still be around.
    Any arbitrary {rb,list} node the BPF program interacts with either:
    came from bpf_obj_new or a collection removal kfunc in the same
    program, in which case the BTF is associated with the program and
    still around; or came from bpf_kptr_xchg, in which case the BTF was
    associated with the map and is still around

Instead of silently continuing with NULL struct_meta, which caused
confusing bugs such as those addressed by commit 2140a6e3422d ("bpf: Set
kptr_struct_meta for node param to list and rbtree insert funcs"), let's
error out. Then, at runtime, we can confidently say that the
implementations of these kfuncs were given a non-NULL kptr_struct_meta,
meaning that special-field-specific functionality like
bpf_obj_free_fields and the bpf_obj_drop change introduced later in this
series are guaranteed to execute.

This patch doesn't change functionality, just makes it easier to reason
about existing functionality.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 kernel/bpf/verifier.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index e7b1af016841..ec37e84a11c6 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -18271,6 +18271,13 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
 		struct bpf_insn addr[2] = { BPF_LD_IMM64(BPF_REG_2, (long)kptr_struct_meta) };
 
+		if (desc->func_id == special_kfunc_list[KF_bpf_refcount_acquire_impl] &&
+		    !kptr_struct_meta) {
+			verbose(env, "verifier internal error: kptr_struct_meta expected at insn_idx %d\n",
+				insn_idx);
+			return -EFAULT;
+		}
+
 		insn_buf[0] = addr[0];
 		insn_buf[1] = addr[1];
 		insn_buf[2] = *insn;
@@ -18278,6 +18285,7 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 	} else if (desc->func_id == special_kfunc_list[KF_bpf_list_push_back_impl] ||
 		   desc->func_id == special_kfunc_list[KF_bpf_list_push_front_impl] ||
 		   desc->func_id == special_kfunc_list[KF_bpf_rbtree_add_impl]) {
+		struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
 		int struct_meta_reg = BPF_REG_3;
 		int node_offset_reg = BPF_REG_4;
 
@@ -18287,6 +18295,12 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			node_offset_reg = BPF_REG_5;
 		}
 
+		if (!kptr_struct_meta) {
+			verbose(env, "verifier internal error: kptr_struct_meta expected at insn_idx %d\n",
+				insn_idx);
+			return -EFAULT;
+		}
+
 		__fixup_collection_insert_kfunc(&env->insn_aux_data[insn_idx], struct_meta_reg,
 						node_offset_reg, insn, insn_buf, cnt);
 	} else if (desc->func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx] ||
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v1 bpf-next 2/7] bpf: Consider non-owning refs trusted
  2023-08-01 20:36 [PATCH v1 bpf-next 0/7] BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes Dave Marchevsky
  2023-08-01 20:36 ` [PATCH v1 bpf-next 1/7] bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire Dave Marchevsky
@ 2023-08-01 20:36 ` Dave Marchevsky
  2023-08-02  4:11   ` Yonghong Song
  2023-08-01 20:36 ` [PATCH v1 bpf-next 3/7] bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes Dave Marchevsky
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Dave Marchevsky @ 2023-08-01 20:36 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team, Dave Marchevsky

Recent discussions around default kptr "trustedness" led to changes such
as commit 6fcd486b3a0a ("bpf: Refactor RCU enforcement in the
verifier."). One of the conclusions of those discussions, as expressed
in code and comments in that patch, is that we'd like to move away from
'raw' PTR_TO_BTF_ID without some type flag or other register state
indicating trustedness. Although PTR_TRUSTED and PTR_UNTRUSTED flags mark
this state explicitly, the verifier currently considers trustedness
implied by other register state. For example, owning refs to graph
collection nodes must have a nonzero ref_obj_id, so they pass the
is_trusted_reg check despite having no explicit PTR_{UN}TRUSTED flag.
This patch makes trustedness of non-owning refs to graph collection
nodes explicit as well.

By definition, non-owning refs are currently trusted. Although the ref
has no control over pointee lifetime, due to non-owning ref clobbering
rules (see invalidate_non_owning_refs) dereferencing a non-owning ref is
safe in the critical section controlled by bpf_spin_lock associated with
its owning collection.

Note that the previous statement does not hold true for nodes with shared
ownership due to the use-after-free issue that this series is
addressing. True shared ownership was disabled by commit 7deca5eae833
("bpf: Disable bpf_refcount_acquire kfunc calls until race conditions are fixed"),
though, so the statement holds for now. Further patches in the series will change
the trustedness state of non-owning refs before re-enabling
bpf_refcount_acquire.

Let's add NON_OWN_REF type flag to BPF_REG_TRUSTED_MODIFIERS such that a
non-owning ref reg state would pass is_trusted_reg check. Somewhat
surprisingly, this doesn't result in any change to user-visible
functionality elsewhere in the verifier: graph collection nodes are all
marked MEM_ALLOC, which tends to be handled in separate codepaths from
"raw" PTR_TO_BTF_ID. Regardless, let's be explicit here and document the
current state of things before changing it elsewhere in the series.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 include/linux/bpf_verifier.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index f70f9ac884d2..b6e58dab8e27 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -745,7 +745,7 @@ static inline bool bpf_prog_check_recur(const struct bpf_prog *prog)
 	}
 }
 
-#define BPF_REG_TRUSTED_MODIFIERS (MEM_ALLOC | PTR_TRUSTED)
+#define BPF_REG_TRUSTED_MODIFIERS (MEM_ALLOC | PTR_TRUSTED | NON_OWN_REF)
 
 static inline bool bpf_type_has_unsafe_modifiers(u32 type)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v1 bpf-next 3/7] bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes
  2023-08-01 20:36 [PATCH v1 bpf-next 0/7] BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes Dave Marchevsky
  2023-08-01 20:36 ` [PATCH v1 bpf-next 1/7] bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire Dave Marchevsky
  2023-08-01 20:36 ` [PATCH v1 bpf-next 2/7] bpf: Consider non-owning refs trusted Dave Marchevsky
@ 2023-08-01 20:36 ` Dave Marchevsky
  2023-08-02  4:15   ` Yonghong Song
  2023-08-01 20:36 ` [PATCH v1 bpf-next 4/7] bpf: Reenable bpf_refcount_acquire Dave Marchevsky
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Dave Marchevsky @ 2023-08-01 20:36 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team, Dave Marchevsky

This is the final fix for the use-after-free scenario described in
commit 7793fc3babe9 ("bpf: Make bpf_refcount_acquire fallible for
non-owning refs"). That commit, by virtue of changing
bpf_refcount_acquire's refcount_inc to a refcount_inc_not_zero, fixed
the "refcount incr on 0" splat. The not_zero check in
refcount_inc_not_zero, though, still occurs on memory that could have
been free'd and reused, so the commit didn't properly fix the root
cause.

This patch actually fixes the issue by free'ing using the recently-added
bpf_mem_free_rcu, which ensures that the memory is not reused until
RCU Tasks Trace grace period has elapsed. If that has happened then
there are no non-owning references alive that point to the
recently-free'd memory, so it can be safely reused.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 kernel/bpf/helpers.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 56ce5008aedd..7702f657fa3f 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1913,7 +1913,11 @@ void __bpf_obj_drop_impl(void *p, const struct btf_record *rec)
 
 	if (rec)
 		bpf_obj_free_fields(rec, p);
-	bpf_mem_free(&bpf_global_ma, p);
+
+	if (rec && rec->refcount_off >= 0)
+		bpf_mem_free_rcu(&bpf_global_ma, p);
+	else
+		bpf_mem_free(&bpf_global_ma, p);
 }
 
 __bpf_kfunc void bpf_obj_drop_impl(void *p__alloc, void *meta__ign)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v1 bpf-next 4/7] bpf: Reenable bpf_refcount_acquire
  2023-08-01 20:36 [PATCH v1 bpf-next 0/7] BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes Dave Marchevsky
                   ` (2 preceding siblings ...)
  2023-08-01 20:36 ` [PATCH v1 bpf-next 3/7] bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes Dave Marchevsky
@ 2023-08-01 20:36 ` Dave Marchevsky
  2023-08-02  5:21   ` Yonghong Song
  2023-08-01 20:36 ` [PATCH v1 bpf-next 5/7] bpf: Consider non-owning refs to refcounted nodes RCU protected Dave Marchevsky
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Dave Marchevsky @ 2023-08-01 20:36 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team, Dave Marchevsky

Now that all reported issues are fixed, bpf_refcount_acquire can be
turned back on. Also reenable all bpf_refcount-related tests which were
disabled.

This a revert of:
 * commit f3514a5d6740 ("selftests/bpf: Disable newly-added 'owner' field test until refcount re-enabled")
 * commit 7deca5eae833 ("bpf: Disable bpf_refcount_acquire kfunc calls until race conditions are fixed")

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 kernel/bpf/verifier.c                         |  5 +---
 .../bpf/prog_tests/refcounted_kptr.c          | 26 +++++++++++++++++++
 2 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index ec37e84a11c6..9014b469dd9d 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -11222,10 +11222,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 				verbose(env, "arg#%d doesn't point to a type with bpf_refcount field\n", i);
 				return -EINVAL;
 			}
-			if (rec->refcount_off >= 0) {
-				verbose(env, "bpf_refcount_acquire calls are disabled for now\n");
-				return -EINVAL;
-			}
+
 			meta->arg_btf = reg->btf;
 			meta->arg_btf_id = reg->btf_id;
 			break;
diff --git a/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c b/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c
index 7423983472c7..d6bd5e16e637 100644
--- a/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c
+++ b/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c
@@ -9,12 +9,38 @@
 
 void test_refcounted_kptr(void)
 {
+	RUN_TESTS(refcounted_kptr);
 }
 
 void test_refcounted_kptr_fail(void)
 {
+	RUN_TESTS(refcounted_kptr_fail);
 }
 
 void test_refcounted_kptr_wrong_owner(void)
 {
+	LIBBPF_OPTS(bpf_test_run_opts, opts,
+		    .data_in = &pkt_v4,
+		    .data_size_in = sizeof(pkt_v4),
+		    .repeat = 1,
+	);
+	struct refcounted_kptr *skel;
+	int ret;
+
+	skel = refcounted_kptr__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "refcounted_kptr__open_and_load"))
+		return;
+
+	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.rbtree_wrong_owner_remove_fail_a1), &opts);
+	ASSERT_OK(ret, "rbtree_wrong_owner_remove_fail_a1");
+	ASSERT_OK(opts.retval, "rbtree_wrong_owner_remove_fail_a1 retval");
+
+	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.rbtree_wrong_owner_remove_fail_b), &opts);
+	ASSERT_OK(ret, "rbtree_wrong_owner_remove_fail_b");
+	ASSERT_OK(opts.retval, "rbtree_wrong_owner_remove_fail_b retval");
+
+	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.rbtree_wrong_owner_remove_fail_a2), &opts);
+	ASSERT_OK(ret, "rbtree_wrong_owner_remove_fail_a2");
+	ASSERT_OK(opts.retval, "rbtree_wrong_owner_remove_fail_a2 retval");
+	refcounted_kptr__destroy(skel);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v1 bpf-next 5/7] bpf: Consider non-owning refs to refcounted nodes RCU protected
  2023-08-01 20:36 [PATCH v1 bpf-next 0/7] BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes Dave Marchevsky
                   ` (3 preceding siblings ...)
  2023-08-01 20:36 ` [PATCH v1 bpf-next 4/7] bpf: Reenable bpf_refcount_acquire Dave Marchevsky
@ 2023-08-01 20:36 ` Dave Marchevsky
  2023-08-02  5:59   ` Yonghong Song
  2023-08-02 22:50   ` Alexei Starovoitov
  2023-08-01 20:36 ` [PATCH v1 bpf-next 6/7] [RFC] bpf: Allow bpf_spin_{lock,unlock} in sleepable prog's RCU CS Dave Marchevsky
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 25+ messages in thread
From: Dave Marchevsky @ 2023-08-01 20:36 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team, Dave Marchevsky

The previous patch in the series ensures that the underlying memory of
nodes with bpf_refcount - which can have multiple owners - is not reused
until RCU Tasks Trace grace period has elapsed. This prevents
use-after-free with non-owning references that may point to
recently-freed memory. While RCU read lock is held, it's safe to
dereference such a non-owning ref, as by definition RCU GP couldn't have
elapsed and therefore underlying memory couldn't have been reused.

From the perspective of verifier "trustedness" non-owning refs to
refcounted nodes are now trusted only in RCU CS and therefore should no
longer pass is_trusted_reg, but rather is_rcu_reg. Let's mark them
MEM_RCU in order to reflect this new state.

Similarly to bpf_spin_unlock being a non-owning ref invalidation point,
where non-owning ref reg states are clobbered so that they cannot be
used outside of the critical section, currently all MEM_RCU regs are
marked untrusted after bpf_rcu_read_unlock. This patch makes
bpf_rcu_read_unlock a non-owning ref invalidation point as well,
clobbering the non-owning refs instead of marking untrusted. In the
future we may want to allow untrusted non-owning refs in which case we
can remove this custom logic without breaking BPF programs as it's more
restrictive than the default. That's a big change in semantics, though,
and this series is focused on fixing the use-after-free in most
straightforward way.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 include/linux/bpf.h   |  3 ++-
 kernel/bpf/verifier.c | 17 +++++++++++++++--
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index ceaa8c23287f..37fba01b061a 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -653,7 +653,8 @@ enum bpf_type_flag {
 	MEM_RCU			= BIT(13 + BPF_BASE_TYPE_BITS),
 
 	/* Used to tag PTR_TO_BTF_ID | MEM_ALLOC references which are non-owning.
-	 * Currently only valid for linked-list and rbtree nodes.
+	 * Currently only valid for linked-list and rbtree nodes. If the nodes
+	 * have a bpf_refcount_field, they must be tagged MEM_RCU as well.
 	 */
 	NON_OWN_REF		= BIT(14 + BPF_BASE_TYPE_BITS),
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 9014b469dd9d..4bda365000d3 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -469,7 +469,8 @@ static bool type_is_ptr_alloc_obj(u32 type)
 
 static bool type_is_non_owning_ref(u32 type)
 {
-	return type_is_ptr_alloc_obj(type) && type_flag(type) & NON_OWN_REF;
+	return type_is_ptr_alloc_obj(type) &&
+		type_flag(type) & NON_OWN_REF;
 }
 
 static struct btf_record *reg_btf_record(const struct bpf_reg_state *reg)
@@ -8012,6 +8013,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
 	case PTR_TO_BTF_ID | PTR_TRUSTED:
 	case PTR_TO_BTF_ID | MEM_RCU:
 	case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF:
+	case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU:
 		/* When referenced PTR_TO_BTF_ID is passed to release function,
 		 * its fixed offset must be 0. In the other cases, fixed offset
 		 * can be non-zero. This was already checked above. So pass
@@ -10478,6 +10480,7 @@ static int process_kf_arg_ptr_to_btf_id(struct bpf_verifier_env *env,
 static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
 {
 	struct bpf_verifier_state *state = env->cur_state;
+	struct btf_record *rec = reg_btf_record(reg);
 
 	if (!state->active_lock.ptr) {
 		verbose(env, "verifier internal error: ref_set_non_owning w/o active lock\n");
@@ -10490,6 +10493,9 @@ static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state
 	}
 
 	reg->type |= NON_OWN_REF;
+	if (rec->refcount_off >= 0)
+		reg->type |= MEM_RCU;
+
 	return 0;
 }
 
@@ -11327,10 +11333,16 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		struct bpf_func_state *state;
 		struct bpf_reg_state *reg;
 
+		if (in_rbtree_lock_required_cb(env) && (rcu_lock || rcu_unlock)) {
+			verbose(env, "can't rcu read {lock,unlock} in rbtree cb\n");
+			return -EACCES;
+		}
+
 		if (rcu_lock) {
 			verbose(env, "nested rcu read lock (kernel function %s)\n", func_name);
 			return -EINVAL;
 		} else if (rcu_unlock) {
+			invalidate_non_owning_refs(env);
 			bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
 				if (reg->type & MEM_RCU) {
 					reg->type &= ~(MEM_RCU | PTR_MAYBE_NULL);
@@ -16679,7 +16691,8 @@ static int do_check(struct bpf_verifier_env *env)
 					return -EINVAL;
 				}
 
-				if (env->cur_state->active_rcu_lock) {
+				if (env->cur_state->active_rcu_lock &&
+				    !in_rbtree_lock_required_cb(env)) {
 					verbose(env, "bpf_rcu_read_unlock is missing\n");
 					return -EINVAL;
 				}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v1 bpf-next 6/7] [RFC] bpf: Allow bpf_spin_{lock,unlock} in sleepable prog's RCU CS
  2023-08-01 20:36 [PATCH v1 bpf-next 0/7] BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes Dave Marchevsky
                   ` (4 preceding siblings ...)
  2023-08-01 20:36 ` [PATCH v1 bpf-next 5/7] bpf: Consider non-owning refs to refcounted nodes RCU protected Dave Marchevsky
@ 2023-08-01 20:36 ` Dave Marchevsky
  2023-08-02  6:33   ` Yonghong Song
  2023-08-02 22:55   ` Alexei Starovoitov
  2023-08-01 20:36 ` [PATCH v1 bpf-next 7/7] selftests/bpf: Add tests for rbtree API interaction in sleepable progs Dave Marchevsky
  2023-08-02  3:07 ` [PATCH v1 bpf-next 0/7] BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes Yonghong Song
  7 siblings, 2 replies; 25+ messages in thread
From: Dave Marchevsky @ 2023-08-01 20:36 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team, Dave Marchevsky

Commit 9e7a4d9831e8 ("bpf: Allow LSM programs to use bpf spin locks")
disabled bpf_spin_lock usage in sleepable progs, stating:

 Sleepable LSM programs can be preempted which means that allowng spin
 locks will need more work (disabling preemption and the verifier
 ensuring that no sleepable helpers are called when a spin lock is
 held).

It seems that some of this 'ensuring that no sleepable helpers are
called' was done for RCU critical section in commit 9bb00b2895cb ("bpf:
Add kfunc bpf_rcu_read_lock/unlock()"), specifically the check which
fails with verbose "sleepable helper %s#%d in rcu_read_lock region"
message in check_helper_call and similar in check_kfunc_call. These
checks prevent sleepable helper and kfunc calls in RCU critical
sections. Accordingly, it should be safe to allow bpf_spin_{lock,unlock}
in RCU CS. This patch does so, replacing the broad "sleepable progs cannot use
bpf_spin_lock yet" check with a more targeted !in_rcu_cs.

[
  RFC: Does preemption still need to be disabled here?
]

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 kernel/bpf/verifier.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 4bda365000d3..d1b8e8964aec 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -8270,6 +8270,10 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 			verbose(env, "can't spin_{lock,unlock} in rbtree cb\n");
 			return -EACCES;
 		}
+		if (!in_rcu_cs(env)) {
+			verbose(env, "sleepable progs may only spin_{lock,unlock} in RCU CS\n");
+			return -EACCES;
+		}
 		if (meta->func_id == BPF_FUNC_spin_lock) {
 			err = process_spin_lock(env, regno, true);
 			if (err)
@@ -16972,11 +16976,6 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
 			verbose(env, "tracing progs cannot use bpf_spin_lock yet\n");
 			return -EINVAL;
 		}
-
-		if (prog->aux->sleepable) {
-			verbose(env, "sleepable progs cannot use bpf_spin_lock yet\n");
-			return -EINVAL;
-		}
 	}
 
 	if (btf_record_has_field(map->record, BPF_TIMER)) {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v1 bpf-next 7/7] selftests/bpf: Add tests for rbtree API interaction in sleepable progs
  2023-08-01 20:36 [PATCH v1 bpf-next 0/7] BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes Dave Marchevsky
                   ` (5 preceding siblings ...)
  2023-08-01 20:36 ` [PATCH v1 bpf-next 6/7] [RFC] bpf: Allow bpf_spin_{lock,unlock} in sleepable prog's RCU CS Dave Marchevsky
@ 2023-08-01 20:36 ` Dave Marchevsky
  2023-08-02 23:07   ` Alexei Starovoitov
  2023-08-02  3:07 ` [PATCH v1 bpf-next 0/7] BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes Yonghong Song
  7 siblings, 1 reply; 25+ messages in thread
From: Dave Marchevsky @ 2023-08-01 20:36 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team, Dave Marchevsky

Confirm that the following sleepable prog states fail verification:
  * bpf_rcu_read_unlock before bpf_spin_unlock
     * Breaks "spin_{lock,unlock} only allowed in RCU CS". Even if
       bpf_rcu_read_lock is called again before bpf_spin_unlock this
       should fail.
  * bpf_spin_lock outside RCU critical section

Also confirm that correct usage passes verification.

None of the selftest progs actually attach to bpf_testmod's
bpf_testmod_test_read.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 .../selftests/bpf/progs/refcounted_kptr.c     | 37 ++++++++++++++
 .../bpf/progs/refcounted_kptr_fail.c          | 48 +++++++++++++++++++
 2 files changed, 85 insertions(+)

diff --git a/tools/testing/selftests/bpf/progs/refcounted_kptr.c b/tools/testing/selftests/bpf/progs/refcounted_kptr.c
index c55652fdc63a..f7ab2711fea8 100644
--- a/tools/testing/selftests/bpf/progs/refcounted_kptr.c
+++ b/tools/testing/selftests/bpf/progs/refcounted_kptr.c
@@ -8,6 +8,9 @@
 #include "bpf_misc.h"
 #include "bpf_experimental.h"
 
+extern void bpf_rcu_read_lock(void) __ksym;
+extern void bpf_rcu_read_unlock(void) __ksym;
+
 struct node_data {
 	long key;
 	long list_data;
@@ -497,4 +500,38 @@ long rbtree_wrong_owner_remove_fail_a2(void *ctx)
 	return 0;
 }
 
+SEC("?fentry.s/bpf_testmod_test_read")
+__success
+int BPF_PROG(rbtree_sleepable_rcu,
+	     struct file *file, struct kobject *kobj,
+	     struct bin_attribute *bin_attr, char *buf, loff_t off, size_t len)
+{
+	struct bpf_rb_node *rb;
+	struct node_data *n, *m = NULL;
+
+	n = bpf_obj_new(typeof(*n));
+	if (!n)
+		return 0;
+
+	bpf_rcu_read_lock();
+	bpf_spin_lock(&lock);
+	bpf_rbtree_add(&root, &n->r, less);
+	rb = bpf_rbtree_first(&root);
+	if (!rb)
+		goto err_out;
+
+	rb = bpf_rbtree_remove(&root, rb);
+	if (!rb)
+		goto err_out;
+
+	m = container_of(rb, struct node_data, r);
+
+err_out:
+	bpf_spin_unlock(&lock);
+	bpf_rcu_read_unlock();
+	if (m)
+		bpf_obj_drop(m);
+	return 0;
+}
+
 char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/refcounted_kptr_fail.c b/tools/testing/selftests/bpf/progs/refcounted_kptr_fail.c
index 0b09e5c915b1..0a75d914e0f9 100644
--- a/tools/testing/selftests/bpf/progs/refcounted_kptr_fail.c
+++ b/tools/testing/selftests/bpf/progs/refcounted_kptr_fail.c
@@ -13,6 +13,9 @@ struct node_acquire {
 	struct bpf_refcount refcount;
 };
 
+extern void bpf_rcu_read_lock(void) __ksym;
+extern void bpf_rcu_read_unlock(void) __ksym;
+
 #define private(name) SEC(".data." #name) __hidden __attribute__((aligned(8)))
 private(A) struct bpf_spin_lock glock;
 private(A) struct bpf_rb_root groot __contains(node_acquire, node);
@@ -71,4 +74,49 @@ long rbtree_refcounted_node_ref_escapes_owning_input(void *ctx)
 	return 0;
 }
 
+SEC("?fentry.s/bpf_testmod_test_read")
+__failure __msg("sleepable progs may only spin_{lock,unlock} in RCU CS")
+int BPF_PROG(rbtree_fail_sleepable_lock_no_rcu,
+	     struct file *file, struct kobject *kobj,
+	     struct bin_attribute *bin_attr, char *buf, loff_t off, size_t len)
+{
+	struct node_acquire *n;
+
+	n = bpf_obj_new(typeof(*n));
+	if (!n)
+		return 0;
+
+	/* no bpf_rcu_read_{lock,unlock} */
+	bpf_spin_lock(&glock);
+	bpf_rbtree_add(&groot, &n->node, less);
+	bpf_spin_unlock(&glock);
+
+	return 0;
+}
+
+SEC("?fentry.s/bpf_testmod_test_read")
+__failure __msg("function calls are not allowed while holding a lock")
+int BPF_PROG(rbtree_fail_sleepable_lock_across_rcu,
+	     struct file *file, struct kobject *kobj,
+	     struct bin_attribute *bin_attr, char *buf, loff_t off, size_t len)
+{
+	struct node_acquire *n;
+
+	n = bpf_obj_new(typeof(*n));
+	if (!n)
+		return 0;
+
+	/* spin_{lock,unlock} are in different RCU CS */
+	bpf_rcu_read_lock();
+	bpf_spin_lock(&glock);
+	bpf_rbtree_add(&groot, &n->node, less);
+	bpf_rcu_read_unlock();
+
+	bpf_rcu_read_lock();
+	bpf_spin_unlock(&glock);
+	bpf_rcu_read_unlock();
+
+	return 0;
+}
+
 char _license[] SEC("license") = "GPL";
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 bpf-next 0/7] BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes
  2023-08-01 20:36 [PATCH v1 bpf-next 0/7] BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes Dave Marchevsky
                   ` (6 preceding siblings ...)
  2023-08-01 20:36 ` [PATCH v1 bpf-next 7/7] selftests/bpf: Add tests for rbtree API interaction in sleepable progs Dave Marchevsky
@ 2023-08-02  3:07 ` Yonghong Song
  7 siblings, 0 replies; 25+ messages in thread
From: Yonghong Song @ 2023-08-02  3:07 UTC (permalink / raw)
  To: Dave Marchevsky, bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team



On 8/1/23 1:36 PM, Dave Marchevsky wrote:
> This series is the third of three (or more) followups to address issues
> in the bpf_refcount shared ownership implementation discovered by Kumar.
> This series addresses the use-after-free scenario described in [0]. The
> first followup series ([1]) also attempted to address the same
> use-after-free, but only got rid of the splat without addressing the
> underlying issue. After this series the underyling issue is fixed and
> bpf_refcount_acquire can be re-enabled.
> 
> The main fix here is migration of bpf_obj_drop to use
> bpf_mem_free_rcu. To understand why this fixes the issue, let us consider
> the example interleaving provided by Kumar in [0]:
> 
> CPU 0                                   CPU 1
> n = bpf_obj_new
> lock(lock1)
> bpf_rbtree_add(rbtree1, n)
> m = bpf_rbtree_acquire(n)
> unlock(lock1)
> 
> kptr_xchg(map, m) // move to map
> // at this point, refcount = 2
> 					m = kptr_xchg(map, NULL)
> 					lock(lock2)
> lock(lock1)				bpf_rbtree_add(rbtree2, m)

On the right column: bpf_rbtree_add(rbtree1, m) ?

> p = bpf_rbtree_first(rbtree1)			if (!RB_EMPTY_NODE) bpf_obj_drop_impl(m) // A
> bpf_rbtree_remove(rbtree1, p)
> unlock(lock1)
> bpf_obj_drop(p) // B
> 					bpf_refcount_acquire(m) // use-after-free
> 					...
> 
> Before this series, bpf_obj_drop returns memory to the allocator using
> bpf_mem_free. At this point (B in the example) there might be some
> non-owning references to that memory which the verifier believes are valid,
> but where the underlying memory was reused for some other allocation.
> Commit 7793fc3babe9 ("bpf: Make bpf_refcount_acquire fallible for
> non-owning refs") attempted to fix this by doing refcount_inc_non_zero
> on refcount_acquire in instead of refcount_inc under the assumption that
> preventing erroneous incr-on-0 would be sufficient. This isn't true,
> though: refcount_inc_non_zero must *check* if the refcount is zero, and
> the memory it's checking could have been reused, so the check may look
> at and incr random reused bytes.
> 
> If we wait to reuse this memory until all non-owning refs that could
> point to it are gone, there is no possibility of this scenario
> happening. Migrating bpf_obj_drop to use bpf_mem_free_rcu for refcounted
> nodes accomplishes this.
> 
> For such nodes, the validity of their underlying memory is now tied to
> RCU Tasks Trace critical section. This matches MEM_RCU trustedness
> expectations, so the series takes the opportunity to more explicitly
> mark this trustedness state.
> 
> The functional effects of trustedness changes here are rather small.
> This is largely due to local kptrs having separate verifier handling -
> with implicit trustedness assumptions - than arbitrary kptrs.
> Regardless, let's take the opportunity to move towards a world where
> trustedness is more explictly handled.
> 
> Summary of patch contents, with sub-bullets being leading questions and
> comments I think are worth reviewer attention:
> 
>    * Patches 1 and 2 are moreso documententation - and
>      enforcement, in patch 1's case - of existing semantics / expectations
> 
>    * Patch 3 changes bpf_obj_drop behavior for refcounted nodes such that
>      their underlying memory is not reused until RCU grace period elapses
>      * Perhaps it makes sense to move to mem_free_rcu for _all_
>        non-owning refs in the future, not just refcounted. This might
>        allow custom non-owning ref lifetime + invalidation logic to be
>        entirely subsumed by MEM_RCU handling. IMO this needs a bit more
>        thought and should be tackled outside of a fix series, so it's not
>        attempted here.
> 
>    * Patch 4 re-enables bpf_refcount_acquire as changes in patch 3 fix
>      the remaining use-after-free
>      * One might expect this patch to be last in the series, or last
>        before selftest changes. Patches 5 and 6 don't change
>        verification or runtime behavior for existing BPF progs, though.
> 
>    * Patch 5 brings the verifier's understanding of refcounted node
>      trustedness in line with Patch 4's changes
> 
>    * Patch 6 allows some bpf_spin_{lock, unlock} calls in sleepable
>      progs. Marked RFC for a few reasons:
>      * bpf_spin_{lock,unlock} haven't been usable in sleepable progs
>        since before the introduction of bpf linked list and rbtree. As
>        such this feels more like a new feature that may not belong in
>        this fixes series.
>      * If we do want to allow bpf_spin_{lock,unlock} in sleepable progs,
>        Alexei has expressed a preference for that do be done as part of a
>        broader set of changes to verifier determination of where those
>        helpers can be called, and what can be called inside the spin_lock
>        CS.
>      * I'm unsure whether preemption needs to be disabled in this patch
>        as well.
> 
>    * Patch 7 adds tests
> 
>    [0]: https://lore.kernel.org/bpf/atfviesiidev4hu53hzravmtlau3wdodm2vqs7rd7tnwft34e3@xktodqeqevir/
>    [1]: https://lore.kernel.org/bpf/20230602022647.1571784-1-davemarchevsky@fb.com/
> 
> Dave Marchevsky (7):
>    bpf: Ensure kptr_struct_meta is non-NULL for collection insert and
>      refcount_acquire
>    bpf: Consider non-owning refs trusted
>    bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes
>    bpf: Reenable bpf_refcount_acquire
>    bpf: Consider non-owning refs to refcounted nodes RCU protected
>    [RFC] bpf: Allow bpf_spin_{lock,unlock} in sleepable prog's RCU CS
>    selftests/bpf: Add tests for rbtree API interaction in sleepable progs
> 
>   include/linux/bpf.h                           |  3 +-
>   include/linux/bpf_verifier.h                  |  2 +-
>   kernel/bpf/helpers.c                          |  6 ++-
>   kernel/bpf/verifier.c                         | 45 ++++++++++++-----
>   .../bpf/prog_tests/refcounted_kptr.c          | 26 ++++++++++
>   .../selftests/bpf/progs/refcounted_kptr.c     | 37 ++++++++++++++
>   .../bpf/progs/refcounted_kptr_fail.c          | 48 +++++++++++++++++++
>   7 files changed, 153 insertions(+), 14 deletions(-)
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 bpf-next 1/7] bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire
  2023-08-01 20:36 ` [PATCH v1 bpf-next 1/7] bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire Dave Marchevsky
@ 2023-08-02  3:57   ` Yonghong Song
  2023-08-02 19:23     ` Dave Marchevsky
  0 siblings, 1 reply; 25+ messages in thread
From: Yonghong Song @ 2023-08-02  3:57 UTC (permalink / raw)
  To: Dave Marchevsky, bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team



On 8/1/23 1:36 PM, Dave Marchevsky wrote:
> It's straightforward to prove that kptr_struct_meta must be non-NULL for
> any valid call to these kfuncs:
> 
>    * btf_parse_struct_metas in btf.c creates a btf_struct_meta for any
>      struct in user BTF with a special field (e.g. bpf_refcount,
>      {rb,list}_node). These are stored in that BTF's struct_meta_tab.
> 
>    * __process_kf_arg_ptr_to_graph_node in verifier.c ensures that nodes
>      have {rb,list}_node field and that it's at the correct offset.
>      Similarly, check_kfunc_args ensures bpf_refcount field existence for
>      node param to bpf_refcount_acquire.
> 
>    * So a btf_struct_meta must have been created for the struct type of
>      node param to these kfuncs
> 
>    * That BTF and its struct_meta_tab are guaranteed to still be around.
>      Any arbitrary {rb,list} node the BPF program interacts with either:
>      came from bpf_obj_new or a collection removal kfunc in the same
>      program, in which case the BTF is associated with the program and
>      still around; or came from bpf_kptr_xchg, in which case the BTF was
>      associated with the map and is still around
> 
> Instead of silently continuing with NULL struct_meta, which caused
> confusing bugs such as those addressed by commit 2140a6e3422d ("bpf: Set
> kptr_struct_meta for node param to list and rbtree insert funcs"), let's
> error out. Then, at runtime, we can confidently say that the
> implementations of these kfuncs were given a non-NULL kptr_struct_meta,
> meaning that special-field-specific functionality like
> bpf_obj_free_fields and the bpf_obj_drop change introduced later in this
> series are guaranteed to execute.

The subject says '... for collection insert and refcount_acquire'.
Why picks these? We could check for all kptr_struct_meta use cases?

> 
> This patch doesn't change functionality, just makes it easier to reason
> about existing functionality.
> 
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> ---
>   kernel/bpf/verifier.c | 14 ++++++++++++++
>   1 file changed, 14 insertions(+)
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index e7b1af016841..ec37e84a11c6 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -18271,6 +18271,13 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>   		struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
>   		struct bpf_insn addr[2] = { BPF_LD_IMM64(BPF_REG_2, (long)kptr_struct_meta) };
>   
> +		if (desc->func_id == special_kfunc_list[KF_bpf_refcount_acquire_impl] &&

Why check for KF_bpf_refcount_acquire_impl? We can cover all cases in 
this 'if' branch, right?

> +		    !kptr_struct_meta) {
> +			verbose(env, "verifier internal error: kptr_struct_meta expected at insn_idx %d\n",
> +				insn_idx);
> +			return -EFAULT;
> +		}
> +
>   		insn_buf[0] = addr[0];
>   		insn_buf[1] = addr[1];
>   		insn_buf[2] = *insn;
> @@ -18278,6 +18285,7 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>   	} else if (desc->func_id == special_kfunc_list[KF_bpf_list_push_back_impl] ||
>   		   desc->func_id == special_kfunc_list[KF_bpf_list_push_front_impl] ||
>   		   desc->func_id == special_kfunc_list[KF_bpf_rbtree_add_impl]) {
> +		struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
>   		int struct_meta_reg = BPF_REG_3;
>   		int node_offset_reg = BPF_REG_4;
>   
> @@ -18287,6 +18295,12 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>   			node_offset_reg = BPF_REG_5;
>   		}
>   
> +		if (!kptr_struct_meta) {
> +			verbose(env, "verifier internal error: kptr_struct_meta expected at insn_idx %d\n",
> +				insn_idx);
> +			return -EFAULT;
> +		}
> +
>   		__fixup_collection_insert_kfunc(&env->insn_aux_data[insn_idx], struct_meta_reg,
>   						node_offset_reg, insn, insn_buf, cnt);
>   	} else if (desc->func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx] ||

In my opinion, such selective defensive programming is not necessary. By 
searching kptr_struct_meta in the code, it is reasonably easy to find
whether we have any mismatch or not. Also self test coverage should
cover these cases (probably already) right?

If the defensive programming here is still desirable to warn at 
verification time, I think we should just check all of uses for 
kptr_struct_meta.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 bpf-next 2/7] bpf: Consider non-owning refs trusted
  2023-08-01 20:36 ` [PATCH v1 bpf-next 2/7] bpf: Consider non-owning refs trusted Dave Marchevsky
@ 2023-08-02  4:11   ` Yonghong Song
  0 siblings, 0 replies; 25+ messages in thread
From: Yonghong Song @ 2023-08-02  4:11 UTC (permalink / raw)
  To: Dave Marchevsky, bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team



On 8/1/23 1:36 PM, Dave Marchevsky wrote:
> Recent discussions around default kptr "trustedness" led to changes such
> as commit 6fcd486b3a0a ("bpf: Refactor RCU enforcement in the
> verifier."). One of the conclusions of those discussions, as expressed
> in code and comments in that patch, is that we'd like to move away from
> 'raw' PTR_TO_BTF_ID without some type flag or other register state
> indicating trustedness. Although PTR_TRUSTED and PTR_UNTRUSTED flags mark
> this state explicitly, the verifier currently considers trustedness
> implied by other register state. For example, owning refs to graph
> collection nodes must have a nonzero ref_obj_id, so they pass the
> is_trusted_reg check despite having no explicit PTR_{UN}TRUSTED flag.
> This patch makes trustedness of non-owning refs to graph collection
> nodes explicit as well.
> 
> By definition, non-owning refs are currently trusted. Although the ref
> has no control over pointee lifetime, due to non-owning ref clobbering
> rules (see invalidate_non_owning_refs) dereferencing a non-owning ref is
> safe in the critical section controlled by bpf_spin_lock associated with
> its owning collection.
> 
> Note that the previous statement does not hold true for nodes with shared
> ownership due to the use-after-free issue that this series is
> addressing. True shared ownership was disabled by commit 7deca5eae833
> ("bpf: Disable bpf_refcount_acquire kfunc calls until race conditions are fixed"),
> though, so the statement holds for now. Further patches in the series will change
> the trustedness state of non-owning refs before re-enabling
> bpf_refcount_acquire.
> 
> Let's add NON_OWN_REF type flag to BPF_REG_TRUSTED_MODIFIERS such that a
> non-owning ref reg state would pass is_trusted_reg check. Somewhat
> surprisingly, this doesn't result in any change to user-visible
> functionality elsewhere in the verifier: graph collection nodes are all
> marked MEM_ALLOC, which tends to be handled in separate codepaths from
> "raw" PTR_TO_BTF_ID. Regardless, let's be explicit here and document the
> current state of things before changing it elsewhere in the series.
> 
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>

Acked-by: Yonghong Song <yonghong.song@linux.dev>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 bpf-next 3/7] bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes
  2023-08-01 20:36 ` [PATCH v1 bpf-next 3/7] bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes Dave Marchevsky
@ 2023-08-02  4:15   ` Yonghong Song
  0 siblings, 0 replies; 25+ messages in thread
From: Yonghong Song @ 2023-08-02  4:15 UTC (permalink / raw)
  To: Dave Marchevsky, bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team



On 8/1/23 1:36 PM, Dave Marchevsky wrote:
> This is the final fix for the use-after-free scenario described in
> commit 7793fc3babe9 ("bpf: Make bpf_refcount_acquire fallible for
> non-owning refs"). That commit, by virtue of changing
> bpf_refcount_acquire's refcount_inc to a refcount_inc_not_zero, fixed
> the "refcount incr on 0" splat. The not_zero check in
> refcount_inc_not_zero, though, still occurs on memory that could have
> been free'd and reused, so the commit didn't properly fix the root
> cause.
> 
> This patch actually fixes the issue by free'ing using the recently-added
> bpf_mem_free_rcu, which ensures that the memory is not reused until
> RCU Tasks Trace grace period has elapsed. If that has happened then
> there are no non-owning references alive that point to the
> recently-free'd memory, so it can be safely reused.
> 
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>

Acked-by: Yonghong Song <yonghong.song@linux.dev>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 bpf-next 4/7] bpf: Reenable bpf_refcount_acquire
  2023-08-01 20:36 ` [PATCH v1 bpf-next 4/7] bpf: Reenable bpf_refcount_acquire Dave Marchevsky
@ 2023-08-02  5:21   ` Yonghong Song
  0 siblings, 0 replies; 25+ messages in thread
From: Yonghong Song @ 2023-08-02  5:21 UTC (permalink / raw)
  To: Dave Marchevsky, bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team



On 8/1/23 1:36 PM, Dave Marchevsky wrote:
> Now that all reported issues are fixed, bpf_refcount_acquire can be
> turned back on. Also reenable all bpf_refcount-related tests which were
> disabled.
> 
> This a revert of:
>   * commit f3514a5d6740 ("selftests/bpf: Disable newly-added 'owner' field test until refcount re-enabled")
>   * commit 7deca5eae833 ("bpf: Disable bpf_refcount_acquire kfunc calls until race conditions are fixed")
> 
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>

Acked-by: Yonghong Song <yonghong.song@linux.dev>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 bpf-next 5/7] bpf: Consider non-owning refs to refcounted nodes RCU protected
  2023-08-01 20:36 ` [PATCH v1 bpf-next 5/7] bpf: Consider non-owning refs to refcounted nodes RCU protected Dave Marchevsky
@ 2023-08-02  5:59   ` Yonghong Song
  2023-08-04  6:47     ` David Marchevsky
  2023-08-02 22:50   ` Alexei Starovoitov
  1 sibling, 1 reply; 25+ messages in thread
From: Yonghong Song @ 2023-08-02  5:59 UTC (permalink / raw)
  To: Dave Marchevsky, bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team



On 8/1/23 1:36 PM, Dave Marchevsky wrote:
> The previous patch in the series ensures that the underlying memory of
> nodes with bpf_refcount - which can have multiple owners - is not reused
> until RCU Tasks Trace grace period has elapsed. This prevents
> use-after-free with non-owning references that may point to
> recently-freed memory. While RCU read lock is held, it's safe to
> dereference such a non-owning ref, as by definition RCU GP couldn't have
> elapsed and therefore underlying memory couldn't have been reused.
> 
>  From the perspective of verifier "trustedness" non-owning refs to
> refcounted nodes are now trusted only in RCU CS and therefore should no
> longer pass is_trusted_reg, but rather is_rcu_reg. Let's mark them
> MEM_RCU in order to reflect this new state.
> 
> Similarly to bpf_spin_unlock being a non-owning ref invalidation point,
> where non-owning ref reg states are clobbered so that they cannot be
> used outside of the critical section, currently all MEM_RCU regs are
> marked untrusted after bpf_rcu_read_unlock. This patch makes
> bpf_rcu_read_unlock a non-owning ref invalidation point as well,
> clobbering the non-owning refs instead of marking untrusted. In the
> future we may want to allow untrusted non-owning refs in which case we
> can remove this custom logic without breaking BPF programs as it's more
> restrictive than the default. That's a big change in semantics, though,
> and this series is focused on fixing the use-after-free in most
> straightforward way.
> 
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> ---
>   include/linux/bpf.h   |  3 ++-
>   kernel/bpf/verifier.c | 17 +++++++++++++++--
>   2 files changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index ceaa8c23287f..37fba01b061a 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -653,7 +653,8 @@ enum bpf_type_flag {
>   	MEM_RCU			= BIT(13 + BPF_BASE_TYPE_BITS),
>   
>   	/* Used to tag PTR_TO_BTF_ID | MEM_ALLOC references which are non-owning.
> -	 * Currently only valid for linked-list and rbtree nodes.
> +	 * Currently only valid for linked-list and rbtree nodes. If the nodes
> +	 * have a bpf_refcount_field, they must be tagged MEM_RCU as well.

What does 'must' here mean?

>   	 */
>   	NON_OWN_REF		= BIT(14 + BPF_BASE_TYPE_BITS),
>   
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 9014b469dd9d..4bda365000d3 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -469,7 +469,8 @@ static bool type_is_ptr_alloc_obj(u32 type)
>   
>   static bool type_is_non_owning_ref(u32 type)
>   {
> -	return type_is_ptr_alloc_obj(type) && type_flag(type) & NON_OWN_REF;
> +	return type_is_ptr_alloc_obj(type) &&
> +		type_flag(type) & NON_OWN_REF;

There is no code change here.

>   }
>   
>   static struct btf_record *reg_btf_record(const struct bpf_reg_state *reg)
> @@ -8012,6 +8013,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
>   	case PTR_TO_BTF_ID | PTR_TRUSTED:
>   	case PTR_TO_BTF_ID | MEM_RCU:
>   	case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF:
> +	case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU:
>   		/* When referenced PTR_TO_BTF_ID is passed to release function,
>   		 * its fixed offset must be 0. In the other cases, fixed offset
>   		 * can be non-zero. This was already checked above. So pass
> @@ -10478,6 +10480,7 @@ static int process_kf_arg_ptr_to_btf_id(struct bpf_verifier_env *env,
>   static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
>   {
>   	struct bpf_verifier_state *state = env->cur_state;
> +	struct btf_record *rec = reg_btf_record(reg);
>   
>   	if (!state->active_lock.ptr) {
>   		verbose(env, "verifier internal error: ref_set_non_owning w/o active lock\n");
> @@ -10490,6 +10493,9 @@ static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state
>   	}
>   
>   	reg->type |= NON_OWN_REF;
> +	if (rec->refcount_off >= 0)
> +		reg->type |= MEM_RCU;

Should we check whether the state is in rcu cs before marking MEM_RCU?

> +
>   	return 0;
>   }
>   
> @@ -11327,10 +11333,16 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>   		struct bpf_func_state *state;
>   		struct bpf_reg_state *reg;
>   
> +		if (in_rbtree_lock_required_cb(env) && (rcu_lock || rcu_unlock)) {
> +			verbose(env, "can't rcu read {lock,unlock} in rbtree cb\n");
> +			return -EACCES;
> +		}
> +
>   		if (rcu_lock) {
>   			verbose(env, "nested rcu read lock (kernel function %s)\n", func_name);
>   			return -EINVAL;
>   		} else if (rcu_unlock) {
> +			invalidate_non_owning_refs(env);

If we have both spin lock and rcu like

      bpf_rcu_read_lock()
      ...
      bpf_spin_lock()
      ...
      bpf_spin_unlock()  <=== invalidate all non_owning_refs
      ...                <=== MEM_RCU type is gone
      bpf_rcu_read_unlock()

Maybe we could fine tune here to preserve MEM_RCU after bpf_spin_unlock()?

>   			bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
>   				if (reg->type & MEM_RCU) {
>   					reg->type &= ~(MEM_RCU | PTR_MAYBE_NULL);
> @@ -16679,7 +16691,8 @@ static int do_check(struct bpf_verifier_env *env)
>   					return -EINVAL;
>   				}
>   
> -				if (env->cur_state->active_rcu_lock) {
> +				if (env->cur_state->active_rcu_lock &&
> +				    !in_rbtree_lock_required_cb(env)) {
>   					verbose(env, "bpf_rcu_read_unlock is missing\n");
>   					return -EINVAL;
>   				}

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 bpf-next 6/7] [RFC] bpf: Allow bpf_spin_{lock,unlock} in sleepable prog's RCU CS
  2023-08-01 20:36 ` [PATCH v1 bpf-next 6/7] [RFC] bpf: Allow bpf_spin_{lock,unlock} in sleepable prog's RCU CS Dave Marchevsky
@ 2023-08-02  6:33   ` Yonghong Song
  2023-08-02 22:55   ` Alexei Starovoitov
  1 sibling, 0 replies; 25+ messages in thread
From: Yonghong Song @ 2023-08-02  6:33 UTC (permalink / raw)
  To: Dave Marchevsky, bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team



On 8/1/23 1:36 PM, Dave Marchevsky wrote:
> Commit 9e7a4d9831e8 ("bpf: Allow LSM programs to use bpf spin locks")
> disabled bpf_spin_lock usage in sleepable progs, stating:
> 
>   Sleepable LSM programs can be preempted which means that allowng spin
>   locks will need more work (disabling preemption and the verifier
>   ensuring that no sleepable helpers are called when a spin lock is
>   held).
> 
> It seems that some of this 'ensuring that no sleepable helpers are
> called' was done for RCU critical section in commit 9bb00b2895cb ("bpf:
> Add kfunc bpf_rcu_read_lock/unlock()"), specifically the check which
> fails with verbose "sleepable helper %s#%d in rcu_read_lock region"
> message in check_helper_call and similar in check_kfunc_call. These
> checks prevent sleepable helper and kfunc calls in RCU critical
> sections. Accordingly, it should be safe to allow bpf_spin_{lock,unlock}
> in RCU CS. This patch does so, replacing the broad "sleepable progs cannot use
> bpf_spin_lock yet" check with a more targeted !in_rcu_cs.
> 
> [
>    RFC: Does preemption still need to be disabled here?

Good question. My hunch is that we do not need it since
   - spin lock is inside rcu cs, which is similar to a
     spin lock in a non-sleepable program.

I could be wrong as preemption with a sleepable program
may allow explicit blocking.


> ]
> 
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> ---
>   kernel/bpf/verifier.c | 9 ++++-----
>   1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 4bda365000d3..d1b8e8964aec 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -8270,6 +8270,10 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
>   			verbose(env, "can't spin_{lock,unlock} in rbtree cb\n");
>   			return -EACCES;
>   		}
> +		if (!in_rcu_cs(env)) {
> +			verbose(env, "sleepable progs may only spin_{lock,unlock} in RCU CS\n");
> +			return -EACCES;
> +		}
>   		if (meta->func_id == BPF_FUNC_spin_lock) {
>   			err = process_spin_lock(env, regno, true);
>   			if (err)
> @@ -16972,11 +16976,6 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
>   			verbose(env, "tracing progs cannot use bpf_spin_lock yet\n");
>   			return -EINVAL;
>   		}
> -
> -		if (prog->aux->sleepable) {
> -			verbose(env, "sleepable progs cannot use bpf_spin_lock yet\n");
> -			return -EINVAL;
> -		}
>   	}
>   
>   	if (btf_record_has_field(map->record, BPF_TIMER)) {

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 bpf-next 1/7] bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire
  2023-08-02  3:57   ` Yonghong Song
@ 2023-08-02 19:23     ` Dave Marchevsky
  2023-08-02 21:41       ` Yonghong Song
  0 siblings, 1 reply; 25+ messages in thread
From: Dave Marchevsky @ 2023-08-02 19:23 UTC (permalink / raw)
  To: yonghong.song, Dave Marchevsky, bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team



On 8/1/23 11:57 PM, Yonghong Song wrote:
> 
> 
> On 8/1/23 1:36 PM, Dave Marchevsky wrote:
>> It's straightforward to prove that kptr_struct_meta must be non-NULL for
>> any valid call to these kfuncs:
>>
>>    * btf_parse_struct_metas in btf.c creates a btf_struct_meta for any
>>      struct in user BTF with a special field (e.g. bpf_refcount,
>>      {rb,list}_node). These are stored in that BTF's struct_meta_tab.
>>
>>    * __process_kf_arg_ptr_to_graph_node in verifier.c ensures that nodes
>>      have {rb,list}_node field and that it's at the correct offset.
>>      Similarly, check_kfunc_args ensures bpf_refcount field existence for
>>      node param to bpf_refcount_acquire.
>>
>>    * So a btf_struct_meta must have been created for the struct type of
>>      node param to these kfuncs
>>
>>    * That BTF and its struct_meta_tab are guaranteed to still be around.
>>      Any arbitrary {rb,list} node the BPF program interacts with either:
>>      came from bpf_obj_new or a collection removal kfunc in the same
>>      program, in which case the BTF is associated with the program and
>>      still around; or came from bpf_kptr_xchg, in which case the BTF was
>>      associated with the map and is still around
>>
>> Instead of silently continuing with NULL struct_meta, which caused
>> confusing bugs such as those addressed by commit 2140a6e3422d ("bpf: Set
>> kptr_struct_meta for node param to list and rbtree insert funcs"), let's
>> error out. Then, at runtime, we can confidently say that the
>> implementations of these kfuncs were given a non-NULL kptr_struct_meta,
>> meaning that special-field-specific functionality like
>> bpf_obj_free_fields and the bpf_obj_drop change introduced later in this
>> series are guaranteed to execute.
> 
> The subject says '... for collection insert and refcount_acquire'.
> Why picks these? We could check for all kptr_struct_meta use cases?
> 

fixup_kfunc_call sets kptr_struct_meta arg for the following kfuncs:

  - bpf_obj_new_impl
  - bpf_obj_drop_impl
  - collection insert kfuncs
    - bpf_rbtree_add_impl
    - bpf_list_push_{front,back}_impl
  - bpf_refcount_acquire_impl

A btf_struct_meta is only created for a struct if it has a non-null btf_record,
which in turn only happens if the struct has any special fields (spin_lock,
refcount, {rb,list}_node, etc.). Since it's valid to call bpf_obj_new on a
struct type without any special fields, the kptr_struct_meta arg can be
NULL. The result of such bpf_obj_new allocation must be bpf_obj_drop-able, so
the same holds for that kfunc.

By definition rbtree and list nodes must be some struct type w/
struct bpf_{rb,list}_node field, and similar logic for refcounted, so if there's
no kptr_struct_meta for their node arg, there was some verifier-internal issue.


>>
>> This patch doesn't change functionality, just makes it easier to reason
>> about existing functionality.
>>
>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>> ---
>>   kernel/bpf/verifier.c | 14 ++++++++++++++
>>   1 file changed, 14 insertions(+)
>>
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index e7b1af016841..ec37e84a11c6 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -18271,6 +18271,13 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>           struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
>>           struct bpf_insn addr[2] = { BPF_LD_IMM64(BPF_REG_2, (long)kptr_struct_meta) };
>>   +        if (desc->func_id == special_kfunc_list[KF_bpf_refcount_acquire_impl] &&
> 
> Why check for KF_bpf_refcount_acquire_impl? We can cover all cases in this 'if' branch, right?
> 

The body of this 'else if' also handles kptr_struct_meta setup for bpf_obj_drop,
for which NULL kptr_struct_meta is valid. 

>> +            !kptr_struct_meta) {
>> +            verbose(env, "verifier internal error: kptr_struct_meta expected at insn_idx %d\n",
>> +                insn_idx);
>> +            return -EFAULT;
>> +        }
>> +
>>           insn_buf[0] = addr[0];
>>           insn_buf[1] = addr[1];
>>           insn_buf[2] = *insn;
>> @@ -18278,6 +18285,7 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>       } else if (desc->func_id == special_kfunc_list[KF_bpf_list_push_back_impl] ||
>>              desc->func_id == special_kfunc_list[KF_bpf_list_push_front_impl] ||
>>              desc->func_id == special_kfunc_list[KF_bpf_rbtree_add_impl]) {
>> +        struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
>>           int struct_meta_reg = BPF_REG_3;
>>           int node_offset_reg = BPF_REG_4;
>>   @@ -18287,6 +18295,12 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>               node_offset_reg = BPF_REG_5;
>>           }
>>   +        if (!kptr_struct_meta) {
>> +            verbose(env, "verifier internal error: kptr_struct_meta expected at insn_idx %d\n",
>> +                insn_idx);
>> +            return -EFAULT;
>> +        }
>> +
>>           __fixup_collection_insert_kfunc(&env->insn_aux_data[insn_idx], struct_meta_reg,
>>                           node_offset_reg, insn, insn_buf, cnt);
>>       } else if (desc->func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx] ||
> 
> In my opinion, such selective defensive programming is not necessary. By searching kptr_struct_meta in the code, it is reasonably easy to find
> whether we have any mismatch or not. Also self test coverage should
> cover these cases (probably already) right?
> 
> If the defensive programming here is still desirable to warn at verification time, I think we should just check all of uses for kptr_struct_meta.

Something like this patch probably should've been included with the series
containing 2140a6e3422d ("bpf: Set kptr_struct_meta for node param to list and rbtree insert funcs"),
since that commit found that kptr_struct_meta wasn't being set for collection
insert kfuncs and fixed the issue. It was annoyingly hard to root-cause
because, among other things, many of these kfunc impls check that
the btf_struct_meta is non-NULL before using it, with some fallback logic.
I don't like those unnecessary NULL checks either, and considered removing
them in this patch, but decided to leave them in since we already had
a case where struct_meta wasn't being set.

On second thought, maybe it's better to take the unnecessary runtime checks
out and leave these verification-time checks in. If, at runtime, those kfuncs
see a NULL btf_struct_meta, I'd rather they fail loudly in the future
with a NULL deref splat, than potentially leaking memory or similarly
subtle failures. WDYT?

I don't feel particularly strongly about these verification-time checks,
but the level of 'selective defensive programming' here feels similar to
other 'verifier internal error' checks sprinkled throughout verifier.c,
so that argument doesn't feel very persuasive to me.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 bpf-next 1/7] bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire
  2023-08-02 19:23     ` Dave Marchevsky
@ 2023-08-02 21:41       ` Yonghong Song
  2023-08-04  6:17         ` David Marchevsky
  0 siblings, 1 reply; 25+ messages in thread
From: Yonghong Song @ 2023-08-02 21:41 UTC (permalink / raw)
  To: Dave Marchevsky, Dave Marchevsky, bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team



On 8/2/23 12:23 PM, Dave Marchevsky wrote:
> 
> 
> On 8/1/23 11:57 PM, Yonghong Song wrote:
>>
>>
>> On 8/1/23 1:36 PM, Dave Marchevsky wrote:
>>> It's straightforward to prove that kptr_struct_meta must be non-NULL for
>>> any valid call to these kfuncs:
>>>
>>>     * btf_parse_struct_metas in btf.c creates a btf_struct_meta for any
>>>       struct in user BTF with a special field (e.g. bpf_refcount,
>>>       {rb,list}_node). These are stored in that BTF's struct_meta_tab.
>>>
>>>     * __process_kf_arg_ptr_to_graph_node in verifier.c ensures that nodes
>>>       have {rb,list}_node field and that it's at the correct offset.
>>>       Similarly, check_kfunc_args ensures bpf_refcount field existence for
>>>       node param to bpf_refcount_acquire.
>>>
>>>     * So a btf_struct_meta must have been created for the struct type of
>>>       node param to these kfuncs
>>>
>>>     * That BTF and its struct_meta_tab are guaranteed to still be around.
>>>       Any arbitrary {rb,list} node the BPF program interacts with either:
>>>       came from bpf_obj_new or a collection removal kfunc in the same
>>>       program, in which case the BTF is associated with the program and
>>>       still around; or came from bpf_kptr_xchg, in which case the BTF was
>>>       associated with the map and is still around
>>>
>>> Instead of silently continuing with NULL struct_meta, which caused
>>> confusing bugs such as those addressed by commit 2140a6e3422d ("bpf: Set
>>> kptr_struct_meta for node param to list and rbtree insert funcs"), let's
>>> error out. Then, at runtime, we can confidently say that the
>>> implementations of these kfuncs were given a non-NULL kptr_struct_meta,
>>> meaning that special-field-specific functionality like
>>> bpf_obj_free_fields and the bpf_obj_drop change introduced later in this
>>> series are guaranteed to execute.
>>
>> The subject says '... for collection insert and refcount_acquire'.
>> Why picks these? We could check for all kptr_struct_meta use cases?
>>
> 
> fixup_kfunc_call sets kptr_struct_meta arg for the following kfuncs:
> 
>    - bpf_obj_new_impl
>    - bpf_obj_drop_impl
>    - collection insert kfuncs
>      - bpf_rbtree_add_impl
>      - bpf_list_push_{front,back}_impl
>    - bpf_refcount_acquire_impl
> 
> A btf_struct_meta is only created for a struct if it has a non-null btf_record,
> which in turn only happens if the struct has any special fields (spin_lock,
> refcount, {rb,list}_node, etc.). Since it's valid to call bpf_obj_new on a
> struct type without any special fields, the kptr_struct_meta arg can be
> NULL. The result of such bpf_obj_new allocation must be bpf_obj_drop-able, so
> the same holds for that kfunc.
> 
> By definition rbtree and list nodes must be some struct type w/
> struct bpf_{rb,list}_node field, and similar logic for refcounted, so if there's
> no kptr_struct_meta for their node arg, there was some verifier-internal issue.
> 
> 
>>>
>>> This patch doesn't change functionality, just makes it easier to reason
>>> about existing functionality.
>>>
>>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>>> ---
>>>    kernel/bpf/verifier.c | 14 ++++++++++++++
>>>    1 file changed, 14 insertions(+)
>>>
>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>>> index e7b1af016841..ec37e84a11c6 100644
>>> --- a/kernel/bpf/verifier.c
>>> +++ b/kernel/bpf/verifier.c
>>> @@ -18271,6 +18271,13 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>>            struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
>>>            struct bpf_insn addr[2] = { BPF_LD_IMM64(BPF_REG_2, (long)kptr_struct_meta) };
>>>    +        if (desc->func_id == special_kfunc_list[KF_bpf_refcount_acquire_impl] &&
>>
>> Why check for KF_bpf_refcount_acquire_impl? We can cover all cases in this 'if' branch, right?
>>
> 
> The body of this 'else if' also handles kptr_struct_meta setup for bpf_obj_drop,
> for which NULL kptr_struct_meta is valid.
> 
>>> +            !kptr_struct_meta) {
>>> +            verbose(env, "verifier internal error: kptr_struct_meta expected at insn_idx %d\n",
>>> +                insn_idx);
>>> +            return -EFAULT;
>>> +        }
>>> +
>>>            insn_buf[0] = addr[0];
>>>            insn_buf[1] = addr[1];
>>>            insn_buf[2] = *insn;
>>> @@ -18278,6 +18285,7 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>>        } else if (desc->func_id == special_kfunc_list[KF_bpf_list_push_back_impl] ||
>>>               desc->func_id == special_kfunc_list[KF_bpf_list_push_front_impl] ||
>>>               desc->func_id == special_kfunc_list[KF_bpf_rbtree_add_impl]) {
>>> +        struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
>>>            int struct_meta_reg = BPF_REG_3;
>>>            int node_offset_reg = BPF_REG_4;
>>>    @@ -18287,6 +18295,12 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>>                node_offset_reg = BPF_REG_5;
>>>            }
>>>    +        if (!kptr_struct_meta) {
>>> +            verbose(env, "verifier internal error: kptr_struct_meta expected at insn_idx %d\n",
>>> +                insn_idx);
>>> +            return -EFAULT;
>>> +        }
>>> +
>>>            __fixup_collection_insert_kfunc(&env->insn_aux_data[insn_idx], struct_meta_reg,
>>>                            node_offset_reg, insn, insn_buf, cnt);
>>>        } else if (desc->func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx] ||
>>
>> In my opinion, such selective defensive programming is not necessary. By searching kptr_struct_meta in the code, it is reasonably easy to find
>> whether we have any mismatch or not. Also self test coverage should
>> cover these cases (probably already) right?
>>
>> If the defensive programming here is still desirable to warn at verification time, I think we should just check all of uses for kptr_struct_meta.
> 
> Something like this patch probably should've been included with the series
> containing 2140a6e3422d ("bpf: Set kptr_struct_meta for node param to list and rbtree insert funcs"),
> since that commit found that kptr_struct_meta wasn't being set for collection
> insert kfuncs and fixed the issue. It was annoyingly hard to root-cause
> because, among other things, many of these kfunc impls check that
> the btf_struct_meta is non-NULL before using it, with some fallback logic.
> I don't like those unnecessary NULL checks either, and considered removing
> them in this patch, but decided to leave them in since we already had
> a case where struct_meta wasn't being set.
> 
> On second thought, maybe it's better to take the unnecessary runtime checks
> out and leave these verification-time checks in. If, at runtime, those kfuncs
> see a NULL btf_struct_meta, I'd rather they fail loudly in the future
> with a NULL deref splat, than potentially leaking memory or similarly
> subtle failures. WDYT?

Certainly I agree with you that verification failure is much better than
debugging runtime.

Here, we covered a few kfunc which always requires non-NULL 
kptr_struct_meta. But as you mentioned in the above, we also have
cases where for a kfunc, the kptr_struct_meta could be NULL or non-NULL.

Let us say, kfunc bpf_obj_new_impl, for some cases, the kptr_struct_meta
cannot be NULL based on bpf prog, but somehow, the verifier passes
a NULL ptr to the program. Should we check this at fixup_kfunc_call()
as well?

> 
> I don't feel particularly strongly about these verification-time checks,
> but the level of 'selective defensive programming' here feels similar to
> other 'verifier internal error' checks sprinkled throughout verifier.c,
> so that argument doesn't feel very persuasive to me.

I am okay with this patch but I wonder whether we can cover more
cases.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 bpf-next 5/7] bpf: Consider non-owning refs to refcounted nodes RCU protected
  2023-08-01 20:36 ` [PATCH v1 bpf-next 5/7] bpf: Consider non-owning refs to refcounted nodes RCU protected Dave Marchevsky
  2023-08-02  5:59   ` Yonghong Song
@ 2023-08-02 22:50   ` Alexei Starovoitov
  2023-08-04  6:55     ` David Marchevsky
  1 sibling, 1 reply; 25+ messages in thread
From: Alexei Starovoitov @ 2023-08-02 22:50 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team

On Tue, Aug 01, 2023 at 01:36:28PM -0700, Dave Marchevsky wrote:
> The previous patch in the series ensures that the underlying memory of
> nodes with bpf_refcount - which can have multiple owners - is not reused
> until RCU Tasks Trace grace period has elapsed. This prevents

Here and in the cover letter... above should probably be "RCU grace period"
and not "RCU tasks trace grace period".
bpf progs will reuse objects after normal RCU.
We're waiting for RCU tasks trace GP to free into slab.

> use-after-free with non-owning references that may point to
> recently-freed memory. While RCU read lock is held, it's safe to
> dereference such a non-owning ref, as by definition RCU GP couldn't have
> elapsed and therefore underlying memory couldn't have been reused.
> 
> From the perspective of verifier "trustedness" non-owning refs to
> refcounted nodes are now trusted only in RCU CS and therefore should no
> longer pass is_trusted_reg, but rather is_rcu_reg. Let's mark them
> MEM_RCU in order to reflect this new state.
> 
> Similarly to bpf_spin_unlock being a non-owning ref invalidation point,
> where non-owning ref reg states are clobbered so that they cannot be
> used outside of the critical section, currently all MEM_RCU regs are
> marked untrusted after bpf_rcu_read_unlock. This patch makes
> bpf_rcu_read_unlock a non-owning ref invalidation point as well,
> clobbering the non-owning refs instead of marking untrusted. In the
> future we may want to allow untrusted non-owning refs in which case we
> can remove this custom logic without breaking BPF programs as it's more
> restrictive than the default. That's a big change in semantics, though,
> and this series is focused on fixing the use-after-free in most
> straightforward way.
> 
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> ---
>  include/linux/bpf.h   |  3 ++-
>  kernel/bpf/verifier.c | 17 +++++++++++++++--
>  2 files changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index ceaa8c23287f..37fba01b061a 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -653,7 +653,8 @@ enum bpf_type_flag {
>  	MEM_RCU			= BIT(13 + BPF_BASE_TYPE_BITS),
>  
>  	/* Used to tag PTR_TO_BTF_ID | MEM_ALLOC references which are non-owning.
> -	 * Currently only valid for linked-list and rbtree nodes.
> +	 * Currently only valid for linked-list and rbtree nodes. If the nodes
> +	 * have a bpf_refcount_field, they must be tagged MEM_RCU as well.
>  	 */
>  	NON_OWN_REF		= BIT(14 + BPF_BASE_TYPE_BITS),
>  
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 9014b469dd9d..4bda365000d3 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -469,7 +469,8 @@ static bool type_is_ptr_alloc_obj(u32 type)
>  
>  static bool type_is_non_owning_ref(u32 type)
>  {
> -	return type_is_ptr_alloc_obj(type) && type_flag(type) & NON_OWN_REF;
> +	return type_is_ptr_alloc_obj(type) &&
> +		type_flag(type) & NON_OWN_REF;
>  }
>  
>  static struct btf_record *reg_btf_record(const struct bpf_reg_state *reg)
> @@ -8012,6 +8013,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
>  	case PTR_TO_BTF_ID | PTR_TRUSTED:
>  	case PTR_TO_BTF_ID | MEM_RCU:
>  	case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF:
> +	case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU:
>  		/* When referenced PTR_TO_BTF_ID is passed to release function,
>  		 * its fixed offset must be 0. In the other cases, fixed offset
>  		 * can be non-zero. This was already checked above. So pass
> @@ -10478,6 +10480,7 @@ static int process_kf_arg_ptr_to_btf_id(struct bpf_verifier_env *env,
>  static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
>  {
>  	struct bpf_verifier_state *state = env->cur_state;
> +	struct btf_record *rec = reg_btf_record(reg);
>  
>  	if (!state->active_lock.ptr) {
>  		verbose(env, "verifier internal error: ref_set_non_owning w/o active lock\n");
> @@ -10490,6 +10493,9 @@ static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state
>  	}
>  
>  	reg->type |= NON_OWN_REF;
> +	if (rec->refcount_off >= 0)
> +		reg->type |= MEM_RCU;
> +
>  	return 0;
>  }
>  
> @@ -11327,10 +11333,16 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>  		struct bpf_func_state *state;
>  		struct bpf_reg_state *reg;
>  
> +		if (in_rbtree_lock_required_cb(env) && (rcu_lock || rcu_unlock)) {
> +			verbose(env, "can't rcu read {lock,unlock} in rbtree cb\n");
> +			return -EACCES;
> +		}

I guess it's ok to prevent cb from calling bpf_rcu_read_lock(), since it's unnecessary,
but pls make the message more verbose. Like:
 verbose(env, "Calling bpf_rcu_read_{lock,unlock} in unnecessary rbtree callback\n");

so that users know why the verifier complains.
Technically it's ok to do so. Unnecessary is not a safety issue.

> +
>  		if (rcu_lock) {
>  			verbose(env, "nested rcu read lock (kernel function %s)\n", func_name);
>  			return -EINVAL;
>  		} else if (rcu_unlock) {
> +			invalidate_non_owning_refs(env);

I agree with Yonghong. It probably doesn't belong here.
rcu lock/unlock and spin_lock/unlock are separate critical sections.
Since ref_set_non_owning() adds extra MEM_RCU flag nothing extra needs to be done here.
Below code will make the pointers untrusted.

>  			bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
>  				if (reg->type & MEM_RCU) {
>  					reg->type &= ~(MEM_RCU | PTR_MAYBE_NULL);
> @@ -16679,7 +16691,8 @@ static int do_check(struct bpf_verifier_env *env)
>  					return -EINVAL;
>  				}
>  
> -				if (env->cur_state->active_rcu_lock) {
> +				if (env->cur_state->active_rcu_lock &&
> +				    !in_rbtree_lock_required_cb(env)) {

I'm not following here.
Didn't you want to prevent bpf_rcu_read_lock/unlock inside cb? Why this change?

>  					verbose(env, "bpf_rcu_read_unlock is missing\n");
>  					return -EINVAL;
>  				}
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 bpf-next 6/7] [RFC] bpf: Allow bpf_spin_{lock,unlock} in sleepable prog's RCU CS
  2023-08-01 20:36 ` [PATCH v1 bpf-next 6/7] [RFC] bpf: Allow bpf_spin_{lock,unlock} in sleepable prog's RCU CS Dave Marchevsky
  2023-08-02  6:33   ` Yonghong Song
@ 2023-08-02 22:55   ` Alexei Starovoitov
  1 sibling, 0 replies; 25+ messages in thread
From: Alexei Starovoitov @ 2023-08-02 22:55 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team

On Tue, Aug 01, 2023 at 01:36:29PM -0700, Dave Marchevsky wrote:
> Commit 9e7a4d9831e8 ("bpf: Allow LSM programs to use bpf spin locks")
> disabled bpf_spin_lock usage in sleepable progs, stating:
> 
>  Sleepable LSM programs can be preempted which means that allowng spin
>  locks will need more work (disabling preemption and the verifier
>  ensuring that no sleepable helpers are called when a spin lock is
>  held).
> 
> It seems that some of this 'ensuring that no sleepable helpers are
> called' was done for RCU critical section in commit 9bb00b2895cb ("bpf:
> Add kfunc bpf_rcu_read_lock/unlock()"), specifically the check which
> fails with verbose "sleepable helper %s#%d in rcu_read_lock region"
> message in check_helper_call and similar in check_kfunc_call. These
> checks prevent sleepable helper and kfunc calls in RCU critical
> sections. Accordingly, it should be safe to allow bpf_spin_{lock,unlock}
> in RCU CS. This patch does so, replacing the broad "sleepable progs cannot use
> bpf_spin_lock yet" check with a more targeted !in_rcu_cs.
> 
> [
>   RFC: Does preemption still need to be disabled here?

Yes. __bpf_spin_lock() needs to disable it before arch_spin_lock.
Since some sleepable progs are reentrable we need to make sure the bpf prog
isn't preempted while spin_lock is held. Otherwise dead lock is possible.

> ]
> 
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> ---
>  kernel/bpf/verifier.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 4bda365000d3..d1b8e8964aec 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -8270,6 +8270,10 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
>  			verbose(env, "can't spin_{lock,unlock} in rbtree cb\n");
>  			return -EACCES;
>  		}
> +		if (!in_rcu_cs(env)) {
> +			verbose(env, "sleepable progs may only spin_{lock,unlock} in RCU CS\n");
> +			return -EACCES;
> +		}

I don't see the point requiring bpf_spin_lock only under RCU CS.
It seems below !sleepable check can be dropped without adding above hunk.

>  		if (meta->func_id == BPF_FUNC_spin_lock) {
>  			err = process_spin_lock(env, regno, true);
>  			if (err)
> @@ -16972,11 +16976,6 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
>  			verbose(env, "tracing progs cannot use bpf_spin_lock yet\n");
>  			return -EINVAL;
>  		}
> -
> -		if (prog->aux->sleepable) {
> -			verbose(env, "sleepable progs cannot use bpf_spin_lock yet\n");
> -			return -EINVAL;
> -		}
>  	}
>  
>  	if (btf_record_has_field(map->record, BPF_TIMER)) {
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 bpf-next 7/7] selftests/bpf: Add tests for rbtree API interaction in sleepable progs
  2023-08-01 20:36 ` [PATCH v1 bpf-next 7/7] selftests/bpf: Add tests for rbtree API interaction in sleepable progs Dave Marchevsky
@ 2023-08-02 23:07   ` Alexei Starovoitov
  0 siblings, 0 replies; 25+ messages in thread
From: Alexei Starovoitov @ 2023-08-02 23:07 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team

On Tue, Aug 01, 2023 at 01:36:30PM -0700, Dave Marchevsky wrote:
> +SEC("?fentry.s/bpf_testmod_test_read")
> +__failure __msg("sleepable progs may only spin_{lock,unlock} in RCU CS")
> +int BPF_PROG(rbtree_fail_sleepable_lock_no_rcu,
> +	     struct file *file, struct kobject *kobj,
> +	     struct bin_attribute *bin_attr, char *buf, loff_t off, size_t len)
> +{
> +	struct node_acquire *n;
> +
> +	n = bpf_obj_new(typeof(*n));
> +	if (!n)
> +		return 0;
> +
> +	/* no bpf_rcu_read_{lock,unlock} */
> +	bpf_spin_lock(&glock);
> +	bpf_rbtree_add(&groot, &n->node, less);
> +	bpf_spin_unlock(&glock);

Continuing the discussion in the other patch...
I don't see anything wrong with above.
bpf_spin_lock will disable preemption and will call rbtree_add.
Everything looks safe.

RCU/Design/Requirements/Requirements.rst
"
In addition, anything that disables
preemption also marks an RCU-sched read-side critical section,
including preempt_disable() and preempt_enable(), local_irq_save()
and local_irq_restore(), and so on.
"
In practice it's always better to enforce explicit bpf_rcu_read_lock/unlock,
but seems overkill in this case. Sleepable prog holding spin_lock will
not cause a UAF for non-owned references with refcnt == 0.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 bpf-next 1/7] bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire
  2023-08-02 21:41       ` Yonghong Song
@ 2023-08-04  6:17         ` David Marchevsky
  2023-08-04 15:37           ` Yonghong Song
  0 siblings, 1 reply; 25+ messages in thread
From: David Marchevsky @ 2023-08-04  6:17 UTC (permalink / raw)
  To: yonghong.song, Dave Marchevsky, Dave Marchevsky, bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team

On 8/2/23 5:41 PM, Yonghong Song wrote:
> 
> 
> On 8/2/23 12:23 PM, Dave Marchevsky wrote:
>>
>>
>> On 8/1/23 11:57 PM, Yonghong Song wrote:
>>>
>>>
>>> On 8/1/23 1:36 PM, Dave Marchevsky wrote:
>>>> It's straightforward to prove that kptr_struct_meta must be non-NULL for
>>>> any valid call to these kfuncs:
>>>>
>>>>     * btf_parse_struct_metas in btf.c creates a btf_struct_meta for any
>>>>       struct in user BTF with a special field (e.g. bpf_refcount,
>>>>       {rb,list}_node). These are stored in that BTF's struct_meta_tab.
>>>>
>>>>     * __process_kf_arg_ptr_to_graph_node in verifier.c ensures that nodes
>>>>       have {rb,list}_node field and that it's at the correct offset.
>>>>       Similarly, check_kfunc_args ensures bpf_refcount field existence for
>>>>       node param to bpf_refcount_acquire.
>>>>
>>>>     * So a btf_struct_meta must have been created for the struct type of
>>>>       node param to these kfuncs
>>>>
>>>>     * That BTF and its struct_meta_tab are guaranteed to still be around.
>>>>       Any arbitrary {rb,list} node the BPF program interacts with either:
>>>>       came from bpf_obj_new or a collection removal kfunc in the same
>>>>       program, in which case the BTF is associated with the program and
>>>>       still around; or came from bpf_kptr_xchg, in which case the BTF was
>>>>       associated with the map and is still around
>>>>
>>>> Instead of silently continuing with NULL struct_meta, which caused
>>>> confusing bugs such as those addressed by commit 2140a6e3422d ("bpf: Set
>>>> kptr_struct_meta for node param to list and rbtree insert funcs"), let's
>>>> error out. Then, at runtime, we can confidently say that the
>>>> implementations of these kfuncs were given a non-NULL kptr_struct_meta,
>>>> meaning that special-field-specific functionality like
>>>> bpf_obj_free_fields and the bpf_obj_drop change introduced later in this
>>>> series are guaranteed to execute.
>>>
>>> The subject says '... for collection insert and refcount_acquire'.
>>> Why picks these? We could check for all kptr_struct_meta use cases?
>>>
>>
>> fixup_kfunc_call sets kptr_struct_meta arg for the following kfuncs:
>>
>>    - bpf_obj_new_impl
>>    - bpf_obj_drop_impl
>>    - collection insert kfuncs
>>      - bpf_rbtree_add_impl
>>      - bpf_list_push_{front,back}_impl
>>    - bpf_refcount_acquire_impl
>>
>> A btf_struct_meta is only created for a struct if it has a non-null btf_record,
>> which in turn only happens if the struct has any special fields (spin_lock,
>> refcount, {rb,list}_node, etc.). Since it's valid to call bpf_obj_new on a
>> struct type without any special fields, the kptr_struct_meta arg can be
>> NULL. The result of such bpf_obj_new allocation must be bpf_obj_drop-able, so
>> the same holds for that kfunc.
>>
>> By definition rbtree and list nodes must be some struct type w/
>> struct bpf_{rb,list}_node field, and similar logic for refcounted, so if there's
>> no kptr_struct_meta for their node arg, there was some verifier-internal issue.
>>
>>
>>>>
>>>> This patch doesn't change functionality, just makes it easier to reason
>>>> about existing functionality.
>>>>
>>>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>>>> ---
>>>>    kernel/bpf/verifier.c | 14 ++++++++++++++
>>>>    1 file changed, 14 insertions(+)
>>>>
>>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>>>> index e7b1af016841..ec37e84a11c6 100644
>>>> --- a/kernel/bpf/verifier.c
>>>> +++ b/kernel/bpf/verifier.c
>>>> @@ -18271,6 +18271,13 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>>>            struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
>>>>            struct bpf_insn addr[2] = { BPF_LD_IMM64(BPF_REG_2, (long)kptr_struct_meta) };
>>>>    +        if (desc->func_id == special_kfunc_list[KF_bpf_refcount_acquire_impl] &&
>>>
>>> Why check for KF_bpf_refcount_acquire_impl? We can cover all cases in this 'if' branch, right?
>>>
>>
>> The body of this 'else if' also handles kptr_struct_meta setup for bpf_obj_drop,
>> for which NULL kptr_struct_meta is valid.
>>
>>>> +            !kptr_struct_meta) {
>>>> +            verbose(env, "verifier internal error: kptr_struct_meta expected at insn_idx %d\n",
>>>> +                insn_idx);
>>>> +            return -EFAULT;
>>>> +        }
>>>> +
>>>>            insn_buf[0] = addr[0];
>>>>            insn_buf[1] = addr[1];
>>>>            insn_buf[2] = *insn;
>>>> @@ -18278,6 +18285,7 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>>>        } else if (desc->func_id == special_kfunc_list[KF_bpf_list_push_back_impl] ||
>>>>               desc->func_id == special_kfunc_list[KF_bpf_list_push_front_impl] ||
>>>>               desc->func_id == special_kfunc_list[KF_bpf_rbtree_add_impl]) {
>>>> +        struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
>>>>            int struct_meta_reg = BPF_REG_3;
>>>>            int node_offset_reg = BPF_REG_4;
>>>>    @@ -18287,6 +18295,12 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>>>                node_offset_reg = BPF_REG_5;
>>>>            }
>>>>    +        if (!kptr_struct_meta) {
>>>> +            verbose(env, "verifier internal error: kptr_struct_meta expected at insn_idx %d\n",
>>>> +                insn_idx);
>>>> +            return -EFAULT;
>>>> +        }
>>>> +
>>>>            __fixup_collection_insert_kfunc(&env->insn_aux_data[insn_idx], struct_meta_reg,
>>>>                            node_offset_reg, insn, insn_buf, cnt);
>>>>        } else if (desc->func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx] ||
>>>
>>> In my opinion, such selective defensive programming is not necessary. By searching kptr_struct_meta in the code, it is reasonably easy to find
>>> whether we have any mismatch or not. Also self test coverage should
>>> cover these cases (probably already) right?
>>>
>>> If the defensive programming here is still desirable to warn at verification time, I think we should just check all of uses for kptr_struct_meta.
>>
>> Something like this patch probably should've been included with the series
>> containing 2140a6e3422d ("bpf: Set kptr_struct_meta for node param to list and rbtree insert funcs"),
>> since that commit found that kptr_struct_meta wasn't being set for collection
>> insert kfuncs and fixed the issue. It was annoyingly hard to root-cause
>> because, among other things, many of these kfunc impls check that
>> the btf_struct_meta is non-NULL before using it, with some fallback logic.
>> I don't like those unnecessary NULL checks either, and considered removing
>> them in this patch, but decided to leave them in since we already had
>> a case where struct_meta wasn't being set.
>>
>> On second thought, maybe it's better to take the unnecessary runtime checks
>> out and leave these verification-time checks in. If, at runtime, those kfuncs
>> see a NULL btf_struct_meta, I'd rather they fail loudly in the future
>> with a NULL deref splat, than potentially leaking memory or similarly
>> subtle failures. WDYT?
> 
> Certainly I agree with you that verification failure is much better than
> debugging runtime.
> 
> Here, we covered a few kfunc which always requires non-NULL kptr_struct_meta. But as you mentioned in the above, we also have
> cases where for a kfunc, the kptr_struct_meta could be NULL or non-NULL.
> 
> Let us say, kfunc bpf_obj_new_impl, for some cases, the kptr_struct_meta
> cannot be NULL based on bpf prog, but somehow, the verifier passes
> a NULL ptr to the program. Should we check this at fixup_kfunc_call()
> as well?
> 

I see what you mean. Since btf_struct_meta is a (btf_id, btf_record) tuple
currently, and btf_record is a view of a BTF type optimized for quickly
answering questions about special fields (existence / location / type), we
could certainly go look at the kfunc arg BTF types to confirm, as they're the
source of truth. But this would be redoing the work necessary to generate the
btf_record in the first place, which feels like it defeats the purpose of
having this view of the data at verification time. We definitely don't want
to push this BTF digging to runtime either.

Another approach: we could focus on the specific issue that caused the
bug which the commit I mentioned earlier fixes, and enforce the following
invariants:

   * All bpf_obj_new_impl calls for a particular type must use the same
     btf_struct_meta
   * Any kfunc or helper taking a user-allocated param must use the same
     btf_struct_meta as the bpf_obj_new_impl call that allocated it

Whole point of 'special' fields is to take special-field-specific action on
them, or due to their existence. This would ensure that all helpers use same
special field info. Then e.g. if bpf_obj_new's bpf_obj_init_fields does
something to a special field, bpf_obj_drop will have the same btf_struct_meta
and will have same field info for bpf_obj_free_fields.

Maybe instead of "bpf_obj_new_impl calls" above it's
more accurate to say "kfunc calls that acquire the
user-allocated type".

I think this second idea makes sense and is better than piecemeal checking.
Will think on it more and try to incorporate it into v2 of this series.

>>
>> I don't feel particularly strongly about these verification-time checks,
>> but the level of 'selective defensive programming' here feels similar to
>> other 'verifier internal error' checks sprinkled throughout verifier.c,
>> so that argument doesn't feel very persuasive to me.
> 
> I am okay with this patch but I wonder whether we can cover more
> cases.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 bpf-next 5/7] bpf: Consider non-owning refs to refcounted nodes RCU protected
  2023-08-02  5:59   ` Yonghong Song
@ 2023-08-04  6:47     ` David Marchevsky
  2023-08-04 15:43       ` Yonghong Song
  0 siblings, 1 reply; 25+ messages in thread
From: David Marchevsky @ 2023-08-04  6:47 UTC (permalink / raw)
  To: yonghong.song, Dave Marchevsky, bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team

On 8/2/23 1:59 AM, Yonghong Song wrote:
> 
> 
> On 8/1/23 1:36 PM, Dave Marchevsky wrote:
>> The previous patch in the series ensures that the underlying memory of
>> nodes with bpf_refcount - which can have multiple owners - is not reused
>> until RCU Tasks Trace grace period has elapsed. This prevents
>> use-after-free with non-owning references that may point to
>> recently-freed memory. While RCU read lock is held, it's safe to
>> dereference such a non-owning ref, as by definition RCU GP couldn't have
>> elapsed and therefore underlying memory couldn't have been reused.
>>
>>  From the perspective of verifier "trustedness" non-owning refs to
>> refcounted nodes are now trusted only in RCU CS and therefore should no
>> longer pass is_trusted_reg, but rather is_rcu_reg. Let's mark them
>> MEM_RCU in order to reflect this new state.
>>
>> Similarly to bpf_spin_unlock being a non-owning ref invalidation point,
>> where non-owning ref reg states are clobbered so that they cannot be
>> used outside of the critical section, currently all MEM_RCU regs are
>> marked untrusted after bpf_rcu_read_unlock. This patch makes
>> bpf_rcu_read_unlock a non-owning ref invalidation point as well,
>> clobbering the non-owning refs instead of marking untrusted. In the
>> future we may want to allow untrusted non-owning refs in which case we
>> can remove this custom logic without breaking BPF programs as it's more
>> restrictive than the default. That's a big change in semantics, though,
>> and this series is focused on fixing the use-after-free in most
>> straightforward way.
>>
>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>> ---
>>   include/linux/bpf.h   |  3 ++-
>>   kernel/bpf/verifier.c | 17 +++++++++++++++--
>>   2 files changed, 17 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index ceaa8c23287f..37fba01b061a 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -653,7 +653,8 @@ enum bpf_type_flag {
>>       MEM_RCU            = BIT(13 + BPF_BASE_TYPE_BITS),
>>         /* Used to tag PTR_TO_BTF_ID | MEM_ALLOC references which are non-owning.
>> -     * Currently only valid for linked-list and rbtree nodes.
>> +     * Currently only valid for linked-list and rbtree nodes. If the nodes
>> +     * have a bpf_refcount_field, they must be tagged MEM_RCU as well.
> 
> What does 'must' here mean?
> 

Meaning that if there's any NON_OWN_REF-flagged
PTR_TO_BTF_ID which points to a struct with a bpf_refcount field,
it should also be flagged with MEM_RCU. If it isn't, it's a
verifier error. 

>>        */
>>       NON_OWN_REF        = BIT(14 + BPF_BASE_TYPE_BITS),
>>   diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 9014b469dd9d..4bda365000d3 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -469,7 +469,8 @@ static bool type_is_ptr_alloc_obj(u32 type)
>>     static bool type_is_non_owning_ref(u32 type)
>>   {
>> -    return type_is_ptr_alloc_obj(type) && type_flag(type) & NON_OWN_REF;
>> +    return type_is_ptr_alloc_obj(type) &&
>> +        type_flag(type) & NON_OWN_REF;
> 
> There is no code change here.
> 

Yep, will undo in v2.

>>   }
>>     static struct btf_record *reg_btf_record(const struct bpf_reg_state *reg)
>> @@ -8012,6 +8013,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
>>       case PTR_TO_BTF_ID | PTR_TRUSTED:
>>       case PTR_TO_BTF_ID | MEM_RCU:
>>       case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF:
>> +    case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU:
>>           /* When referenced PTR_TO_BTF_ID is passed to release function,
>>            * its fixed offset must be 0. In the other cases, fixed offset
>>            * can be non-zero. This was already checked above. So pass
>> @@ -10478,6 +10480,7 @@ static int process_kf_arg_ptr_to_btf_id(struct bpf_verifier_env *env,
>>   static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
>>   {
>>       struct bpf_verifier_state *state = env->cur_state;
>> +    struct btf_record *rec = reg_btf_record(reg);
>>         if (!state->active_lock.ptr) {
>>           verbose(env, "verifier internal error: ref_set_non_owning w/o active lock\n");
>> @@ -10490,6 +10493,9 @@ static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state
>>       }
>>         reg->type |= NON_OWN_REF;
>> +    if (rec->refcount_off >= 0)
>> +        reg->type |= MEM_RCU;
> 
> Should we check whether the state is in rcu cs before marking MEM_RCU?
> 

I think this is implicitly being enforced.
Rbtree/list kfuncs must be called under bpf_spin_lock,
and this series requires bpf_spin_{lock,unlock} helpers
to called in RCU CS if the BPF prog is sleepable.

>> +
>>       return 0;
>>   }
>>   @@ -11327,10 +11333,16 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>           struct bpf_func_state *state;
>>           struct bpf_reg_state *reg;
>>   +        if (in_rbtree_lock_required_cb(env) && (rcu_lock || rcu_unlock)) {
>> +            verbose(env, "can't rcu read {lock,unlock} in rbtree cb\n");
>> +            return -EACCES;
>> +        }
>> +
>>           if (rcu_lock) {
>>               verbose(env, "nested rcu read lock (kernel function %s)\n", func_name);
>>               return -EINVAL;
>>           } else if (rcu_unlock) {
>> +            invalidate_non_owning_refs(env);
> 
> If we have both spin lock and rcu like
> 
>      bpf_rcu_read_lock()
>      ...
>      bpf_spin_lock()
>      ...
>      bpf_spin_unlock()  <=== invalidate all non_owning_refs
>      ...                <=== MEM_RCU type is gone
>      bpf_rcu_read_unlock()
> 
> Maybe we could fine tune here to preserve MEM_RCU after bpf_spin_unlock()?
> 

IIUC, you're saying that we should no longer
have non-owning refs get clobbered after bpf_spin_unlock,
and instead just have rcu_read_unlock do its default
"MEM_RCU refs become PTR_UNTRUSTED" logic.

In the cover letter I mention that this is probably
the direction we want to go in in the long term, on
the comments on patch 3:

  This might
  allow custom non-owning ref lifetime + invalidation logic to be
  entirely subsumed by MEM_RCU handling.

But I'm hesitant to do that in this fixes series
as I'd like to minimize changes that could introduce
additional bugs. This series' current changes keep the
clobbering rules effectively unchanged - can always
loosen them in the future. Also, I think we should
make this change for _all_ non-owning refs, (w/ and w/o
bpf_refcount field). Otherwise the verifier lifetime
of non-owning refs would change if BPF program writer
adds bpf_refcount field to their struct, or removes it.
 

>>               bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
>>                   if (reg->type & MEM_RCU) {
>>                       reg->type &= ~(MEM_RCU | PTR_MAYBE_NULL);
>> @@ -16679,7 +16691,8 @@ static int do_check(struct bpf_verifier_env *env)
>>                       return -EINVAL;
>>                   }
>>   -                if (env->cur_state->active_rcu_lock) {
>> +                if (env->cur_state->active_rcu_lock &&
>> +                    !in_rbtree_lock_required_cb(env)) {
>>                       verbose(env, "bpf_rcu_read_unlock is missing\n");
>>                       return -EINVAL;
>>                   }

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 bpf-next 5/7] bpf: Consider non-owning refs to refcounted nodes RCU protected
  2023-08-02 22:50   ` Alexei Starovoitov
@ 2023-08-04  6:55     ` David Marchevsky
  0 siblings, 0 replies; 25+ messages in thread
From: David Marchevsky @ 2023-08-04  6:55 UTC (permalink / raw)
  To: Alexei Starovoitov, Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team

On 8/2/23 6:50 PM, Alexei Starovoitov wrote:
> On Tue, Aug 01, 2023 at 01:36:28PM -0700, Dave Marchevsky wrote:
>> The previous patch in the series ensures that the underlying memory of
>> nodes with bpf_refcount - which can have multiple owners - is not reused
>> until RCU Tasks Trace grace period has elapsed. This prevents
> 
> Here and in the cover letter... above should probably be "RCU grace period"
> and not "RCU tasks trace grace period".
> bpf progs will reuse objects after normal RCU.
> We're waiting for RCU tasks trace GP to free into slab.
> 

Will fix.

>> use-after-free with non-owning references that may point to
>> recently-freed memory. While RCU read lock is held, it's safe to
>> dereference such a non-owning ref, as by definition RCU GP couldn't have
>> elapsed and therefore underlying memory couldn't have been reused.
>>
>> From the perspective of verifier "trustedness" non-owning refs to
>> refcounted nodes are now trusted only in RCU CS and therefore should no
>> longer pass is_trusted_reg, but rather is_rcu_reg. Let's mark them
>> MEM_RCU in order to reflect this new state.
>>
>> Similarly to bpf_spin_unlock being a non-owning ref invalidation point,
>> where non-owning ref reg states are clobbered so that they cannot be
>> used outside of the critical section, currently all MEM_RCU regs are
>> marked untrusted after bpf_rcu_read_unlock. This patch makes
>> bpf_rcu_read_unlock a non-owning ref invalidation point as well,
>> clobbering the non-owning refs instead of marking untrusted. In the
>> future we may want to allow untrusted non-owning refs in which case we
>> can remove this custom logic without breaking BPF programs as it's more
>> restrictive than the default. That's a big change in semantics, though,
>> and this series is focused on fixing the use-after-free in most
>> straightforward way.
>>
>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>> ---
>>  include/linux/bpf.h   |  3 ++-
>>  kernel/bpf/verifier.c | 17 +++++++++++++++--
>>  2 files changed, 17 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index ceaa8c23287f..37fba01b061a 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -653,7 +653,8 @@ enum bpf_type_flag {
>>  	MEM_RCU			= BIT(13 + BPF_BASE_TYPE_BITS),
>>  
>>  	/* Used to tag PTR_TO_BTF_ID | MEM_ALLOC references which are non-owning.
>> -	 * Currently only valid for linked-list and rbtree nodes.
>> +	 * Currently only valid for linked-list and rbtree nodes. If the nodes
>> +	 * have a bpf_refcount_field, they must be tagged MEM_RCU as well.
>>  	 */
>>  	NON_OWN_REF		= BIT(14 + BPF_BASE_TYPE_BITS),
>>  
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 9014b469dd9d..4bda365000d3 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -469,7 +469,8 @@ static bool type_is_ptr_alloc_obj(u32 type)
>>  
>>  static bool type_is_non_owning_ref(u32 type)
>>  {
>> -	return type_is_ptr_alloc_obj(type) && type_flag(type) & NON_OWN_REF;
>> +	return type_is_ptr_alloc_obj(type) &&
>> +		type_flag(type) & NON_OWN_REF;
>>  }
>>  
>>  static struct btf_record *reg_btf_record(const struct bpf_reg_state *reg)
>> @@ -8012,6 +8013,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
>>  	case PTR_TO_BTF_ID | PTR_TRUSTED:
>>  	case PTR_TO_BTF_ID | MEM_RCU:
>>  	case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF:
>> +	case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU:
>>  		/* When referenced PTR_TO_BTF_ID is passed to release function,
>>  		 * its fixed offset must be 0. In the other cases, fixed offset
>>  		 * can be non-zero. This was already checked above. So pass
>> @@ -10478,6 +10480,7 @@ static int process_kf_arg_ptr_to_btf_id(struct bpf_verifier_env *env,
>>  static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
>>  {
>>  	struct bpf_verifier_state *state = env->cur_state;
>> +	struct btf_record *rec = reg_btf_record(reg);
>>  
>>  	if (!state->active_lock.ptr) {
>>  		verbose(env, "verifier internal error: ref_set_non_owning w/o active lock\n");
>> @@ -10490,6 +10493,9 @@ static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state
>>  	}
>>  
>>  	reg->type |= NON_OWN_REF;
>> +	if (rec->refcount_off >= 0)
>> +		reg->type |= MEM_RCU;
>> +
>>  	return 0;
>>  }
>>  
>> @@ -11327,10 +11333,16 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>  		struct bpf_func_state *state;
>>  		struct bpf_reg_state *reg;
>>  
>> +		if (in_rbtree_lock_required_cb(env) && (rcu_lock || rcu_unlock)) {
>> +			verbose(env, "can't rcu read {lock,unlock} in rbtree cb\n");
>> +			return -EACCES;
>> +		}
> 
> I guess it's ok to prevent cb from calling bpf_rcu_read_lock(), since it's unnecessary,
> but pls make the message more verbose. Like:
>  verbose(env, "Calling bpf_rcu_read_{lock,unlock} in unnecessary rbtree callback\n");
> 
> so that users know why the verifier complains.
> Technically it's ok to do so. Unnecessary is not a safety issue.
> 

Well, for rcu_read_unlock it would be a safety issue, no?
Feels easier to reason about if we can just say "RCU lock is
held for the duration of the callback".

>> +
>>  		if (rcu_lock) {
>>  			verbose(env, "nested rcu read lock (kernel function %s)\n", func_name);
>>  			return -EINVAL;
>>  		} else if (rcu_unlock) {
>> +			invalidate_non_owning_refs(env);
> 
> I agree with Yonghong. It probably doesn't belong here.
> rcu lock/unlock and spin_lock/unlock are separate critical sections.
> Since ref_set_non_owning() adds extra MEM_RCU flag nothing extra needs to be done here.
> Below code will make the pointers untrusted.
> 

Will change. The desire here was to not loosen constraints
on non-owning ref lifetime in this series. As I mention in
my thoughts on Patch 3 in the cover letter, I do think we can
loosen that in the future, but would like to avoid doing so in this
fixes series. Regardless, because bpf_spin_unlock will
happen before this executes, this line can be removed.

>>  			bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
>>  				if (reg->type & MEM_RCU) {
>>  					reg->type &= ~(MEM_RCU | PTR_MAYBE_NULL);
>> @@ -16679,7 +16691,8 @@ static int do_check(struct bpf_verifier_env *env)
>>  					return -EINVAL;
>>  				}
>>  
>> -				if (env->cur_state->active_rcu_lock) {
>> +				if (env->cur_state->active_rcu_lock &&
>> +				    !in_rbtree_lock_required_cb(env)) {
> 
> I'm not following here.
> Didn't you want to prevent bpf_rcu_read_lock/unlock inside cb? Why this change?
> >>  					verbose(env, "bpf_rcu_read_unlock is missing\n");
>>  					return -EINVAL;
>>  				}
>> -- 
>> 2.34.1
>>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 bpf-next 1/7] bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire
  2023-08-04  6:17         ` David Marchevsky
@ 2023-08-04 15:37           ` Yonghong Song
  0 siblings, 0 replies; 25+ messages in thread
From: Yonghong Song @ 2023-08-04 15:37 UTC (permalink / raw)
  To: David Marchevsky, Dave Marchevsky, Dave Marchevsky, bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team



On 8/3/23 11:17 PM, David Marchevsky wrote:
> On 8/2/23 5:41 PM, Yonghong Song wrote:
>>
>>
>> On 8/2/23 12:23 PM, Dave Marchevsky wrote:
>>>
>>>
>>> On 8/1/23 11:57 PM, Yonghong Song wrote:
>>>>
>>>>
>>>> On 8/1/23 1:36 PM, Dave Marchevsky wrote:
>>>>> It's straightforward to prove that kptr_struct_meta must be non-NULL for
>>>>> any valid call to these kfuncs:
>>>>>
>>>>>      * btf_parse_struct_metas in btf.c creates a btf_struct_meta for any
>>>>>        struct in user BTF with a special field (e.g. bpf_refcount,
>>>>>        {rb,list}_node). These are stored in that BTF's struct_meta_tab.
>>>>>
>>>>>      * __process_kf_arg_ptr_to_graph_node in verifier.c ensures that nodes
>>>>>        have {rb,list}_node field and that it's at the correct offset.
>>>>>        Similarly, check_kfunc_args ensures bpf_refcount field existence for
>>>>>        node param to bpf_refcount_acquire.
>>>>>
>>>>>      * So a btf_struct_meta must have been created for the struct type of
>>>>>        node param to these kfuncs
>>>>>
>>>>>      * That BTF and its struct_meta_tab are guaranteed to still be around.
>>>>>        Any arbitrary {rb,list} node the BPF program interacts with either:
>>>>>        came from bpf_obj_new or a collection removal kfunc in the same
>>>>>        program, in which case the BTF is associated with the program and
>>>>>        still around; or came from bpf_kptr_xchg, in which case the BTF was
>>>>>        associated with the map and is still around
>>>>>
>>>>> Instead of silently continuing with NULL struct_meta, which caused
>>>>> confusing bugs such as those addressed by commit 2140a6e3422d ("bpf: Set
>>>>> kptr_struct_meta for node param to list and rbtree insert funcs"), let's
>>>>> error out. Then, at runtime, we can confidently say that the
>>>>> implementations of these kfuncs were given a non-NULL kptr_struct_meta,
>>>>> meaning that special-field-specific functionality like
>>>>> bpf_obj_free_fields and the bpf_obj_drop change introduced later in this
>>>>> series are guaranteed to execute.
>>>>
>>>> The subject says '... for collection insert and refcount_acquire'.
>>>> Why picks these? We could check for all kptr_struct_meta use cases?
>>>>
>>>
>>> fixup_kfunc_call sets kptr_struct_meta arg for the following kfuncs:
>>>
>>>     - bpf_obj_new_impl
>>>     - bpf_obj_drop_impl
>>>     - collection insert kfuncs
>>>       - bpf_rbtree_add_impl
>>>       - bpf_list_push_{front,back}_impl
>>>     - bpf_refcount_acquire_impl
>>>
>>> A btf_struct_meta is only created for a struct if it has a non-null btf_record,
>>> which in turn only happens if the struct has any special fields (spin_lock,
>>> refcount, {rb,list}_node, etc.). Since it's valid to call bpf_obj_new on a
>>> struct type without any special fields, the kptr_struct_meta arg can be
>>> NULL. The result of such bpf_obj_new allocation must be bpf_obj_drop-able, so
>>> the same holds for that kfunc.
>>>
>>> By definition rbtree and list nodes must be some struct type w/
>>> struct bpf_{rb,list}_node field, and similar logic for refcounted, so if there's
>>> no kptr_struct_meta for their node arg, there was some verifier-internal issue.
>>>
>>>
>>>>>
>>>>> This patch doesn't change functionality, just makes it easier to reason
>>>>> about existing functionality.
>>>>>
>>>>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>>>>> ---
>>>>>     kernel/bpf/verifier.c | 14 ++++++++++++++
>>>>>     1 file changed, 14 insertions(+)
>>>>>
>>>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>>>>> index e7b1af016841..ec37e84a11c6 100644
>>>>> --- a/kernel/bpf/verifier.c
>>>>> +++ b/kernel/bpf/verifier.c
>>>>> @@ -18271,6 +18271,13 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>>>>             struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
>>>>>             struct bpf_insn addr[2] = { BPF_LD_IMM64(BPF_REG_2, (long)kptr_struct_meta) };
>>>>>     +        if (desc->func_id == special_kfunc_list[KF_bpf_refcount_acquire_impl] &&
>>>>
>>>> Why check for KF_bpf_refcount_acquire_impl? We can cover all cases in this 'if' branch, right?
>>>>
>>>
>>> The body of this 'else if' also handles kptr_struct_meta setup for bpf_obj_drop,
>>> for which NULL kptr_struct_meta is valid.
>>>
>>>>> +            !kptr_struct_meta) {
>>>>> +            verbose(env, "verifier internal error: kptr_struct_meta expected at insn_idx %d\n",
>>>>> +                insn_idx);
>>>>> +            return -EFAULT;
>>>>> +        }
>>>>> +
>>>>>             insn_buf[0] = addr[0];
>>>>>             insn_buf[1] = addr[1];
>>>>>             insn_buf[2] = *insn;
>>>>> @@ -18278,6 +18285,7 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>>>>         } else if (desc->func_id == special_kfunc_list[KF_bpf_list_push_back_impl] ||
>>>>>                desc->func_id == special_kfunc_list[KF_bpf_list_push_front_impl] ||
>>>>>                desc->func_id == special_kfunc_list[KF_bpf_rbtree_add_impl]) {
>>>>> +        struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
>>>>>             int struct_meta_reg = BPF_REG_3;
>>>>>             int node_offset_reg = BPF_REG_4;
>>>>>     @@ -18287,6 +18295,12 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>>>>                 node_offset_reg = BPF_REG_5;
>>>>>             }
>>>>>     +        if (!kptr_struct_meta) {
>>>>> +            verbose(env, "verifier internal error: kptr_struct_meta expected at insn_idx %d\n",
>>>>> +                insn_idx);
>>>>> +            return -EFAULT;
>>>>> +        }
>>>>> +
>>>>>             __fixup_collection_insert_kfunc(&env->insn_aux_data[insn_idx], struct_meta_reg,
>>>>>                             node_offset_reg, insn, insn_buf, cnt);
>>>>>         } else if (desc->func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx] ||
>>>>
>>>> In my opinion, such selective defensive programming is not necessary. By searching kptr_struct_meta in the code, it is reasonably easy to find
>>>> whether we have any mismatch or not. Also self test coverage should
>>>> cover these cases (probably already) right?
>>>>
>>>> If the defensive programming here is still desirable to warn at verification time, I think we should just check all of uses for kptr_struct_meta.
>>>
>>> Something like this patch probably should've been included with the series
>>> containing 2140a6e3422d ("bpf: Set kptr_struct_meta for node param to list and rbtree insert funcs"),
>>> since that commit found that kptr_struct_meta wasn't being set for collection
>>> insert kfuncs and fixed the issue. It was annoyingly hard to root-cause
>>> because, among other things, many of these kfunc impls check that
>>> the btf_struct_meta is non-NULL before using it, with some fallback logic.
>>> I don't like those unnecessary NULL checks either, and considered removing
>>> them in this patch, but decided to leave them in since we already had
>>> a case where struct_meta wasn't being set.
>>>
>>> On second thought, maybe it's better to take the unnecessary runtime checks
>>> out and leave these verification-time checks in. If, at runtime, those kfuncs
>>> see a NULL btf_struct_meta, I'd rather they fail loudly in the future
>>> with a NULL deref splat, than potentially leaking memory or similarly
>>> subtle failures. WDYT?
>>
>> Certainly I agree with you that verification failure is much better than
>> debugging runtime.
>>
>> Here, we covered a few kfunc which always requires non-NULL kptr_struct_meta. But as you mentioned in the above, we also have
>> cases where for a kfunc, the kptr_struct_meta could be NULL or non-NULL.
>>
>> Let us say, kfunc bpf_obj_new_impl, for some cases, the kptr_struct_meta
>> cannot be NULL based on bpf prog, but somehow, the verifier passes
>> a NULL ptr to the program. Should we check this at fixup_kfunc_call()
>> as well?
>>
> 
> I see what you mean. Since btf_struct_meta is a (btf_id, btf_record) tuple
> currently, and btf_record is a view of a BTF type optimized for quickly
> answering questions about special fields (existence / location / type), we
> could certainly go look at the kfunc arg BTF types to confirm, as they're the
> source of truth. But this would be redoing the work necessary to generate the
> btf_record in the first place, which feels like it defeats the purpose of
> having this view of the data at verification time. We definitely don't want
> to push this BTF digging to runtime either.
> 
> Another approach: we could focus on the specific issue that caused the
> bug which the commit I mentioned earlier fixes, and enforce the following
> invariants:
> 
>     * All bpf_obj_new_impl calls for a particular type must use the same
>       btf_struct_meta
>     * Any kfunc or helper taking a user-allocated param must use the same
>       btf_struct_meta as the bpf_obj_new_impl call that allocated it
> 
> Whole point of 'special' fields is to take special-field-specific action on
> them, or due to their existence. This would ensure that all helpers use same
> special field info. Then e.g. if bpf_obj_new's bpf_obj_init_fields does
> something to a special field, bpf_obj_drop will have the same btf_struct_meta
> and will have same field info for bpf_obj_free_fields.
> 
> Maybe instead of "bpf_obj_new_impl calls" above it's
> more accurate to say "kfunc calls that acquire the
> user-allocated type".
> 
> I think this second idea makes sense and is better than piecemeal checking.
> Will think on it more and try to incorporate it into v2 of this series.

This sounds better as it is generic
and this should also cover two cases in this patch
as they are covered by the second bullet in the above
(Any kfunc or helper taking a user-allocated param must use the same
  btf_struct_meta as the bpf_obj_new_impl call that allocated it)

> 
>>>
>>> I don't feel particularly strongly about these verification-time checks,
>>> but the level of 'selective defensive programming' here feels similar to
>>> other 'verifier internal error' checks sprinkled throughout verifier.c,
>>> so that argument doesn't feel very persuasive to me.
>>
>> I am okay with this patch but I wonder whether we can cover more
>> cases.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 bpf-next 5/7] bpf: Consider non-owning refs to refcounted nodes RCU protected
  2023-08-04  6:47     ` David Marchevsky
@ 2023-08-04 15:43       ` Yonghong Song
  0 siblings, 0 replies; 25+ messages in thread
From: Yonghong Song @ 2023-08-04 15:43 UTC (permalink / raw)
  To: David Marchevsky, Dave Marchevsky, bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team



On 8/3/23 11:47 PM, David Marchevsky wrote:
> On 8/2/23 1:59 AM, Yonghong Song wrote:
>>
>>
>> On 8/1/23 1:36 PM, Dave Marchevsky wrote:
>>> The previous patch in the series ensures that the underlying memory of
>>> nodes with bpf_refcount - which can have multiple owners - is not reused
>>> until RCU Tasks Trace grace period has elapsed. This prevents
>>> use-after-free with non-owning references that may point to
>>> recently-freed memory. While RCU read lock is held, it's safe to
>>> dereference such a non-owning ref, as by definition RCU GP couldn't have
>>> elapsed and therefore underlying memory couldn't have been reused.
>>>
>>>   From the perspective of verifier "trustedness" non-owning refs to
>>> refcounted nodes are now trusted only in RCU CS and therefore should no
>>> longer pass is_trusted_reg, but rather is_rcu_reg. Let's mark them
>>> MEM_RCU in order to reflect this new state.
>>>
>>> Similarly to bpf_spin_unlock being a non-owning ref invalidation point,
>>> where non-owning ref reg states are clobbered so that they cannot be
>>> used outside of the critical section, currently all MEM_RCU regs are
>>> marked untrusted after bpf_rcu_read_unlock. This patch makes
>>> bpf_rcu_read_unlock a non-owning ref invalidation point as well,
>>> clobbering the non-owning refs instead of marking untrusted. In the
>>> future we may want to allow untrusted non-owning refs in which case we
>>> can remove this custom logic without breaking BPF programs as it's more
>>> restrictive than the default. That's a big change in semantics, though,
>>> and this series is focused on fixing the use-after-free in most
>>> straightforward way.
>>>
>>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>>> ---
>>>    include/linux/bpf.h   |  3 ++-
>>>    kernel/bpf/verifier.c | 17 +++++++++++++++--
>>>    2 files changed, 17 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>>> index ceaa8c23287f..37fba01b061a 100644
>>> --- a/include/linux/bpf.h
>>> +++ b/include/linux/bpf.h
>>> @@ -653,7 +653,8 @@ enum bpf_type_flag {
>>>        MEM_RCU            = BIT(13 + BPF_BASE_TYPE_BITS),
>>>          /* Used to tag PTR_TO_BTF_ID | MEM_ALLOC references which are non-owning.
>>> -     * Currently only valid for linked-list and rbtree nodes.
>>> +     * Currently only valid for linked-list and rbtree nodes. If the nodes
>>> +     * have a bpf_refcount_field, they must be tagged MEM_RCU as well.
>>
>> What does 'must' here mean?
>>
> 
> Meaning that if there's any NON_OWN_REF-flagged
> PTR_TO_BTF_ID which points to a struct with a bpf_refcount field,
> it should also be flagged with MEM_RCU. If it isn't, it's a
> verifier error.
> 
>>>         */
>>>        NON_OWN_REF        = BIT(14 + BPF_BASE_TYPE_BITS),
>>>    diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>>> index 9014b469dd9d..4bda365000d3 100644
>>> --- a/kernel/bpf/verifier.c
>>> +++ b/kernel/bpf/verifier.c
>>> @@ -469,7 +469,8 @@ static bool type_is_ptr_alloc_obj(u32 type)
>>>      static bool type_is_non_owning_ref(u32 type)
>>>    {
>>> -    return type_is_ptr_alloc_obj(type) && type_flag(type) & NON_OWN_REF;
>>> +    return type_is_ptr_alloc_obj(type) &&
>>> +        type_flag(type) & NON_OWN_REF;
>>
>> There is no code change here.
>>
> 
> Yep, will undo in v2.
> 
>>>    }
>>>      static struct btf_record *reg_btf_record(const struct bpf_reg_state *reg)
>>> @@ -8012,6 +8013,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
>>>        case PTR_TO_BTF_ID | PTR_TRUSTED:
>>>        case PTR_TO_BTF_ID | MEM_RCU:
>>>        case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF:
>>> +    case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU:
>>>            /* When referenced PTR_TO_BTF_ID is passed to release function,
>>>             * its fixed offset must be 0. In the other cases, fixed offset
>>>             * can be non-zero. This was already checked above. So pass
>>> @@ -10478,6 +10480,7 @@ static int process_kf_arg_ptr_to_btf_id(struct bpf_verifier_env *env,
>>>    static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
>>>    {
>>>        struct bpf_verifier_state *state = env->cur_state;
>>> +    struct btf_record *rec = reg_btf_record(reg);
>>>          if (!state->active_lock.ptr) {
>>>            verbose(env, "verifier internal error: ref_set_non_owning w/o active lock\n");
>>> @@ -10490,6 +10493,9 @@ static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state
>>>        }
>>>          reg->type |= NON_OWN_REF;
>>> +    if (rec->refcount_off >= 0)
>>> +        reg->type |= MEM_RCU;
>>
>> Should we check whether the state is in rcu cs before marking MEM_RCU?
>>
> 
> I think this is implicitly being enforced.
> Rbtree/list kfuncs must be called under bpf_spin_lock,
> and this series requires bpf_spin_{lock,unlock} helpers
> to called in RCU CS if the BPF prog is sleepable.

I see.

Alexei early mentioned that
there is no need to put bpf_spin_lock inside RCU CS
if for sleepable program, we do preempt_disable before
real arch_spin_lock(). This is similar to to regular
spin_lock/raw_spin_lock which does preempt_disable
so spin_lock/spin_unlock region becomes an ATOMIC
region which prevents any blocking.

Not sure whether you want to implement in this
patch set or in later patch set.

> 
>>> +
>>>        return 0;
>>>    }
>>>    @@ -11327,10 +11333,16 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>>            struct bpf_func_state *state;
>>>            struct bpf_reg_state *reg;
>>>    +        if (in_rbtree_lock_required_cb(env) && (rcu_lock || rcu_unlock)) {
>>> +            verbose(env, "can't rcu read {lock,unlock} in rbtree cb\n");
>>> +            return -EACCES;
>>> +        }
>>> +
>>>            if (rcu_lock) {
>>>                verbose(env, "nested rcu read lock (kernel function %s)\n", func_name);
>>>                return -EINVAL;
>>>            } else if (rcu_unlock) {
>>> +            invalidate_non_owning_refs(env);
>>
>> If we have both spin lock and rcu like
>>
>>       bpf_rcu_read_lock()
>>       ...
>>       bpf_spin_lock()
>>       ...
>>       bpf_spin_unlock()  <=== invalidate all non_owning_refs
>>       ...                <=== MEM_RCU type is gone
>>       bpf_rcu_read_unlock()
>>
>> Maybe we could fine tune here to preserve MEM_RCU after bpf_spin_unlock()?
>>
> 
> IIUC, you're saying that we should no longer
> have non-owning refs get clobbered after bpf_spin_unlock,
> and instead just have rcu_read_unlock do its default
> "MEM_RCU refs become PTR_UNTRUSTED" logic.
> 
> In the cover letter I mention that this is probably
> the direction we want to go in in the long term, on
> the comments on patch 3:
> 
>    This might
>    allow custom non-owning ref lifetime + invalidation logic to be
>    entirely subsumed by MEM_RCU handling.
> 
> But I'm hesitant to do that in this fixes series
> as I'd like to minimize changes that could introduce
> additional bugs. This series' current changes keep the
> clobbering rules effectively unchanged - can always
> loosen them in the future. Also, I think we should
> make this change for _all_ non-owning refs, (w/ and w/o
> bpf_refcount field). Otherwise the verifier lifetime
> of non-owning refs would change if BPF program writer
> adds bpf_refcount field to their struct, or removes it.
>   
> 
>>>                bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
>>>                    if (reg->type & MEM_RCU) {
>>>                        reg->type &= ~(MEM_RCU | PTR_MAYBE_NULL);
>>> @@ -16679,7 +16691,8 @@ static int do_check(struct bpf_verifier_env *env)
>>>                        return -EINVAL;
>>>                    }
>>>    -                if (env->cur_state->active_rcu_lock) {
>>> +                if (env->cur_state->active_rcu_lock &&
>>> +                    !in_rbtree_lock_required_cb(env)) {
>>>                        verbose(env, "bpf_rcu_read_unlock is missing\n");
>>>                        return -EINVAL;
>>>                    }

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2023-08-04 15:43 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-01 20:36 [PATCH v1 bpf-next 0/7] BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes Dave Marchevsky
2023-08-01 20:36 ` [PATCH v1 bpf-next 1/7] bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire Dave Marchevsky
2023-08-02  3:57   ` Yonghong Song
2023-08-02 19:23     ` Dave Marchevsky
2023-08-02 21:41       ` Yonghong Song
2023-08-04  6:17         ` David Marchevsky
2023-08-04 15:37           ` Yonghong Song
2023-08-01 20:36 ` [PATCH v1 bpf-next 2/7] bpf: Consider non-owning refs trusted Dave Marchevsky
2023-08-02  4:11   ` Yonghong Song
2023-08-01 20:36 ` [PATCH v1 bpf-next 3/7] bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes Dave Marchevsky
2023-08-02  4:15   ` Yonghong Song
2023-08-01 20:36 ` [PATCH v1 bpf-next 4/7] bpf: Reenable bpf_refcount_acquire Dave Marchevsky
2023-08-02  5:21   ` Yonghong Song
2023-08-01 20:36 ` [PATCH v1 bpf-next 5/7] bpf: Consider non-owning refs to refcounted nodes RCU protected Dave Marchevsky
2023-08-02  5:59   ` Yonghong Song
2023-08-04  6:47     ` David Marchevsky
2023-08-04 15:43       ` Yonghong Song
2023-08-02 22:50   ` Alexei Starovoitov
2023-08-04  6:55     ` David Marchevsky
2023-08-01 20:36 ` [PATCH v1 bpf-next 6/7] [RFC] bpf: Allow bpf_spin_{lock,unlock} in sleepable prog's RCU CS Dave Marchevsky
2023-08-02  6:33   ` Yonghong Song
2023-08-02 22:55   ` Alexei Starovoitov
2023-08-01 20:36 ` [PATCH v1 bpf-next 7/7] selftests/bpf: Add tests for rbtree API interaction in sleepable progs Dave Marchevsky
2023-08-02 23:07   ` Alexei Starovoitov
2023-08-02  3:07 ` [PATCH v1 bpf-next 0/7] BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes Yonghong Song

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.