netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 bpf-next 0/5] bpf: Introduce kptr_rcu.
@ 2023-02-28  4:01 Alexei Starovoitov
  2023-02-28  4:01 ` [PATCH v3 bpf-next 1/5] bpf: Rename __kptr_ref -> __kptr and __kptr -> __kptr_untrusted Alexei Starovoitov
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Alexei Starovoitov @ 2023-02-28  4:01 UTC (permalink / raw)
  To: davem
  Cc: daniel, andrii, martin.lau, void, davemarchevsky, tj, memxor,
	netdev, bpf, kernel-team

From: Alexei Starovoitov <ast@kernel.org>

v2->v3:
- Instead of requiring bpf progs to tag fields with __kptr_rcu
teach the verifier to infer RCU properties based on the type.
BPF_KPTR_RCU becomes kernel internal type of struct btf_field.
- Add patch 2 to tag cgroups and dfl_cgrp as trusted.
That bug was spotted by BPF CI on clang compiler kernels,
since patch 3 is doing:
static bool in_rcu_cs(struct bpf_verifier_env *env)
{
        return env->cur_state->active_rcu_lock || !env->prog->aux->sleepable;
}
which makes all non-sleepable programs behave like they have implicit
rcu_read_lock around them. Which is the case in practice.
It was fine on gcc compiled kernels where task->cgroup deference was producing
PTR_TO_BTF_ID, but on clang compiled kernels task->cgroup deference was
producing PTR_TO_BTF_ID | MEM_RCU | MAYBE_NULL, which is more correct,
but selftests were failing. Patch 2 fixes this discrepancy.
With few more patches like patch 2 we can make KF_TRUSTED_ARGS default
for kfuncs and helpers.
- Add comment in selftest patch 5 that it's verifier only check.

v1->v2:
Instead of agressively allow dereferenced kptr_rcu pointers into KF_TRUSTED_ARGS
kfuncs only allow them into KF_RCU funcs.
The KF_RCU flag is a weaker version of KF_TRUSTED_ARGS. The kfuncs marked with
KF_RCU expect either PTR_TRUSTED or MEM_RCU arguments. The verifier guarantees
that the objects are valid and there is no use-after-free, but the pointers
maybe NULL and pointee object's reference count could have reached zero, hence
kfuncs must do != NULL check and consider refcnt==0 case when accessing such
arguments.
No changes in patch 1.
Patches 2,3,4 adjusted with above behavior.

v1:
The __kptr_ref turned out to be too limited, since any "trusted" pointer access
requires bpf_kptr_xchg() which is impractical when the same pointer needs
to be dereferenced by multiple cpus.
The __kptr "untrusted" only access isn't very useful in practice.
Rename __kptr to __kptr_untrusted with eventual goal to deprecate it,
and rename __kptr_ref to __kptr, since that looks to be more common use of kptrs.
Introduce __kptr_rcu that can be directly dereferenced and used similar
to native kernel C code.
Once bpf_cpumask and task_struct kfuncs are converted to observe RCU GP
when refcnt goes to zero, both __kptr and __kptr_untrusted can be deprecated
and __kptr_rcu can become the only __kptr tag.

Alexei Starovoitov (5):
  bpf: Rename __kptr_ref -> __kptr and __kptr -> __kptr_untrusted.
  bpf: Mark cgroups and dfl_cgrp fields as trusted.
  bpf: Introduce kptr_rcu.
  selftests/bpf: Add a test case for kptr_rcu.
  selftests/bpf: Tweak cgroup kfunc test.

 Documentation/bpf/bpf_design_QA.rst           |  4 +-
 Documentation/bpf/cpumasks.rst                |  4 +-
 Documentation/bpf/kfuncs.rst                  | 13 ++++---
 include/linux/bpf.h                           | 15 +++++---
 include/linux/btf.h                           |  2 +-
 kernel/bpf/btf.c                              | 20 +++++++++-
 kernel/bpf/helpers.c                          |  7 ++--
 kernel/bpf/syscall.c                          |  4 ++
 kernel/bpf/verifier.c                         | 38 +++++++++++++------
 net/bpf/test_run.c                            |  3 +-
 tools/lib/bpf/bpf_helpers.h                   |  2 +-
 tools/testing/selftests/bpf/progs/cb_refs.c   |  2 +-
 .../selftests/bpf/progs/cgrp_kfunc_common.h   |  2 +-
 .../selftests/bpf/progs/cgrp_kfunc_failure.c  |  2 +-
 .../selftests/bpf/progs/cgrp_kfunc_success.c  |  7 +++-
 .../selftests/bpf/progs/cpumask_common.h      |  2 +-
 .../selftests/bpf/progs/jit_probe_mem.c       |  2 +-
 tools/testing/selftests/bpf/progs/lru_bug.c   |  2 +-
 tools/testing/selftests/bpf/progs/map_kptr.c  | 18 ++++++++-
 .../selftests/bpf/progs/map_kptr_fail.c       | 10 ++---
 .../selftests/bpf/progs/task_kfunc_common.h   |  2 +-
 tools/testing/selftests/bpf/test_verifier.c   | 22 +++++------
 tools/testing/selftests/bpf/verifier/calls.c  |  2 +-
 .../testing/selftests/bpf/verifier/map_kptr.c |  2 +-
 24 files changed, 125 insertions(+), 62 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 bpf-next 1/5] bpf: Rename __kptr_ref -> __kptr and __kptr -> __kptr_untrusted.
  2023-02-28  4:01 [PATCH v3 bpf-next 0/5] bpf: Introduce kptr_rcu Alexei Starovoitov
@ 2023-02-28  4:01 ` Alexei Starovoitov
  2023-02-28 15:16   ` David Vernet
  2023-02-28  4:01 ` [PATCH v3 bpf-next 2/5] bpf: Mark cgroups and dfl_cgrp fields as trusted Alexei Starovoitov
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Alexei Starovoitov @ 2023-02-28  4:01 UTC (permalink / raw)
  To: davem
  Cc: daniel, andrii, martin.lau, void, davemarchevsky, tj, memxor,
	netdev, bpf, kernel-team

From: Alexei Starovoitov <ast@kernel.org>

__kptr meant to store PTR_UNTRUSTED kernel pointers inside bpf maps.
The concept felt useful, but didn't get much traction,
since bpf_rdonly_cast() was added soon after and bpf programs received
a simpler way to access PTR_UNTRUSTED kernel pointers
without going through restrictive __kptr usage.

Rename __kptr_ref -> __kptr and __kptr -> __kptr_untrusted to indicate
its intended usage.
The main goal of __kptr_untrusted was to read/write such pointers
directly while bpf_kptr_xchg was a mechanism to access refcnted
kernel pointers. The next patch will allow RCU protected __kptr access
with direct read. At that point __kptr_untrusted will be deprecated.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 Documentation/bpf/bpf_design_QA.rst           |  4 ++--
 Documentation/bpf/cpumasks.rst                |  4 ++--
 Documentation/bpf/kfuncs.rst                  |  2 +-
 kernel/bpf/btf.c                              |  4 ++--
 tools/lib/bpf/bpf_helpers.h                   |  2 +-
 tools/testing/selftests/bpf/progs/cb_refs.c   |  2 +-
 .../selftests/bpf/progs/cgrp_kfunc_common.h   |  2 +-
 .../selftests/bpf/progs/cpumask_common.h      |  2 +-
 .../selftests/bpf/progs/jit_probe_mem.c       |  2 +-
 tools/testing/selftests/bpf/progs/lru_bug.c   |  2 +-
 tools/testing/selftests/bpf/progs/map_kptr.c  |  4 ++--
 .../selftests/bpf/progs/map_kptr_fail.c       |  6 ++---
 .../selftests/bpf/progs/task_kfunc_common.h   |  2 +-
 tools/testing/selftests/bpf/test_verifier.c   | 22 +++++++++----------
 14 files changed, 30 insertions(+), 30 deletions(-)

diff --git a/Documentation/bpf/bpf_design_QA.rst b/Documentation/bpf/bpf_design_QA.rst
index bfff0e7e37c2..38372a956d65 100644
--- a/Documentation/bpf/bpf_design_QA.rst
+++ b/Documentation/bpf/bpf_design_QA.rst
@@ -314,7 +314,7 @@ Q: What is the compatibility story for special BPF types in map values?
 Q: Users are allowed to embed bpf_spin_lock, bpf_timer fields in their BPF map
 values (when using BTF support for BPF maps). This allows to use helpers for
 such objects on these fields inside map values. Users are also allowed to embed
-pointers to some kernel types (with __kptr and __kptr_ref BTF tags). Will the
+pointers to some kernel types (with __kptr_untrusted and __kptr BTF tags). Will the
 kernel preserve backwards compatibility for these features?
 
 A: It depends. For bpf_spin_lock, bpf_timer: YES, for kptr and everything else:
@@ -324,7 +324,7 @@ For struct types that have been added already, like bpf_spin_lock and bpf_timer,
 the kernel will preserve backwards compatibility, as they are part of UAPI.
 
 For kptrs, they are also part of UAPI, but only with respect to the kptr
-mechanism. The types that you can use with a __kptr and __kptr_ref tagged
+mechanism. The types that you can use with a __kptr_untrusted and __kptr tagged
 pointer in your struct are NOT part of the UAPI contract. The supported types can
 and will change across kernel releases. However, operations like accessing kptr
 fields and bpf_kptr_xchg() helper will continue to be supported across kernel
diff --git a/Documentation/bpf/cpumasks.rst b/Documentation/bpf/cpumasks.rst
index 24bef9cbbeee..75344cd230e5 100644
--- a/Documentation/bpf/cpumasks.rst
+++ b/Documentation/bpf/cpumasks.rst
@@ -51,7 +51,7 @@ A ``struct bpf_cpumask *`` is allocated, acquired, and released, using the
 .. code-block:: c
 
         struct cpumask_map_value {
-                struct bpf_cpumask __kptr_ref * cpumask;
+                struct bpf_cpumask __kptr * cpumask;
         };
 
         struct array_map {
@@ -128,7 +128,7 @@ a map, the reference can be removed from the map with bpf_kptr_xchg(), or
 
 	/* struct containing the struct bpf_cpumask kptr which is stored in the map. */
 	struct cpumasks_kfunc_map_value {
-		struct bpf_cpumask __kptr_ref * bpf_cpumask;
+		struct bpf_cpumask __kptr * bpf_cpumask;
 	};
 
 	/* The map containing struct cpumasks_kfunc_map_value entries. */
diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst
index 226313747be5..7d7c1144372a 100644
--- a/Documentation/bpf/kfuncs.rst
+++ b/Documentation/bpf/kfuncs.rst
@@ -527,7 +527,7 @@ You may also acquire a reference to a ``struct cgroup`` kptr that's already
 
 	/* struct containing the struct task_struct kptr which is actually stored in the map. */
 	struct __cgroups_kfunc_map_value {
-		struct cgroup __kptr_ref * cgroup;
+		struct cgroup __kptr * cgroup;
 	};
 
 	/* The map containing struct __cgroups_kfunc_map_value entries. */
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index fa22ec79ac0e..01dee7d48e6d 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3283,9 +3283,9 @@ static int btf_find_kptr(const struct btf *btf, const struct btf_type *t,
 	/* Reject extra tags */
 	if (btf_type_is_type_tag(btf_type_by_id(btf, t->type)))
 		return -EINVAL;
-	if (!strcmp("kptr", __btf_name_by_offset(btf, t->name_off)))
+	if (!strcmp("kptr_untrusted", __btf_name_by_offset(btf, t->name_off)))
 		type = BPF_KPTR_UNREF;
-	else if (!strcmp("kptr_ref", __btf_name_by_offset(btf, t->name_off)))
+	else if (!strcmp("kptr", __btf_name_by_offset(btf, t->name_off)))
 		type = BPF_KPTR_REF;
 	else
 		return -EINVAL;
diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h
index 5ec1871acb2f..7d12d3e620cc 100644
--- a/tools/lib/bpf/bpf_helpers.h
+++ b/tools/lib/bpf/bpf_helpers.h
@@ -174,8 +174,8 @@ enum libbpf_tristate {
 
 #define __kconfig __attribute__((section(".kconfig")))
 #define __ksym __attribute__((section(".ksyms")))
+#define __kptr_untrusted __attribute__((btf_type_tag("kptr_untrusted")))
 #define __kptr __attribute__((btf_type_tag("kptr")))
-#define __kptr_ref __attribute__((btf_type_tag("kptr_ref")))
 
 #ifndef ___bpf_concat
 #define ___bpf_concat(a, b) a ## b
diff --git a/tools/testing/selftests/bpf/progs/cb_refs.c b/tools/testing/selftests/bpf/progs/cb_refs.c
index 7653df1bc787..ce96b33e38d6 100644
--- a/tools/testing/selftests/bpf/progs/cb_refs.c
+++ b/tools/testing/selftests/bpf/progs/cb_refs.c
@@ -4,7 +4,7 @@
 #include <bpf/bpf_helpers.h>
 
 struct map_value {
-	struct prog_test_ref_kfunc __kptr_ref *ptr;
+	struct prog_test_ref_kfunc __kptr *ptr;
 };
 
 struct {
diff --git a/tools/testing/selftests/bpf/progs/cgrp_kfunc_common.h b/tools/testing/selftests/bpf/progs/cgrp_kfunc_common.h
index 2f8de933b957..d0b7cd0d09d7 100644
--- a/tools/testing/selftests/bpf/progs/cgrp_kfunc_common.h
+++ b/tools/testing/selftests/bpf/progs/cgrp_kfunc_common.h
@@ -10,7 +10,7 @@
 #include <bpf/bpf_tracing.h>
 
 struct __cgrps_kfunc_map_value {
-	struct cgroup __kptr_ref * cgrp;
+	struct cgroup __kptr * cgrp;
 };
 
 struct hash_map {
diff --git a/tools/testing/selftests/bpf/progs/cpumask_common.h b/tools/testing/selftests/bpf/progs/cpumask_common.h
index ad34f3b602be..65e5496ca1b2 100644
--- a/tools/testing/selftests/bpf/progs/cpumask_common.h
+++ b/tools/testing/selftests/bpf/progs/cpumask_common.h
@@ -10,7 +10,7 @@
 int err;
 
 struct __cpumask_map_value {
-	struct bpf_cpumask __kptr_ref * cpumask;
+	struct bpf_cpumask __kptr * cpumask;
 };
 
 struct array_map {
diff --git a/tools/testing/selftests/bpf/progs/jit_probe_mem.c b/tools/testing/selftests/bpf/progs/jit_probe_mem.c
index 2d2e61470794..13f00ca2ed0a 100644
--- a/tools/testing/selftests/bpf/progs/jit_probe_mem.c
+++ b/tools/testing/selftests/bpf/progs/jit_probe_mem.c
@@ -4,7 +4,7 @@
 #include <bpf/bpf_tracing.h>
 #include <bpf/bpf_helpers.h>
 
-static struct prog_test_ref_kfunc __kptr_ref *v;
+static struct prog_test_ref_kfunc __kptr *v;
 long total_sum = -1;
 
 extern struct prog_test_ref_kfunc *bpf_kfunc_call_test_acquire(unsigned long *sp) __ksym;
diff --git a/tools/testing/selftests/bpf/progs/lru_bug.c b/tools/testing/selftests/bpf/progs/lru_bug.c
index 687081a724b3..ad73029cb1e3 100644
--- a/tools/testing/selftests/bpf/progs/lru_bug.c
+++ b/tools/testing/selftests/bpf/progs/lru_bug.c
@@ -4,7 +4,7 @@
 #include <bpf/bpf_helpers.h>
 
 struct map_value {
-	struct task_struct __kptr *ptr;
+	struct task_struct __kptr_untrusted *ptr;
 };
 
 struct {
diff --git a/tools/testing/selftests/bpf/progs/map_kptr.c b/tools/testing/selftests/bpf/progs/map_kptr.c
index 228ec45365a8..4a7da6cb5800 100644
--- a/tools/testing/selftests/bpf/progs/map_kptr.c
+++ b/tools/testing/selftests/bpf/progs/map_kptr.c
@@ -4,8 +4,8 @@
 #include <bpf/bpf_helpers.h>
 
 struct map_value {
-	struct prog_test_ref_kfunc __kptr *unref_ptr;
-	struct prog_test_ref_kfunc __kptr_ref *ref_ptr;
+	struct prog_test_ref_kfunc __kptr_untrusted *unref_ptr;
+	struct prog_test_ref_kfunc __kptr *ref_ptr;
 };
 
 struct array_map {
diff --git a/tools/testing/selftests/bpf/progs/map_kptr_fail.c b/tools/testing/selftests/bpf/progs/map_kptr_fail.c
index 760e41e1a632..e19e2a5f38cf 100644
--- a/tools/testing/selftests/bpf/progs/map_kptr_fail.c
+++ b/tools/testing/selftests/bpf/progs/map_kptr_fail.c
@@ -7,9 +7,9 @@
 
 struct map_value {
 	char buf[8];
-	struct prog_test_ref_kfunc __kptr *unref_ptr;
-	struct prog_test_ref_kfunc __kptr_ref *ref_ptr;
-	struct prog_test_member __kptr_ref *ref_memb_ptr;
+	struct prog_test_ref_kfunc __kptr_untrusted *unref_ptr;
+	struct prog_test_ref_kfunc __kptr *ref_ptr;
+	struct prog_test_member __kptr *ref_memb_ptr;
 };
 
 struct array_map {
diff --git a/tools/testing/selftests/bpf/progs/task_kfunc_common.h b/tools/testing/selftests/bpf/progs/task_kfunc_common.h
index c0ffd171743e..4c2a4b0e3a25 100644
--- a/tools/testing/selftests/bpf/progs/task_kfunc_common.h
+++ b/tools/testing/selftests/bpf/progs/task_kfunc_common.h
@@ -10,7 +10,7 @@
 #include <bpf/bpf_tracing.h>
 
 struct __tasks_kfunc_map_value {
-	struct task_struct __kptr_ref * task;
+	struct task_struct __kptr * task;
 };
 
 struct hash_map {
diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index 8b9949bb833d..49a70d9beb0b 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -699,13 +699,13 @@ static int create_cgroup_storage(bool percpu)
  *   struct bpf_timer t;
  * };
  * struct btf_ptr {
+ *   struct prog_test_ref_kfunc __kptr_untrusted *ptr;
  *   struct prog_test_ref_kfunc __kptr *ptr;
- *   struct prog_test_ref_kfunc __kptr_ref *ptr;
- *   struct prog_test_member __kptr_ref *ptr;
+ *   struct prog_test_member __kptr *ptr;
  * }
  */
 static const char btf_str_sec[] = "\0bpf_spin_lock\0val\0cnt\0l\0bpf_timer\0timer\0t"
-				  "\0btf_ptr\0prog_test_ref_kfunc\0ptr\0kptr\0kptr_ref"
+				  "\0btf_ptr\0prog_test_ref_kfunc\0ptr\0kptr\0kptr_untrusted"
 				  "\0prog_test_member";
 static __u32 btf_raw_types[] = {
 	/* int */
@@ -724,20 +724,20 @@ static __u32 btf_raw_types[] = {
 	BTF_MEMBER_ENC(41, 4, 0), /* struct bpf_timer t; */
 	/* struct prog_test_ref_kfunc */		/* [6] */
 	BTF_STRUCT_ENC(51, 0, 0),
-	BTF_STRUCT_ENC(89, 0, 0),			/* [7] */
+	BTF_STRUCT_ENC(95, 0, 0),			/* [7] */
+	/* type tag "kptr_untrusted" */
+	BTF_TYPE_TAG_ENC(80, 6),			/* [8] */
 	/* type tag "kptr" */
-	BTF_TYPE_TAG_ENC(75, 6),			/* [8] */
-	/* type tag "kptr_ref" */
-	BTF_TYPE_TAG_ENC(80, 6),			/* [9] */
-	BTF_TYPE_TAG_ENC(80, 7),			/* [10] */
+	BTF_TYPE_TAG_ENC(75, 6),			/* [9] */
+	BTF_TYPE_TAG_ENC(75, 7),			/* [10] */
 	BTF_PTR_ENC(8),					/* [11] */
 	BTF_PTR_ENC(9),					/* [12] */
 	BTF_PTR_ENC(10),				/* [13] */
 	/* struct btf_ptr */				/* [14] */
 	BTF_STRUCT_ENC(43, 3, 24),
-	BTF_MEMBER_ENC(71, 11, 0), /* struct prog_test_ref_kfunc __kptr *ptr; */
-	BTF_MEMBER_ENC(71, 12, 64), /* struct prog_test_ref_kfunc __kptr_ref *ptr; */
-	BTF_MEMBER_ENC(71, 13, 128), /* struct prog_test_member __kptr_ref *ptr; */
+	BTF_MEMBER_ENC(71, 11, 0), /* struct prog_test_ref_kfunc __kptr_untrusted *ptr; */
+	BTF_MEMBER_ENC(71, 12, 64), /* struct prog_test_ref_kfunc __kptr *ptr; */
+	BTF_MEMBER_ENC(71, 13, 128), /* struct prog_test_member __kptr *ptr; */
 };
 
 static char bpf_vlog[UINT_MAX >> 8];
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 bpf-next 2/5] bpf: Mark cgroups and dfl_cgrp fields as trusted.
  2023-02-28  4:01 [PATCH v3 bpf-next 0/5] bpf: Introduce kptr_rcu Alexei Starovoitov
  2023-02-28  4:01 ` [PATCH v3 bpf-next 1/5] bpf: Rename __kptr_ref -> __kptr and __kptr -> __kptr_untrusted Alexei Starovoitov
@ 2023-02-28  4:01 ` Alexei Starovoitov
  2023-02-28 16:45   ` David Vernet
  2023-02-28 19:32   ` Tejun Heo
  2023-02-28  4:01 ` [PATCH v3 bpf-next 3/5] bpf: Introduce kptr_rcu Alexei Starovoitov
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 17+ messages in thread
From: Alexei Starovoitov @ 2023-02-28  4:01 UTC (permalink / raw)
  To: davem
  Cc: daniel, andrii, martin.lau, void, davemarchevsky, tj, memxor,
	netdev, bpf, kernel-team

From: Alexei Starovoitov <ast@kernel.org>

bpf programs sometimes do:
bpf_cgrp_storage_get(&map, task->cgroups->dfl_cgrp, ...);
It is safe to do, because cgroups->dfl_cgrp pointer is set diring init and
never changes. The task->cgroups is also never NULL. It is also set during init
and will change when task switches cgroups. For any trusted task pointer
dereference of cgroups and dfl_cgrp should yield trusted pointers. The verifier
wasn't aware of this. Hence in gcc compiled kernels task->cgroups dereference
was producing PTR_TO_BTF_ID without modifiers while in clang compiled kernels
the verifier recognizes __rcu tag in cgroups field and produces
PTR_TO_BTF_ID | MEM_RCU | MAYBE_NULL.
Tag cgroups and dfl_cgrp as trusted to equalize clang and gcc behavior.
When GCC supports btf_type_tag such tagging will done directly in the type.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 kernel/bpf/verifier.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 5cb8b623f639..e4234266e76d 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5011,6 +5011,10 @@ static int bpf_map_direct_read(struct bpf_map *map, int off, int size, u64 *val)
 
 BTF_TYPE_SAFE_NESTED(struct task_struct) {
 	const cpumask_t *cpus_ptr;
+	struct css_set *cgroups;
+};
+BTF_TYPE_SAFE_NESTED(struct css_set) {
+	struct cgroup *dfl_cgrp;
 };
 
 static bool nested_ptr_is_trusted(struct bpf_verifier_env *env,
@@ -5022,6 +5026,7 @@ static bool nested_ptr_is_trusted(struct bpf_verifier_env *env,
 		return false;
 
 	BTF_TYPE_EMIT(BTF_TYPE_SAFE_NESTED(struct task_struct));
+	BTF_TYPE_EMIT(BTF_TYPE_SAFE_NESTED(struct css_set));
 
 	return btf_nested_type_is_trusted(&env->log, reg, off);
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 bpf-next 3/5] bpf: Introduce kptr_rcu.
  2023-02-28  4:01 [PATCH v3 bpf-next 0/5] bpf: Introduce kptr_rcu Alexei Starovoitov
  2023-02-28  4:01 ` [PATCH v3 bpf-next 1/5] bpf: Rename __kptr_ref -> __kptr and __kptr -> __kptr_untrusted Alexei Starovoitov
  2023-02-28  4:01 ` [PATCH v3 bpf-next 2/5] bpf: Mark cgroups and dfl_cgrp fields as trusted Alexei Starovoitov
@ 2023-02-28  4:01 ` Alexei Starovoitov
  2023-02-28 16:45   ` David Vernet
  2023-02-28 19:39   ` Tejun Heo
  2023-02-28  4:01 ` [PATCH v3 bpf-next 4/5] selftests/bpf: Add a test case for kptr_rcu Alexei Starovoitov
  2023-02-28  4:01 ` [PATCH v3 bpf-next 5/5] selftests/bpf: Tweak cgroup kfunc test Alexei Starovoitov
  4 siblings, 2 replies; 17+ messages in thread
From: Alexei Starovoitov @ 2023-02-28  4:01 UTC (permalink / raw)
  To: davem
  Cc: daniel, andrii, martin.lau, void, davemarchevsky, tj, memxor,
	netdev, bpf, kernel-team

From: Alexei Starovoitov <ast@kernel.org>

The life time of certain kernel structures like 'struct cgroup' is protected by RCU.
Hence it's safe to dereference them directly from __kptr tagged pointers in bpf maps.
The resulting pointer is MEM_RCU and can be passed to kfuncs that expect KF_RCU.
Derefrence of other kptr-s returns PTR_UNTRUSTED.

For example:
struct map_value {
   struct cgroup __kptr *cgrp;
};

SEC("tp_btf/cgroup_mkdir")
int BPF_PROG(test_cgrp_get_ancestors, struct cgroup *cgrp_arg, const char *path)
{
  struct cgroup *cg, *cg2;

  cg = bpf_cgroup_acquire(cgrp_arg); // cg is PTR_TRUSTED and ref_obj_id > 0
  bpf_kptr_xchg(&v->cgrp, cg);

  cg2 = v->cgrp; // This is new feature introduced by this patch.
  // cg2 is PTR_MAYBE_NULL | MEM_RCU.
  // When cg2 != NULL, it's a valid cgroup, but its percpu_ref could be zero

  bpf_cgroup_ancestor(cg2, level); // safe to do.
}

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 Documentation/bpf/kfuncs.rst                  | 11 ++++---
 include/linux/bpf.h                           | 15 ++++++---
 include/linux/btf.h                           |  2 +-
 kernel/bpf/btf.c                              | 16 +++++++++
 kernel/bpf/helpers.c                          |  7 ++--
 kernel/bpf/syscall.c                          |  4 +++
 kernel/bpf/verifier.c                         | 33 ++++++++++++-------
 net/bpf/test_run.c                            |  3 +-
 .../selftests/bpf/progs/map_kptr_fail.c       |  4 +--
 tools/testing/selftests/bpf/verifier/calls.c  |  2 +-
 .../testing/selftests/bpf/verifier/map_kptr.c |  2 +-
 11 files changed, 69 insertions(+), 30 deletions(-)

diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst
index 7d7c1144372a..49c5cb6f46e7 100644
--- a/Documentation/bpf/kfuncs.rst
+++ b/Documentation/bpf/kfuncs.rst
@@ -232,11 +232,12 @@ added later.
 2.4.8 KF_RCU flag
 -----------------
 
-The KF_RCU flag is used for kfuncs which have a rcu ptr as its argument.
-When used together with KF_ACQUIRE, it indicates the kfunc should have a
-single argument which must be a trusted argument or a MEM_RCU pointer.
-The argument may have reference count of 0 and the kfunc must take this
-into consideration.
+The KF_RCU flag is a weaker version of KF_TRUSTED_ARGS. The kfuncs marked with
+KF_RCU expect either PTR_TRUSTED or MEM_RCU arguments. The verifier guarantees
+that the objects are valid and there is no use-after-free, but the pointers
+maybe NULL and pointee object's reference count could have reached zero, hence
+kfuncs must do != NULL check and consider refcnt==0 case when accessing such
+arguments.
 
 .. _KF_deprecated_flag:
 
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 520b238abd5a..d4b5faa0a777 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -178,11 +178,12 @@ enum btf_field_type {
 	BPF_TIMER      = (1 << 1),
 	BPF_KPTR_UNREF = (1 << 2),
 	BPF_KPTR_REF   = (1 << 3),
-	BPF_KPTR       = BPF_KPTR_UNREF | BPF_KPTR_REF,
-	BPF_LIST_HEAD  = (1 << 4),
-	BPF_LIST_NODE  = (1 << 5),
-	BPF_RB_ROOT    = (1 << 6),
-	BPF_RB_NODE    = (1 << 7),
+	BPF_KPTR_RCU   = (1 << 4), /* kernel internal. not exposed to bpf prog */
+	BPF_KPTR       = BPF_KPTR_UNREF | BPF_KPTR_REF | BPF_KPTR_RCU,
+	BPF_LIST_HEAD  = (1 << 5),
+	BPF_LIST_NODE  = (1 << 6),
+	BPF_RB_ROOT    = (1 << 7),
+	BPF_RB_NODE    = (1 << 8),
 	BPF_GRAPH_NODE_OR_ROOT = BPF_LIST_NODE | BPF_LIST_HEAD |
 				 BPF_RB_NODE | BPF_RB_ROOT,
 };
@@ -284,6 +285,8 @@ static inline const char *btf_field_type_name(enum btf_field_type type)
 	case BPF_KPTR_UNREF:
 	case BPF_KPTR_REF:
 		return "kptr";
+	case BPF_KPTR_RCU:
+		return "kptr_rcu";
 	case BPF_LIST_HEAD:
 		return "bpf_list_head";
 	case BPF_LIST_NODE:
@@ -307,6 +310,7 @@ static inline u32 btf_field_type_size(enum btf_field_type type)
 		return sizeof(struct bpf_timer);
 	case BPF_KPTR_UNREF:
 	case BPF_KPTR_REF:
+	case BPF_KPTR_RCU:
 		return sizeof(u64);
 	case BPF_LIST_HEAD:
 		return sizeof(struct bpf_list_head);
@@ -331,6 +335,7 @@ static inline u32 btf_field_type_align(enum btf_field_type type)
 		return __alignof__(struct bpf_timer);
 	case BPF_KPTR_UNREF:
 	case BPF_KPTR_REF:
+	case BPF_KPTR_RCU:
 		return __alignof__(u64);
 	case BPF_LIST_HEAD:
 		return __alignof__(struct bpf_list_head);
diff --git a/include/linux/btf.h b/include/linux/btf.h
index 49e0fe6d8274..556b3e2e7471 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -70,7 +70,7 @@
 #define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */
 #define KF_SLEEPABLE    (1 << 5) /* kfunc may sleep */
 #define KF_DESTRUCTIVE  (1 << 6) /* kfunc performs destructive actions */
-#define KF_RCU          (1 << 7) /* kfunc only takes rcu pointer arguments */
+#define KF_RCU          (1 << 7) /* kfunc takes either rcu or trusted pointer arguments */
 
 /*
  * Tag marking a kernel function as a kfunc. This is meant to minimize the
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 01dee7d48e6d..a44ea1f6164b 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3552,6 +3552,18 @@ static int btf_find_field(const struct btf *btf, const struct btf_type *t,
 	return -EINVAL;
 }
 
+BTF_SET_START(rcu_protected_types)
+BTF_ID(struct, prog_test_ref_kfunc)
+BTF_ID(struct, cgroup)
+BTF_SET_END(rcu_protected_types)
+
+static bool rcu_protected_object(const struct btf *btf, u32 btf_id)
+{
+	if (!btf_is_kernel(btf))
+		return false;
+	return btf_id_set_contains(&rcu_protected_types, btf_id);
+}
+
 static int btf_parse_kptr(const struct btf *btf, struct btf_field *field,
 			  struct btf_field_info *info)
 {
@@ -3615,6 +3627,10 @@ static int btf_parse_kptr(const struct btf *btf, struct btf_field *field,
 		field->kptr.dtor = (void *)addr;
 	}
 
+	if (info->type == BPF_KPTR_REF && rcu_protected_object(kernel_btf, id))
+		/* rcu dereference of this field will return MEM_RCU instead of PTR_UNTRUSTED */
+		field->type = BPF_KPTR_RCU;
+
 	field->kptr.btf_id = id;
 	field->kptr.btf = kernel_btf;
 	field->kptr.module = mod;
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index a784be6f8bac..fed74afd45d1 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2094,11 +2094,12 @@ __bpf_kfunc struct cgroup *bpf_cgroup_ancestor(struct cgroup *cgrp, int level)
 {
 	struct cgroup *ancestor;
 
-	if (level > cgrp->level || level < 0)
+	if (!cgrp || level > cgrp->level || level < 0)
 		return NULL;
 
 	ancestor = cgrp->ancestors[level];
-	cgroup_get(ancestor);
+	if (!cgroup_tryget(ancestor))
+		return NULL;
 	return ancestor;
 }
 
@@ -2183,7 +2184,7 @@ BTF_ID_FLAGS(func, bpf_rbtree_first, KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_cgroup_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS)
 BTF_ID_FLAGS(func, bpf_cgroup_kptr_get, KF_ACQUIRE | KF_KPTR_GET | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_cgroup_release, KF_RELEASE)
-BTF_ID_FLAGS(func, bpf_cgroup_ancestor, KF_ACQUIRE | KF_TRUSTED_ARGS | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_cgroup_ancestor, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_cgroup_from_id, KF_ACQUIRE | KF_RET_NULL)
 #endif
 BTF_ID_FLAGS(func, bpf_task_from_pid, KF_ACQUIRE | KF_RET_NULL)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index e3fcdc9836a6..2e730918911c 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -539,6 +539,7 @@ void btf_record_free(struct btf_record *rec)
 		switch (rec->fields[i].type) {
 		case BPF_KPTR_UNREF:
 		case BPF_KPTR_REF:
+		case BPF_KPTR_RCU:
 			if (rec->fields[i].kptr.module)
 				module_put(rec->fields[i].kptr.module);
 			btf_put(rec->fields[i].kptr.btf);
@@ -584,6 +585,7 @@ struct btf_record *btf_record_dup(const struct btf_record *rec)
 		switch (fields[i].type) {
 		case BPF_KPTR_UNREF:
 		case BPF_KPTR_REF:
+		case BPF_KPTR_RCU:
 			btf_get(fields[i].kptr.btf);
 			if (fields[i].kptr.module && !try_module_get(fields[i].kptr.module)) {
 				ret = -ENXIO;
@@ -669,6 +671,7 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
 			WRITE_ONCE(*(u64 *)field_ptr, 0);
 			break;
 		case BPF_KPTR_REF:
+		case BPF_KPTR_RCU:
 			field->kptr.dtor((void *)xchg((unsigned long *)field_ptr, 0));
 			break;
 		case BPF_LIST_HEAD:
@@ -1058,6 +1061,7 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 				break;
 			case BPF_KPTR_UNREF:
 			case BPF_KPTR_REF:
+			case BPF_KPTR_RCU:
 				if (map->map_type != BPF_MAP_TYPE_HASH &&
 				    map->map_type != BPF_MAP_TYPE_LRU_HASH &&
 				    map->map_type != BPF_MAP_TYPE_ARRAY &&
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index e4234266e76d..0b728ce0dde9 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4183,7 +4183,7 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
 			       struct bpf_reg_state *reg, u32 regno)
 {
 	const char *targ_name = kernel_type_name(kptr_field->kptr.btf, kptr_field->kptr.btf_id);
-	int perm_flags = PTR_MAYBE_NULL | PTR_TRUSTED;
+	int perm_flags = PTR_MAYBE_NULL | PTR_TRUSTED | MEM_RCU;
 	const char *reg_name = "";
 
 	/* Only unreferenced case accepts untrusted pointers */
@@ -4230,12 +4230,12 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
 	 * In the kptr_ref case, check_func_arg_reg_off already ensures reg->off
 	 * is zero. We must also ensure that btf_struct_ids_match does not walk
 	 * the struct to match type against first member of struct, i.e. reject
-	 * second case from above. Hence, when type is BPF_KPTR_REF, we set
+	 * second case from above. Hence, when type is BPF_KPTR_REF | BPF_KPTR_RCU, we set
 	 * strict mode to true for type match.
 	 */
 	if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
 				  kptr_field->kptr.btf, kptr_field->kptr.btf_id,
-				  kptr_field->type == BPF_KPTR_REF))
+				  kptr_field->type == BPF_KPTR_REF || kptr_field->type == BPF_KPTR_RCU))
 		goto bad_type;
 	return 0;
 bad_type:
@@ -4250,6 +4250,14 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
 	return -EINVAL;
 }
 
+/* The non-sleepable programs and sleepable programs with explicit bpf_rcu_read_lock()
+ * can dereference RCU protected pointers and result is PTR_TRUSTED.
+ */
+static bool in_rcu_cs(struct bpf_verifier_env *env)
+{
+	return env->cur_state->active_rcu_lock || !env->prog->aux->sleepable;
+}
+
 static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
 				 int value_regno, int insn_idx,
 				 struct btf_field *kptr_field)
@@ -4273,7 +4281,7 @@ static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
 	/* We only allow loading referenced kptr, since it will be marked as
 	 * untrusted, similar to unreferenced kptr.
 	 */
-	if (class != BPF_LDX && kptr_field->type == BPF_KPTR_REF) {
+	if (class != BPF_LDX && kptr_field->type != BPF_KPTR_UNREF) {
 		verbose(env, "store to referenced kptr disallowed\n");
 		return -EACCES;
 	}
@@ -4284,7 +4292,10 @@ static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
 		 * value from map as PTR_TO_BTF_ID, with the correct type.
 		 */
 		mark_btf_ld_reg(env, cur_regs(env), value_regno, PTR_TO_BTF_ID, kptr_field->kptr.btf,
-				kptr_field->kptr.btf_id, PTR_MAYBE_NULL | PTR_UNTRUSTED);
+				kptr_field->kptr.btf_id,
+				kptr_field->type == BPF_KPTR_RCU && in_rcu_cs(env) ?
+				PTR_MAYBE_NULL | MEM_RCU :
+				PTR_MAYBE_NULL | PTR_UNTRUSTED);
 		/* For mark_ptr_or_null_reg */
 		val_reg->id = ++env->id_gen;
 	} else if (class == BPF_STX) {
@@ -4338,6 +4349,7 @@ static int check_map_access(struct bpf_verifier_env *env, u32 regno,
 			switch (field->type) {
 			case BPF_KPTR_UNREF:
 			case BPF_KPTR_REF:
+			case BPF_KPTR_RCU:
 				if (src != ACCESS_DIRECT) {
 					verbose(env, "kptr cannot be accessed indirectly by helper\n");
 					return -EACCES;
@@ -5139,11 +5151,10 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
 		 * read lock region. Also mark rcu pointer as PTR_MAYBE_NULL since
 		 * it could be null in some cases.
 		 */
-		if (!env->cur_state->active_rcu_lock ||
-		    !(is_trusted_reg(reg) || is_rcu_reg(reg)))
-			flag &= ~MEM_RCU;
-		else
+		if (in_rcu_cs(env) && (is_trusted_reg(reg) || is_rcu_reg(reg)))
 			flag |= PTR_MAYBE_NULL;
+		else
+			flag &= ~MEM_RCU;
 	} else if (reg->type & MEM_RCU) {
 		/* ptr (reg) is marked as MEM_RCU, but the struct field is not tagged
 		 * with __rcu. Mark the flag as PTR_UNTRUSTED conservatively.
@@ -6187,7 +6198,7 @@ static int process_kptr_func(struct bpf_verifier_env *env, int regno,
 		verbose(env, "off=%d doesn't point to kptr\n", kptr_off);
 		return -EACCES;
 	}
-	if (kptr_field->type != BPF_KPTR_REF) {
+	if (kptr_field->type != BPF_KPTR_REF && kptr_field->type != BPF_KPTR_RCU) {
 		verbose(env, "off=%d kptr isn't referenced kptr\n", kptr_off);
 		return -EACCES;
 	}
@@ -9111,7 +9122,7 @@ static int process_kf_arg_ptr_to_kptr(struct bpf_verifier_env *env,
 	}
 
 	kptr_field = btf_record_find(reg->map_ptr->record, reg->off + reg->var_off.value, BPF_KPTR);
-	if (!kptr_field || kptr_field->type != BPF_KPTR_REF) {
+	if (!kptr_field || (kptr_field->type != BPF_KPTR_REF && kptr_field->type != BPF_KPTR_RCU)) {
 		verbose(env, "arg#0 no referenced kptr at map value offset=%llu\n",
 			reg->off + reg->var_off.value);
 		return -EINVAL;
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 6f3d654b3339..73e5029ab5c9 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -737,6 +737,7 @@ __bpf_kfunc void bpf_kfunc_call_test_mem_len_fail2(u64 *mem, int len)
 
 __bpf_kfunc void bpf_kfunc_call_test_ref(struct prog_test_ref_kfunc *p)
 {
+	/* p could be NULL and p->cnt could be 0 */
 }
 
 __bpf_kfunc void bpf_kfunc_call_test_destructive(void)
@@ -784,7 +785,7 @@ BTF_ID_FLAGS(func, bpf_kfunc_call_test_fail3)
 BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_pass1)
 BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_fail1)
 BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_fail2)
-BTF_ID_FLAGS(func, bpf_kfunc_call_test_ref, KF_TRUSTED_ARGS)
+BTF_ID_FLAGS(func, bpf_kfunc_call_test_ref, KF_TRUSTED_ARGS | KF_RCU)
 BTF_ID_FLAGS(func, bpf_kfunc_call_test_destructive, KF_DESTRUCTIVE)
 BTF_ID_FLAGS(func, bpf_kfunc_call_test_static_unused_arg)
 BTF_SET8_END(test_sk_check_kfunc_ids)
diff --git a/tools/testing/selftests/bpf/progs/map_kptr_fail.c b/tools/testing/selftests/bpf/progs/map_kptr_fail.c
index e19e2a5f38cf..08f9ec18c345 100644
--- a/tools/testing/selftests/bpf/progs/map_kptr_fail.c
+++ b/tools/testing/selftests/bpf/progs/map_kptr_fail.c
@@ -281,7 +281,7 @@ int reject_kptr_get_bad_type_match(struct __sk_buff *ctx)
 }
 
 SEC("?tc")
-__failure __msg("R1 type=untrusted_ptr_or_null_ expected=percpu_ptr_")
+__failure __msg("R1 type=rcu_ptr_or_null_ expected=percpu_ptr_")
 int mark_ref_as_untrusted_or_null(struct __sk_buff *ctx)
 {
 	struct map_value *v;
@@ -316,7 +316,7 @@ int reject_untrusted_store_to_ref(struct __sk_buff *ctx)
 }
 
 SEC("?tc")
-__failure __msg("R2 type=untrusted_ptr_ expected=ptr_")
+__failure __msg("R2 must be referenced")
 int reject_untrusted_xchg(struct __sk_buff *ctx)
 {
 	struct prog_test_ref_kfunc *p;
diff --git a/tools/testing/selftests/bpf/verifier/calls.c b/tools/testing/selftests/bpf/verifier/calls.c
index 289ed202ec66..9a326a800e5c 100644
--- a/tools/testing/selftests/bpf/verifier/calls.c
+++ b/tools/testing/selftests/bpf/verifier/calls.c
@@ -243,7 +243,7 @@
 	},
 	.result_unpriv = REJECT,
 	.result = REJECT,
-	.errstr = "R1 must be referenced",
+	.errstr = "R1 must be",
 },
 {
 	"calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID",
diff --git a/tools/testing/selftests/bpf/verifier/map_kptr.c b/tools/testing/selftests/bpf/verifier/map_kptr.c
index 6914904344c0..d775ccb01989 100644
--- a/tools/testing/selftests/bpf/verifier/map_kptr.c
+++ b/tools/testing/selftests/bpf/verifier/map_kptr.c
@@ -336,7 +336,7 @@
 	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	.fixup_map_kptr = { 1 },
 	.result = REJECT,
-	.errstr = "R1 type=untrusted_ptr_or_null_ expected=percpu_ptr_",
+	.errstr = "R1 type=rcu_ptr_or_null_ expected=percpu_ptr_",
 },
 {
 	"map_kptr: ref: reject off != 0",
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 bpf-next 4/5] selftests/bpf: Add a test case for kptr_rcu.
  2023-02-28  4:01 [PATCH v3 bpf-next 0/5] bpf: Introduce kptr_rcu Alexei Starovoitov
                   ` (2 preceding siblings ...)
  2023-02-28  4:01 ` [PATCH v3 bpf-next 3/5] bpf: Introduce kptr_rcu Alexei Starovoitov
@ 2023-02-28  4:01 ` Alexei Starovoitov
  2023-02-28 16:52   ` David Vernet
  2023-02-28  4:01 ` [PATCH v3 bpf-next 5/5] selftests/bpf: Tweak cgroup kfunc test Alexei Starovoitov
  4 siblings, 1 reply; 17+ messages in thread
From: Alexei Starovoitov @ 2023-02-28  4:01 UTC (permalink / raw)
  To: davem
  Cc: daniel, andrii, martin.lau, void, davemarchevsky, tj, memxor,
	netdev, bpf, kernel-team

From: Alexei Starovoitov <ast@kernel.org>

Tweak existing map_kptr test to check kptr_rcu.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 tools/testing/selftests/bpf/progs/map_kptr.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/tools/testing/selftests/bpf/progs/map_kptr.c b/tools/testing/selftests/bpf/progs/map_kptr.c
index 4a7da6cb5800..b041234ec68d 100644
--- a/tools/testing/selftests/bpf/progs/map_kptr.c
+++ b/tools/testing/selftests/bpf/progs/map_kptr.c
@@ -61,6 +61,7 @@ extern struct prog_test_ref_kfunc *bpf_kfunc_call_test_acquire(unsigned long *sp
 extern struct prog_test_ref_kfunc *
 bpf_kfunc_call_test_kptr_get(struct prog_test_ref_kfunc **p, int a, int b) __ksym;
 extern void bpf_kfunc_call_test_release(struct prog_test_ref_kfunc *p) __ksym;
+void bpf_kfunc_call_test_ref(struct prog_test_ref_kfunc *p) __ksym;
 
 #define WRITE_ONCE(x, val) ((*(volatile typeof(x) *) &(x)) = (val))
 
@@ -90,12 +91,23 @@ static void test_kptr_ref(struct map_value *v)
 	WRITE_ONCE(v->unref_ptr, p);
 	if (!p)
 		return;
+	/*
+	 * p is rcu_ptr_prog_test_ref_kfunc,
+	 * because bpf prog is non-sleepable and runs in RCU CS.
+	 * p can be passed to kfunc that requires KF_RCU.
+	 */
+	bpf_kfunc_call_test_ref(p);
 	if (p->a + p->b > 100)
 		return;
 	/* store NULL */
 	p = bpf_kptr_xchg(&v->ref_ptr, NULL);
 	if (!p)
 		return;
+	/*
+	 * p is trusted_ptr_prog_test_ref_kfunc.
+	 * p can be passed to kfunc that requires KF_RCU.
+	 */
+	bpf_kfunc_call_test_ref(p);
 	if (p->a + p->b > 100) {
 		bpf_kfunc_call_test_release(p);
 		return;
@@ -288,6 +300,8 @@ int test_map_kptr_ref2(struct __sk_buff *ctx)
 	if (p_st->cnt.refs.counter != 2)
 		return 6;
 
+	/* p_st is MEM_RCU, because we're in RCU CS */
+	bpf_kfunc_call_test_ref(p_st);
 	return 0;
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 bpf-next 5/5] selftests/bpf: Tweak cgroup kfunc test.
  2023-02-28  4:01 [PATCH v3 bpf-next 0/5] bpf: Introduce kptr_rcu Alexei Starovoitov
                   ` (3 preceding siblings ...)
  2023-02-28  4:01 ` [PATCH v3 bpf-next 4/5] selftests/bpf: Add a test case for kptr_rcu Alexei Starovoitov
@ 2023-02-28  4:01 ` Alexei Starovoitov
  2023-02-28 17:07   ` David Vernet
  4 siblings, 1 reply; 17+ messages in thread
From: Alexei Starovoitov @ 2023-02-28  4:01 UTC (permalink / raw)
  To: davem
  Cc: daniel, andrii, martin.lau, void, davemarchevsky, tj, memxor,
	netdev, bpf, kernel-team

From: Alexei Starovoitov <ast@kernel.org>

Adjust cgroup kfunc test to dereference RCU protected cgroup pointer
as PTR_TRUSTED and pass into KF_TRUSTED_ARGS kfunc.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 tools/testing/selftests/bpf/progs/cgrp_kfunc_failure.c | 2 +-
 tools/testing/selftests/bpf/progs/cgrp_kfunc_success.c | 7 ++++++-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/progs/cgrp_kfunc_failure.c b/tools/testing/selftests/bpf/progs/cgrp_kfunc_failure.c
index 4ad7fe24966d..d5a53b5e708f 100644
--- a/tools/testing/selftests/bpf/progs/cgrp_kfunc_failure.c
+++ b/tools/testing/selftests/bpf/progs/cgrp_kfunc_failure.c
@@ -205,7 +205,7 @@ int BPF_PROG(cgrp_kfunc_get_unreleased, struct cgroup *cgrp, const char *path)
 }
 
 SEC("tp_btf/cgroup_mkdir")
-__failure __msg("arg#0 is untrusted_ptr_or_null_ expected ptr_ or socket")
+__failure __msg("bpf_cgroup_release expects refcounted")
 int BPF_PROG(cgrp_kfunc_release_untrusted, struct cgroup *cgrp, const char *path)
 {
 	struct __cgrps_kfunc_map_value *v;
diff --git a/tools/testing/selftests/bpf/progs/cgrp_kfunc_success.c b/tools/testing/selftests/bpf/progs/cgrp_kfunc_success.c
index 42e13aebdd62..85becaa8573b 100644
--- a/tools/testing/selftests/bpf/progs/cgrp_kfunc_success.c
+++ b/tools/testing/selftests/bpf/progs/cgrp_kfunc_success.c
@@ -61,7 +61,7 @@ int BPF_PROG(test_cgrp_acquire_leave_in_map, struct cgroup *cgrp, const char *pa
 SEC("tp_btf/cgroup_mkdir")
 int BPF_PROG(test_cgrp_xchg_release, struct cgroup *cgrp, const char *path)
 {
-	struct cgroup *kptr;
+	struct cgroup *kptr, *cg;
 	struct __cgrps_kfunc_map_value *v;
 	long status;
 
@@ -80,6 +80,11 @@ int BPF_PROG(test_cgrp_xchg_release, struct cgroup *cgrp, const char *path)
 		return 0;
 	}
 
+	kptr = v->cgrp;
+	cg = bpf_cgroup_ancestor(kptr, 1);
+	if (cg)	/* verifier only check */
+		bpf_cgroup_release(cg);
+
 	kptr = bpf_kptr_xchg(&v->cgrp, NULL);
 	if (!kptr) {
 		err = 3;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 bpf-next 1/5] bpf: Rename __kptr_ref -> __kptr and __kptr -> __kptr_untrusted.
  2023-02-28  4:01 ` [PATCH v3 bpf-next 1/5] bpf: Rename __kptr_ref -> __kptr and __kptr -> __kptr_untrusted Alexei Starovoitov
@ 2023-02-28 15:16   ` David Vernet
  0 siblings, 0 replies; 17+ messages in thread
From: David Vernet @ 2023-02-28 15:16 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: davem, daniel, andrii, martin.lau, davemarchevsky, tj, memxor,
	netdev, bpf, kernel-team

On Mon, Feb 27, 2023 at 08:01:17PM -0800, Alexei Starovoitov wrote:
> From: Alexei Starovoitov <ast@kernel.org>
> 
> __kptr meant to store PTR_UNTRUSTED kernel pointers inside bpf maps.
> The concept felt useful, but didn't get much traction,
> since bpf_rdonly_cast() was added soon after and bpf programs received
> a simpler way to access PTR_UNTRUSTED kernel pointers
> without going through restrictive __kptr usage.
> 
> Rename __kptr_ref -> __kptr and __kptr -> __kptr_untrusted to indicate
> its intended usage.
> The main goal of __kptr_untrusted was to read/write such pointers
> directly while bpf_kptr_xchg was a mechanism to access refcnted
> kernel pointers. The next patch will allow RCU protected __kptr access
> with direct read. At that point __kptr_untrusted will be deprecated.
> 
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Acked-by: David Vernet <void@manifault.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 bpf-next 3/5] bpf: Introduce kptr_rcu.
  2023-02-28  4:01 ` [PATCH v3 bpf-next 3/5] bpf: Introduce kptr_rcu Alexei Starovoitov
@ 2023-02-28 16:45   ` David Vernet
  2023-02-28 19:49     ` Alexei Starovoitov
  2023-02-28 19:39   ` Tejun Heo
  1 sibling, 1 reply; 17+ messages in thread
From: David Vernet @ 2023-02-28 16:45 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: davem, daniel, andrii, martin.lau, davemarchevsky, tj, memxor,
	netdev, bpf, kernel-team

On Mon, Feb 27, 2023 at 08:01:19PM -0800, Alexei Starovoitov wrote:
> From: Alexei Starovoitov <ast@kernel.org>
> 
> The life time of certain kernel structures like 'struct cgroup' is protected by RCU.
> Hence it's safe to dereference them directly from __kptr tagged pointers in bpf maps.
> The resulting pointer is MEM_RCU and can be passed to kfuncs that expect KF_RCU.
> Derefrence of other kptr-s returns PTR_UNTRUSTED.
> 
> For example:
> struct map_value {
>    struct cgroup __kptr *cgrp;
> };
> 
> SEC("tp_btf/cgroup_mkdir")
> int BPF_PROG(test_cgrp_get_ancestors, struct cgroup *cgrp_arg, const char *path)
> {
>   struct cgroup *cg, *cg2;
> 
>   cg = bpf_cgroup_acquire(cgrp_arg); // cg is PTR_TRUSTED and ref_obj_id > 0
>   bpf_kptr_xchg(&v->cgrp, cg);
> 
>   cg2 = v->cgrp; // This is new feature introduced by this patch.
>   // cg2 is PTR_MAYBE_NULL | MEM_RCU.
>   // When cg2 != NULL, it's a valid cgroup, but its percpu_ref could be zero
> 
>   bpf_cgroup_ancestor(cg2, level); // safe to do.
> }
> 
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---
>  Documentation/bpf/kfuncs.rst                  | 11 ++++---
>  include/linux/bpf.h                           | 15 ++++++---
>  include/linux/btf.h                           |  2 +-
>  kernel/bpf/btf.c                              | 16 +++++++++
>  kernel/bpf/helpers.c                          |  7 ++--
>  kernel/bpf/syscall.c                          |  4 +++
>  kernel/bpf/verifier.c                         | 33 ++++++++++++-------
>  net/bpf/test_run.c                            |  3 +-
>  .../selftests/bpf/progs/map_kptr_fail.c       |  4 +--
>  tools/testing/selftests/bpf/verifier/calls.c  |  2 +-
>  .../testing/selftests/bpf/verifier/map_kptr.c |  2 +-
>  11 files changed, 69 insertions(+), 30 deletions(-)
> 
> diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst
> index 7d7c1144372a..49c5cb6f46e7 100644
> --- a/Documentation/bpf/kfuncs.rst
> +++ b/Documentation/bpf/kfuncs.rst
> @@ -232,11 +232,12 @@ added later.
>  2.4.8 KF_RCU flag
>  -----------------
>  
> -The KF_RCU flag is used for kfuncs which have a rcu ptr as its argument.
> -When used together with KF_ACQUIRE, it indicates the kfunc should have a
> -single argument which must be a trusted argument or a MEM_RCU pointer.
> -The argument may have reference count of 0 and the kfunc must take this
> -into consideration.
> +The KF_RCU flag is a weaker version of KF_TRUSTED_ARGS. The kfuncs marked with
> +KF_RCU expect either PTR_TRUSTED or MEM_RCU arguments. The verifier guarantees
> +that the objects are valid and there is no use-after-free, but the pointers
> +maybe NULL and pointee object's reference count could have reached zero, hence

s/maybe/may be

> +kfuncs must do != NULL check and consider refcnt==0 case when accessing such
> +arguments.

Hmmm, given that it's only necessary to check refcnt==0 if the kfunc is
KF_ACQUIRE, wdyt about addending this paragraph with something like the
following (note as well the addition of the KF_RET_NULL suggestion):

...the pointers may be NULL, and the object's refcount could have
reached zero. The kfuncs must therefore do a != NULL check, and if
returning a KF_ACQUIRE pointer, also check that refcnt != 0. Note as
well that a KF_ACQUIRE kfunc that is KF_RCU should **very** likely also
be KF_RET_NULL, for both of these reasons.

>  .. _KF_deprecated_flag:
>  
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 520b238abd5a..d4b5faa0a777 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -178,11 +178,12 @@ enum btf_field_type {
>  	BPF_TIMER      = (1 << 1),
>  	BPF_KPTR_UNREF = (1 << 2),
>  	BPF_KPTR_REF   = (1 << 3),
> -	BPF_KPTR       = BPF_KPTR_UNREF | BPF_KPTR_REF,
> -	BPF_LIST_HEAD  = (1 << 4),
> -	BPF_LIST_NODE  = (1 << 5),
> -	BPF_RB_ROOT    = (1 << 6),
> -	BPF_RB_NODE    = (1 << 7),
> +	BPF_KPTR_RCU   = (1 << 4), /* kernel internal. not exposed to bpf prog */
> +	BPF_KPTR       = BPF_KPTR_UNREF | BPF_KPTR_REF | BPF_KPTR_RCU,
> +	BPF_LIST_HEAD  = (1 << 5),
> +	BPF_LIST_NODE  = (1 << 6),
> +	BPF_RB_ROOT    = (1 << 7),
> +	BPF_RB_NODE    = (1 << 8),
>  	BPF_GRAPH_NODE_OR_ROOT = BPF_LIST_NODE | BPF_LIST_HEAD |
>  				 BPF_RB_NODE | BPF_RB_ROOT,
>  };
> @@ -284,6 +285,8 @@ static inline const char *btf_field_type_name(enum btf_field_type type)
>  	case BPF_KPTR_UNREF:
>  	case BPF_KPTR_REF:
>  		return "kptr";
> +	case BPF_KPTR_RCU:
> +		return "kptr_rcu";
>  	case BPF_LIST_HEAD:
>  		return "bpf_list_head";
>  	case BPF_LIST_NODE:
> @@ -307,6 +310,7 @@ static inline u32 btf_field_type_size(enum btf_field_type type)
>  		return sizeof(struct bpf_timer);
>  	case BPF_KPTR_UNREF:
>  	case BPF_KPTR_REF:
> +	case BPF_KPTR_RCU:
>  		return sizeof(u64);
>  	case BPF_LIST_HEAD:
>  		return sizeof(struct bpf_list_head);
> @@ -331,6 +335,7 @@ static inline u32 btf_field_type_align(enum btf_field_type type)
>  		return __alignof__(struct bpf_timer);
>  	case BPF_KPTR_UNREF:
>  	case BPF_KPTR_REF:
> +	case BPF_KPTR_RCU:
>  		return __alignof__(u64);
>  	case BPF_LIST_HEAD:
>  		return __alignof__(struct bpf_list_head);
> diff --git a/include/linux/btf.h b/include/linux/btf.h
> index 49e0fe6d8274..556b3e2e7471 100644
> --- a/include/linux/btf.h
> +++ b/include/linux/btf.h
> @@ -70,7 +70,7 @@
>  #define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */
>  #define KF_SLEEPABLE    (1 << 5) /* kfunc may sleep */
>  #define KF_DESTRUCTIVE  (1 << 6) /* kfunc performs destructive actions */
> -#define KF_RCU          (1 << 7) /* kfunc only takes rcu pointer arguments */
> +#define KF_RCU          (1 << 7) /* kfunc takes either rcu or trusted pointer arguments */
>  
>  /*
>   * Tag marking a kernel function as a kfunc. This is meant to minimize the
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index 01dee7d48e6d..a44ea1f6164b 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -3552,6 +3552,18 @@ static int btf_find_field(const struct btf *btf, const struct btf_type *t,
>  	return -EINVAL;
>  }
>  

Could you please add a comment here that once gcc has tag support, we
can replace this mechanism with just checking the type's BTF tag? I like
this a lot in the interim though -- it's a very easy way to add kfuncs
for new RCU-protected types, and will be trivially easy to remove and
cleanup later.

> +BTF_SET_START(rcu_protected_types)
> +BTF_ID(struct, prog_test_ref_kfunc)
> +BTF_ID(struct, cgroup)
> +BTF_SET_END(rcu_protected_types)
> +
> +static bool rcu_protected_object(const struct btf *btf, u32 btf_id)
> +{
> +	if (!btf_is_kernel(btf))
> +		return false;
> +	return btf_id_set_contains(&rcu_protected_types, btf_id);
> +}
> +
>  static int btf_parse_kptr(const struct btf *btf, struct btf_field *field,
>  			  struct btf_field_info *info)
>  {
> @@ -3615,6 +3627,10 @@ static int btf_parse_kptr(const struct btf *btf, struct btf_field *field,
>  		field->kptr.dtor = (void *)addr;
>  	}
>  
> +	if (info->type == BPF_KPTR_REF && rcu_protected_object(kernel_btf, id))
> +		/* rcu dereference of this field will return MEM_RCU instead of PTR_UNTRUSTED */
> +		field->type = BPF_KPTR_RCU;

Can you move this into the if block above, and update the conditional to
just be:

if (rcu_protected_object(kernel_btf, id))

Also, outside the scope of your patch and subjective, but IMO it's a bit
confusing that we're looking at info->type, when field->type already ==
info->type. When reading the code it looks like field->type is unset
unless we set it to BPF_KPTR_RCU, but in reality we're just overwriting
it from being BPF_KPTR_REF. Might be worth tidying up at some point (I
can do that in a follow-on patch once this series lands).

>  	field->kptr.btf_id = id;
>  	field->kptr.btf = kernel_btf;
>  	field->kptr.module = mod;
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index a784be6f8bac..fed74afd45d1 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2094,11 +2094,12 @@ __bpf_kfunc struct cgroup *bpf_cgroup_ancestor(struct cgroup *cgrp, int level)
>  {
>  	struct cgroup *ancestor;
>  
> -	if (level > cgrp->level || level < 0)
> +	if (!cgrp || level > cgrp->level || level < 0)
>  		return NULL;
>  
>  	ancestor = cgrp->ancestors[level];
> -	cgroup_get(ancestor);
> +	if (!cgroup_tryget(ancestor))
> +		return NULL;
>  	return ancestor;
>  }
>  
> @@ -2183,7 +2184,7 @@ BTF_ID_FLAGS(func, bpf_rbtree_first, KF_RET_NULL)
>  BTF_ID_FLAGS(func, bpf_cgroup_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS)
>  BTF_ID_FLAGS(func, bpf_cgroup_kptr_get, KF_ACQUIRE | KF_KPTR_GET | KF_RET_NULL)
>  BTF_ID_FLAGS(func, bpf_cgroup_release, KF_RELEASE)
> -BTF_ID_FLAGS(func, bpf_cgroup_ancestor, KF_ACQUIRE | KF_TRUSTED_ARGS | KF_RET_NULL)
> +BTF_ID_FLAGS(func, bpf_cgroup_ancestor, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
>  BTF_ID_FLAGS(func, bpf_cgroup_from_id, KF_ACQUIRE | KF_RET_NULL)
>  #endif
>  BTF_ID_FLAGS(func, bpf_task_from_pid, KF_ACQUIRE | KF_RET_NULL)
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index e3fcdc9836a6..2e730918911c 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -539,6 +539,7 @@ void btf_record_free(struct btf_record *rec)
>  		switch (rec->fields[i].type) {
>  		case BPF_KPTR_UNREF:
>  		case BPF_KPTR_REF:
> +		case BPF_KPTR_RCU:
>  			if (rec->fields[i].kptr.module)
>  				module_put(rec->fields[i].kptr.module);
>  			btf_put(rec->fields[i].kptr.btf);
> @@ -584,6 +585,7 @@ struct btf_record *btf_record_dup(const struct btf_record *rec)
>  		switch (fields[i].type) {
>  		case BPF_KPTR_UNREF:
>  		case BPF_KPTR_REF:
> +		case BPF_KPTR_RCU:
>  			btf_get(fields[i].kptr.btf);
>  			if (fields[i].kptr.module && !try_module_get(fields[i].kptr.module)) {
>  				ret = -ENXIO;
> @@ -669,6 +671,7 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
>  			WRITE_ONCE(*(u64 *)field_ptr, 0);
>  			break;
>  		case BPF_KPTR_REF:
> +		case BPF_KPTR_RCU:

The fact that we're adding this case is IMO a sign that we're arguably
breaking abstractions a bit. BPF_KPTR_REF should really be the kptr type
that holds a reference and for which we should be firing the destructor,
and RCU protection should ideally be something we could just derive
later in the verifier. Not a huge problem given that this complexity is
completely hidden from the user, but I'm not fully understanding why
the extra complexity of BPF_KPTR_RCU is necessary. See below in another
comment in verifier.c.

>  			field->kptr.dtor((void *)xchg((unsigned long *)field_ptr, 0));

Also completely unrelated to your patch set, but we should probably only
invoke field->kptr.dtor() if the value in field_ptr ends up being
non-NULL after the xchg. Otherwise, all KF_RELEASE kfuncs have to check
for NULL, even though they expect inherently trusted args. I can also do
that in a follow-on patch.

>  			break;
>  		case BPF_LIST_HEAD:
> @@ -1058,6 +1061,7 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
>  				break;
>  			case BPF_KPTR_UNREF:
>  			case BPF_KPTR_REF:
> +			case BPF_KPTR_RCU:
>  				if (map->map_type != BPF_MAP_TYPE_HASH &&
>  				    map->map_type != BPF_MAP_TYPE_LRU_HASH &&
>  				    map->map_type != BPF_MAP_TYPE_ARRAY &&
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index e4234266e76d..0b728ce0dde9 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -4183,7 +4183,7 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
>  			       struct bpf_reg_state *reg, u32 regno)
>  {
>  	const char *targ_name = kernel_type_name(kptr_field->kptr.btf, kptr_field->kptr.btf_id);
> -	int perm_flags = PTR_MAYBE_NULL | PTR_TRUSTED;
> +	int perm_flags = PTR_MAYBE_NULL | PTR_TRUSTED | MEM_RCU;
>  	const char *reg_name = "";
>  
>  	/* Only unreferenced case accepts untrusted pointers */
> @@ -4230,12 +4230,12 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
>  	 * In the kptr_ref case, check_func_arg_reg_off already ensures reg->off
>  	 * is zero. We must also ensure that btf_struct_ids_match does not walk
>  	 * the struct to match type against first member of struct, i.e. reject
> -	 * second case from above. Hence, when type is BPF_KPTR_REF, we set
> +	 * second case from above. Hence, when type is BPF_KPTR_REF | BPF_KPTR_RCU, we set
>  	 * strict mode to true for type match.
>  	 */
>  	if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
>  				  kptr_field->kptr.btf, kptr_field->kptr.btf_id,
> -				  kptr_field->type == BPF_KPTR_REF))
> +				  kptr_field->type == BPF_KPTR_REF || kptr_field->type == BPF_KPTR_RCU))
>  		goto bad_type;
>  	return 0;
>  bad_type:
> @@ -4250,6 +4250,14 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
>  	return -EINVAL;
>  }
>  
> +/* The non-sleepable programs and sleepable programs with explicit bpf_rcu_read_lock()
> + * can dereference RCU protected pointers and result is PTR_TRUSTED.
> + */
> +static bool in_rcu_cs(struct bpf_verifier_env *env)
> +{
> +	return env->cur_state->active_rcu_lock || !env->prog->aux->sleepable;
> +}
> +
>  static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
>  				 int value_regno, int insn_idx,
>  				 struct btf_field *kptr_field)
> @@ -4273,7 +4281,7 @@ static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
>  	/* We only allow loading referenced kptr, since it will be marked as
>  	 * untrusted, similar to unreferenced kptr.
>  	 */
> -	if (class != BPF_LDX && kptr_field->type == BPF_KPTR_REF) {
> +	if (class != BPF_LDX && kptr_field->type != BPF_KPTR_UNREF) {
>  		verbose(env, "store to referenced kptr disallowed\n");
>  		return -EACCES;
>  	}
> @@ -4284,7 +4292,10 @@ static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
>  		 * value from map as PTR_TO_BTF_ID, with the correct type.
>  		 */
>  		mark_btf_ld_reg(env, cur_regs(env), value_regno, PTR_TO_BTF_ID, kptr_field->kptr.btf,
> -				kptr_field->kptr.btf_id, PTR_MAYBE_NULL | PTR_UNTRUSTED);
> +				kptr_field->kptr.btf_id,
> +				kptr_field->type == BPF_KPTR_RCU && in_rcu_cs(env) ?

If we replaced this kptr_field->type == BPF_KPTR_RCU check with
something like btf_rcu_safe_kptr(kptr_field), corresponding to:

bool btf_rcu_safe_kptr(const struct btf_field *field)
{
	const struct btf_field_kptr *kptr = &field->kptr;

	return field->type == BPF_KPTR_REF && rcu_protected_object(kptr->btf, kptr->btf_id);
}

Wouldn't that allow us to avoid having to define BPF_KPTR_RCU at all?
Given that BPF_KPTR_RCU is really just an instance of BPF_KPTR_REF which
may also derive safety from RCU protection, this seems both simpler and
more thematic. Or am I missing something?

> +				PTR_MAYBE_NULL | MEM_RCU :
> +				PTR_MAYBE_NULL | PTR_UNTRUSTED);
>  		/* For mark_ptr_or_null_reg */
>  		val_reg->id = ++env->id_gen;
>  	} else if (class == BPF_STX) {
> @@ -4338,6 +4349,7 @@ static int check_map_access(struct bpf_verifier_env *env, u32 regno,
>  			switch (field->type) {
>  			case BPF_KPTR_UNREF:
>  			case BPF_KPTR_REF:
> +			case BPF_KPTR_RCU:
>  				if (src != ACCESS_DIRECT) {
>  					verbose(env, "kptr cannot be accessed indirectly by helper\n");
>  					return -EACCES;
> @@ -5139,11 +5151,10 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
>  		 * read lock region. Also mark rcu pointer as PTR_MAYBE_NULL since
>  		 * it could be null in some cases.
>  		 */
> -		if (!env->cur_state->active_rcu_lock ||
> -		    !(is_trusted_reg(reg) || is_rcu_reg(reg)))
> -			flag &= ~MEM_RCU;
> -		else
> +		if (in_rcu_cs(env) && (is_trusted_reg(reg) || is_rcu_reg(reg)))
>  			flag |= PTR_MAYBE_NULL;
> +		else
> +			flag &= ~MEM_RCU;
>  	} else if (reg->type & MEM_RCU) {
>  		/* ptr (reg) is marked as MEM_RCU, but the struct field is not tagged
>  		 * with __rcu. Mark the flag as PTR_UNTRUSTED conservatively.
> @@ -6187,7 +6198,7 @@ static int process_kptr_func(struct bpf_verifier_env *env, int regno,
>  		verbose(env, "off=%d doesn't point to kptr\n", kptr_off);
>  		return -EACCES;
>  	}
> -	if (kptr_field->type != BPF_KPTR_REF) {
> +	if (kptr_field->type != BPF_KPTR_REF && kptr_field->type != BPF_KPTR_RCU) {
>  		verbose(env, "off=%d kptr isn't referenced kptr\n", kptr_off);
>  		return -EACCES;
>  	}
> @@ -9111,7 +9122,7 @@ static int process_kf_arg_ptr_to_kptr(struct bpf_verifier_env *env,
>  	}
>  
>  	kptr_field = btf_record_find(reg->map_ptr->record, reg->off + reg->var_off.value, BPF_KPTR);
> -	if (!kptr_field || kptr_field->type != BPF_KPTR_REF) {
> +	if (!kptr_field || (kptr_field->type != BPF_KPTR_REF && kptr_field->type != BPF_KPTR_RCU)) {
>  		verbose(env, "arg#0 no referenced kptr at map value offset=%llu\n",
>  			reg->off + reg->var_off.value);
>  		return -EINVAL;
> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> index 6f3d654b3339..73e5029ab5c9 100644
> --- a/net/bpf/test_run.c
> +++ b/net/bpf/test_run.c
> @@ -737,6 +737,7 @@ __bpf_kfunc void bpf_kfunc_call_test_mem_len_fail2(u64 *mem, int len)
>  
>  __bpf_kfunc void bpf_kfunc_call_test_ref(struct prog_test_ref_kfunc *p)
>  {
> +	/* p could be NULL and p->cnt could be 0 */
>  }
>  
>  __bpf_kfunc void bpf_kfunc_call_test_destructive(void)
> @@ -784,7 +785,7 @@ BTF_ID_FLAGS(func, bpf_kfunc_call_test_fail3)
>  BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_pass1)
>  BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_fail1)
>  BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_fail2)
> -BTF_ID_FLAGS(func, bpf_kfunc_call_test_ref, KF_TRUSTED_ARGS)
> +BTF_ID_FLAGS(func, bpf_kfunc_call_test_ref, KF_TRUSTED_ARGS | KF_RCU)
>  BTF_ID_FLAGS(func, bpf_kfunc_call_test_destructive, KF_DESTRUCTIVE)
>  BTF_ID_FLAGS(func, bpf_kfunc_call_test_static_unused_arg)
>  BTF_SET8_END(test_sk_check_kfunc_ids)
> diff --git a/tools/testing/selftests/bpf/progs/map_kptr_fail.c b/tools/testing/selftests/bpf/progs/map_kptr_fail.c
> index e19e2a5f38cf..08f9ec18c345 100644
> --- a/tools/testing/selftests/bpf/progs/map_kptr_fail.c
> +++ b/tools/testing/selftests/bpf/progs/map_kptr_fail.c
> @@ -281,7 +281,7 @@ int reject_kptr_get_bad_type_match(struct __sk_buff *ctx)
>  }
>  
>  SEC("?tc")
> -__failure __msg("R1 type=untrusted_ptr_or_null_ expected=percpu_ptr_")
> +__failure __msg("R1 type=rcu_ptr_or_null_ expected=percpu_ptr_")
>  int mark_ref_as_untrusted_or_null(struct __sk_buff *ctx)
>  {
>  	struct map_value *v;
> @@ -316,7 +316,7 @@ int reject_untrusted_store_to_ref(struct __sk_buff *ctx)
>  }
>  
>  SEC("?tc")
> -__failure __msg("R2 type=untrusted_ptr_ expected=ptr_")
> +__failure __msg("R2 must be referenced")
>  int reject_untrusted_xchg(struct __sk_buff *ctx)
>  {
>  	struct prog_test_ref_kfunc *p;
> diff --git a/tools/testing/selftests/bpf/verifier/calls.c b/tools/testing/selftests/bpf/verifier/calls.c
> index 289ed202ec66..9a326a800e5c 100644
> --- a/tools/testing/selftests/bpf/verifier/calls.c
> +++ b/tools/testing/selftests/bpf/verifier/calls.c
> @@ -243,7 +243,7 @@
>  	},
>  	.result_unpriv = REJECT,
>  	.result = REJECT,
> -	.errstr = "R1 must be referenced",
> +	.errstr = "R1 must be",
>  },
>  {
>  	"calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID",
> diff --git a/tools/testing/selftests/bpf/verifier/map_kptr.c b/tools/testing/selftests/bpf/verifier/map_kptr.c
> index 6914904344c0..d775ccb01989 100644
> --- a/tools/testing/selftests/bpf/verifier/map_kptr.c
> +++ b/tools/testing/selftests/bpf/verifier/map_kptr.c
> @@ -336,7 +336,7 @@
>  	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
>  	.fixup_map_kptr = { 1 },
>  	.result = REJECT,
> -	.errstr = "R1 type=untrusted_ptr_or_null_ expected=percpu_ptr_",
> +	.errstr = "R1 type=rcu_ptr_or_null_ expected=percpu_ptr_",
>  },
>  {
>  	"map_kptr: ref: reject off != 0",
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 bpf-next 2/5] bpf: Mark cgroups and dfl_cgrp fields as trusted.
  2023-02-28  4:01 ` [PATCH v3 bpf-next 2/5] bpf: Mark cgroups and dfl_cgrp fields as trusted Alexei Starovoitov
@ 2023-02-28 16:45   ` David Vernet
  2023-02-28 19:32   ` Tejun Heo
  1 sibling, 0 replies; 17+ messages in thread
From: David Vernet @ 2023-02-28 16:45 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: davem, daniel, andrii, martin.lau, davemarchevsky, tj, memxor,
	netdev, bpf, kernel-team

On Mon, Feb 27, 2023 at 08:01:18PM -0800, Alexei Starovoitov wrote:
> From: Alexei Starovoitov <ast@kernel.org>
> 
> bpf programs sometimes do:
> bpf_cgrp_storage_get(&map, task->cgroups->dfl_cgrp, ...);
> It is safe to do, because cgroups->dfl_cgrp pointer is set diring init and
> never changes. The task->cgroups is also never NULL. It is also set during init
> and will change when task switches cgroups. For any trusted task pointer
> dereference of cgroups and dfl_cgrp should yield trusted pointers. The verifier
> wasn't aware of this. Hence in gcc compiled kernels task->cgroups dereference
> was producing PTR_TO_BTF_ID without modifiers while in clang compiled kernels
> the verifier recognizes __rcu tag in cgroups field and produces
> PTR_TO_BTF_ID | MEM_RCU | MAYBE_NULL.
> Tag cgroups and dfl_cgrp as trusted to equalize clang and gcc behavior.
> When GCC supports btf_type_tag such tagging will done directly in the type.
> 
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Acked-by: David Vernet <void@manifault.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 bpf-next 4/5] selftests/bpf: Add a test case for kptr_rcu.
  2023-02-28  4:01 ` [PATCH v3 bpf-next 4/5] selftests/bpf: Add a test case for kptr_rcu Alexei Starovoitov
@ 2023-02-28 16:52   ` David Vernet
  0 siblings, 0 replies; 17+ messages in thread
From: David Vernet @ 2023-02-28 16:52 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: davem, daniel, andrii, martin.lau, davemarchevsky, tj, memxor,
	netdev, bpf, kernel-team

On Mon, Feb 27, 2023 at 08:01:20PM -0800, Alexei Starovoitov wrote:
> From: Alexei Starovoitov <ast@kernel.org>
> 
> Tweak existing map_kptr test to check kptr_rcu.
> 
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Acked-by: David Vernet <void@manifault.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 bpf-next 5/5] selftests/bpf: Tweak cgroup kfunc test.
  2023-02-28  4:01 ` [PATCH v3 bpf-next 5/5] selftests/bpf: Tweak cgroup kfunc test Alexei Starovoitov
@ 2023-02-28 17:07   ` David Vernet
  2023-03-01  0:29     ` Alexei Starovoitov
  0 siblings, 1 reply; 17+ messages in thread
From: David Vernet @ 2023-02-28 17:07 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: davem, daniel, andrii, martin.lau, davemarchevsky, tj, memxor,
	netdev, bpf, kernel-team

On Mon, Feb 27, 2023 at 08:01:21PM -0800, Alexei Starovoitov wrote:
> From: Alexei Starovoitov <ast@kernel.org>
> 
> Adjust cgroup kfunc test to dereference RCU protected cgroup pointer
> as PTR_TRUSTED and pass into KF_TRUSTED_ARGS kfunc.
> 
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>

This LGTM, but I noticed that another cgrp test was failing with this
patch set:

[root@archbig bpf]# ./test_progs -t cgrp_local_storage/recursion
test_cgrp_local_storage:PASS:join_cgroup /cgrp_local_storage 0 nsec
libbpf: prog 'on_lookup': BPF program load failed: Permission denied
libbpf: prog 'on_lookup': -- BEGIN PROG LOAD LOG --
reg type unsupported for arg#0 function on_lookup#16
0: R1=ctx(off=0,imm=0) R10=fp0
; struct task_struct *task = bpf_get_current_task_btf();
0: (85) call bpf_get_current_task_btf#158     ; R0_w=trusted_ptr_task_struct(off=0,imm=0)
1: (bf) r6 = r0                       ; R0_w=trusted_ptr_task_struct(off=0,imm=0) R6_w=trusted_ptr_task_struct(off=0,imm=0)
; bpf_cgrp_storage_delete(&map_a, task->cgroups->dfl_cgrp);
2: (79) r1 = *(u64 *)(r6 +2296)       ; R1_w=rcu_ptr_or_null_css_set(off=0,imm=0) R6_w=trusted_ptr_task_struct(off=0,imm=0)
; bpf_cgrp_storage_delete(&map_a, task->cgroups->dfl_cgrp);
3: (79) r2 = *(u64 *)(r1 +120)
R1 invalid mem access 'rcu_ptr_or_null_'
processed 4 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
-- END PROG LOAD LOG --
libbpf: prog 'on_lookup': failed to load: -13
libbpf: failed to load object 'cgrp_ls_recursion'
libbpf: failed to load BPF skeleton 'cgrp_ls_recursion': -13
test_recursion:FAIL:skel_open_and_load unexpected error: -13
#43/3    cgrp_local_storage/recursion:FAIL
#43      cgrp_local_storage:FAIL

All error logs:
test_cgrp_local_storage:PASS:join_cgroup /cgrp_local_storage 0 nsec
libbpf: prog 'on_lookup': BPF program load failed: Permission denied
libbpf: prog 'on_lookup': -- BEGIN PROG LOAD LOG --
reg type unsupported for arg#0 function on_lookup#16
0: R1=ctx(off=0,imm=0) R10=fp0
; struct task_struct *task = bpf_get_current_task_btf();
0: (85) call bpf_get_current_task_btf#158     ; R0_w=trusted_ptr_task_struct(off=0,imm=0)
1: (bf) r6 = r0                       ; R0_w=trusted_ptr_task_struct(off=0,imm=0) R6_w=trusted_ptr_task_struct(off=0,imm=0)
; bpf_cgrp_storage_delete(&map_a, task->cgroups->dfl_cgrp);
2: (79) r1 = *(u64 *)(r6 +2296)       ; R1_w=rcu_ptr_or_null_css_set(off=0,imm=0) R6_w=trusted_ptr_task_struct(off=0,imm=0)
; bpf_cgrp_storage_delete(&map_a, task->cgroups->dfl_cgrp);
3: (79) r2 = *(u64 *)(r1 +120)
R1 invalid mem access 'rcu_ptr_or_null_'
processed 4 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
-- END PROG LOAD LOG --
libbpf: prog 'on_lookup': failed to load: -13
libbpf: failed to load object 'cgrp_ls_recursion'
libbpf: failed to load BPF skeleton 'cgrp_ls_recursion': -13
test_recursion:FAIL:skel_open_and_load unexpected error: -13
#43/3    cgrp_local_storage/recursion:FAIL
#43      cgrp_local_storage:FAIL
Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
[root@archbig bpf]#

The ptr type looks correct, so I assumed that the arg type for the proto
needed to be updated to expect NULL. This doesn't seem to fix it though:

diff --git a/kernel/bpf/bpf_cgrp_storage.c b/kernel/bpf/bpf_cgrp_storage.c
index 6cdf6d9ed91d..9d5d47c8e820 100644
--- a/kernel/bpf/bpf_cgrp_storage.c
+++ b/kernel/bpf/bpf_cgrp_storage.c
@@ -241,6 +241,6 @@ const struct bpf_func_proto bpf_cgrp_storage_delete_proto = {
        .gpl_only       = false,
        .ret_type       = RET_INTEGER,
        .arg1_type      = ARG_CONST_MAP_PTR,
-   .arg2_type      = ARG_PTR_TO_BTF_ID,
+ .arg2_type  = ARG_PTR_TO_BTF_ID_OR_NULL,
        .arg2_btf_id    = &bpf_cgroup_btf_id[0],
 };

> ---
>  tools/testing/selftests/bpf/progs/cgrp_kfunc_failure.c | 2 +-
>  tools/testing/selftests/bpf/progs/cgrp_kfunc_success.c | 7 ++++++-
>  2 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/testing/selftests/bpf/progs/cgrp_kfunc_failure.c b/tools/testing/selftests/bpf/progs/cgrp_kfunc_failure.c
> index 4ad7fe24966d..d5a53b5e708f 100644
> --- a/tools/testing/selftests/bpf/progs/cgrp_kfunc_failure.c
> +++ b/tools/testing/selftests/bpf/progs/cgrp_kfunc_failure.c
> @@ -205,7 +205,7 @@ int BPF_PROG(cgrp_kfunc_get_unreleased, struct cgroup *cgrp, const char *path)
>  }
>  
>  SEC("tp_btf/cgroup_mkdir")
> -__failure __msg("arg#0 is untrusted_ptr_or_null_ expected ptr_ or socket")
> +__failure __msg("bpf_cgroup_release expects refcounted")
>  int BPF_PROG(cgrp_kfunc_release_untrusted, struct cgroup *cgrp, const char *path)
>  {
>  	struct __cgrps_kfunc_map_value *v;
> diff --git a/tools/testing/selftests/bpf/progs/cgrp_kfunc_success.c b/tools/testing/selftests/bpf/progs/cgrp_kfunc_success.c
> index 42e13aebdd62..85becaa8573b 100644
> --- a/tools/testing/selftests/bpf/progs/cgrp_kfunc_success.c
> +++ b/tools/testing/selftests/bpf/progs/cgrp_kfunc_success.c
> @@ -61,7 +61,7 @@ int BPF_PROG(test_cgrp_acquire_leave_in_map, struct cgroup *cgrp, const char *pa
>  SEC("tp_btf/cgroup_mkdir")
>  int BPF_PROG(test_cgrp_xchg_release, struct cgroup *cgrp, const char *path)
>  {
> -	struct cgroup *kptr;
> +	struct cgroup *kptr, *cg;
>  	struct __cgrps_kfunc_map_value *v;
>  	long status;
>  
> @@ -80,6 +80,11 @@ int BPF_PROG(test_cgrp_xchg_release, struct cgroup *cgrp, const char *path)
>  		return 0;
>  	}
>  
> +	kptr = v->cgrp;
> +	cg = bpf_cgroup_ancestor(kptr, 1);
> +	if (cg)	/* verifier only check */
> +		bpf_cgroup_release(cg);
> +
>  	kptr = bpf_kptr_xchg(&v->cgrp, NULL);
>  	if (!kptr) {
>  		err = 3;
> -- 
> 2.30.2
> 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 bpf-next 2/5] bpf: Mark cgroups and dfl_cgrp fields as trusted.
  2023-02-28  4:01 ` [PATCH v3 bpf-next 2/5] bpf: Mark cgroups and dfl_cgrp fields as trusted Alexei Starovoitov
  2023-02-28 16:45   ` David Vernet
@ 2023-02-28 19:32   ` Tejun Heo
  1 sibling, 0 replies; 17+ messages in thread
From: Tejun Heo @ 2023-02-28 19:32 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: davem, daniel, andrii, martin.lau, void, davemarchevsky, memxor,
	netdev, bpf, kernel-team

On Mon, Feb 27, 2023 at 08:01:18PM -0800, Alexei Starovoitov wrote:
> From: Alexei Starovoitov <ast@kernel.org>
> 
> bpf programs sometimes do:
> bpf_cgrp_storage_get(&map, task->cgroups->dfl_cgrp, ...);
> It is safe to do, because cgroups->dfl_cgrp pointer is set diring init and
                                                              ^
							      u

> never changes. The task->cgroups is also never NULL. It is also set during init
> and will change when task switches cgroups. For any trusted task pointer
> dereference of cgroups and dfl_cgrp should yield trusted pointers. The verifier
> wasn't aware of this. Hence in gcc compiled kernels task->cgroups dereference
> was producing PTR_TO_BTF_ID without modifiers while in clang compiled kernels
> the verifier recognizes __rcu tag in cgroups field and produces
> PTR_TO_BTF_ID | MEM_RCU | MAYBE_NULL.
> Tag cgroups and dfl_cgrp as trusted to equalize clang and gcc behavior.
> When GCC supports btf_type_tag such tagging will done directly in the type.
> 
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 bpf-next 3/5] bpf: Introduce kptr_rcu.
  2023-02-28  4:01 ` [PATCH v3 bpf-next 3/5] bpf: Introduce kptr_rcu Alexei Starovoitov
  2023-02-28 16:45   ` David Vernet
@ 2023-02-28 19:39   ` Tejun Heo
  1 sibling, 0 replies; 17+ messages in thread
From: Tejun Heo @ 2023-02-28 19:39 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: davem, daniel, andrii, martin.lau, void, davemarchevsky, memxor,
	netdev, bpf, kernel-team

On Mon, Feb 27, 2023 at 08:01:19PM -0800, Alexei Starovoitov wrote:
> From: Alexei Starovoitov <ast@kernel.org>
> 
> The life time of certain kernel structures like 'struct cgroup' is protected by RCU.
> Hence it's safe to dereference them directly from __kptr tagged pointers in bpf maps.
> The resulting pointer is MEM_RCU and can be passed to kfuncs that expect KF_RCU.
> Derefrence of other kptr-s returns PTR_UNTRUSTED.
> 
> For example:
> struct map_value {
>    struct cgroup __kptr *cgrp;
> };
> 
> SEC("tp_btf/cgroup_mkdir")
> int BPF_PROG(test_cgrp_get_ancestors, struct cgroup *cgrp_arg, const char *path)
> {
>   struct cgroup *cg, *cg2;
> 
>   cg = bpf_cgroup_acquire(cgrp_arg); // cg is PTR_TRUSTED and ref_obj_id > 0
>   bpf_kptr_xchg(&v->cgrp, cg);
> 
>   cg2 = v->cgrp; // This is new feature introduced by this patch.
>   // cg2 is PTR_MAYBE_NULL | MEM_RCU.
>   // When cg2 != NULL, it's a valid cgroup, but its percpu_ref could be zero
> 
>   bpf_cgroup_ancestor(cg2, level); // safe to do.
> }
> 
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>

From cgroup POV:

 Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 bpf-next 3/5] bpf: Introduce kptr_rcu.
  2023-02-28 16:45   ` David Vernet
@ 2023-02-28 19:49     ` Alexei Starovoitov
  2023-02-28 20:05       ` David Vernet
  0 siblings, 1 reply; 17+ messages in thread
From: Alexei Starovoitov @ 2023-02-28 19:49 UTC (permalink / raw)
  To: David Vernet
  Cc: davem, daniel, andrii, martin.lau, davemarchevsky, tj, memxor,
	netdev, bpf, kernel-team

On Tue, Feb 28, 2023 at 10:45:16AM -0600, David Vernet wrote:
> On Mon, Feb 27, 2023 at 08:01:19PM -0800, Alexei Starovoitov wrote:
> > From: Alexei Starovoitov <ast@kernel.org>
> > 
> > The life time of certain kernel structures like 'struct cgroup' is protected by RCU.
> > Hence it's safe to dereference them directly from __kptr tagged pointers in bpf maps.
> > The resulting pointer is MEM_RCU and can be passed to kfuncs that expect KF_RCU.
> > Derefrence of other kptr-s returns PTR_UNTRUSTED.
> > 
> > For example:
> > struct map_value {
> >    struct cgroup __kptr *cgrp;
> > };
> > 
> > SEC("tp_btf/cgroup_mkdir")
> > int BPF_PROG(test_cgrp_get_ancestors, struct cgroup *cgrp_arg, const char *path)
> > {
> >   struct cgroup *cg, *cg2;
> > 
> >   cg = bpf_cgroup_acquire(cgrp_arg); // cg is PTR_TRUSTED and ref_obj_id > 0
> >   bpf_kptr_xchg(&v->cgrp, cg);
> > 
> >   cg2 = v->cgrp; // This is new feature introduced by this patch.
> >   // cg2 is PTR_MAYBE_NULL | MEM_RCU.
> >   // When cg2 != NULL, it's a valid cgroup, but its percpu_ref could be zero
> > 
> >   bpf_cgroup_ancestor(cg2, level); // safe to do.
> > }
> > 
> > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > ---
> >  Documentation/bpf/kfuncs.rst                  | 11 ++++---
> >  include/linux/bpf.h                           | 15 ++++++---
> >  include/linux/btf.h                           |  2 +-
> >  kernel/bpf/btf.c                              | 16 +++++++++
> >  kernel/bpf/helpers.c                          |  7 ++--
> >  kernel/bpf/syscall.c                          |  4 +++
> >  kernel/bpf/verifier.c                         | 33 ++++++++++++-------
> >  net/bpf/test_run.c                            |  3 +-
> >  .../selftests/bpf/progs/map_kptr_fail.c       |  4 +--
> >  tools/testing/selftests/bpf/verifier/calls.c  |  2 +-
> >  .../testing/selftests/bpf/verifier/map_kptr.c |  2 +-
> >  11 files changed, 69 insertions(+), 30 deletions(-)
> > 
> > diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst
> > index 7d7c1144372a..49c5cb6f46e7 100644
> > --- a/Documentation/bpf/kfuncs.rst
> > +++ b/Documentation/bpf/kfuncs.rst
> > @@ -232,11 +232,12 @@ added later.
> >  2.4.8 KF_RCU flag
> >  -----------------
> >  
> > -The KF_RCU flag is used for kfuncs which have a rcu ptr as its argument.
> > -When used together with KF_ACQUIRE, it indicates the kfunc should have a
> > -single argument which must be a trusted argument or a MEM_RCU pointer.
> > -The argument may have reference count of 0 and the kfunc must take this
> > -into consideration.
> > +The KF_RCU flag is a weaker version of KF_TRUSTED_ARGS. The kfuncs marked with
> > +KF_RCU expect either PTR_TRUSTED or MEM_RCU arguments. The verifier guarantees
> > +that the objects are valid and there is no use-after-free, but the pointers
> > +maybe NULL and pointee object's reference count could have reached zero, hence
> 
> s/maybe/may be
> 
> > +kfuncs must do != NULL check and consider refcnt==0 case when accessing such
> > +arguments.
> 
> Hmmm, given that it's only necessary to check refcnt==0 if the kfunc is
> KF_ACQUIRE, wdyt about addending this paragraph with something like the
> following (note as well the addition of the KF_RET_NULL suggestion):
> 
> ...the pointers may be NULL, and the object's refcount could have
> reached zero. The kfuncs must therefore do a != NULL check, and if
> returning a KF_ACQUIRE pointer, also check that refcnt != 0. Note as
> well that a KF_ACQUIRE kfunc that is KF_RCU should **very** likely also
> be KF_RET_NULL, for both of these reasons.

Good suggestion.

> >  .. _KF_deprecated_flag:
> >  
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 520b238abd5a..d4b5faa0a777 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -178,11 +178,12 @@ enum btf_field_type {
> >  	BPF_TIMER      = (1 << 1),
> >  	BPF_KPTR_UNREF = (1 << 2),
> >  	BPF_KPTR_REF   = (1 << 3),
> > -	BPF_KPTR       = BPF_KPTR_UNREF | BPF_KPTR_REF,
> > -	BPF_LIST_HEAD  = (1 << 4),
> > -	BPF_LIST_NODE  = (1 << 5),
> > -	BPF_RB_ROOT    = (1 << 6),
> > -	BPF_RB_NODE    = (1 << 7),
> > +	BPF_KPTR_RCU   = (1 << 4), /* kernel internal. not exposed to bpf prog */
> > +	BPF_KPTR       = BPF_KPTR_UNREF | BPF_KPTR_REF | BPF_KPTR_RCU,
> > +	BPF_LIST_HEAD  = (1 << 5),
> > +	BPF_LIST_NODE  = (1 << 6),
> > +	BPF_RB_ROOT    = (1 << 7),
> > +	BPF_RB_NODE    = (1 << 8),
> >  	BPF_GRAPH_NODE_OR_ROOT = BPF_LIST_NODE | BPF_LIST_HEAD |
> >  				 BPF_RB_NODE | BPF_RB_ROOT,
> >  };
> > @@ -284,6 +285,8 @@ static inline const char *btf_field_type_name(enum btf_field_type type)
> >  	case BPF_KPTR_UNREF:
> >  	case BPF_KPTR_REF:
> >  		return "kptr";
> > +	case BPF_KPTR_RCU:
> > +		return "kptr_rcu";
> >  	case BPF_LIST_HEAD:
> >  		return "bpf_list_head";
> >  	case BPF_LIST_NODE:
> > @@ -307,6 +310,7 @@ static inline u32 btf_field_type_size(enum btf_field_type type)
> >  		return sizeof(struct bpf_timer);
> >  	case BPF_KPTR_UNREF:
> >  	case BPF_KPTR_REF:
> > +	case BPF_KPTR_RCU:
> >  		return sizeof(u64);
> >  	case BPF_LIST_HEAD:
> >  		return sizeof(struct bpf_list_head);
> > @@ -331,6 +335,7 @@ static inline u32 btf_field_type_align(enum btf_field_type type)
> >  		return __alignof__(struct bpf_timer);
> >  	case BPF_KPTR_UNREF:
> >  	case BPF_KPTR_REF:
> > +	case BPF_KPTR_RCU:
> >  		return __alignof__(u64);
> >  	case BPF_LIST_HEAD:
> >  		return __alignof__(struct bpf_list_head);
> > diff --git a/include/linux/btf.h b/include/linux/btf.h
> > index 49e0fe6d8274..556b3e2e7471 100644
> > --- a/include/linux/btf.h
> > +++ b/include/linux/btf.h
> > @@ -70,7 +70,7 @@
> >  #define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */
> >  #define KF_SLEEPABLE    (1 << 5) /* kfunc may sleep */
> >  #define KF_DESTRUCTIVE  (1 << 6) /* kfunc performs destructive actions */
> > -#define KF_RCU          (1 << 7) /* kfunc only takes rcu pointer arguments */
> > +#define KF_RCU          (1 << 7) /* kfunc takes either rcu or trusted pointer arguments */
> >  
> >  /*
> >   * Tag marking a kernel function as a kfunc. This is meant to minimize the
> > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > index 01dee7d48e6d..a44ea1f6164b 100644
> > --- a/kernel/bpf/btf.c
> > +++ b/kernel/bpf/btf.c
> > @@ -3552,6 +3552,18 @@ static int btf_find_field(const struct btf *btf, const struct btf_type *t,
> >  	return -EINVAL;
> >  }
> >  
> 
> Could you please add a comment here that once gcc has tag support, we
> can replace this mechanism with just checking the type's BTF tag? I like
> this a lot in the interim though -- it's a very easy way to add kfuncs
> for new RCU-protected types, and will be trivially easy to remove and
> cleanup later.

+1

> > +BTF_SET_START(rcu_protected_types)
> > +BTF_ID(struct, prog_test_ref_kfunc)
> > +BTF_ID(struct, cgroup)
> > +BTF_SET_END(rcu_protected_types)
> > +
> > +static bool rcu_protected_object(const struct btf *btf, u32 btf_id)
> > +{
> > +	if (!btf_is_kernel(btf))
> > +		return false;
> > +	return btf_id_set_contains(&rcu_protected_types, btf_id);
> > +}
> > +
> >  static int btf_parse_kptr(const struct btf *btf, struct btf_field *field,
> >  			  struct btf_field_info *info)
> >  {
> > @@ -3615,6 +3627,10 @@ static int btf_parse_kptr(const struct btf *btf, struct btf_field *field,
> >  		field->kptr.dtor = (void *)addr;
> >  	}
> >  
> > +	if (info->type == BPF_KPTR_REF && rcu_protected_object(kernel_btf, id))
> > +		/* rcu dereference of this field will return MEM_RCU instead of PTR_UNTRUSTED */
> > +		field->type = BPF_KPTR_RCU;
> 
> Can you move this into the if block above, and update the conditional to
> just be:
> 
> if (rcu_protected_object(kernel_btf, id))

good idea.

> Also, outside the scope of your patch and subjective, but IMO it's a bit
> confusing that we're looking at info->type, when field->type already ==
> info->type. When reading the code it looks like field->type is unset
> unless we set it to BPF_KPTR_RCU, but in reality we're just overwriting
> it from being BPF_KPTR_REF. Might be worth tidying up at some point (I
> can do that in a follow-on patch once this series lands).

The caller of btf_parse_kptr() provided temporary btf_field_info array.
Since there is only one caller it's easy to see. Not sure what clean up you have in mind.

> >  	field->kptr.btf_id = id;
> >  	field->kptr.btf = kernel_btf;
> >  	field->kptr.module = mod;
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index a784be6f8bac..fed74afd45d1 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> > @@ -2094,11 +2094,12 @@ __bpf_kfunc struct cgroup *bpf_cgroup_ancestor(struct cgroup *cgrp, int level)
> >  {
> >  	struct cgroup *ancestor;
> >  
> > -	if (level > cgrp->level || level < 0)
> > +	if (!cgrp || level > cgrp->level || level < 0)
> >  		return NULL;
> >  
> >  	ancestor = cgrp->ancestors[level];
> > -	cgroup_get(ancestor);
> > +	if (!cgroup_tryget(ancestor))
> > +		return NULL;
> >  	return ancestor;
> >  }
> >  
> > @@ -2183,7 +2184,7 @@ BTF_ID_FLAGS(func, bpf_rbtree_first, KF_RET_NULL)
> >  BTF_ID_FLAGS(func, bpf_cgroup_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS)
> >  BTF_ID_FLAGS(func, bpf_cgroup_kptr_get, KF_ACQUIRE | KF_KPTR_GET | KF_RET_NULL)
> >  BTF_ID_FLAGS(func, bpf_cgroup_release, KF_RELEASE)
> > -BTF_ID_FLAGS(func, bpf_cgroup_ancestor, KF_ACQUIRE | KF_TRUSTED_ARGS | KF_RET_NULL)
> > +BTF_ID_FLAGS(func, bpf_cgroup_ancestor, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
> >  BTF_ID_FLAGS(func, bpf_cgroup_from_id, KF_ACQUIRE | KF_RET_NULL)
> >  #endif
> >  BTF_ID_FLAGS(func, bpf_task_from_pid, KF_ACQUIRE | KF_RET_NULL)
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index e3fcdc9836a6..2e730918911c 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -539,6 +539,7 @@ void btf_record_free(struct btf_record *rec)
> >  		switch (rec->fields[i].type) {
> >  		case BPF_KPTR_UNREF:
> >  		case BPF_KPTR_REF:
> > +		case BPF_KPTR_RCU:
> >  			if (rec->fields[i].kptr.module)
> >  				module_put(rec->fields[i].kptr.module);
> >  			btf_put(rec->fields[i].kptr.btf);
> > @@ -584,6 +585,7 @@ struct btf_record *btf_record_dup(const struct btf_record *rec)
> >  		switch (fields[i].type) {
> >  		case BPF_KPTR_UNREF:
> >  		case BPF_KPTR_REF:
> > +		case BPF_KPTR_RCU:
> >  			btf_get(fields[i].kptr.btf);
> >  			if (fields[i].kptr.module && !try_module_get(fields[i].kptr.module)) {
> >  				ret = -ENXIO;
> > @@ -669,6 +671,7 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
> >  			WRITE_ONCE(*(u64 *)field_ptr, 0);
> >  			break;
> >  		case BPF_KPTR_REF:
> > +		case BPF_KPTR_RCU:
> 
> The fact that we're adding this case is IMO a sign that we're arguably
> breaking abstractions a bit. BPF_KPTR_REF should really be the kptr type
> that holds a reference and for which we should be firing the destructor,
> and RCU protection should ideally be something we could just derive
> later in the verifier. 

I've considered keeping BPF_KPTR_REF as-is and just add a "bool is_kptr_rcu;"
to indicate it's a BPF_KPTR_REF with extra RCU properties, but they are different
enough. So it's cleaner to make them stand out.
With BPF_KPTR_RCU being different type it's impossible for other bits
in the verifier to silently accept BPF_KPTR_REF that shouldn't have RCU property.

> Not a huge problem given that this complexity is
> completely hidden from the user, but I'm not fully understanding why
> the extra complexity of BPF_KPTR_RCU is necessary. See below in another
> comment in verifier.c.
> 
> >  			field->kptr.dtor((void *)xchg((unsigned long *)field_ptr, 0));
> 
> Also completely unrelated to your patch set, but we should probably only
> invoke field->kptr.dtor() if the value in field_ptr ends up being
> non-NULL after the xchg. Otherwise, all KF_RELEASE kfuncs have to check
> for NULL, even though they expect inherently trusted args. I can also do
> that in a follow-on patch.

Good point. The verifier forces bpf progs to do if (ptr != NULL) bpf_..__release(ptr);
but we still have duplicated !=NULL check inside dtor-s,
because both BPF_KPTR_RCU and BPF_KPTR_REF can be NULL here.
It would be good to clean up indeed.

> 
> >  			break;
> >  		case BPF_LIST_HEAD:
> > @@ -1058,6 +1061,7 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
> >  				break;
> >  			case BPF_KPTR_UNREF:
> >  			case BPF_KPTR_REF:
> > +			case BPF_KPTR_RCU:
> >  				if (map->map_type != BPF_MAP_TYPE_HASH &&
> >  				    map->map_type != BPF_MAP_TYPE_LRU_HASH &&
> >  				    map->map_type != BPF_MAP_TYPE_ARRAY &&
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index e4234266e76d..0b728ce0dde9 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -4183,7 +4183,7 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
> >  			       struct bpf_reg_state *reg, u32 regno)
> >  {
> >  	const char *targ_name = kernel_type_name(kptr_field->kptr.btf, kptr_field->kptr.btf_id);
> > -	int perm_flags = PTR_MAYBE_NULL | PTR_TRUSTED;
> > +	int perm_flags = PTR_MAYBE_NULL | PTR_TRUSTED | MEM_RCU;
> >  	const char *reg_name = "";
> >  
> >  	/* Only unreferenced case accepts untrusted pointers */
> > @@ -4230,12 +4230,12 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
> >  	 * In the kptr_ref case, check_func_arg_reg_off already ensures reg->off
> >  	 * is zero. We must also ensure that btf_struct_ids_match does not walk
> >  	 * the struct to match type against first member of struct, i.e. reject
> > -	 * second case from above. Hence, when type is BPF_KPTR_REF, we set
> > +	 * second case from above. Hence, when type is BPF_KPTR_REF | BPF_KPTR_RCU, we set
> >  	 * strict mode to true for type match.
> >  	 */
> >  	if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
> >  				  kptr_field->kptr.btf, kptr_field->kptr.btf_id,
> > -				  kptr_field->type == BPF_KPTR_REF))
> > +				  kptr_field->type == BPF_KPTR_REF || kptr_field->type == BPF_KPTR_RCU))
> >  		goto bad_type;
> >  	return 0;
> >  bad_type:
> > @@ -4250,6 +4250,14 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
> >  	return -EINVAL;
> >  }
> >  
> > +/* The non-sleepable programs and sleepable programs with explicit bpf_rcu_read_lock()
> > + * can dereference RCU protected pointers and result is PTR_TRUSTED.
> > + */
> > +static bool in_rcu_cs(struct bpf_verifier_env *env)
> > +{
> > +	return env->cur_state->active_rcu_lock || !env->prog->aux->sleepable;
> > +}
> > +
> >  static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
> >  				 int value_regno, int insn_idx,
> >  				 struct btf_field *kptr_field)
> > @@ -4273,7 +4281,7 @@ static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
> >  	/* We only allow loading referenced kptr, since it will be marked as
> >  	 * untrusted, similar to unreferenced kptr.
> >  	 */
> > -	if (class != BPF_LDX && kptr_field->type == BPF_KPTR_REF) {
> > +	if (class != BPF_LDX && kptr_field->type != BPF_KPTR_UNREF) {
> >  		verbose(env, "store to referenced kptr disallowed\n");
> >  		return -EACCES;
> >  	}
> > @@ -4284,7 +4292,10 @@ static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
> >  		 * value from map as PTR_TO_BTF_ID, with the correct type.
> >  		 */
> >  		mark_btf_ld_reg(env, cur_regs(env), value_regno, PTR_TO_BTF_ID, kptr_field->kptr.btf,
> > -				kptr_field->kptr.btf_id, PTR_MAYBE_NULL | PTR_UNTRUSTED);
> > +				kptr_field->kptr.btf_id,
> > +				kptr_field->type == BPF_KPTR_RCU && in_rcu_cs(env) ?
> 
> If we replaced this kptr_field->type == BPF_KPTR_RCU check with
> something like btf_rcu_safe_kptr(kptr_field), corresponding to:
> 
> bool btf_rcu_safe_kptr(const struct btf_field *field)
> {
> 	const struct btf_field_kptr *kptr = &field->kptr;
> 
> 	return field->type == BPF_KPTR_REF && rcu_protected_object(kptr->btf, kptr->btf_id);
> }
> 
> Wouldn't that allow us to avoid having to define BPF_KPTR_RCU at all?
> Given that BPF_KPTR_RCU is really just an instance of BPF_KPTR_REF which
> may also derive safety from RCU protection, this seems both simpler and
> more thematic. Or am I missing something?

See my earlier reply. It felt cleaner to keep them separate so that
BPF_KPTR_RCU won't be accepted in placed where only BPF_KPTR_REF is ok.
I'm probably overthinking.
Looking at the code again all places with BPF_KPTR_REF were appended with BPF_KPTR_RCU.
So, yeah, let's go with your suggestion above. A lot less code to maintain and
if it turns out to be an issue we can go back to separate types.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 bpf-next 3/5] bpf: Introduce kptr_rcu.
  2023-02-28 19:49     ` Alexei Starovoitov
@ 2023-02-28 20:05       ` David Vernet
  0 siblings, 0 replies; 17+ messages in thread
From: David Vernet @ 2023-02-28 20:05 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: davem, daniel, andrii, martin.lau, davemarchevsky, tj, memxor,
	netdev, bpf, kernel-team

On Tue, Feb 28, 2023 at 11:49:23AM -0800, Alexei Starovoitov wrote:
> On Tue, Feb 28, 2023 at 10:45:16AM -0600, David Vernet wrote:
> > On Mon, Feb 27, 2023 at 08:01:19PM -0800, Alexei Starovoitov wrote:
> > > From: Alexei Starovoitov <ast@kernel.org>
> > > 
> > > The life time of certain kernel structures like 'struct cgroup' is protected by RCU.
> > > Hence it's safe to dereference them directly from __kptr tagged pointers in bpf maps.
> > > The resulting pointer is MEM_RCU and can be passed to kfuncs that expect KF_RCU.
> > > Derefrence of other kptr-s returns PTR_UNTRUSTED.
> > > 
> > > For example:
> > > struct map_value {
> > >    struct cgroup __kptr *cgrp;
> > > };
> > > 
> > > SEC("tp_btf/cgroup_mkdir")
> > > int BPF_PROG(test_cgrp_get_ancestors, struct cgroup *cgrp_arg, const char *path)
> > > {
> > >   struct cgroup *cg, *cg2;
> > > 
> > >   cg = bpf_cgroup_acquire(cgrp_arg); // cg is PTR_TRUSTED and ref_obj_id > 0
> > >   bpf_kptr_xchg(&v->cgrp, cg);
> > > 
> > >   cg2 = v->cgrp; // This is new feature introduced by this patch.
> > >   // cg2 is PTR_MAYBE_NULL | MEM_RCU.
> > >   // When cg2 != NULL, it's a valid cgroup, but its percpu_ref could be zero
> > > 
> > >   bpf_cgroup_ancestor(cg2, level); // safe to do.
> > > }
> > > 
> > > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > > ---
> > >  Documentation/bpf/kfuncs.rst                  | 11 ++++---
> > >  include/linux/bpf.h                           | 15 ++++++---
> > >  include/linux/btf.h                           |  2 +-
> > >  kernel/bpf/btf.c                              | 16 +++++++++
> > >  kernel/bpf/helpers.c                          |  7 ++--
> > >  kernel/bpf/syscall.c                          |  4 +++
> > >  kernel/bpf/verifier.c                         | 33 ++++++++++++-------
> > >  net/bpf/test_run.c                            |  3 +-
> > >  .../selftests/bpf/progs/map_kptr_fail.c       |  4 +--
> > >  tools/testing/selftests/bpf/verifier/calls.c  |  2 +-
> > >  .../testing/selftests/bpf/verifier/map_kptr.c |  2 +-
> > >  11 files changed, 69 insertions(+), 30 deletions(-)
> > > 
> > > diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst
> > > index 7d7c1144372a..49c5cb6f46e7 100644
> > > --- a/Documentation/bpf/kfuncs.rst
> > > +++ b/Documentation/bpf/kfuncs.rst
> > > @@ -232,11 +232,12 @@ added later.
> > >  2.4.8 KF_RCU flag
> > >  -----------------
> > >  
> > > -The KF_RCU flag is used for kfuncs which have a rcu ptr as its argument.
> > > -When used together with KF_ACQUIRE, it indicates the kfunc should have a
> > > -single argument which must be a trusted argument or a MEM_RCU pointer.
> > > -The argument may have reference count of 0 and the kfunc must take this
> > > -into consideration.
> > > +The KF_RCU flag is a weaker version of KF_TRUSTED_ARGS. The kfuncs marked with
> > > +KF_RCU expect either PTR_TRUSTED or MEM_RCU arguments. The verifier guarantees
> > > +that the objects are valid and there is no use-after-free, but the pointers
> > > +maybe NULL and pointee object's reference count could have reached zero, hence
> > 
> > s/maybe/may be
> > 
> > > +kfuncs must do != NULL check and consider refcnt==0 case when accessing such
> > > +arguments.
> > 
> > Hmmm, given that it's only necessary to check refcnt==0 if the kfunc is
> > KF_ACQUIRE, wdyt about addending this paragraph with something like the
> > following (note as well the addition of the KF_RET_NULL suggestion):
> > 
> > ...the pointers may be NULL, and the object's refcount could have
> > reached zero. The kfuncs must therefore do a != NULL check, and if
> > returning a KF_ACQUIRE pointer, also check that refcnt != 0. Note as
> > well that a KF_ACQUIRE kfunc that is KF_RCU should **very** likely also
> > be KF_RET_NULL, for both of these reasons.
> 
> Good suggestion.
> 
> > >  .. _KF_deprecated_flag:
> > >  
> > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > index 520b238abd5a..d4b5faa0a777 100644
> > > --- a/include/linux/bpf.h
> > > +++ b/include/linux/bpf.h
> > > @@ -178,11 +178,12 @@ enum btf_field_type {
> > >  	BPF_TIMER      = (1 << 1),
> > >  	BPF_KPTR_UNREF = (1 << 2),
> > >  	BPF_KPTR_REF   = (1 << 3),
> > > -	BPF_KPTR       = BPF_KPTR_UNREF | BPF_KPTR_REF,
> > > -	BPF_LIST_HEAD  = (1 << 4),
> > > -	BPF_LIST_NODE  = (1 << 5),
> > > -	BPF_RB_ROOT    = (1 << 6),
> > > -	BPF_RB_NODE    = (1 << 7),
> > > +	BPF_KPTR_RCU   = (1 << 4), /* kernel internal. not exposed to bpf prog */
> > > +	BPF_KPTR       = BPF_KPTR_UNREF | BPF_KPTR_REF | BPF_KPTR_RCU,
> > > +	BPF_LIST_HEAD  = (1 << 5),
> > > +	BPF_LIST_NODE  = (1 << 6),
> > > +	BPF_RB_ROOT    = (1 << 7),
> > > +	BPF_RB_NODE    = (1 << 8),
> > >  	BPF_GRAPH_NODE_OR_ROOT = BPF_LIST_NODE | BPF_LIST_HEAD |
> > >  				 BPF_RB_NODE | BPF_RB_ROOT,
> > >  };
> > > @@ -284,6 +285,8 @@ static inline const char *btf_field_type_name(enum btf_field_type type)
> > >  	case BPF_KPTR_UNREF:
> > >  	case BPF_KPTR_REF:
> > >  		return "kptr";
> > > +	case BPF_KPTR_RCU:
> > > +		return "kptr_rcu";
> > >  	case BPF_LIST_HEAD:
> > >  		return "bpf_list_head";
> > >  	case BPF_LIST_NODE:
> > > @@ -307,6 +310,7 @@ static inline u32 btf_field_type_size(enum btf_field_type type)
> > >  		return sizeof(struct bpf_timer);
> > >  	case BPF_KPTR_UNREF:
> > >  	case BPF_KPTR_REF:
> > > +	case BPF_KPTR_RCU:
> > >  		return sizeof(u64);
> > >  	case BPF_LIST_HEAD:
> > >  		return sizeof(struct bpf_list_head);
> > > @@ -331,6 +335,7 @@ static inline u32 btf_field_type_align(enum btf_field_type type)
> > >  		return __alignof__(struct bpf_timer);
> > >  	case BPF_KPTR_UNREF:
> > >  	case BPF_KPTR_REF:
> > > +	case BPF_KPTR_RCU:
> > >  		return __alignof__(u64);
> > >  	case BPF_LIST_HEAD:
> > >  		return __alignof__(struct bpf_list_head);
> > > diff --git a/include/linux/btf.h b/include/linux/btf.h
> > > index 49e0fe6d8274..556b3e2e7471 100644
> > > --- a/include/linux/btf.h
> > > +++ b/include/linux/btf.h
> > > @@ -70,7 +70,7 @@
> > >  #define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */
> > >  #define KF_SLEEPABLE    (1 << 5) /* kfunc may sleep */
> > >  #define KF_DESTRUCTIVE  (1 << 6) /* kfunc performs destructive actions */
> > > -#define KF_RCU          (1 << 7) /* kfunc only takes rcu pointer arguments */
> > > +#define KF_RCU          (1 << 7) /* kfunc takes either rcu or trusted pointer arguments */
> > >  
> > >  /*
> > >   * Tag marking a kernel function as a kfunc. This is meant to minimize the
> > > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > > index 01dee7d48e6d..a44ea1f6164b 100644
> > > --- a/kernel/bpf/btf.c
> > > +++ b/kernel/bpf/btf.c
> > > @@ -3552,6 +3552,18 @@ static int btf_find_field(const struct btf *btf, const struct btf_type *t,
> > >  	return -EINVAL;
> > >  }
> > >  
> > 
> > Could you please add a comment here that once gcc has tag support, we
> > can replace this mechanism with just checking the type's BTF tag? I like
> > this a lot in the interim though -- it's a very easy way to add kfuncs
> > for new RCU-protected types, and will be trivially easy to remove and
> > cleanup later.
> 
> +1
> 
> > > +BTF_SET_START(rcu_protected_types)
> > > +BTF_ID(struct, prog_test_ref_kfunc)
> > > +BTF_ID(struct, cgroup)
> > > +BTF_SET_END(rcu_protected_types)
> > > +
> > > +static bool rcu_protected_object(const struct btf *btf, u32 btf_id)
> > > +{
> > > +	if (!btf_is_kernel(btf))
> > > +		return false;
> > > +	return btf_id_set_contains(&rcu_protected_types, btf_id);
> > > +}
> > > +
> > >  static int btf_parse_kptr(const struct btf *btf, struct btf_field *field,
> > >  			  struct btf_field_info *info)
> > >  {
> > > @@ -3615,6 +3627,10 @@ static int btf_parse_kptr(const struct btf *btf, struct btf_field *field,
> > >  		field->kptr.dtor = (void *)addr;
> > >  	}
> > >  
> > > +	if (info->type == BPF_KPTR_REF && rcu_protected_object(kernel_btf, id))
> > > +		/* rcu dereference of this field will return MEM_RCU instead of PTR_UNTRUSTED */
> > > +		field->type = BPF_KPTR_RCU;
> > 
> > Can you move this into the if block above, and update the conditional to
> > just be:
> > 
> > if (rcu_protected_object(kernel_btf, id))
> 
> good idea.
> 
> > Also, outside the scope of your patch and subjective, but IMO it's a bit
> > confusing that we're looking at info->type, when field->type already ==
> > info->type. When reading the code it looks like field->type is unset
> > unless we set it to BPF_KPTR_RCU, but in reality we're just overwriting
> > it from being BPF_KPTR_REF. Might be worth tidying up at some point (I
> > can do that in a follow-on patch once this series lands).
> 
> The caller of btf_parse_kptr() provided temporary btf_field_info array.
> Since there is only one caller it's easy to see. Not sure what clean up you have in mind.

I was thinking of checking field->type instead of info->type in the if
statements to make it more clear that field->type is already set. This
is really just a subjective point about readability. Probably wasn't
worth the additional noise on the patch.

> > >  	field->kptr.btf_id = id;
> > >  	field->kptr.btf = kernel_btf;
> > >  	field->kptr.module = mod;
> > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > > index a784be6f8bac..fed74afd45d1 100644
> > > --- a/kernel/bpf/helpers.c
> > > +++ b/kernel/bpf/helpers.c
> > > @@ -2094,11 +2094,12 @@ __bpf_kfunc struct cgroup *bpf_cgroup_ancestor(struct cgroup *cgrp, int level)
> > >  {
> > >  	struct cgroup *ancestor;
> > >  
> > > -	if (level > cgrp->level || level < 0)
> > > +	if (!cgrp || level > cgrp->level || level < 0)
> > >  		return NULL;
> > >  
> > >  	ancestor = cgrp->ancestors[level];
> > > -	cgroup_get(ancestor);
> > > +	if (!cgroup_tryget(ancestor))
> > > +		return NULL;
> > >  	return ancestor;
> > >  }
> > >  
> > > @@ -2183,7 +2184,7 @@ BTF_ID_FLAGS(func, bpf_rbtree_first, KF_RET_NULL)
> > >  BTF_ID_FLAGS(func, bpf_cgroup_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS)
> > >  BTF_ID_FLAGS(func, bpf_cgroup_kptr_get, KF_ACQUIRE | KF_KPTR_GET | KF_RET_NULL)
> > >  BTF_ID_FLAGS(func, bpf_cgroup_release, KF_RELEASE)
> > > -BTF_ID_FLAGS(func, bpf_cgroup_ancestor, KF_ACQUIRE | KF_TRUSTED_ARGS | KF_RET_NULL)
> > > +BTF_ID_FLAGS(func, bpf_cgroup_ancestor, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
> > >  BTF_ID_FLAGS(func, bpf_cgroup_from_id, KF_ACQUIRE | KF_RET_NULL)
> > >  #endif
> > >  BTF_ID_FLAGS(func, bpf_task_from_pid, KF_ACQUIRE | KF_RET_NULL)
> > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > > index e3fcdc9836a6..2e730918911c 100644
> > > --- a/kernel/bpf/syscall.c
> > > +++ b/kernel/bpf/syscall.c
> > > @@ -539,6 +539,7 @@ void btf_record_free(struct btf_record *rec)
> > >  		switch (rec->fields[i].type) {
> > >  		case BPF_KPTR_UNREF:
> > >  		case BPF_KPTR_REF:
> > > +		case BPF_KPTR_RCU:
> > >  			if (rec->fields[i].kptr.module)
> > >  				module_put(rec->fields[i].kptr.module);
> > >  			btf_put(rec->fields[i].kptr.btf);
> > > @@ -584,6 +585,7 @@ struct btf_record *btf_record_dup(const struct btf_record *rec)
> > >  		switch (fields[i].type) {
> > >  		case BPF_KPTR_UNREF:
> > >  		case BPF_KPTR_REF:
> > > +		case BPF_KPTR_RCU:
> > >  			btf_get(fields[i].kptr.btf);
> > >  			if (fields[i].kptr.module && !try_module_get(fields[i].kptr.module)) {
> > >  				ret = -ENXIO;
> > > @@ -669,6 +671,7 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
> > >  			WRITE_ONCE(*(u64 *)field_ptr, 0);
> > >  			break;
> > >  		case BPF_KPTR_REF:
> > > +		case BPF_KPTR_RCU:
> > 
> > The fact that we're adding this case is IMO a sign that we're arguably
> > breaking abstractions a bit. BPF_KPTR_REF should really be the kptr type
> > that holds a reference and for which we should be firing the destructor,
> > and RCU protection should ideally be something we could just derive
> > later in the verifier. 
> 
> I've considered keeping BPF_KPTR_REF as-is and just add a "bool is_kptr_rcu;"
> to indicate it's a BPF_KPTR_REF with extra RCU properties, but they are different
> enough. So it's cleaner to make them stand out.
> With BPF_KPTR_RCU being different type it's impossible for other bits
> in the verifier to silently accept BPF_KPTR_REF that shouldn't have RCU property.

Makes sense, will reply below.

> > Not a huge problem given that this complexity is
> > completely hidden from the user, but I'm not fully understanding why
> > the extra complexity of BPF_KPTR_RCU is necessary. See below in another
> > comment in verifier.c.
> > 
> > >  			field->kptr.dtor((void *)xchg((unsigned long *)field_ptr, 0));
> > 
> > Also completely unrelated to your patch set, but we should probably only
> > invoke field->kptr.dtor() if the value in field_ptr ends up being
> > non-NULL after the xchg. Otherwise, all KF_RELEASE kfuncs have to check
> > for NULL, even though they expect inherently trusted args. I can also do
> > that in a follow-on patch.
> 
> Good point. The verifier forces bpf progs to do if (ptr != NULL) bpf_..__release(ptr);
> but we still have duplicated !=NULL check inside dtor-s,
> because both BPF_KPTR_RCU and BPF_KPTR_REF can be NULL here.
> It would be good to clean up indeed.

Great, I'll send a patch after this set lands.

> > 
> > >  			break;
> > >  		case BPF_LIST_HEAD:
> > > @@ -1058,6 +1061,7 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
> > >  				break;
> > >  			case BPF_KPTR_UNREF:
> > >  			case BPF_KPTR_REF:
> > > +			case BPF_KPTR_RCU:
> > >  				if (map->map_type != BPF_MAP_TYPE_HASH &&
> > >  				    map->map_type != BPF_MAP_TYPE_LRU_HASH &&
> > >  				    map->map_type != BPF_MAP_TYPE_ARRAY &&
> > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > index e4234266e76d..0b728ce0dde9 100644
> > > --- a/kernel/bpf/verifier.c
> > > +++ b/kernel/bpf/verifier.c
> > > @@ -4183,7 +4183,7 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
> > >  			       struct bpf_reg_state *reg, u32 regno)
> > >  {
> > >  	const char *targ_name = kernel_type_name(kptr_field->kptr.btf, kptr_field->kptr.btf_id);
> > > -	int perm_flags = PTR_MAYBE_NULL | PTR_TRUSTED;
> > > +	int perm_flags = PTR_MAYBE_NULL | PTR_TRUSTED | MEM_RCU;
> > >  	const char *reg_name = "";
> > >  
> > >  	/* Only unreferenced case accepts untrusted pointers */
> > > @@ -4230,12 +4230,12 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
> > >  	 * In the kptr_ref case, check_func_arg_reg_off already ensures reg->off
> > >  	 * is zero. We must also ensure that btf_struct_ids_match does not walk
> > >  	 * the struct to match type against first member of struct, i.e. reject
> > > -	 * second case from above. Hence, when type is BPF_KPTR_REF, we set
> > > +	 * second case from above. Hence, when type is BPF_KPTR_REF | BPF_KPTR_RCU, we set
> > >  	 * strict mode to true for type match.
> > >  	 */
> > >  	if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
> > >  				  kptr_field->kptr.btf, kptr_field->kptr.btf_id,
> > > -				  kptr_field->type == BPF_KPTR_REF))
> > > +				  kptr_field->type == BPF_KPTR_REF || kptr_field->type == BPF_KPTR_RCU))
> > >  		goto bad_type;
> > >  	return 0;
> > >  bad_type:
> > > @@ -4250,6 +4250,14 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
> > >  	return -EINVAL;
> > >  }
> > >  
> > > +/* The non-sleepable programs and sleepable programs with explicit bpf_rcu_read_lock()
> > > + * can dereference RCU protected pointers and result is PTR_TRUSTED.
> > > + */
> > > +static bool in_rcu_cs(struct bpf_verifier_env *env)
> > > +{
> > > +	return env->cur_state->active_rcu_lock || !env->prog->aux->sleepable;
> > > +}
> > > +
> > >  static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
> > >  				 int value_regno, int insn_idx,
> > >  				 struct btf_field *kptr_field)
> > > @@ -4273,7 +4281,7 @@ static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
> > >  	/* We only allow loading referenced kptr, since it will be marked as
> > >  	 * untrusted, similar to unreferenced kptr.
> > >  	 */
> > > -	if (class != BPF_LDX && kptr_field->type == BPF_KPTR_REF) {
> > > +	if (class != BPF_LDX && kptr_field->type != BPF_KPTR_UNREF) {
> > >  		verbose(env, "store to referenced kptr disallowed\n");
> > >  		return -EACCES;
> > >  	}
> > > @@ -4284,7 +4292,10 @@ static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
> > >  		 * value from map as PTR_TO_BTF_ID, with the correct type.
> > >  		 */
> > >  		mark_btf_ld_reg(env, cur_regs(env), value_regno, PTR_TO_BTF_ID, kptr_field->kptr.btf,
> > > -				kptr_field->kptr.btf_id, PTR_MAYBE_NULL | PTR_UNTRUSTED);
> > > +				kptr_field->kptr.btf_id,
> > > +				kptr_field->type == BPF_KPTR_RCU && in_rcu_cs(env) ?
> > 
> > If we replaced this kptr_field->type == BPF_KPTR_RCU check with
> > something like btf_rcu_safe_kptr(kptr_field), corresponding to:
> > 
> > bool btf_rcu_safe_kptr(const struct btf_field *field)
> > {
> > 	const struct btf_field_kptr *kptr = &field->kptr;
> > 
> > 	return field->type == BPF_KPTR_REF && rcu_protected_object(kptr->btf, kptr->btf_id);
> > }
> > 
> > Wouldn't that allow us to avoid having to define BPF_KPTR_RCU at all?
> > Given that BPF_KPTR_RCU is really just an instance of BPF_KPTR_REF which
> > may also derive safety from RCU protection, this seems both simpler and
> > more thematic. Or am I missing something?
> 
> See my earlier reply. It felt cleaner to keep them separate so that
> BPF_KPTR_RCU won't be accepted in placed where only BPF_KPTR_REF is ok.
> I'm probably overthinking.
> Looking at the code again all places with BPF_KPTR_REF were appended with BPF_KPTR_RCU.
> So, yeah, let's go with your suggestion above. A lot less code to maintain and

Yep, that's what I was thinking. The fact that there are so many places
where we're appending BPF_KPTR_RCU to BPF_KPTR_REF suggests to me
they're more similar than they are different, so either a bool
rcu_protected field as you suggested above, or a dynamic check as I
suggested, seems like the preferable tradeoff.

> if it turns out to be an issue we can go back to separate types.

Sounds good to me.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 bpf-next 5/5] selftests/bpf: Tweak cgroup kfunc test.
  2023-02-28 17:07   ` David Vernet
@ 2023-03-01  0:29     ` Alexei Starovoitov
  2023-03-01  2:22       ` David Vernet
  0 siblings, 1 reply; 17+ messages in thread
From: Alexei Starovoitov @ 2023-03-01  0:29 UTC (permalink / raw)
  To: David Vernet
  Cc: David S. Miller, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Dave Marchevsky, Tejun Heo,
	Kumar Kartikeya Dwivedi, Network Development, bpf, Kernel Team

On Tue, Feb 28, 2023 at 9:07 AM David Vernet <void@manifault.com> wrote:
> libbpf: prog 'on_lookup': failed to load: -13
> libbpf: failed to load object 'cgrp_ls_recursion'
> libbpf: failed to load BPF skeleton 'cgrp_ls_recursion': -13
> test_recursion:FAIL:skel_open_and_load unexpected error: -13
> #43/3    cgrp_local_storage/recursion:FAIL
> #43      cgrp_local_storage:FAIL
>
> All error logs:
> test_cgrp_local_storage:PASS:join_cgroup /cgrp_local_storage 0 nsec
> libbpf: prog 'on_lookup': BPF program load failed: Permission denied
> libbpf: prog 'on_lookup': -- BEGIN PROG LOAD LOG --
> reg type unsupported for arg#0 function on_lookup#16
> 0: R1=ctx(off=0,imm=0) R10=fp0
> ; struct task_struct *task = bpf_get_current_task_btf();
> 0: (85) call bpf_get_current_task_btf#158     ; R0_w=trusted_ptr_task_struct(off=0,imm=0)
> 1: (bf) r6 = r0                       ; R0_w=trusted_ptr_task_struct(off=0,imm=0) R6_w=trusted_ptr_task_struct(off=0,imm=0)
> ; bpf_cgrp_storage_delete(&map_a, task->cgroups->dfl_cgrp);
> 2: (79) r1 = *(u64 *)(r6 +2296)       ; R1_w=rcu_ptr_or_null_css_set(off=0,imm=0) R6_w=trusted_ptr_task_struct(off=0,imm=0)
> ; bpf_cgrp_storage_delete(&map_a, task->cgroups->dfl_cgrp);
> 3: (79) r2 = *(u64 *)(r1 +120)
> R1 invalid mem access 'rcu_ptr_or_null_'

This one was tricky :)
Turned out btf_nested_type_is_trusted() was able to find
'cgroups' field in gcc compiled kernel and was failing on clang
compiled kernel because patch 2 did:
BTF_TYPE_SAFE_NESTED(struct task_struct) {
        const cpumask_t *cpus_ptr;
        struct css_set *cgroups;
};
instead of
BTF_TYPE_SAFE_NESTED(struct task_struct) {
        const cpumask_t *cpus_ptr;
        struct css_set __rcu *cgroups;
};
The missing tag was causing a miscompare.
Something to keep in mind.
This ugliness will go away once GCC supports btf tag.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 bpf-next 5/5] selftests/bpf: Tweak cgroup kfunc test.
  2023-03-01  0:29     ` Alexei Starovoitov
@ 2023-03-01  2:22       ` David Vernet
  0 siblings, 0 replies; 17+ messages in thread
From: David Vernet @ 2023-03-01  2:22 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Dave Marchevsky, Tejun Heo,
	Kumar Kartikeya Dwivedi, Network Development, bpf, Kernel Team

On Tue, Feb 28, 2023 at 04:29:18PM -0800, Alexei Starovoitov wrote:
> On Tue, Feb 28, 2023 at 9:07 AM David Vernet <void@manifault.com> wrote:
> > libbpf: prog 'on_lookup': failed to load: -13
> > libbpf: failed to load object 'cgrp_ls_recursion'
> > libbpf: failed to load BPF skeleton 'cgrp_ls_recursion': -13
> > test_recursion:FAIL:skel_open_and_load unexpected error: -13
> > #43/3    cgrp_local_storage/recursion:FAIL
> > #43      cgrp_local_storage:FAIL
> >
> > All error logs:
> > test_cgrp_local_storage:PASS:join_cgroup /cgrp_local_storage 0 nsec
> > libbpf: prog 'on_lookup': BPF program load failed: Permission denied
> > libbpf: prog 'on_lookup': -- BEGIN PROG LOAD LOG --
> > reg type unsupported for arg#0 function on_lookup#16
> > 0: R1=ctx(off=0,imm=0) R10=fp0
> > ; struct task_struct *task = bpf_get_current_task_btf();
> > 0: (85) call bpf_get_current_task_btf#158     ; R0_w=trusted_ptr_task_struct(off=0,imm=0)
> > 1: (bf) r6 = r0                       ; R0_w=trusted_ptr_task_struct(off=0,imm=0) R6_w=trusted_ptr_task_struct(off=0,imm=0)
> > ; bpf_cgrp_storage_delete(&map_a, task->cgroups->dfl_cgrp);
> > 2: (79) r1 = *(u64 *)(r6 +2296)       ; R1_w=rcu_ptr_or_null_css_set(off=0,imm=0) R6_w=trusted_ptr_task_struct(off=0,imm=0)
> > ; bpf_cgrp_storage_delete(&map_a, task->cgroups->dfl_cgrp);
> > 3: (79) r2 = *(u64 *)(r1 +120)
> > R1 invalid mem access 'rcu_ptr_or_null_'
> 
> This one was tricky :)
> Turned out btf_nested_type_is_trusted() was able to find
> 'cgroups' field in gcc compiled kernel and was failing on clang
> compiled kernel because patch 2 did:
> BTF_TYPE_SAFE_NESTED(struct task_struct) {
>         const cpumask_t *cpus_ptr;
>         struct css_set *cgroups;
> };
> instead of
> BTF_TYPE_SAFE_NESTED(struct task_struct) {
>         const cpumask_t *cpus_ptr;
>         struct css_set __rcu *cgroups;
> };
> The missing tag was causing a miscompare.

Ahh, sorry I missed that in review. Once your patch set lands I'll add a
very loud comment here so that it's not missed in the future.

> Something to keep in mind.
> This ugliness will go away once GCC supports btf tag.

Looking forward to that day.

Given that you'll apply that fix to [0] here's my stamp for this patch:

Acked-by: David Vernet <void@manifault.com>

[0]: https://lore.kernel.org/all/20230228040121.94253-3-alexei.starovoitov@gmail.com/

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-03-01  2:22 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-28  4:01 [PATCH v3 bpf-next 0/5] bpf: Introduce kptr_rcu Alexei Starovoitov
2023-02-28  4:01 ` [PATCH v3 bpf-next 1/5] bpf: Rename __kptr_ref -> __kptr and __kptr -> __kptr_untrusted Alexei Starovoitov
2023-02-28 15:16   ` David Vernet
2023-02-28  4:01 ` [PATCH v3 bpf-next 2/5] bpf: Mark cgroups and dfl_cgrp fields as trusted Alexei Starovoitov
2023-02-28 16:45   ` David Vernet
2023-02-28 19:32   ` Tejun Heo
2023-02-28  4:01 ` [PATCH v3 bpf-next 3/5] bpf: Introduce kptr_rcu Alexei Starovoitov
2023-02-28 16:45   ` David Vernet
2023-02-28 19:49     ` Alexei Starovoitov
2023-02-28 20:05       ` David Vernet
2023-02-28 19:39   ` Tejun Heo
2023-02-28  4:01 ` [PATCH v3 bpf-next 4/5] selftests/bpf: Add a test case for kptr_rcu Alexei Starovoitov
2023-02-28 16:52   ` David Vernet
2023-02-28  4:01 ` [PATCH v3 bpf-next 5/5] selftests/bpf: Tweak cgroup kfunc test Alexei Starovoitov
2023-02-28 17:07   ` David Vernet
2023-03-01  0:29     ` Alexei Starovoitov
2023-03-01  2:22       ` David Vernet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).