* [PATCH bpf-next 0/2] Relax allowlist for open-coded css_task iter
@ 2023-10-22 15:45 Chuyi Zhou
2023-10-22 15:45 ` [PATCH bpf-next 1/2] bpf: Relax allowlist for " Chuyi Zhou
2023-10-22 15:45 ` [PATCH bpf-next 2/2] selftests/bpf: Add test for css_task iter combining with cgroup iter Chuyi Zhou
0 siblings, 2 replies; 13+ messages in thread
From: Chuyi Zhou @ 2023-10-22 15:45 UTC (permalink / raw)
To: bpf; +Cc: ast, daniel, andrii, martin.lau, Chuyi Zhou
Hi,
The patchset aims to relax the allowlist for open-coded css_task iter
suggested by Alexei[1].
Please see individual patches for more details. And comments are always
welcome.
Patch summary:
* Patch #1: Relax the allowlist and let css_task iter can be used in
bpf iters and any sleepable progs.
* Patch #2: Add a test in cgroup_iters.c which demonstrates how
css_task iters can be combined with cgroup iter.
link[1]:https://lore.kernel.org/lkml/CAADnVQKafk_junRyE=-FVAik4hjTRDtThymYGEL8hGTuYoOGpA@mail.gmail.com/
Chuyi Zhou (2):
bpf: Relax allowlist for css_task iter
selftests/bpf: Add test for css_task iter combining with cgroup iter
kernel/bpf/verifier.c | 15 ++++---
.../selftests/bpf/prog_tests/cgroup_iter.c | 33 +++++++++++++++
.../selftests/bpf/progs/iters_css_task.c | 41 +++++++++++++++++++
.../selftests/bpf/progs/iters_task_failure.c | 4 +-
4 files changed, 86 insertions(+), 7 deletions(-)
--
2.20.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH bpf-next 1/2] bpf: Relax allowlist for css_task iter
2023-10-22 15:45 [PATCH bpf-next 0/2] Relax allowlist for open-coded css_task iter Chuyi Zhou
@ 2023-10-22 15:45 ` Chuyi Zhou
2023-10-22 15:58 ` Alexei Starovoitov
2023-10-22 15:45 ` [PATCH bpf-next 2/2] selftests/bpf: Add test for css_task iter combining with cgroup iter Chuyi Zhou
1 sibling, 1 reply; 13+ messages in thread
From: Chuyi Zhou @ 2023-10-22 15:45 UTC (permalink / raw)
To: bpf; +Cc: ast, daniel, andrii, martin.lau, Chuyi Zhou
The newly added open-coded css_task iter would try to hold the global
css_set_lock in bpf_iter_css_task_new, so the bpf side has to be careful in
where it allows to use this iter. The mainly concern is dead locking on
css_set_lock. check_css_task_iter_allowlist() in verifier enforced css_task
can only be used in bpf_lsm hooks and sleepable bpf_iter.
This patch relax the allowlist for css_task iter. Any lsm and any iter
(even non-sleepable) and any sleepable are safe since they would not hold
the css_set_lock before entering BPF progs context.
This patch also fixes the misused BPF_TRACE_ITER in
check_css_task_iter_allowlist which compared bpf_prog_type with
bpf_attach_type.
Fixes: 9c66dc94b62ae ("bpf: Introduce css_task open-coded iterator kfuncs")
Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
---
kernel/bpf/verifier.c | 15 ++++++++++-----
.../selftests/bpf/progs/iters_task_failure.c | 4 ++--
2 files changed, 12 insertions(+), 7 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index e9bc5d4a25a1..cc79cd555337 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -11088,17 +11088,22 @@ static int process_kf_arg_ptr_to_rbtree_node(struct bpf_verifier_env *env,
&meta->arg_rbtree_root.field);
}
+/*
+ * css_task iter allowlist is needed to avoid dead locking on css_set_lock.
+ * LSM hooks and iters (both sleepable and non-sleepable) are safe.
+ * Any sleepable progs are also safe since bpf_check_attach_target() enforce
+ * them can only be attached to some specific hook points.
+ */
static bool check_css_task_iter_allowlist(struct bpf_verifier_env *env)
{
enum bpf_prog_type prog_type = resolve_prog_type(env->prog);
-
switch (prog_type) {
case BPF_PROG_TYPE_LSM:
return true;
- case BPF_TRACE_ITER:
- return env->prog->aux->sleepable;
+ case BPF_PROG_TYPE_TRACING:
+ return env->prog->expected_attach_type == BPF_TRACE_ITER;
default:
- return false;
+ return env->prog->aux->sleepable;
}
}
@@ -11357,7 +11362,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
case KF_ARG_PTR_TO_ITER:
if (meta->func_id == special_kfunc_list[KF_bpf_iter_css_task_new]) {
if (!check_css_task_iter_allowlist(env)) {
- verbose(env, "css_task_iter is only allowed in bpf_lsm and bpf iter-s\n");
+ verbose(env, "css_task_iter is only allowed in bpf_lsm, bpf_iter and sleepable progs\n");
return -EINVAL;
}
}
diff --git a/tools/testing/selftests/bpf/progs/iters_task_failure.c b/tools/testing/selftests/bpf/progs/iters_task_failure.c
index c3bf96a67dba..6b1588d70652 100644
--- a/tools/testing/selftests/bpf/progs/iters_task_failure.c
+++ b/tools/testing/selftests/bpf/progs/iters_task_failure.c
@@ -84,8 +84,8 @@ int BPF_PROG(iter_css_lock_and_unlock)
return 0;
}
-SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
-__failure __msg("css_task_iter is only allowed in bpf_lsm and bpf iter-s")
+SEC("?fentry/" SYS_PREFIX "sys_getpgid")
+__failure __msg("css_task_iter is only allowed in bpf_lsm, bpf_iter and sleepable progs")
int BPF_PROG(iter_css_task_for_each)
{
u64 cg_id = bpf_get_current_cgroup_id();
--
2.20.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH bpf-next 2/2] selftests/bpf: Add test for css_task iter combining with cgroup iter
2023-10-22 15:45 [PATCH bpf-next 0/2] Relax allowlist for open-coded css_task iter Chuyi Zhou
2023-10-22 15:45 ` [PATCH bpf-next 1/2] bpf: Relax allowlist for " Chuyi Zhou
@ 2023-10-22 15:45 ` Chuyi Zhou
2023-10-22 16:03 ` Alexei Starovoitov
1 sibling, 1 reply; 13+ messages in thread
From: Chuyi Zhou @ 2023-10-22 15:45 UTC (permalink / raw)
To: bpf; +Cc: ast, daniel, andrii, martin.lau, Chuyi Zhou
This patch adds a test which demonstrates how css_task iter can be combined
with cgroup iter and it won't cause deadlock, though cgroup iter is not
sleepable.
Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
---
.../selftests/bpf/prog_tests/cgroup_iter.c | 33 +++++++++++++++
.../selftests/bpf/progs/iters_css_task.c | 41 +++++++++++++++++++
2 files changed, 74 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/cgroup_iter.c b/tools/testing/selftests/bpf/prog_tests/cgroup_iter.c
index e02feb5fae97..3679687a6927 100644
--- a/tools/testing/selftests/bpf/prog_tests/cgroup_iter.c
+++ b/tools/testing/selftests/bpf/prog_tests/cgroup_iter.c
@@ -4,6 +4,7 @@
#include <test_progs.h>
#include <bpf/libbpf.h>
#include <bpf/btf.h>
+#include "iters_css_task.skel.h"
#include "cgroup_iter.skel.h"
#include "cgroup_helpers.h"
@@ -263,6 +264,35 @@ static void test_walk_dead_self_only(struct cgroup_iter *skel)
close(cgrp_fd);
}
+static void test_walk_self_only_css_task(void)
+{
+ struct iters_css_task *skel = NULL;
+ int err;
+
+ skel = iters_css_task__open();
+ if (!ASSERT_OK_PTR(skel, "skel_open"))
+ return;
+
+ bpf_program__set_autoload(skel->progs.cgroup_id_printer, true);
+
+ err = iters_css_task__load(skel);
+ if (!ASSERT_OK(err, "skel_load"))
+ goto cleanup;
+
+ err = join_cgroup(cg_path[CHILD2]);
+ if (!ASSERT_OK(err, "join_cgroup"))
+ goto cleanup;
+
+ skel->bss->target_pid = getpid();
+ snprintf(expected_output, sizeof(expected_output),
+ PROLOGUE "%8llu\n" EPILOGUE, cg_id[CHILD2]);
+ read_from_cgroup_iter(skel->progs.cgroup_id_printer, cg_fd[CHILD2],
+ BPF_CGROUP_ITER_SELF_ONLY, "test_walk_self_only_css_task");
+ ASSERT_EQ(skel->bss->css_task_cnt, 1, "css_task_cnt");
+cleanup:
+ iters_css_task__destroy(skel);
+}
+
void test_cgroup_iter(void)
{
struct cgroup_iter *skel = NULL;
@@ -293,6 +323,9 @@ void test_cgroup_iter(void)
test_walk_self_only(skel);
if (test__start_subtest("cgroup_iter__dead_self_only"))
test_walk_dead_self_only(skel);
+ if (test__start_subtest("cgroup_iter__self_only_css_task"))
+ test_walk_self_only_css_task();
+
out:
cgroup_iter__destroy(skel);
cleanup_cgroups();
diff --git a/tools/testing/selftests/bpf/progs/iters_css_task.c b/tools/testing/selftests/bpf/progs/iters_css_task.c
index 5089ce384a1c..0974e6f44328 100644
--- a/tools/testing/selftests/bpf/progs/iters_css_task.c
+++ b/tools/testing/selftests/bpf/progs/iters_css_task.c
@@ -10,6 +10,7 @@
char _license[] SEC("license") = "GPL";
+struct cgroup *bpf_cgroup_acquire(struct cgroup *p) __ksym;
struct cgroup *bpf_cgroup_from_id(u64 cgid) __ksym;
void bpf_cgroup_release(struct cgroup *p) __ksym;
@@ -45,3 +46,43 @@ int BPF_PROG(iter_css_task_for_each, struct vm_area_struct *vma,
return -EPERM;
}
+
+static inline u64 cgroup_id(struct cgroup *cgrp)
+{
+ return cgrp->kn->id;
+}
+
+SEC("?iter/cgroup")
+int cgroup_id_printer(struct bpf_iter__cgroup *ctx)
+{
+ struct seq_file *seq = ctx->meta->seq;
+ struct cgroup *cgrp, *acquired;
+ struct cgroup_subsys_state *css;
+ struct task_struct *task;
+
+ cgrp = ctx->cgroup;
+
+ /* epilogue */
+ if (cgrp == NULL) {
+ BPF_SEQ_PRINTF(seq, "epilogue\n");
+ return 0;
+ }
+
+ /* prologue */
+ if (ctx->meta->seq_num == 0)
+ BPF_SEQ_PRINTF(seq, "prologue\n");
+
+ BPF_SEQ_PRINTF(seq, "%8llu\n", cgroup_id(cgrp));
+
+ acquired = bpf_cgroup_from_id(cgroup_id(cgrp));
+ if (!acquired)
+ return 0;
+ css = &acquired->self;
+ css_task_cnt = 0;
+ bpf_for_each(css_task, task, css, CSS_TASK_ITER_PROCS) {
+ if (task->pid == target_pid)
+ css_task_cnt++;
+ }
+ bpf_cgroup_release(acquired);
+ return 0;
+}
--
2.20.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next 1/2] bpf: Relax allowlist for css_task iter
2023-10-22 15:45 ` [PATCH bpf-next 1/2] bpf: Relax allowlist for " Chuyi Zhou
@ 2023-10-22 15:58 ` Alexei Starovoitov
0 siblings, 0 replies; 13+ messages in thread
From: Alexei Starovoitov @ 2023-10-22 15:58 UTC (permalink / raw)
To: Chuyi Zhou
Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau
On Sun, Oct 22, 2023 at 8:45 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>
> The newly added open-coded css_task iter would try to hold the global
> css_set_lock in bpf_iter_css_task_new, so the bpf side has to be careful in
> where it allows to use this iter. The mainly concern is dead locking on
> css_set_lock. check_css_task_iter_allowlist() in verifier enforced css_task
> can only be used in bpf_lsm hooks and sleepable bpf_iter.
>
> This patch relax the allowlist for css_task iter. Any lsm and any iter
> (even non-sleepable) and any sleepable are safe since they would not hold
> the css_set_lock before entering BPF progs context.
>
> This patch also fixes the misused BPF_TRACE_ITER in
> check_css_task_iter_allowlist which compared bpf_prog_type with
> bpf_attach_type.
>
> Fixes: 9c66dc94b62ae ("bpf: Introduce css_task open-coded iterator kfuncs")
> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
> ---
> kernel/bpf/verifier.c | 15 ++++++++++-----
> .../selftests/bpf/progs/iters_task_failure.c | 4 ++--
> 2 files changed, 12 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index e9bc5d4a25a1..cc79cd555337 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -11088,17 +11088,22 @@ static int process_kf_arg_ptr_to_rbtree_node(struct bpf_verifier_env *env,
> &meta->arg_rbtree_root.field);
> }
>
> +/*
> + * css_task iter allowlist is needed to avoid dead locking on css_set_lock.
> + * LSM hooks and iters (both sleepable and non-sleepable) are safe.
> + * Any sleepable progs are also safe since bpf_check_attach_target() enforce
> + * them can only be attached to some specific hook points.
> + */
> static bool check_css_task_iter_allowlist(struct bpf_verifier_env *env)
> {
> enum bpf_prog_type prog_type = resolve_prog_type(env->prog);
> -
> switch (prog_type) {
> case BPF_PROG_TYPE_LSM:
> return true;
> - case BPF_TRACE_ITER:
> - return env->prog->aux->sleepable;
> + case BPF_PROG_TYPE_TRACING:
> + return env->prog->expected_attach_type == BPF_TRACE_ITER;
I think it needs to be
if (env->prog->expected_attach_type == BPF_TRACE_ITER)
return true;
/* else: fall through to check sleepable */
> default:
> - return false;
> + return env->prog->aux->sleepable;
> }
> }
>
> @@ -11357,7 +11362,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
> case KF_ARG_PTR_TO_ITER:
> if (meta->func_id == special_kfunc_list[KF_bpf_iter_css_task_new]) {
> if (!check_css_task_iter_allowlist(env)) {
> - verbose(env, "css_task_iter is only allowed in bpf_lsm and bpf iter-s\n");
> + verbose(env, "css_task_iter is only allowed in bpf_lsm, bpf_iter and sleepable progs\n");
> return -EINVAL;
> }
> }
> diff --git a/tools/testing/selftests/bpf/progs/iters_task_failure.c b/tools/testing/selftests/bpf/progs/iters_task_failure.c
> index c3bf96a67dba..6b1588d70652 100644
> --- a/tools/testing/selftests/bpf/progs/iters_task_failure.c
> +++ b/tools/testing/selftests/bpf/progs/iters_task_failure.c
> @@ -84,8 +84,8 @@ int BPF_PROG(iter_css_lock_and_unlock)
> return 0;
> }
>
> -SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
> -__failure __msg("css_task_iter is only allowed in bpf_lsm and bpf iter-s")
> +SEC("?fentry/" SYS_PREFIX "sys_getpgid")
so that fentry/sys_foo is rejected, but fentry.s/sys_foo loads ok.
> +__failure __msg("css_task_iter is only allowed in bpf_lsm, bpf_iter and sleepable progs")
> int BPF_PROG(iter_css_task_for_each)
> {
> u64 cg_id = bpf_get_current_cgroup_id();
> --
> 2.20.1
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next 2/2] selftests/bpf: Add test for css_task iter combining with cgroup iter
2023-10-22 15:45 ` [PATCH bpf-next 2/2] selftests/bpf: Add test for css_task iter combining with cgroup iter Chuyi Zhou
@ 2023-10-22 16:03 ` Alexei Starovoitov
2023-10-23 13:50 ` Chuyi Zhou
0 siblings, 1 reply; 13+ messages in thread
From: Alexei Starovoitov @ 2023-10-22 16:03 UTC (permalink / raw)
To: Chuyi Zhou
Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau
On Sun, Oct 22, 2023 at 8:45 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>
> This patch adds a test which demonstrates how css_task iter can be combined
> with cgroup iter and it won't cause deadlock, though cgroup iter is not
> sleepable.
>
> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
> ---
> .../selftests/bpf/prog_tests/cgroup_iter.c | 33 +++++++++++++++
> .../selftests/bpf/progs/iters_css_task.c | 41 +++++++++++++++++++
> 2 files changed, 74 insertions(+)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/cgroup_iter.c b/tools/testing/selftests/bpf/prog_tests/cgroup_iter.c
> index e02feb5fae97..3679687a6927 100644
> --- a/tools/testing/selftests/bpf/prog_tests/cgroup_iter.c
> +++ b/tools/testing/selftests/bpf/prog_tests/cgroup_iter.c
> @@ -4,6 +4,7 @@
> #include <test_progs.h>
> #include <bpf/libbpf.h>
> #include <bpf/btf.h>
> +#include "iters_css_task.skel.h"
> #include "cgroup_iter.skel.h"
> #include "cgroup_helpers.h"
>
> @@ -263,6 +264,35 @@ static void test_walk_dead_self_only(struct cgroup_iter *skel)
> close(cgrp_fd);
> }
>
> +static void test_walk_self_only_css_task(void)
> +{
> + struct iters_css_task *skel = NULL;
> + int err;
> +
> + skel = iters_css_task__open();
> + if (!ASSERT_OK_PTR(skel, "skel_open"))
> + return;
> +
> + bpf_program__set_autoload(skel->progs.cgroup_id_printer, true);
> +
> + err = iters_css_task__load(skel);
> + if (!ASSERT_OK(err, "skel_load"))
> + goto cleanup;
> +
> + err = join_cgroup(cg_path[CHILD2]);
> + if (!ASSERT_OK(err, "join_cgroup"))
> + goto cleanup;
> +
> + skel->bss->target_pid = getpid();
> + snprintf(expected_output, sizeof(expected_output),
> + PROLOGUE "%8llu\n" EPILOGUE, cg_id[CHILD2]);
> + read_from_cgroup_iter(skel->progs.cgroup_id_printer, cg_fd[CHILD2],
> + BPF_CGROUP_ITER_SELF_ONLY, "test_walk_self_only_css_task");
> + ASSERT_EQ(skel->bss->css_task_cnt, 1, "css_task_cnt");
> +cleanup:
> + iters_css_task__destroy(skel);
> +}
> +
> void test_cgroup_iter(void)
> {
> struct cgroup_iter *skel = NULL;
> @@ -293,6 +323,9 @@ void test_cgroup_iter(void)
> test_walk_self_only(skel);
> if (test__start_subtest("cgroup_iter__dead_self_only"))
> test_walk_dead_self_only(skel);
> + if (test__start_subtest("cgroup_iter__self_only_css_task"))
> + test_walk_self_only_css_task();
> +
> out:
> cgroup_iter__destroy(skel);
> cleanup_cgroups();
> diff --git a/tools/testing/selftests/bpf/progs/iters_css_task.c b/tools/testing/selftests/bpf/progs/iters_css_task.c
> index 5089ce384a1c..0974e6f44328 100644
> --- a/tools/testing/selftests/bpf/progs/iters_css_task.c
> +++ b/tools/testing/selftests/bpf/progs/iters_css_task.c
> @@ -10,6 +10,7 @@
>
> char _license[] SEC("license") = "GPL";
>
> +struct cgroup *bpf_cgroup_acquire(struct cgroup *p) __ksym;
> struct cgroup *bpf_cgroup_from_id(u64 cgid) __ksym;
> void bpf_cgroup_release(struct cgroup *p) __ksym;
>
> @@ -45,3 +46,43 @@ int BPF_PROG(iter_css_task_for_each, struct vm_area_struct *vma,
>
> return -EPERM;
> }
> +
> +static inline u64 cgroup_id(struct cgroup *cgrp)
> +{
> + return cgrp->kn->id;
> +}
> +
> +SEC("?iter/cgroup")
> +int cgroup_id_printer(struct bpf_iter__cgroup *ctx)
> +{
> + struct seq_file *seq = ctx->meta->seq;
> + struct cgroup *cgrp, *acquired;
> + struct cgroup_subsys_state *css;
> + struct task_struct *task;
> +
> + cgrp = ctx->cgroup;
> +
> + /* epilogue */
> + if (cgrp == NULL) {
> + BPF_SEQ_PRINTF(seq, "epilogue\n");
> + return 0;
> + }
> +
> + /* prologue */
> + if (ctx->meta->seq_num == 0)
> + BPF_SEQ_PRINTF(seq, "prologue\n");
> +
> + BPF_SEQ_PRINTF(seq, "%8llu\n", cgroup_id(cgrp));
> +
> + acquired = bpf_cgroup_from_id(cgroup_id(cgrp));
You're doing this dance, because a plain cgrp pointer is not trusted?
Maybe let's add
BTF_TYPE_SAFE_RCU_OR_NULL(struct bpf_iter__cgroup) {...}
to the verifier similar to what we do for bpf_iter__task.
After if (cgrp == NULL) check the pointer is safe to iterate on.
> + if (!acquired)
> + return 0;
> + css = &acquired->self;
> + css_task_cnt = 0;
> + bpf_for_each(css_task, task, css, CSS_TASK_ITER_PROCS) {
> + if (task->pid == target_pid)
> + css_task_cnt++;
> + }
> + bpf_cgroup_release(acquired);
> + return 0;
> +}
> --
> 2.20.1
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next 2/2] selftests/bpf: Add test for css_task iter combining with cgroup iter
2023-10-22 16:03 ` Alexei Starovoitov
@ 2023-10-23 13:50 ` Chuyi Zhou
2023-10-23 17:14 ` Alexei Starovoitov
0 siblings, 1 reply; 13+ messages in thread
From: Chuyi Zhou @ 2023-10-23 13:50 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau
在 2023/10/23 00:03, Alexei Starovoitov 写道:
> On Sun, Oct 22, 2023 at 8:45 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>>
>> This patch adds a test which demonstrates how css_task iter can be combined
>> with cgroup iter and it won't cause deadlock, though cgroup iter is not
>> sleepable.
>>
>> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
>> ---
>> .../selftests/bpf/prog_tests/cgroup_iter.c | 33 +++++++++++++++
>> .../selftests/bpf/progs/iters_css_task.c | 41 +++++++++++++++++++
>> 2 files changed, 74 insertions(+)
>>
>> diff --git a/tools/testing/selftests/bpf/prog_tests/cgroup_iter.c b/tools/testing/selftests/bpf/prog_tests/cgroup_iter.c
>> index e02feb5fae97..3679687a6927 100644
>> --- a/tools/testing/selftests/bpf/prog_tests/cgroup_iter.c
>> +++ b/tools/testing/selftests/bpf/prog_tests/cgroup_iter.c
>> @@ -4,6 +4,7 @@
>> #include <test_progs.h>
>> #include <bpf/libbpf.h>
>> #include <bpf/btf.h>
>> +#include "iters_css_task.skel.h"
>> #include "cgroup_iter.skel.h"
>> #include "cgroup_helpers.h"
>>
>> @@ -263,6 +264,35 @@ static void test_walk_dead_self_only(struct cgroup_iter *skel)
>> close(cgrp_fd);
>> }
>>
>> +static void test_walk_self_only_css_task(void)
>> +{
>> + struct iters_css_task *skel = NULL;
>> + int err;
>> +
>> + skel = iters_css_task__open();
>> + if (!ASSERT_OK_PTR(skel, "skel_open"))
>> + return;
>> +
>> + bpf_program__set_autoload(skel->progs.cgroup_id_printer, true);
>> +
>> + err = iters_css_task__load(skel);
>> + if (!ASSERT_OK(err, "skel_load"))
>> + goto cleanup;
>> +
>> + err = join_cgroup(cg_path[CHILD2]);
>> + if (!ASSERT_OK(err, "join_cgroup"))
>> + goto cleanup;
>> +
>> + skel->bss->target_pid = getpid();
>> + snprintf(expected_output, sizeof(expected_output),
>> + PROLOGUE "%8llu\n" EPILOGUE, cg_id[CHILD2]);
>> + read_from_cgroup_iter(skel->progs.cgroup_id_printer, cg_fd[CHILD2],
>> + BPF_CGROUP_ITER_SELF_ONLY, "test_walk_self_only_css_task");
>> + ASSERT_EQ(skel->bss->css_task_cnt, 1, "css_task_cnt");
>> +cleanup:
>> + iters_css_task__destroy(skel);
>> +}
>> +
>> void test_cgroup_iter(void)
>> {
>> struct cgroup_iter *skel = NULL;
>> @@ -293,6 +323,9 @@ void test_cgroup_iter(void)
>> test_walk_self_only(skel);
>> if (test__start_subtest("cgroup_iter__dead_self_only"))
>> test_walk_dead_self_only(skel);
>> + if (test__start_subtest("cgroup_iter__self_only_css_task"))
>> + test_walk_self_only_css_task();
>> +
>> out:
>> cgroup_iter__destroy(skel);
>> cleanup_cgroups();
>> diff --git a/tools/testing/selftests/bpf/progs/iters_css_task.c b/tools/testing/selftests/bpf/progs/iters_css_task.c
>> index 5089ce384a1c..0974e6f44328 100644
>> --- a/tools/testing/selftests/bpf/progs/iters_css_task.c
>> +++ b/tools/testing/selftests/bpf/progs/iters_css_task.c
>> @@ -10,6 +10,7 @@
>>
>> char _license[] SEC("license") = "GPL";
>>
>> +struct cgroup *bpf_cgroup_acquire(struct cgroup *p) __ksym;
>> struct cgroup *bpf_cgroup_from_id(u64 cgid) __ksym;
>> void bpf_cgroup_release(struct cgroup *p) __ksym;
>>
>> @@ -45,3 +46,43 @@ int BPF_PROG(iter_css_task_for_each, struct vm_area_struct *vma,
>>
>> return -EPERM;
>> }
>> +
>> +static inline u64 cgroup_id(struct cgroup *cgrp)
>> +{
>> + return cgrp->kn->id;
>> +}
>> +
>> +SEC("?iter/cgroup")
>> +int cgroup_id_printer(struct bpf_iter__cgroup *ctx)
>> +{
>> + struct seq_file *seq = ctx->meta->seq;
>> + struct cgroup *cgrp, *acquired;
>> + struct cgroup_subsys_state *css;
>> + struct task_struct *task;
>> +
>> + cgrp = ctx->cgroup;
>> +
>> + /* epilogue */
>> + if (cgrp == NULL) {
>> + BPF_SEQ_PRINTF(seq, "epilogue\n");
>> + return 0;
>> + }
>> +
>> + /* prologue */
>> + if (ctx->meta->seq_num == 0)
>> + BPF_SEQ_PRINTF(seq, "prologue\n");
>> +
>> + BPF_SEQ_PRINTF(seq, "%8llu\n", cgroup_id(cgrp));
>> +
>> + acquired = bpf_cgroup_from_id(cgroup_id(cgrp));
>
> You're doing this dance, because a plain cgrp pointer is not trusted?
> Maybe let's add
> BTF_TYPE_SAFE_RCU_OR_NULL(struct bpf_iter__cgroup) {...}
> to the verifier similar to what we do for bpf_iter__task.
>
Yes, thanks for the suggestion.
But it seems currently, bpf_iter__task->task works well either.
SEC("iter/task")
int dump_task(struct bpf_iter__task *ctx)
{
struct task_struct *task = ctx->task;
// here task should be trusted since BTF_TYPE_SAFE_TRUSTED(struct
bpf_iter__task) {...}
if (task == NULL) {
return 0;
}
bpf_task_acquire(task);
bpf_task_release(task);
return 0;
}
The log of verifier is :
VERIFIER LOG:
=============
reg type unsupported for arg#0 function dump_task#7
0: R1=ctx(off=0,imm=0) R10=fp0
; struct task_struct *task = ctx->task;
0: (79) r6 = *(u64 *)(r1 +8) ; R1=ctx(off=0,imm=0)
R6_w=ptr_or_null_task_struct(id=1,off=0,imm=0)
; if (task == NULL) {
1: (15) if r6 == 0x0 goto pc+4 ; R6_w=ptr_task_struct(off=0,imm=0)
; bpf_task_acquire(task);
2: (bf) r1 = r6 ;
R1_w=ptr_task_struct(off=0,imm=0) R6_w=ptr_task_struct(off=0,imm=0)
3: (85) call bpf_task_acquire#26990
R1 must be a rcu pointer
processed 4 insns (limit 1000000) max_states_per_insn 0 total_states 0
peak_states 0 mark_read 0
From the above log, it seems 'task' is a normal porint not a trusted
pointer.
Actually, it seems BPF verifier didn't even call check_ptr_to_btf_access()..
But for bpf_iter__task->meta, it works well. BPF verifier would consider
it as a trusted pointer.
SEC("iter/task")
int dump_task(struct bpf_iter__task *ctx)
{
struct task_struct *task = ctx->task;
struct bpf_iter_meta *meta = ctx->meta;
struct seq_file *seq = meta->seq;
if (task == NULL) {
BPF_SEQ_PRINTF(seq, "%s\n", "NULL");
return 0;
}
bpf_task_acquire(task);
bpf_task_release(task);
return 0;
}
VERIFIER LOG:
=============
reg type unsupported for arg#0 function dump_task#7
0: R1=ctx(off=0,imm=0) R10=fp0
; int dump_task(struct bpf_iter__task *ctx)
0: (bf) r8 = r1 ; R1=ctx(off=0,imm=0)
R8_w=ctx(off=0,imm=0)
; struct bpf_iter_meta *meta = ctx->meta;
1: (79) r1 = *(u64 *)(r8 +0)
func 'bpf_iter_task' arg0 has btf_id 22699 type STRUCT 'bpf_iter_meta'
2: R1_w=trusted_ptr_bpf_iter_meta(off=0,imm=0) R8_w=ctx(off=0,imm=0)
; struct seq_file *seq = meta->seq;
2: (79) r6 = *(u64 *)(r1 +0) ;
R1_w=trusted_ptr_bpf_iter_meta(off=0,imm=0)
R6_w=trusted_ptr_seq_file(off=0,imm=0)
; struct task_struct *task = ctx->task;
3: (79) r7 = *(u64 *)(r8 +8) ;
R7_w=ptr_or_null_task_struct(id=1,off=0,imm=0) R8_w=ctx(off=0,imm=0)
; if (task == (void *)0) {
4: (55) if r7 != 0x0 goto pc+12 17:
R1_w=trusted_ptr_bpf_iter_meta(off=0,imm=0)
R6_w=trusted_ptr_seq_file(off=0,imm=0) R7_w=ptr_task_struct(off=0,imm=0)
R8_w=ctx(off=0,imm=0) R10=fp0
; bpf_task_acquire(task);
17: (bf) r1 = r7 ;
R1_w=ptr_task_struct(off=0,imm=0) R7_w=ptr_task_struct(off=0,imm=0)
18: (85) call bpf_task_acquire#26990
R1 must be a rcu pointer
I will try to figure out it.
Thanks.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next 2/2] selftests/bpf: Add test for css_task iter combining with cgroup iter
2023-10-23 13:50 ` Chuyi Zhou
@ 2023-10-23 17:14 ` Alexei Starovoitov
2023-10-30 14:37 ` Chuyi Zhou
0 siblings, 1 reply; 13+ messages in thread
From: Alexei Starovoitov @ 2023-10-23 17:14 UTC (permalink / raw)
To: Chuyi Zhou
Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau
On Mon, Oct 23, 2023 at 6:50 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>
>
> R1_w=ptr_task_struct(off=0,imm=0) R7_w=ptr_task_struct(off=0,imm=0)
> 18: (85) call bpf_task_acquire#26990
> R1 must be a rcu pointer
>
> I will try to figure out it.
Thanks. That would be great.
So far it looks like a regression.
I'm guessing __bpf_md_ptr wrapping is confusing the verifier.
Since it's more complicated than I thought, please respin
the current set with fixes to patch 1 and leave the patch 2 as-is.
That can be another follow up.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next 2/2] selftests/bpf: Add test for css_task iter combining with cgroup iter
2023-10-23 17:14 ` Alexei Starovoitov
@ 2023-10-30 14:37 ` Chuyi Zhou
2023-10-31 11:37 ` Chuyi Zhou
0 siblings, 1 reply; 13+ messages in thread
From: Chuyi Zhou @ 2023-10-30 14:37 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau
Hello,
在 2023/10/24 01:14, Alexei Starovoitov 写道:
> On Mon, Oct 23, 2023 at 6:50 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>>
>>
>> R1_w=ptr_task_struct(off=0,imm=0) R7_w=ptr_task_struct(off=0,imm=0)
>> 18: (85) call bpf_task_acquire#26990
>> R1 must be a rcu pointer
>>
>> I will try to figure out it.
>
> Thanks. That would be great.
> So far it looks like a regression.
> I'm guessing __bpf_md_ptr wrapping is confusing the verifier.
>
> Since it's more complicated than I thought, please respin
> the current set with fixes to patch 1 and leave the patch 2 as-is.
> That can be another follow up.
After adding this change:
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 15d71d2986d3..4565e5457754 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -6071,6 +6071,8 @@ bool btf_ctx_access(int off, int size, enum
bpf_access_type type,
info->reg_type = ctx_arg_info->reg_type;
info->btf = btf_vmlinux;
info->btf_id = ctx_arg_info->btf_id;
+ if (prog_args_trusted(prog))
+ info->reg_type |= PTR_TRUSTED;
return true;
}
}
the task pointer would be recognized as 'trusted_ptr_or_null_task_struct'.
The VERIFIER LOG:
=============
reg type unsupported for arg#0 function dump_task#7
0: R1=ctx(off=0,imm=0) R10=fp0
; struct task_struct *task = ctx->task;
0: (79) r1 = *(u64 *)(r1 +8) ;
R1_w=trusted_ptr_or_null_task_struct(id=1,off=0,imm=0)
The following BPF Prog works well.
SEC("iter/task")
int dump_task(struct bpf_iter__task *ctx)
{
struct task_struct *task = ctx->task;
struct task_struct *acq;
if (task == NULL)
return 0;
acq = bpf_task_acquire(task);
if (!acq)
return 0;
bpf_task_release(acq);
return 0;
}
Do you think this change is correct ?
Or do you have better ideas ?
Thanks.
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next 2/2] selftests/bpf: Add test for css_task iter combining with cgroup iter
2023-10-30 14:37 ` Chuyi Zhou
@ 2023-10-31 11:37 ` Chuyi Zhou
2023-10-31 22:06 ` Alexei Starovoitov
0 siblings, 1 reply; 13+ messages in thread
From: Chuyi Zhou @ 2023-10-31 11:37 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau
在 2023/10/30 22:37, Chuyi Zhou 写道:
> Hello,
>
> 在 2023/10/24 01:14, Alexei Starovoitov 写道:
>> On Mon, Oct 23, 2023 at 6:50 AM Chuyi Zhou <zhouchuyi@bytedance.com>
>> wrote:
>>>
>>>
>>> R1_w=ptr_task_struct(off=0,imm=0) R7_w=ptr_task_struct(off=0,imm=0)
>>> 18: (85) call bpf_task_acquire#26990
>>> R1 must be a rcu pointer
>>>
>>> I will try to figure out it.
>>
>> Thanks. That would be great.
>> So far it looks like a regression.
>> I'm guessing __bpf_md_ptr wrapping is confusing the verifier.
>>
>> Since it's more complicated than I thought, please respin
>> the current set with fixes to patch 1 and leave the patch 2 as-is.
>> That can be another follow up.
>
> After adding this change:
>
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index 15d71d2986d3..4565e5457754 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -6071,6 +6071,8 @@ bool btf_ctx_access(int off, int size, enum
> bpf_access_type type,
> info->reg_type = ctx_arg_info->reg_type;
> info->btf = btf_vmlinux;
> info->btf_id = ctx_arg_info->btf_id;
> + if (prog_args_trusted(prog))
> + info->reg_type |= PTR_TRUSTED;
> return true;
> }
> }
>
> the task pointer would be recognized as 'trusted_ptr_or_null_task_struct'.
>
> The VERIFIER LOG:
> =============
> reg type unsupported for arg#0 function dump_task#7
> 0: R1=ctx(off=0,imm=0) R10=fp0
> ; struct task_struct *task = ctx->task;
> 0: (79) r1 = *(u64 *)(r1 +8) ;
> R1_w=trusted_ptr_or_null_task_struct(id=1,off=0,imm=0)
>
> The following BPF Prog works well.
>
> SEC("iter/task")
> int dump_task(struct bpf_iter__task *ctx)
> {
> struct task_struct *task = ctx->task;
> struct task_struct *acq;
> if (task == NULL)
> return 0;
> acq = bpf_task_acquire(task);
> if (!acq)
> return 0;
> bpf_task_release(acq);
>
> return 0;
> }
>
bpf_iter__task->meta would be recognized as a trusted ptr.
In btf_ctx_access(), we would add PTR_TRUSTED flag if
prog_args_trusted(prog) return true.
It seems for BPF_TRACE_ITER, ctx access is always 'trusted'.
But I noticed that in task_iter.c, we actually explicitly declare that
the type of bpf_iter__task->task is PTR_TO_BTF_ID_OR_NULL.
static struct bpf_iter_reg task_reg_info = {
.target = "task",
.attach_target = bpf_iter_attach_task,
.feature = BPF_ITER_RESCHED,
.ctx_arg_info_size = 1,
.ctx_arg_info = {
{ offsetof(struct bpf_iter__task, task),
PTR_TO_BTF_ID_OR_NULL },
},
.seq_info = &task_seq_info,
.fill_link_info = bpf_iter_fill_link_info,
.show_fdinfo = bpf_iter_task_show_fdinfo,
};
btf_ctx_access() would enforce the reg_type is
ctx_arg_info->reg_type(PTR_TO_BTF_ID_OR_NULL) for bpf_iter__task->task:
bool btf_ctx_access()
....
/* this is a pointer to another type */
for (i = 0; i < prog->aux->ctx_arg_info_size; i++) {
const struct bpf_ctx_arg_aux *ctx_arg_info = &prog->aux->ctx_arg_info[i];
if (ctx_arg_info->offset == off) {
...
info->reg_type = ctx_arg_info->reg_type;
info->btf = btf_vmlinux;
info->btf_id = ctx_arg_info->btf_id;
return true;
}
}
So, maybe another possible solution is:
diff --git a/kernel/bpf/cgroup_iter.c b/kernel/bpf/cgroup_iter.c
index 209e5135f9fb..72a6778e3fba 100644
--- a/kernel/bpf/cgroup_iter.c
+++ b/kernel/bpf/cgroup_iter.c
@@ -282,7 +282,7 @@ static struct bpf_iter_reg bpf_cgroup_reg_info = {
.ctx_arg_info_size = 1,
.ctx_arg_info = {
{ offsetof(struct bpf_iter__cgroup, cgroup),
- PTR_TO_BTF_ID_OR_NULL },
+ PTR_TO_BTF_ID_OR_NULL | MEM_RCU },
},
.seq_info = &cgroup_iter_seq_info,
};
diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index 59e747938bdb..4fd3f734dffd 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -706,7 +706,7 @@ static struct bpf_iter_reg task_reg_info = {
.ctx_arg_info_size = 1,
.ctx_arg_info = {
{ offsetof(struct bpf_iter__task, task),
- PTR_TO_BTF_ID_OR_NULL },
+ PTR_TO_BTF_ID_OR_NULL | PTR_TRUSTED },
},
.seq_info = &task_seq_info,
.fill_link_info = bpf_iter_fill_link_info,
Just some thoughts.
Thanks.
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next 2/2] selftests/bpf: Add test for css_task iter combining with cgroup iter
2023-10-31 11:37 ` Chuyi Zhou
@ 2023-10-31 22:06 ` Alexei Starovoitov
2023-11-01 2:41 ` Chuyi Zhou
0 siblings, 1 reply; 13+ messages in thread
From: Alexei Starovoitov @ 2023-10-31 22:06 UTC (permalink / raw)
To: Chuyi Zhou
Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau
On Tue, Oct 31, 2023 at 4:38 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>
>
> So, maybe another possible solution is:
>
> diff --git a/kernel/bpf/cgroup_iter.c b/kernel/bpf/cgroup_iter.c
> index 209e5135f9fb..72a6778e3fba 100644
> --- a/kernel/bpf/cgroup_iter.c
> +++ b/kernel/bpf/cgroup_iter.c
> @@ -282,7 +282,7 @@ static struct bpf_iter_reg bpf_cgroup_reg_info = {
> .ctx_arg_info_size = 1,
> .ctx_arg_info = {
> { offsetof(struct bpf_iter__cgroup, cgroup),
> - PTR_TO_BTF_ID_OR_NULL },
> + PTR_TO_BTF_ID_OR_NULL | MEM_RCU },
> },
> .seq_info = &cgroup_iter_seq_info,
> };
> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> index 59e747938bdb..4fd3f734dffd 100644
> --- a/kernel/bpf/task_iter.c
> +++ b/kernel/bpf/task_iter.c
> @@ -706,7 +706,7 @@ static struct bpf_iter_reg task_reg_info = {
> .ctx_arg_info_size = 1,
> .ctx_arg_info = {
> { offsetof(struct bpf_iter__task, task),
> - PTR_TO_BTF_ID_OR_NULL },
> + PTR_TO_BTF_ID_OR_NULL | PTR_TRUSTED },
Yep. That looks good.
bpf_cgroup_reg_info -> cgroup is probably PTR_TRUSTED too.
Not sure... why did you go with MEM_RCU there ?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next 2/2] selftests/bpf: Add test for css_task iter combining with cgroup iter
2023-10-31 22:06 ` Alexei Starovoitov
@ 2023-11-01 2:41 ` Chuyi Zhou
2023-11-02 6:07 ` Alexei Starovoitov
0 siblings, 1 reply; 13+ messages in thread
From: Chuyi Zhou @ 2023-11-01 2:41 UTC (permalink / raw)
To: Alexei Starovoitov, Yonghong Song
Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau
Hello,
在 2023/11/1 06:06, Alexei Starovoitov 写道:
> On Tue, Oct 31, 2023 at 4:38 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>>
>>
>> So, maybe another possible solution is:
>>
>> diff --git a/kernel/bpf/cgroup_iter.c b/kernel/bpf/cgroup_iter.c
>> index 209e5135f9fb..72a6778e3fba 100644
>> --- a/kernel/bpf/cgroup_iter.c
>> +++ b/kernel/bpf/cgroup_iter.c
>> @@ -282,7 +282,7 @@ static struct bpf_iter_reg bpf_cgroup_reg_info = {
>> .ctx_arg_info_size = 1,
>> .ctx_arg_info = {
>> { offsetof(struct bpf_iter__cgroup, cgroup),
>> - PTR_TO_BTF_ID_OR_NULL },
>> + PTR_TO_BTF_ID_OR_NULL | MEM_RCU },
>> },
>> .seq_info = &cgroup_iter_seq_info,
>> };
>> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
>> index 59e747938bdb..4fd3f734dffd 100644
>> --- a/kernel/bpf/task_iter.c
>> +++ b/kernel/bpf/task_iter.c
>> @@ -706,7 +706,7 @@ static struct bpf_iter_reg task_reg_info = {
>> .ctx_arg_info_size = 1,
>> .ctx_arg_info = {
>> { offsetof(struct bpf_iter__task, task),
>> - PTR_TO_BTF_ID_OR_NULL },
>> + PTR_TO_BTF_ID_OR_NULL | PTR_TRUSTED },
>
> Yep. That looks good.
> bpf_cgroup_reg_info -> cgroup is probably PTR_TRUSTED too.
> Not sure... why did you go with MEM_RCU there ?
hmm...
That is because in our previous discussion, you suggested we'd better
add BTF_TYPE_SAFE_RCU_OR_NULL(struct bpf_iter__cgroup) {...}
I didn't think too much about it. But I noticed that we only use
cgroup_mutex to protect the iteration in cgroup_iter.c.
Looking at cgroup_kn_lock_live() in kernel/cgroup/cgroup.c, it would use
cgroup_tryget()/cgroup_is_dead() to check whether the cgrp is 'dead'.
cgroup_tryget() seems is equal to bpf_cgroup_acquire(). So, maybe let's
return a 'rcu' cgrp pointer. If BPF Prog want stronger guarantee of
cgrp, just use bpf_cgroup_acquire().
Just some thoughts. Thanks.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next 2/2] selftests/bpf: Add test for css_task iter combining with cgroup iter
2023-11-01 2:41 ` Chuyi Zhou
@ 2023-11-02 6:07 ` Alexei Starovoitov
2023-11-02 9:21 ` Chuyi Zhou
0 siblings, 1 reply; 13+ messages in thread
From: Alexei Starovoitov @ 2023-11-02 6:07 UTC (permalink / raw)
To: Chuyi Zhou
Cc: Yonghong Song, bpf, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau
On Tue, Oct 31, 2023 at 7:41 PM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>
> Hello,
>
> 在 2023/11/1 06:06, Alexei Starovoitov 写道:
> > On Tue, Oct 31, 2023 at 4:38 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
> >>
> >>
> >> So, maybe another possible solution is:
> >>
> >> diff --git a/kernel/bpf/cgroup_iter.c b/kernel/bpf/cgroup_iter.c
> >> index 209e5135f9fb..72a6778e3fba 100644
> >> --- a/kernel/bpf/cgroup_iter.c
> >> +++ b/kernel/bpf/cgroup_iter.c
> >> @@ -282,7 +282,7 @@ static struct bpf_iter_reg bpf_cgroup_reg_info = {
> >> .ctx_arg_info_size = 1,
> >> .ctx_arg_info = {
> >> { offsetof(struct bpf_iter__cgroup, cgroup),
> >> - PTR_TO_BTF_ID_OR_NULL },
> >> + PTR_TO_BTF_ID_OR_NULL | MEM_RCU },
> >> },
> >> .seq_info = &cgroup_iter_seq_info,
> >> };
> >> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> >> index 59e747938bdb..4fd3f734dffd 100644
> >> --- a/kernel/bpf/task_iter.c
> >> +++ b/kernel/bpf/task_iter.c
> >> @@ -706,7 +706,7 @@ static struct bpf_iter_reg task_reg_info = {
> >> .ctx_arg_info_size = 1,
> >> .ctx_arg_info = {
> >> { offsetof(struct bpf_iter__task, task),
> >> - PTR_TO_BTF_ID_OR_NULL },
> >> + PTR_TO_BTF_ID_OR_NULL | PTR_TRUSTED },
> >
> > Yep. That looks good.
> > bpf_cgroup_reg_info -> cgroup is probably PTR_TRUSTED too.
> > Not sure... why did you go with MEM_RCU there ?
>
> hmm...
>
> That is because in our previous discussion, you suggested we'd better
> add BTF_TYPE_SAFE_RCU_OR_NULL(struct bpf_iter__cgroup) {...}
I mentioned that because we don't have
BTF_TYPE_SAFE_TRUSTED_OR_NULL.
and cgroup pointer can be NULL, but since you found a cleaner way
we can do PTR_TO_BTF_ID_OR_NULL | PTR_TRUSTED.
> I didn't think too much about it. But I noticed that we only use
> cgroup_mutex to protect the iteration in cgroup_iter.c.
>
> Looking at cgroup_kn_lock_live() in kernel/cgroup/cgroup.c, it would use
> cgroup_tryget()/cgroup_is_dead() to check whether the cgrp is 'dead'.
> cgroup_tryget() seems is equal to bpf_cgroup_acquire(). So, maybe let's
> return a 'rcu' cgrp pointer. If BPF Prog want stronger guarantee of
> cgrp, just use bpf_cgroup_acquire().
and that would be misleading.
MEM_RCU means that the pointer is valid, but it could have refcnt == 0,
while PTR_TRUSTED means that it's good to use as-is.
Here cgroup pointer is trusted. It's not a dead cgroup.
See kernel/bpf/cgroup_iter.c __cgroup_iter_seq_show().
bpf prog doesn't need to call bpf_cgroup_acquire.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next 2/2] selftests/bpf: Add test for css_task iter combining with cgroup iter
2023-11-02 6:07 ` Alexei Starovoitov
@ 2023-11-02 9:21 ` Chuyi Zhou
0 siblings, 0 replies; 13+ messages in thread
From: Chuyi Zhou @ 2023-11-02 9:21 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Yonghong Song, bpf, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau
在 2023/11/2 14:07, Alexei Starovoitov 写道:
> On Tue, Oct 31, 2023 at 7:41 PM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>>
>> Hello,
>>
>> 在 2023/11/1 06:06, Alexei Starovoitov 写道:
>>> On Tue, Oct 31, 2023 at 4:38 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>>>>
>>>>
[cut]
>>> Yep. That looks good.
>>> bpf_cgroup_reg_info -> cgroup is probably PTR_TRUSTED too.
>>> Not sure... why did you go with MEM_RCU there ?
>>
>> hmm...
>>
>> That is because in our previous discussion, you suggested we'd better
>> add BTF_TYPE_SAFE_RCU_OR_NULL(struct bpf_iter__cgroup) {...}
>
> I mentioned that because we don't have
> BTF_TYPE_SAFE_TRUSTED_OR_NULL.
>
> and cgroup pointer can be NULL, but since you found a cleaner way
> we can do PTR_TO_BTF_ID_OR_NULL | PTR_TRUSTED.
>
>> I didn't think too much about it. But I noticed that we only use
>> cgroup_mutex to protect the iteration in cgroup_iter.c.
>>
>> Looking at cgroup_kn_lock_live() in kernel/cgroup/cgroup.c, it would use
>> cgroup_tryget()/cgroup_is_dead() to check whether the cgrp is 'dead'.
>> cgroup_tryget() seems is equal to bpf_cgroup_acquire(). So, maybe let's
>> return a 'rcu' cgrp pointer. If BPF Prog want stronger guarantee of
>> cgrp, just use bpf_cgroup_acquire().
>
> and that would be misleading.
> MEM_RCU means that the pointer is valid, but it could have refcnt == 0,
> while PTR_TRUSTED means that it's good to use as-is.
>
> Here cgroup pointer is trusted. It's not a dead cgroup.
> See kernel/bpf/cgroup_iter.c __cgroup_iter_seq_show().
> bpf prog doesn't need to call bpf_cgroup_acquire.
Thanks for the detailed explanation.
I will send a follow up.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2023-11-02 9:21 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-22 15:45 [PATCH bpf-next 0/2] Relax allowlist for open-coded css_task iter Chuyi Zhou
2023-10-22 15:45 ` [PATCH bpf-next 1/2] bpf: Relax allowlist for " Chuyi Zhou
2023-10-22 15:58 ` Alexei Starovoitov
2023-10-22 15:45 ` [PATCH bpf-next 2/2] selftests/bpf: Add test for css_task iter combining with cgroup iter Chuyi Zhou
2023-10-22 16:03 ` Alexei Starovoitov
2023-10-23 13:50 ` Chuyi Zhou
2023-10-23 17:14 ` Alexei Starovoitov
2023-10-30 14:37 ` Chuyi Zhou
2023-10-31 11:37 ` Chuyi Zhou
2023-10-31 22:06 ` Alexei Starovoitov
2023-11-01 2:41 ` Chuyi Zhou
2023-11-02 6:07 ` Alexei Starovoitov
2023-11-02 9:21 ` Chuyi Zhou
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.