bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v3 0/7] Add Open-coded task, css_task and css iters
@ 2023-09-25 10:55 Chuyi Zhou
  2023-09-25 10:55 ` [PATCH bpf-next v3 1/7] cgroup: Prepare for using css_task_iter_*() in BPF Chuyi Zhou
                   ` (7 more replies)
  0 siblings, 8 replies; 22+ messages in thread
From: Chuyi Zhou @ 2023-09-25 10:55 UTC (permalink / raw)
  To: bpf; +Cc: ast, daniel, andrii, martin.lau, tj, linux-kernel, Chuyi Zhou

Hi,

This is version 3 of task, css_task and css iters support.
Thanks for your review!

--- Changelog ---

v2 -> v3:https://lore.kernel.org/lkml/20230912070149.969939-1-zhouchuyi@bytedance.com/

Patch 1 (cgroup: Prepare for using css_task_iter_*() in BPF)
  * Add tj's ack and Alexei's suggest-by.
Patch 2 (bpf: Introduce css_task open-coded iterator kfuncs)
  * Use bpf_mem_alloc/bpf_mem_free rather than kzalloc()
  * Add KF_TRUSTED_ARGS for bpf_iter_css_task_new (Alexei)
  * Move bpf_iter_css_task's definition from uapi/linux/bpf.h to
    kernel/bpf/task_iter.c and we can use it from vmlinux.h
  * Move bpf_iter_css_task_XXX's declaration from bpf_helpers.h to
    bpf_experimental.h
Patch 3 (Introduce task open coded iterator kfuncs)
  * Change th API design keep consistent with SEC("iter/task"), support
    iterating all threads(BPF_TASK_ITERATE_ALL) and threads of a
    specific task (BPF_TASK_ITERATE_THREAD).(Andrii)
  * Move bpf_iter_task's definition from uapi/linux/bpf.h to
    kernel/bpf/task_iter.c and we can use it from vmlinux.h
  * Move bpf_iter_task_XXX's declaration from bpf_helpers.h to
    bpf_experimental.h
Patch 4 (Introduce css open-coded iterator kfuncs)
  * Change th API design keep consistent with cgroup_iters, reuse
    BPF_CGROUP_ITER_DESCENDANTS_PRE/BPF_CGROUP_ITER_DESCENDANTS_POST
    /BPF_CGROUP_ITER_ANCESTORS_UP(Andrii)
  * Add KF_TRUSTED_ARGS for bpf_iter_css_new
  * Move bpf_iter_css's definition from uapi/linux/bpf.h to
    kernel/bpf/task_iter.c and we can use it from vmlinux.h
  * Move bpf_iter_css_XXX's declaration from bpf_helpers.h to
    bpf_experimental.h
Patch 5 (teach the verifier to enforce css_iter and task_iter in RCU CS)
  * Add KF flag KF_RCU_PROTECTED to maintain kfuncs which need RCU CS.(Andrii)
  * Consider STACK_ITER when using bpf_for_each_spilled_reg.
Patch 6 (Let bpf_iter_task_new accept null task ptr)
  * Add this extra patch to let bpf_iter_task_new accept a 'nullable'
  * task pointer(Andrii)
Patch 7 (selftests/bpf: Add tests for open-coded task and css iter)
  * Add failure testcase(Alexei)

Changes from v1(https://lore.kernel.org/lkml/20230827072057.1591929-1-zhouchuyi@bytedance.com/):
- Add a pre-patch to make some preparations before supporting css_task
  iters.(Alexei)
- Add an allowlist for css_task iters(Alexei)
- Let bpf progs do explicit bpf_rcu_read_lock() when using process
  iters and css_descendant iters.(Alexei)
---------------------

In some BPF usage scenarios, it will be useful to iterate the process and
css directly in the BPF program. One of the expected scenarios is
customizable OOM victim selection via BPF[1].

Inspired by Dave's task_vma iter[2], this patchset adds three types of
open-coded iterator kfuncs:

1. bpf_task_iters. It can be used to
1) iterate all process in the system, like for_each_forcess() in kernel.
2) iterate all threads in the system.
3) iterate all threads of a specific task

2. bpf_css_iters. It works like css_task_iter_{start, next, end} and would
be used to iterating tasks/threads under a css.

3. css_iters. It works like css_next_descendant_{pre, post} to iterating all
descendant css.

BPF programs can use these kfuncs directly or through bpf_for_each macro.

link[1]: https://lore.kernel.org/lkml/20230810081319.65668-1-zhouchuyi@bytedance.com/
link[2]: https://lore.kernel.org/all/20230810183513.684836-1-davemarchevsky@fb.com/

Chuyi Zhou (7):
  cgroup: Prepare for using css_task_iter_*() in BPF
  bpf: Introduce css_task open-coded iterator kfuncs
  bpf: Introduce task open coded iterator kfuncs
  bpf: Introduce css open-coded iterator kfuncs
  bpf: teach the verifier to enforce css_iter and task_iter in RCU CS
  bpf: Let bpf_iter_task_new accept null task ptr
  selftests/bpf: Add tests for open-coded task and css iter

 include/linux/bpf.h                           |   8 +-
 include/linux/bpf_verifier.h                  |  19 ++-
 include/linux/btf.h                           |   1 +
 include/linux/cgroup.h                        |  12 +-
 kernel/bpf/cgroup_iter.c                      |  57 +++++++
 kernel/bpf/helpers.c                          |   9 +
 kernel/bpf/task_iter.c                        | 152 +++++++++++++++--
 kernel/bpf/verifier.c                         |  84 +++++++--
 kernel/cgroup/cgroup.c                        |  18 +-
 .../testing/selftests/bpf/bpf_experimental.h  |  18 ++
 .../selftests/bpf/prog_tests/bpf_iter.c       |  18 +-
 .../testing/selftests/bpf/prog_tests/iters.c  | 161 ++++++++++++++++++
 .../{bpf_iter_task.c => bpf_iter_tasks.c}     |   0
 .../testing/selftests/bpf/progs/iters_task.c  | 132 ++++++++++++++
 .../selftests/bpf/progs/iters_task_failure.c  | 103 +++++++++++
 15 files changed, 734 insertions(+), 58 deletions(-)
 rename tools/testing/selftests/bpf/progs/{bpf_iter_task.c => bpf_iter_tasks.c} (100%)
 create mode 100644 tools/testing/selftests/bpf/progs/iters_task.c
 create mode 100644 tools/testing/selftests/bpf/progs/iters_task_failure.c

-- 
2.20.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH bpf-next v3 1/7] cgroup: Prepare for using css_task_iter_*() in BPF
  2023-09-25 10:55 [PATCH bpf-next v3 0/7] Add Open-coded task, css_task and css iters Chuyi Zhou
@ 2023-09-25 10:55 ` Chuyi Zhou
  2023-09-25 10:55 ` [PATCH bpf-next v3 2/7] bpf: Introduce css_task open-coded iterator kfuncs Chuyi Zhou
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Chuyi Zhou @ 2023-09-25 10:55 UTC (permalink / raw)
  To: bpf; +Cc: ast, daniel, andrii, martin.lau, tj, linux-kernel, Chuyi Zhou

This patch makes some preparations for using css_task_iter_*() in BPF
Program.

1. Flags CSS_TASK_ITER_* are #define-s and it's not easy for bpf prog to
use them. Convert them to enum so bpf prog can take them from vmlinux.h.

2. In the next patch we will add css_task_iter_*() in common kfuncs which
is not safe. Since css_task_iter_*() does spin_unlock_irq() which might
screw up irq flags depending on the context where bpf prog is running.
So we should use irqsave/irqrestore here and the switching is harmless.

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
Acked-by: Tejun Heo <tj@kernel.org>
---
 include/linux/cgroup.h | 12 +++++-------
 kernel/cgroup/cgroup.c | 18 ++++++++++++------
 2 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index b307013b9c6c..0ef0af66080e 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -40,13 +40,11 @@ struct kernel_clone_args;
 #define CGROUP_WEIGHT_DFL		100
 #define CGROUP_WEIGHT_MAX		10000
 
-/* walk only threadgroup leaders */
-#define CSS_TASK_ITER_PROCS		(1U << 0)
-/* walk all threaded css_sets in the domain */
-#define CSS_TASK_ITER_THREADED		(1U << 1)
-
-/* internal flags */
-#define CSS_TASK_ITER_SKIPPED		(1U << 16)
+enum {
+	CSS_TASK_ITER_PROCS    = (1U << 0),  /* walk only threadgroup leaders */
+	CSS_TASK_ITER_THREADED = (1U << 1),  /* walk all threaded css_sets in the domain */
+	CSS_TASK_ITER_SKIPPED  = (1U << 16), /* internal flags */
+};
 
 /* a css_task_iter should be treated as an opaque object */
 struct css_task_iter {
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 1fb7f562289d..b6d64f3b8888 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -4917,9 +4917,11 @@ static void css_task_iter_advance(struct css_task_iter *it)
 void css_task_iter_start(struct cgroup_subsys_state *css, unsigned int flags,
 			 struct css_task_iter *it)
 {
+	unsigned long irqflags;
+
 	memset(it, 0, sizeof(*it));
 
-	spin_lock_irq(&css_set_lock);
+	spin_lock_irqsave(&css_set_lock, irqflags);
 
 	it->ss = css->ss;
 	it->flags = flags;
@@ -4933,7 +4935,7 @@ void css_task_iter_start(struct cgroup_subsys_state *css, unsigned int flags,
 
 	css_task_iter_advance(it);
 
-	spin_unlock_irq(&css_set_lock);
+	spin_unlock_irqrestore(&css_set_lock, irqflags);
 }
 
 /**
@@ -4946,12 +4948,14 @@ void css_task_iter_start(struct cgroup_subsys_state *css, unsigned int flags,
  */
 struct task_struct *css_task_iter_next(struct css_task_iter *it)
 {
+	unsigned long irqflags;
+
 	if (it->cur_task) {
 		put_task_struct(it->cur_task);
 		it->cur_task = NULL;
 	}
 
-	spin_lock_irq(&css_set_lock);
+	spin_lock_irqsave(&css_set_lock, irqflags);
 
 	/* @it may be half-advanced by skips, finish advancing */
 	if (it->flags & CSS_TASK_ITER_SKIPPED)
@@ -4964,7 +4968,7 @@ struct task_struct *css_task_iter_next(struct css_task_iter *it)
 		css_task_iter_advance(it);
 	}
 
-	spin_unlock_irq(&css_set_lock);
+	spin_unlock_irqrestore(&css_set_lock, irqflags);
 
 	return it->cur_task;
 }
@@ -4977,11 +4981,13 @@ struct task_struct *css_task_iter_next(struct css_task_iter *it)
  */
 void css_task_iter_end(struct css_task_iter *it)
 {
+	unsigned long irqflags;
+
 	if (it->cur_cset) {
-		spin_lock_irq(&css_set_lock);
+		spin_lock_irqsave(&css_set_lock, irqflags);
 		list_del(&it->iters_node);
 		put_css_set_locked(it->cur_cset);
-		spin_unlock_irq(&css_set_lock);
+		spin_unlock_irqrestore(&css_set_lock, irqflags);
 	}
 
 	if (it->cur_dcset)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next v3 2/7] bpf: Introduce css_task open-coded iterator kfuncs
  2023-09-25 10:55 [PATCH bpf-next v3 0/7] Add Open-coded task, css_task and css iters Chuyi Zhou
  2023-09-25 10:55 ` [PATCH bpf-next v3 1/7] cgroup: Prepare for using css_task_iter_*() in BPF Chuyi Zhou
@ 2023-09-25 10:55 ` Chuyi Zhou
  2023-09-25 10:55 ` [PATCH bpf-next v3 3/7] bpf: Introduce task open coded " Chuyi Zhou
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Chuyi Zhou @ 2023-09-25 10:55 UTC (permalink / raw)
  To: bpf; +Cc: ast, daniel, andrii, martin.lau, tj, linux-kernel, Chuyi Zhou

This patch adds kfuncs bpf_iter_css_task_{new,next,destroy} which allow
creation and manipulation of struct bpf_iter_css_task in open-coded
iterator style. These kfuncs actually wrapps css_task_iter_{start,next,
end}. BPF programs can use these kfuncs through bpf_for_each macro for
iteration of all tasks under a css.

css_task_iter_*() would try to get the global spin-lock *css_set_lock*, so
the bpf side has to be careful in where it allows to use this iter.
Currently we only allow it in bpf_lsm and bpf iter-s.

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
---
 kernel/bpf/helpers.c                          |  3 ++
 kernel/bpf/task_iter.c                        | 53 +++++++++++++++++++
 kernel/bpf/verifier.c                         | 23 ++++++++
 .../testing/selftests/bpf/bpf_experimental.h  |  7 +++
 4 files changed, 86 insertions(+)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index b0a9834f1051..189d158c9b7f 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2504,6 +2504,9 @@ BTF_ID_FLAGS(func, bpf_dynptr_slice_rdwr, KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_iter_num_new, KF_ITER_NEW)
 BTF_ID_FLAGS(func, bpf_iter_num_next, KF_ITER_NEXT | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY)
+BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
+BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
 BTF_ID_FLAGS(func, bpf_dynptr_adjust)
 BTF_ID_FLAGS(func, bpf_dynptr_is_null)
 BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index 7473068ed313..2cfcb4dd8a37 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -7,6 +7,7 @@
 #include <linux/fs.h>
 #include <linux/fdtable.h>
 #include <linux/filter.h>
+#include <linux/bpf_mem_alloc.h>
 #include <linux/btf_ids.h>
 #include "mmap_unlock_work.h"
 
@@ -803,6 +804,58 @@ const struct bpf_func_proto bpf_find_vma_proto = {
 	.arg5_type	= ARG_ANYTHING,
 };
 
+struct bpf_iter_css_task {
+	__u64 __opaque[1];
+} __attribute__((aligned(8)));
+
+struct bpf_iter_css_task_kern {
+	struct css_task_iter *css_it;
+} __attribute__((aligned(8)));
+
+__bpf_kfunc int bpf_iter_css_task_new(struct bpf_iter_css_task *it,
+		struct cgroup_subsys_state *css, unsigned int flags)
+{
+	struct bpf_iter_css_task_kern *kit = (void *)it;
+
+	BUILD_BUG_ON(sizeof(struct bpf_iter_css_task_kern) != sizeof(struct bpf_iter_css_task));
+	BUILD_BUG_ON(__alignof__(struct bpf_iter_css_task_kern) !=
+					__alignof__(struct bpf_iter_css_task));
+	kit->css_it = NULL;
+	switch (flags) {
+	case CSS_TASK_ITER_PROCS | CSS_TASK_ITER_THREADED:
+	case CSS_TASK_ITER_PROCS:
+	case 0:
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	kit->css_it = bpf_mem_alloc(&bpf_global_ma, sizeof(struct css_task_iter));
+	if (!kit->css_it)
+		return -ENOMEM;
+	css_task_iter_start(css, flags, kit->css_it);
+	return 0;
+}
+
+__bpf_kfunc struct task_struct *bpf_iter_css_task_next(struct bpf_iter_css_task *it)
+{
+	struct bpf_iter_css_task_kern *kit = (void *)it;
+
+	if (!kit->css_it)
+		return NULL;
+	return css_task_iter_next(kit->css_it);
+}
+
+__bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it)
+{
+	struct bpf_iter_css_task_kern *kit = (void *)it;
+
+	if (!kit->css_it)
+		return;
+	css_task_iter_end(kit->css_it);
+	bpf_mem_free(&bpf_global_ma, kit->css_it);
+}
+
 DEFINE_PER_CPU(struct mmap_unlock_irq_work, mmap_unlock_work);
 
 static void do_mmap_read_unlock(struct irq_work *entry)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index dbba2b806017..2367483bf4c2 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -10332,6 +10332,7 @@ enum special_kfunc_type {
 	KF_bpf_dynptr_clone,
 	KF_bpf_percpu_obj_new_impl,
 	KF_bpf_percpu_obj_drop_impl,
+	KF_bpf_iter_css_task_new,
 };
 
 BTF_SET_START(special_kfunc_set)
@@ -10354,6 +10355,7 @@ BTF_ID(func, bpf_dynptr_slice_rdwr)
 BTF_ID(func, bpf_dynptr_clone)
 BTF_ID(func, bpf_percpu_obj_new_impl)
 BTF_ID(func, bpf_percpu_obj_drop_impl)
+BTF_ID(func, bpf_iter_css_task_new)
 BTF_SET_END(special_kfunc_set)
 
 BTF_ID_LIST(special_kfunc_list)
@@ -10378,6 +10380,7 @@ BTF_ID(func, bpf_dynptr_slice_rdwr)
 BTF_ID(func, bpf_dynptr_clone)
 BTF_ID(func, bpf_percpu_obj_new_impl)
 BTF_ID(func, bpf_percpu_obj_drop_impl)
+BTF_ID(func, bpf_iter_css_task_new)
 
 static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta)
 {
@@ -10902,6 +10905,20 @@ static int process_kf_arg_ptr_to_rbtree_node(struct bpf_verifier_env *env,
 						  &meta->arg_rbtree_root.field);
 }
 
+static bool check_css_task_iter_allowlist(struct bpf_verifier_env *env)
+{
+	enum bpf_prog_type prog_type = resolve_prog_type(env->prog);
+
+	switch (prog_type) {
+	case BPF_PROG_TYPE_LSM:
+		return true;
+	case BPF_TRACE_ITER:
+		return env->prog->aux->sleepable;
+	default:
+		return false;
+	}
+}
+
 static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta,
 			    int insn_idx)
 {
@@ -11152,6 +11169,12 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 			break;
 		}
 		case KF_ARG_PTR_TO_ITER:
+			if (meta->func_id == special_kfunc_list[KF_bpf_iter_css_task_new]) {
+				if (!check_css_task_iter_allowlist(env)) {
+					verbose(env, "css_task_iter is only allowed in bpf_lsm and bpf iter-s\n");
+					return -EINVAL;
+				}
+			}
 			ret = process_iter_arg(env, regno, insn_idx, meta);
 			if (ret < 0)
 				return ret;
diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index 4494eaa9937e..d3ea90f0e142 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -162,4 +162,11 @@ extern void bpf_percpu_obj_drop_impl(void *kptr, void *meta) __ksym;
 /* Convenience macro to wrap over bpf_obj_drop_impl */
 #define bpf_percpu_obj_drop(kptr) bpf_percpu_obj_drop_impl(kptr, NULL)
 
+struct bpf_iter_css_task;
+struct cgroup_subsys_state;
+extern int bpf_iter_css_task_new(struct bpf_iter_css_task *it,
+		struct cgroup_subsys_state *css, unsigned int flags) __weak __ksym;
+extern struct task_struct *bpf_iter_css_task_next(struct bpf_iter_css_task *it) __weak __ksym;
+extern void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it) __weak __ksym;
+
 #endif
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next v3 3/7] bpf: Introduce task open coded iterator kfuncs
  2023-09-25 10:55 [PATCH bpf-next v3 0/7] Add Open-coded task, css_task and css iters Chuyi Zhou
  2023-09-25 10:55 ` [PATCH bpf-next v3 1/7] cgroup: Prepare for using css_task_iter_*() in BPF Chuyi Zhou
  2023-09-25 10:55 ` [PATCH bpf-next v3 2/7] bpf: Introduce css_task open-coded iterator kfuncs Chuyi Zhou
@ 2023-09-25 10:55 ` Chuyi Zhou
  2023-09-27 23:20   ` Andrii Nakryiko
  2023-09-25 10:55 ` [PATCH bpf-next v3 4/7] bpf: Introduce css open-coded " Chuyi Zhou
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Chuyi Zhou @ 2023-09-25 10:55 UTC (permalink / raw)
  To: bpf; +Cc: ast, daniel, andrii, martin.lau, tj, linux-kernel, Chuyi Zhou

This patch adds kfuncs bpf_iter_task_{new,next,destroy} which allow
creation and manipulation of struct bpf_iter_task in open-coded iterator
style. BPF programs can use these kfuncs or through bpf_for_each macro to
iterate all processes in the system.

The API design keep consistent with SEC("iter/task"). bpf_iter_task_new()
accepts a specific task and iterating type which allows:
1. iterating all process in the system

2. iterating all threads in the system

3. iterating all threads of a specific task
Here we also resuse enum bpf_iter_task_type and rename BPF_TASK_ITER_TID
to BPF_TASK_ITER_THREAD, rename BPF_TASK_ITER_TGID to BPF_TASK_ITER_PROC.

The newly-added struct bpf_iter_task has a name collision with a selftest
for the seq_file task iter's bpf skel, so the selftests/bpf/progs file is
renamed in order to avoid the collision.

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
---
 include/linux/bpf.h                           |  8 +-
 kernel/bpf/helpers.c                          |  3 +
 kernel/bpf/task_iter.c                        | 96 ++++++++++++++++---
 .../testing/selftests/bpf/bpf_experimental.h  |  5 +
 .../selftests/bpf/prog_tests/bpf_iter.c       | 18 ++--
 .../{bpf_iter_task.c => bpf_iter_tasks.c}     |  0
 6 files changed, 106 insertions(+), 24 deletions(-)
 rename tools/testing/selftests/bpf/progs/{bpf_iter_task.c => bpf_iter_tasks.c} (100%)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 87eeb3a46a1d..0ef5b7a59d62 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2194,16 +2194,16 @@ int bpf_obj_get_user(int path_fd, const char __user *pathname, int flags);
  * BPF_TASK_ITER_ALL (default)
  *	Iterate over resources of every task.
  *
- * BPF_TASK_ITER_TID
+ * BPF_TASK_ITER_THREAD
  *	Iterate over resources of a task/tid.
  *
- * BPF_TASK_ITER_TGID
+ * BPF_TASK_ITER_PROC
  *	Iterate over resources of every task of a process / task group.
  */
 enum bpf_iter_task_type {
 	BPF_TASK_ITER_ALL = 0,
-	BPF_TASK_ITER_TID,
-	BPF_TASK_ITER_TGID,
+	BPF_TASK_ITER_THREAD,
+	BPF_TASK_ITER_PROC,
 };
 
 struct bpf_iter_aux_info {
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 189d158c9b7f..556262c27a75 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2507,6 +2507,9 @@ BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY)
 BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
 BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
+BTF_ID_FLAGS(func, bpf_iter_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
+BTF_ID_FLAGS(func, bpf_iter_task_next, KF_ITER_NEXT | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_iter_task_destroy, KF_ITER_DESTROY)
 BTF_ID_FLAGS(func, bpf_dynptr_adjust)
 BTF_ID_FLAGS(func, bpf_dynptr_is_null)
 BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index 2cfcb4dd8a37..9bcd3f9922b1 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -94,7 +94,7 @@ static struct task_struct *task_seq_get_next(struct bpf_iter_seq_task_common *co
 	struct task_struct *task = NULL;
 	struct pid *pid;
 
-	if (common->type == BPF_TASK_ITER_TID) {
+	if (common->type == BPF_TASK_ITER_THREAD) {
 		if (*tid && *tid != common->pid)
 			return NULL;
 		rcu_read_lock();
@@ -108,7 +108,7 @@ static struct task_struct *task_seq_get_next(struct bpf_iter_seq_task_common *co
 		return task;
 	}
 
-	if (common->type == BPF_TASK_ITER_TGID) {
+	if (common->type == BPF_TASK_ITER_PROC) {
 		rcu_read_lock();
 		task = task_group_seq_get_next(common, tid, skip_if_dup_files);
 		rcu_read_unlock();
@@ -217,15 +217,15 @@ static int bpf_iter_attach_task(struct bpf_prog *prog,
 
 	aux->task.type = BPF_TASK_ITER_ALL;
 	if (linfo->task.tid != 0) {
-		aux->task.type = BPF_TASK_ITER_TID;
+		aux->task.type = BPF_TASK_ITER_THREAD;
 		aux->task.pid = linfo->task.tid;
 	}
 	if (linfo->task.pid != 0) {
-		aux->task.type = BPF_TASK_ITER_TGID;
+		aux->task.type = BPF_TASK_ITER_PROC;
 		aux->task.pid = linfo->task.pid;
 	}
 	if (linfo->task.pid_fd != 0) {
-		aux->task.type = BPF_TASK_ITER_TGID;
+		aux->task.type = BPF_TASK_ITER_PROC;
 
 		pid = pidfd_get_pid(linfo->task.pid_fd, &flags);
 		if (IS_ERR(pid))
@@ -305,7 +305,7 @@ task_file_seq_get_next(struct bpf_iter_seq_task_file_info *info)
 	rcu_read_unlock();
 	put_task_struct(curr_task);
 
-	if (info->common.type == BPF_TASK_ITER_TID) {
+	if (info->common.type == BPF_TASK_ITER_THREAD) {
 		info->task = NULL;
 		return NULL;
 	}
@@ -566,7 +566,7 @@ task_vma_seq_get_next(struct bpf_iter_seq_task_vma_info *info)
 	return curr_vma;
 
 next_task:
-	if (info->common.type == BPF_TASK_ITER_TID)
+	if (info->common.type == BPF_TASK_ITER_THREAD)
 		goto finish;
 
 	put_task_struct(curr_task);
@@ -677,10 +677,10 @@ static const struct bpf_iter_seq_info task_seq_info = {
 static int bpf_iter_fill_link_info(const struct bpf_iter_aux_info *aux, struct bpf_link_info *info)
 {
 	switch (aux->task.type) {
-	case BPF_TASK_ITER_TID:
+	case BPF_TASK_ITER_THREAD:
 		info->iter.task.tid = aux->task.pid;
 		break;
-	case BPF_TASK_ITER_TGID:
+	case BPF_TASK_ITER_PROC:
 		info->iter.task.pid = aux->task.pid;
 		break;
 	default:
@@ -692,9 +692,9 @@ static int bpf_iter_fill_link_info(const struct bpf_iter_aux_info *aux, struct b
 static void bpf_iter_task_show_fdinfo(const struct bpf_iter_aux_info *aux, struct seq_file *seq)
 {
 	seq_printf(seq, "task_type:\t%s\n", iter_task_type_names[aux->task.type]);
-	if (aux->task.type == BPF_TASK_ITER_TID)
+	if (aux->task.type == BPF_TASK_ITER_THREAD)
 		seq_printf(seq, "tid:\t%u\n", aux->task.pid);
-	else if (aux->task.type == BPF_TASK_ITER_TGID)
+	else if (aux->task.type == BPF_TASK_ITER_PROC)
 		seq_printf(seq, "pid:\t%u\n", aux->task.pid);
 }
 
@@ -856,6 +856,80 @@ __bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it)
 	bpf_mem_free(&bpf_global_ma, kit->css_it);
 }
 
+struct bpf_iter_task {
+	__u64 __opaque[2];
+	__u32 __opaque_int[1];
+} __attribute__((aligned(8)));
+
+struct bpf_iter_task_kern {
+	struct task_struct *task;
+	struct task_struct *pos;
+	unsigned int type;
+} __attribute__((aligned(8)));
+
+__bpf_kfunc int bpf_iter_task_new(struct bpf_iter_task *it, struct task_struct *task, unsigned int type)
+{
+	struct bpf_iter_task_kern *kit = (void *)it;
+	BUILD_BUG_ON(sizeof(struct bpf_iter_task_kern) != sizeof(struct bpf_iter_task));
+	BUILD_BUG_ON(__alignof__(struct bpf_iter_task_kern) !=
+					__alignof__(struct bpf_iter_task));
+	kit->task = kit->pos = NULL;
+	switch (type) {
+	case BPF_TASK_ITER_ALL:
+	case BPF_TASK_ITER_PROC:
+	case BPF_TASK_ITER_THREAD:
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	if (type == BPF_TASK_ITER_THREAD)
+		kit->task = task;
+	else
+		kit->task = &init_task;
+	kit->pos = kit->task;
+	kit->type = type;
+	return 0;
+}
+
+__bpf_kfunc struct task_struct *bpf_iter_task_next(struct bpf_iter_task *it)
+{
+	struct bpf_iter_task_kern *kit = (void *)it;
+	struct task_struct *pos;
+	unsigned int type;
+
+	type = kit->type;
+	pos = kit->pos;
+
+	if (!pos)
+		goto out;
+
+	if (type == BPF_TASK_ITER_PROC)
+		goto get_next_task;
+
+	kit->pos = next_thread(kit->pos);
+	if (kit->pos == kit->task) {
+		if (type == BPF_TASK_ITER_THREAD) {
+			kit->pos = NULL;
+			goto out;
+		}
+	} else
+		goto out;
+
+get_next_task:
+	kit->pos = next_task(kit->pos);
+	kit->task = kit->pos;
+	if (kit->pos == &init_task)
+		kit->pos = NULL;
+
+out:
+	return pos;
+}
+
+__bpf_kfunc void bpf_iter_task_destroy(struct bpf_iter_task *it)
+{
+}
+
 DEFINE_PER_CPU(struct mmap_unlock_irq_work, mmap_unlock_work);
 
 static void do_mmap_read_unlock(struct irq_work *entry)
diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index d3ea90f0e142..d989775dbdb5 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -169,4 +169,9 @@ extern int bpf_iter_css_task_new(struct bpf_iter_css_task *it,
 extern struct task_struct *bpf_iter_css_task_next(struct bpf_iter_css_task *it) __weak __ksym;
 extern void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it) __weak __ksym;
 
+struct bpf_iter_task;
+extern int bpf_iter_task_new(struct bpf_iter_task *it, struct task_struct *task, unsigned int type) __weak __ksym;
+extern struct task_struct *bpf_iter_task_next(struct bpf_iter_task *it) __weak __ksym;
+extern void bpf_iter_task_destroy(struct bpf_iter_task *it) __weak __ksym;
+
 #endif
diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
index 1f02168103dd..dc60e8e125cd 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
@@ -7,7 +7,7 @@
 #include "bpf_iter_ipv6_route.skel.h"
 #include "bpf_iter_netlink.skel.h"
 #include "bpf_iter_bpf_map.skel.h"
-#include "bpf_iter_task.skel.h"
+#include "bpf_iter_tasks.skel.h"
 #include "bpf_iter_task_stack.skel.h"
 #include "bpf_iter_task_file.skel.h"
 #include "bpf_iter_task_vma.skel.h"
@@ -215,12 +215,12 @@ static void *do_nothing_wait(void *arg)
 static void test_task_common_nocheck(struct bpf_iter_attach_opts *opts,
 				     int *num_unknown, int *num_known)
 {
-	struct bpf_iter_task *skel;
+	struct bpf_iter_tasks *skel;
 	pthread_t thread_id;
 	void *ret;
 
-	skel = bpf_iter_task__open_and_load();
-	if (!ASSERT_OK_PTR(skel, "bpf_iter_task__open_and_load"))
+	skel = bpf_iter_tasks__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "bpf_iter_tasks__open_and_load"))
 		return;
 
 	ASSERT_OK(pthread_mutex_lock(&do_nothing_mutex), "pthread_mutex_lock");
@@ -239,7 +239,7 @@ static void test_task_common_nocheck(struct bpf_iter_attach_opts *opts,
 	ASSERT_FALSE(pthread_join(thread_id, &ret) || ret != NULL,
 		     "pthread_join");
 
-	bpf_iter_task__destroy(skel);
+	bpf_iter_tasks__destroy(skel);
 }
 
 static void test_task_common(struct bpf_iter_attach_opts *opts, int num_unknown, int num_known)
@@ -307,10 +307,10 @@ static void test_task_pidfd(void)
 
 static void test_task_sleepable(void)
 {
-	struct bpf_iter_task *skel;
+	struct bpf_iter_tasks *skel;
 
-	skel = bpf_iter_task__open_and_load();
-	if (!ASSERT_OK_PTR(skel, "bpf_iter_task__open_and_load"))
+	skel = bpf_iter_tasks__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "bpf_iter_tasks__open_and_load"))
 		return;
 
 	do_dummy_read(skel->progs.dump_task_sleepable);
@@ -320,7 +320,7 @@ static void test_task_sleepable(void)
 	ASSERT_GT(skel->bss->num_success_copy_from_user_task, 0,
 		  "num_success_copy_from_user_task");
 
-	bpf_iter_task__destroy(skel);
+	bpf_iter_tasks__destroy(skel);
 }
 
 static void test_task_stack(void)
diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_task.c b/tools/testing/selftests/bpf/progs/bpf_iter_tasks.c
similarity index 100%
rename from tools/testing/selftests/bpf/progs/bpf_iter_task.c
rename to tools/testing/selftests/bpf/progs/bpf_iter_tasks.c
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next v3 4/7] bpf: Introduce css open-coded iterator kfuncs
  2023-09-25 10:55 [PATCH bpf-next v3 0/7] Add Open-coded task, css_task and css iters Chuyi Zhou
                   ` (2 preceding siblings ...)
  2023-09-25 10:55 ` [PATCH bpf-next v3 3/7] bpf: Introduce task open coded " Chuyi Zhou
@ 2023-09-25 10:55 ` Chuyi Zhou
  2023-09-27 23:24   ` Andrii Nakryiko
  2023-09-25 10:55 ` [PATCH bpf-next v3 5/7] bpf: teach the verifier to enforce css_iter and task_iter in RCU CS Chuyi Zhou
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Chuyi Zhou @ 2023-09-25 10:55 UTC (permalink / raw)
  To: bpf; +Cc: ast, daniel, andrii, martin.lau, tj, linux-kernel, Chuyi Zhou

This Patch adds kfuncs bpf_iter_css_{new,next,destroy} which allow
creation and manipulation of struct bpf_iter_css in open-coded iterator
style. These kfuncs actually wrapps css_next_descendant_{pre, post}.
css_iter can be used to:

1) iterating a sepcific cgroup tree with pre/post/up order

2) iterating cgroup_subsystem in BPF Prog, like
for_each_mem_cgroup_tree/cpuset_for_each_descendant_pre in kernel.

The API design is consistent with cgroup_iter. bpf_iter_css_new accepts
parameters defining iteration order and starting css. Here we also reuse
BPF_CGROUP_ITER_DESCENDANTS_PRE, BPF_CGROUP_ITER_DESCENDANTS_POST,
BPF_CGROUP_ITER_ANCESTORS_UP enums.

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
---
 kernel/bpf/cgroup_iter.c                      | 57 +++++++++++++++++++
 kernel/bpf/helpers.c                          |  3 +
 .../testing/selftests/bpf/bpf_experimental.h  |  6 ++
 3 files changed, 66 insertions(+)

diff --git a/kernel/bpf/cgroup_iter.c b/kernel/bpf/cgroup_iter.c
index 810378f04fbc..ebc3d9471f52 100644
--- a/kernel/bpf/cgroup_iter.c
+++ b/kernel/bpf/cgroup_iter.c
@@ -294,3 +294,60 @@ static int __init bpf_cgroup_iter_init(void)
 }
 
 late_initcall(bpf_cgroup_iter_init);
+
+struct bpf_iter_css {
+	__u64 __opaque[2];
+	__u32 __opaque_int[1];
+} __attribute__((aligned(8)));
+
+struct bpf_iter_css_kern {
+	struct cgroup_subsys_state *start;
+	struct cgroup_subsys_state *pos;
+	int order;
+} __attribute__((aligned(8)));
+
+__bpf_kfunc int bpf_iter_css_new(struct bpf_iter_css *it,
+		struct cgroup_subsys_state *start, enum bpf_cgroup_iter_order order)
+{
+	struct bpf_iter_css_kern *kit = (void *)it;
+	kit->start = NULL;
+	BUILD_BUG_ON(sizeof(struct bpf_iter_css_kern) != sizeof(struct bpf_iter_css));
+	BUILD_BUG_ON(__alignof__(struct bpf_iter_css_kern) != __alignof__(struct bpf_iter_css));
+	switch (order) {
+	case BPF_CGROUP_ITER_DESCENDANTS_PRE:
+	case BPF_CGROUP_ITER_DESCENDANTS_POST:
+	case BPF_CGROUP_ITER_ANCESTORS_UP:
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	kit->start = start;
+	kit->pos = NULL;
+	kit->order = order;
+	return 0;
+}
+
+__bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_next(struct bpf_iter_css *it)
+{
+	struct bpf_iter_css_kern *kit = (void *)it;
+	if (!kit->start)
+		return NULL;
+
+	switch (kit->order) {
+	case BPF_CGROUP_ITER_DESCENDANTS_PRE:
+		kit->pos = css_next_descendant_pre(kit->pos, kit->start);
+		break;
+	case BPF_CGROUP_ITER_DESCENDANTS_POST:
+		kit->pos = css_next_descendant_post(kit->pos, kit->start);
+		break;
+	default:
+		kit->pos = kit->pos ? kit->pos->parent : kit->start;
+	}
+
+	return kit->pos;
+}
+
+__bpf_kfunc void bpf_iter_css_destroy(struct bpf_iter_css *it)
+{
+}
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 556262c27a75..9c3af36249a2 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2510,6 +2510,9 @@ BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
 BTF_ID_FLAGS(func, bpf_iter_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
 BTF_ID_FLAGS(func, bpf_iter_task_next, KF_ITER_NEXT | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_iter_task_destroy, KF_ITER_DESTROY)
+BTF_ID_FLAGS(func, bpf_iter_css_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
+BTF_ID_FLAGS(func, bpf_iter_css_next, KF_ITER_NEXT | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_iter_css_destroy, KF_ITER_DESTROY)
 BTF_ID_FLAGS(func, bpf_dynptr_adjust)
 BTF_ID_FLAGS(func, bpf_dynptr_is_null)
 BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index d989775dbdb5..aa247d1d81d1 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -174,4 +174,10 @@ extern int bpf_iter_task_new(struct bpf_iter_task *it, struct task_struct *task,
 extern struct task_struct *bpf_iter_task_next(struct bpf_iter_task *it) __weak __ksym;
 extern void bpf_iter_task_destroy(struct bpf_iter_task *it) __weak __ksym;
 
+struct bpf_iter_css;
+extern int bpf_iter_css_new(struct bpf_iter_css *it,
+				struct cgroup_subsys_state *start, enum bpf_cgroup_iter_order order) __weak __ksym;
+extern struct cgroup_subsys_state *bpf_iter_css_next(struct bpf_iter_css *it) __weak __ksym;
+extern void bpf_iter_css_destroy(struct bpf_iter_css *it) __weak __ksym;
+
 #endif
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next v3 5/7] bpf: teach the verifier to enforce css_iter and task_iter in RCU CS
  2023-09-25 10:55 [PATCH bpf-next v3 0/7] Add Open-coded task, css_task and css iters Chuyi Zhou
                   ` (3 preceding siblings ...)
  2023-09-25 10:55 ` [PATCH bpf-next v3 4/7] bpf: Introduce css open-coded " Chuyi Zhou
@ 2023-09-25 10:55 ` Chuyi Zhou
  2023-09-27 10:00   ` Yafang Shao
  2023-09-27 23:29   ` Andrii Nakryiko
  2023-09-25 10:55 ` [PATCH bpf-next v3 6/7] bpf: Let bpf_iter_task_new accept null task ptr Chuyi Zhou
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 22+ messages in thread
From: Chuyi Zhou @ 2023-09-25 10:55 UTC (permalink / raw)
  To: bpf; +Cc: ast, daniel, andrii, martin.lau, tj, linux-kernel, Chuyi Zhou

css_iter and task_iter should be used in rcu section. Specifically, in
sleepable progs explicit bpf_rcu_read_lock() is needed before use these
iters. In normal bpf progs that have implicit rcu_read_lock(), it's OK to
use them directly.

This patch adds a new a KF flag KF_RCU_PROTECTED for bpf_iter_task_new and
bpf_iter_css_new. It means the kfunc should be used in RCU CS. We check
whether we are in rcu cs before we want to invoke this kfunc. If the rcu
protection is guaranteed, we would let st->type = PTR_TO_STACK | MEM_RCU.
Once user do rcu_unlock during the iteration, state MEM_RCU of regs would
be cleared. is_iter_reg_valid_init() will reject if reg->type is UNTRUSTED.

It is worth noting that currently, bpf_rcu_read_unlock does not
clear the state of the STACK_ITER reg, since bpf_for_each_spilled_reg
only considers STACK_SPILL. This patch also let bpf_for_each_spilled_reg
search STACK_ITER.

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
---
 include/linux/bpf_verifier.h | 19 ++++++++------
 include/linux/btf.h          |  1 +
 kernel/bpf/helpers.c         |  4 +--
 kernel/bpf/verifier.c        | 48 +++++++++++++++++++++++++++---------
 4 files changed, 50 insertions(+), 22 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index a3236651ec64..b5cdcc332b0a 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -385,19 +385,18 @@ struct bpf_verifier_state {
 	u32 jmp_history_cnt;
 };
 
-#define bpf_get_spilled_reg(slot, frame)				\
+#define bpf_get_spilled_reg(slot, frame, mask)				\
 	(((slot < frame->allocated_stack / BPF_REG_SIZE) &&		\
-	  (frame->stack[slot].slot_type[0] == STACK_SPILL))		\
+	  ((1 << frame->stack[slot].slot_type[0]) & (mask))) \
 	 ? &frame->stack[slot].spilled_ptr : NULL)
 
 /* Iterate over 'frame', setting 'reg' to either NULL or a spilled register. */
-#define bpf_for_each_spilled_reg(iter, frame, reg)			\
-	for (iter = 0, reg = bpf_get_spilled_reg(iter, frame);		\
+#define bpf_for_each_spilled_reg(iter, frame, reg, mask)			\
+	for (iter = 0, reg = bpf_get_spilled_reg(iter, frame, mask);		\
 	     iter < frame->allocated_stack / BPF_REG_SIZE;		\
-	     iter++, reg = bpf_get_spilled_reg(iter, frame))
+	     iter++, reg = bpf_get_spilled_reg(iter, frame, mask))
 
-/* Invoke __expr over regsiters in __vst, setting __state and __reg */
-#define bpf_for_each_reg_in_vstate(__vst, __state, __reg, __expr)   \
+#define bpf_for_each_reg_in_vstate_mask(__vst, __state, __reg, __mask, __expr)   \
 	({                                                               \
 		struct bpf_verifier_state *___vstate = __vst;            \
 		int ___i, ___j;                                          \
@@ -409,7 +408,7 @@ struct bpf_verifier_state {
 				__reg = &___regs[___j];                  \
 				(void)(__expr);                          \
 			}                                                \
-			bpf_for_each_spilled_reg(___j, __state, __reg) { \
+			bpf_for_each_spilled_reg(___j, __state, __reg, __mask) { \
 				if (!__reg)                              \
 					continue;                        \
 				(void)(__expr);                          \
@@ -417,6 +416,10 @@ struct bpf_verifier_state {
 		}                                                        \
 	})
 
+/* Invoke __expr over regsiters in __vst, setting __state and __reg */
+#define bpf_for_each_reg_in_vstate(__vst, __state, __reg, __expr) \
+	bpf_for_each_reg_in_vstate_mask(__vst, __state, __reg, 1 << STACK_SPILL, __expr)
+
 /* linked list of verifier states used to prune search */
 struct bpf_verifier_state_list {
 	struct bpf_verifier_state state;
diff --git a/include/linux/btf.h b/include/linux/btf.h
index 928113a80a95..c2231c64d60b 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -74,6 +74,7 @@
 #define KF_ITER_NEW     (1 << 8) /* kfunc implements BPF iter constructor */
 #define KF_ITER_NEXT    (1 << 9) /* kfunc implements BPF iter next method */
 #define KF_ITER_DESTROY (1 << 10) /* kfunc implements BPF iter destructor */
+#define KF_RCU_PROTECTED (1 << 11) /* kfunc should be protected by rcu cs when they are invoked */
 
 /*
  * Tag marking a kernel function as a kfunc. This is meant to minimize the
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 9c3af36249a2..aa9e03fbfe1a 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2507,10 +2507,10 @@ BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY)
 BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
 BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
-BTF_ID_FLAGS(func, bpf_iter_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
+BTF_ID_FLAGS(func, bpf_iter_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS | KF_RCU_PROTECTED)
 BTF_ID_FLAGS(func, bpf_iter_task_next, KF_ITER_NEXT | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_iter_task_destroy, KF_ITER_DESTROY)
-BTF_ID_FLAGS(func, bpf_iter_css_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
+BTF_ID_FLAGS(func, bpf_iter_css_new, KF_ITER_NEW | KF_TRUSTED_ARGS | KF_RCU_PROTECTED)
 BTF_ID_FLAGS(func, bpf_iter_css_next, KF_ITER_NEXT | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_iter_css_destroy, KF_ITER_DESTROY)
 BTF_ID_FLAGS(func, bpf_dynptr_adjust)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 2367483bf4c2..a065e18a0b3a 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1172,7 +1172,12 @@ static bool is_dynptr_type_expected(struct bpf_verifier_env *env, struct bpf_reg
 
 static void __mark_reg_known_zero(struct bpf_reg_state *reg);
 
+static bool in_rcu_cs(struct bpf_verifier_env *env);
+
+static bool is_kfunc_rcu_protected(struct bpf_kfunc_call_arg_meta *meta);
+
 static int mark_stack_slots_iter(struct bpf_verifier_env *env,
+				 struct bpf_kfunc_call_arg_meta *meta,
 				 struct bpf_reg_state *reg, int insn_idx,
 				 struct btf *btf, u32 btf_id, int nr_slots)
 {
@@ -1193,6 +1198,12 @@ static int mark_stack_slots_iter(struct bpf_verifier_env *env,
 
 		__mark_reg_known_zero(st);
 		st->type = PTR_TO_STACK; /* we don't have dedicated reg type */
+		if (is_kfunc_rcu_protected(meta)) {
+			if (in_rcu_cs(env))
+				st->type |= MEM_RCU;
+			else
+				st->type |= PTR_UNTRUSTED;
+		}
 		st->live |= REG_LIVE_WRITTEN;
 		st->ref_obj_id = i == 0 ? id : 0;
 		st->iter.btf = btf;
@@ -1267,7 +1278,7 @@ static bool is_iter_reg_valid_uninit(struct bpf_verifier_env *env,
 	return true;
 }
 
-static bool is_iter_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
+static int is_iter_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
 				   struct btf *btf, u32 btf_id, int nr_slots)
 {
 	struct bpf_func_state *state = func(env, reg);
@@ -1275,26 +1286,28 @@ static bool is_iter_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_
 
 	spi = iter_get_spi(env, reg, nr_slots);
 	if (spi < 0)
-		return false;
+		return -EINVAL;
 
 	for (i = 0; i < nr_slots; i++) {
 		struct bpf_stack_state *slot = &state->stack[spi - i];
 		struct bpf_reg_state *st = &slot->spilled_ptr;
 
+		if (st->type & PTR_UNTRUSTED)
+			return -EPERM;
 		/* only main (first) slot has ref_obj_id set */
 		if (i == 0 && !st->ref_obj_id)
-			return false;
+			return -EINVAL;
 		if (i != 0 && st->ref_obj_id)
-			return false;
+			return -EINVAL;
 		if (st->iter.btf != btf || st->iter.btf_id != btf_id)
-			return false;
+			return -EINVAL;
 
 		for (j = 0; j < BPF_REG_SIZE; j++)
 			if (slot->slot_type[j] != STACK_ITER)
-				return false;
+				return -EINVAL;
 	}
 
-	return true;
+	return 0;
 }
 
 /* Check if given stack slot is "special":
@@ -7503,15 +7516,20 @@ static int process_iter_arg(struct bpf_verifier_env *env, int regno, int insn_id
 				return err;
 		}
 
-		err = mark_stack_slots_iter(env, reg, insn_idx, meta->btf, btf_id, nr_slots);
+		err = mark_stack_slots_iter(env, meta, reg, insn_idx, meta->btf, btf_id, nr_slots);
 		if (err)
 			return err;
 	} else {
 		/* iter_next() or iter_destroy() expect initialized iter state*/
-		if (!is_iter_reg_valid_init(env, reg, meta->btf, btf_id, nr_slots)) {
-			verbose(env, "expected an initialized iter_%s as arg #%d\n",
+		err = is_iter_reg_valid_init(env, reg, meta->btf, btf_id, nr_slots);
+		switch (err) {
+		case -EINVAL:
+			verbose(env, "expected an initialized iter_%s as arg #%d or without bpf_rcu_read_lock()\n",
 				iter_type_str(meta->btf, btf_id), regno);
-			return -EINVAL;
+			return err;
+		case -EPERM:
+			verbose(env, "expected an RCU CS when using %s\n", meta->func_name);
+			return err;
 		}
 
 		spi = iter_get_spi(env, reg, nr_slots);
@@ -10092,6 +10110,11 @@ static bool is_kfunc_rcu(struct bpf_kfunc_call_arg_meta *meta)
 	return meta->kfunc_flags & KF_RCU;
 }
 
+static bool is_kfunc_rcu_protected(struct bpf_kfunc_call_arg_meta *meta)
+{
+	return meta->kfunc_flags & KF_RCU_PROTECTED;
+}
+
 static bool __kfunc_param_match_suffix(const struct btf *btf,
 				       const struct btf_param *arg,
 				       const char *suffix)
@@ -11428,6 +11451,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 	if (env->cur_state->active_rcu_lock) {
 		struct bpf_func_state *state;
 		struct bpf_reg_state *reg;
+		u32 clear_mask = (1 << STACK_SPILL) | (1 << STACK_ITER);
 
 		if (in_rbtree_lock_required_cb(env) && (rcu_lock || rcu_unlock)) {
 			verbose(env, "Calling bpf_rcu_read_{lock,unlock} in unnecessary rbtree callback\n");
@@ -11438,7 +11462,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			verbose(env, "nested rcu read lock (kernel function %s)\n", func_name);
 			return -EINVAL;
 		} else if (rcu_unlock) {
-			bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
+			bpf_for_each_reg_in_vstate_mask(env->cur_state, state, reg, clear_mask, ({
 				if (reg->type & MEM_RCU) {
 					reg->type &= ~(MEM_RCU | PTR_MAYBE_NULL);
 					reg->type |= PTR_UNTRUSTED;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next v3 6/7] bpf: Let bpf_iter_task_new accept null task ptr
  2023-09-25 10:55 [PATCH bpf-next v3 0/7] Add Open-coded task, css_task and css iters Chuyi Zhou
                   ` (4 preceding siblings ...)
  2023-09-25 10:55 ` [PATCH bpf-next v3 5/7] bpf: teach the verifier to enforce css_iter and task_iter in RCU CS Chuyi Zhou
@ 2023-09-25 10:55 ` Chuyi Zhou
  2023-09-27 23:37   ` Andrii Nakryiko
  2023-09-25 10:55 ` [PATCH bpf-next v3 7/7] selftests/bpf: Add tests for open-coded task and css iter Chuyi Zhou
  2023-09-25 18:48 ` [PATCH bpf-next v3 0/7] Add Open-coded task, css_task and css iters Tejun Heo
  7 siblings, 1 reply; 22+ messages in thread
From: Chuyi Zhou @ 2023-09-25 10:55 UTC (permalink / raw)
  To: bpf; +Cc: ast, daniel, andrii, martin.lau, tj, linux-kernel, Chuyi Zhou

When using task_iter to iterate all threads of a specific task, we enforce
that the user must pass a valid task pointer to ensure safety. However,
when iterating all threads/process in the system, BPF verifier still
require a valid ptr instead of "nullable" pointer, even though it's
pointless, which is a kind of surprising from usability standpoint. It
would be nice if we could let that kfunc accept a explicit null pointer
when we are using BPF_TASK_ITER_ALL/BPF_TASK_ITER_PROC and a valid pointer
when using BPF_TASK_ITER_THREAD.

Given a trival kfunc:
	__bpf_kfunc void FN(struct TYPE_A *obj)

BPF Prog would reject a nullptr for obj. The error info is:
"arg#x pointer type xx xx must point to scalar, or struct with scalar"
reported by get_kfunc_ptr_arg_type(). The reg->type is SCALAR_VALUE and
the btf type of ref_t is not scalar or scalar_struct which leads to the
rejection of get_kfunc_ptr_arg_type.

This patch reuse the __opt annotation which was used to indicate that
the buffer associated with an __sz or __szk argument may be null:
	__bpf_kfunc void FN(struct TYPE_A *obj__opt)
Here __opt indicates obj can be optional, user can pass a explicit nullptr
or a normal TYPE_A pointer. In get_kfunc_ptr_arg_type(), we will detect
whether the current arg is optional and register is null, If so, return
a new kfunc_ptr_arg_type KF_ARG_PTR_TO_NULL and skip to the next arg in
check_kfunc_args().

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
---
 kernel/bpf/task_iter.c |  7 +++++--
 kernel/bpf/verifier.c  | 13 ++++++++++++-
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index 9bcd3f9922b1..7ac007f161cc 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -867,7 +867,7 @@ struct bpf_iter_task_kern {
 	unsigned int type;
 } __attribute__((aligned(8)));
 
-__bpf_kfunc int bpf_iter_task_new(struct bpf_iter_task *it, struct task_struct *task, unsigned int type)
+__bpf_kfunc int bpf_iter_task_new(struct bpf_iter_task *it, struct task_struct *task__opt, unsigned int type)
 {
 	struct bpf_iter_task_kern *kit = (void *)it;
 	BUILD_BUG_ON(sizeof(struct bpf_iter_task_kern) != sizeof(struct bpf_iter_task));
@@ -877,14 +877,17 @@ __bpf_kfunc int bpf_iter_task_new(struct bpf_iter_task *it, struct task_struct *
 	switch (type) {
 	case BPF_TASK_ITER_ALL:
 	case BPF_TASK_ITER_PROC:
+		break;
 	case BPF_TASK_ITER_THREAD:
+		if (!task__opt)
+			return -EINVAL;
 		break;
 	default:
 		return -EINVAL;
 	}
 
 	if (type == BPF_TASK_ITER_THREAD)
-		kit->task = task;
+		kit->task = task__opt;
 	else
 		kit->task = &init_task;
 	kit->pos = kit->task;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index a065e18a0b3a..a79204c75a90 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -10331,6 +10331,7 @@ enum kfunc_ptr_arg_type {
 	KF_ARG_PTR_TO_CALLBACK,
 	KF_ARG_PTR_TO_RB_ROOT,
 	KF_ARG_PTR_TO_RB_NODE,
+	KF_ARG_PTR_TO_NULL,
 };
 
 enum special_kfunc_type {
@@ -10425,6 +10426,12 @@ static bool is_kfunc_bpf_rcu_read_unlock(struct bpf_kfunc_call_arg_meta *meta)
 	return meta->func_id == special_kfunc_list[KF_bpf_rcu_read_unlock];
 }
 
+static inline bool is_kfunc_arg_optional_null(struct bpf_reg_state *reg,
+				const struct btf *btf, const struct btf_param *arg)
+{
+	return register_is_null(reg) && is_kfunc_arg_optional(btf, arg);
+}
+
 static enum kfunc_ptr_arg_type
 get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
 		       struct bpf_kfunc_call_arg_meta *meta,
@@ -10497,6 +10504,8 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
 	 */
 	if (!btf_type_is_scalar(ref_t) && !__btf_type_is_scalar_struct(env, meta->btf, ref_t, 0) &&
 	    (arg_mem_size ? !btf_type_is_void(ref_t) : 1)) {
+			if (is_kfunc_arg_optional_null(reg, meta->btf, &args[argno]))
+				return KF_ARG_PTR_TO_NULL;
 		verbose(env, "arg#%d pointer type %s %s must point to %sscalar, or struct with scalar\n",
 			argno, btf_type_str(ref_t), ref_tname, arg_mem_size ? "void, " : "");
 		return -EINVAL;
@@ -11028,7 +11037,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 		}
 
 		if ((is_kfunc_trusted_args(meta) || is_kfunc_rcu(meta)) &&
-		    (register_is_null(reg) || type_may_be_null(reg->type))) {
+		    (register_is_null(reg) || type_may_be_null(reg->type)) && !is_kfunc_arg_optional(meta->btf, &args[i])) {
 			verbose(env, "Possibly NULL pointer passed to trusted arg%d\n", i);
 			return -EACCES;
 		}
@@ -11053,6 +11062,8 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 			return kf_arg_type;
 
 		switch (kf_arg_type) {
+		case KF_ARG_PTR_TO_NULL:
+			continue;
 		case KF_ARG_PTR_TO_ALLOC_BTF_ID:
 		case KF_ARG_PTR_TO_BTF_ID:
 			if (!is_kfunc_trusted_args(meta) && !is_kfunc_rcu(meta))
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next v3 7/7] selftests/bpf: Add tests for open-coded task and css iter
  2023-09-25 10:55 [PATCH bpf-next v3 0/7] Add Open-coded task, css_task and css iters Chuyi Zhou
                   ` (5 preceding siblings ...)
  2023-09-25 10:55 ` [PATCH bpf-next v3 6/7] bpf: Let bpf_iter_task_new accept null task ptr Chuyi Zhou
@ 2023-09-25 10:55 ` Chuyi Zhou
  2023-09-25 18:48 ` [PATCH bpf-next v3 0/7] Add Open-coded task, css_task and css iters Tejun Heo
  7 siblings, 0 replies; 22+ messages in thread
From: Chuyi Zhou @ 2023-09-25 10:55 UTC (permalink / raw)
  To: bpf; +Cc: ast, daniel, andrii, martin.lau, tj, linux-kernel, Chuyi Zhou

This patch adds three subtests to demonstrate these patterns and validating
correctness.

subtest1:
1) We use task_iter to iterate all process in the system and search for the
current process with a given pid.
2) We create some threads in current process context, and use
BPF_TASK_ITER_PROC to iterate all threads of current process. As expected,
we would find all the threads of current process.
3) We create some threads and use BPF_TASK_ITER_ALL to iterate all threads
in the system. As expected, we would find all the threads which was
created.

subtest2: We create a cgroup and add the current task to the cgroup. In the
BPF program, we would use bpf_for_each(css_task, task, css) to iterate all
tasks under the cgroup. As expected, we would find the current process.

subtest3:
1) We create a cgroup tree. In the BPF program, we use
bpf_for_each(css, pos, root, XXX) to iterate all descendant under the root
with pre and post order. As expected, we would find all descendant and the
last iterating cgroup in post-order is root cgroup, the first iterating
cgroup in pre-order is root cgroup.
2) We wse BPF_CGROUP_ITER_ANCESTORS_UP to traverse the cgroup tree starting
from leaf and root separately, and record the height. The diff of the
hights would be the total tree_high - 1.

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
---
 .../testing/selftests/bpf/prog_tests/iters.c  | 161 ++++++++++++++++++
 .../testing/selftests/bpf/progs/iters_task.c  | 132 ++++++++++++++
 .../selftests/bpf/progs/iters_task_failure.c  | 103 +++++++++++
 3 files changed, 396 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/iters_task.c
 create mode 100644 tools/testing/selftests/bpf/progs/iters_task_failure.c

diff --git a/tools/testing/selftests/bpf/prog_tests/iters.c b/tools/testing/selftests/bpf/prog_tests/iters.c
index 10804ae5ae97..f5bb3c5887db 100644
--- a/tools/testing/selftests/bpf/prog_tests/iters.c
+++ b/tools/testing/selftests/bpf/prog_tests/iters.c
@@ -1,13 +1,22 @@
 // SPDX-License-Identifier: GPL-2.0
 /* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */
 
+#include <sys/syscall.h>
+#include <sys/mman.h>
+#include <sys/wait.h>
+#include <unistd.h>
+#include <malloc.h>
+#include <stdlib.h>
 #include <test_progs.h>
+#include "cgroup_helpers.h"
 
 #include "iters.skel.h"
 #include "iters_state_safety.skel.h"
 #include "iters_looping.skel.h"
 #include "iters_num.skel.h"
 #include "iters_testmod_seq.skel.h"
+#include "iters_task.skel.h"
+#include "iters_task_failure.skel.h"
 
 static void subtest_num_iters(void)
 {
@@ -90,6 +99,151 @@ static void subtest_testmod_seq_iters(void)
 	iters_testmod_seq__destroy(skel);
 }
 
+static pthread_mutex_t do_nothing_mutex;
+
+static void *do_nothing_wait(void *arg)
+{
+	pthread_mutex_lock(&do_nothing_mutex);
+	pthread_mutex_unlock(&do_nothing_mutex);
+
+	pthread_exit(arg);
+}
+
+#define thread_num 5
+
+static void subtest_task_iters(void)
+{
+	struct iters_task *skel;
+	pthread_t thread_ids[thread_num];
+	void *ret;
+	int err;
+
+	skel = iters_task__open();
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		goto cleanup;
+	bpf_program__set_autoload(skel->progs.iter_task_for_each_sleep, true);
+	err = iters_task__load(skel);
+	if (!ASSERT_OK(err, "skel_load"))
+		goto cleanup;
+	skel->bss->target_pid = getpid();
+	err = iters_task__attach(skel);
+	if (!ASSERT_OK(err, "iters_task__attach"))
+		goto cleanup;
+	pthread_mutex_lock(&do_nothing_mutex);
+	for (int i = 0; i < thread_num; i++)
+		ASSERT_OK(pthread_create(&thread_ids[i], NULL, &do_nothing_wait, NULL), "pthread_create");
+
+	syscall(SYS_getpgid);
+	iters_task__detach(skel);
+	ASSERT_EQ(skel->bss->process_cnt, 1, "process_cnt");
+	ASSERT_EQ(skel->bss->thread_cnt, thread_num + 1, "thread_cnt");
+	ASSERT_EQ(skel->bss->all_thread_cnt, thread_num + 1, "all_thread_cnt");
+	pthread_mutex_unlock(&do_nothing_mutex);
+	for (int i = 0; i < thread_num; i++)
+		pthread_join(thread_ids[i], &ret);
+cleanup:
+	iters_task__destroy(skel);
+}
+
+extern int stack_mprotect(void);
+
+static void subtest_css_task_iters(void)
+{
+	struct iters_task *skel;
+	int err, cg_fd, cg_id;
+	const char *cgrp_path = "/cg1";
+
+	err = setup_cgroup_environment();
+	if (!ASSERT_OK(err, "setup_cgroup_environment"))
+		goto cleanup;
+	cg_fd = create_and_get_cgroup(cgrp_path);
+	if (!ASSERT_GE(cg_fd, 0, "cg_create"))
+		goto cleanup;
+	cg_id = get_cgroup_id(cgrp_path);
+	err = join_cgroup(cgrp_path);
+	if (!ASSERT_OK(err, "setup_cgroup_environment"))
+		goto cleanup;
+
+	skel = iters_task__open();
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		goto cleanup;
+
+	bpf_program__set_autoload(skel->progs.iter_css_task_for_each, true);
+	err = iters_task__load(skel);
+	if (!ASSERT_OK(err, "skel_load"))
+		goto cleanup;
+
+	skel->bss->target_pid = getpid();
+	skel->bss->root_cg_id = cg_id;
+	err = iters_task__attach(skel);
+
+	err = stack_mprotect();
+	if (!ASSERT_OK(err, "iters_task__attach"))
+		goto cleanup;
+
+	iters_task__detach(skel);
+	ASSERT_EQ(skel->bss->css_task_cnt, 1, "css_task_cnt");
+
+cleanup:
+	cleanup_cgroup_environment();
+	iters_task__destroy(skel);
+}
+
+static void subtest_css_iters(void)
+{
+	struct iters_task *skel;
+	struct {
+		const char *path;
+		int fd;
+	} cgs[] = {
+		{ "/cg1" },
+		{ "/cg1/cg2" },
+		{ "/cg1/cg2/cg3" },
+		{ "/cg1/cg2/cg3/cg4" },
+	};
+	int err, cg_nr = ARRAY_SIZE(cgs);
+	int i;
+
+	err = setup_cgroup_environment();
+	if (!ASSERT_OK(err, "setup_cgroup_environment"))
+		goto cleanup;
+	for (i = 0; i < cg_nr; i++) {
+		cgs[i].fd = create_and_get_cgroup(cgs[i].path);
+		if (!ASSERT_GE(cgs[i].fd, 0, "cg_create"))
+			goto cleanup;
+	}
+
+	skel = iters_task__open();
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		goto cleanup;
+	bpf_program__set_autoload(skel->progs.iter_css_for_each, true);
+	err = iters_task__load(skel);
+	if (!ASSERT_OK(err, "skel_load"))
+		goto cleanup;
+
+	skel->bss->target_pid = getpid();
+	skel->bss->root_cg_id = get_cgroup_id(cgs[0].path);
+	skel->bss->leaf_cg_id = get_cgroup_id(cgs[cg_nr - 1].path);
+	err = iters_task__attach(skel);
+
+	if (!ASSERT_OK(err, "iters_task__attach"))
+		goto cleanup;
+
+	syscall(SYS_getpgid);
+	ASSERT_EQ(skel->bss->pre_css_dec_cnt, cg_nr, "pre order search dec count");
+	ASSERT_EQ(skel->bss->first_cg_id, get_cgroup_id(cgs[0].path),
+				"pre order search first cgroup id");
+
+	ASSERT_EQ(skel->bss->post_css_dec_cnt, cg_nr, "post order search dec count");
+	ASSERT_EQ(skel->bss->last_cg_id, get_cgroup_id(cgs[0].path),
+				"post order search last cgroup id");
+	ASSERT_EQ(skel->bss->tree_high, cg_nr - 1, "tree high");
+	iters_task__detach(skel);
+cleanup:
+	cleanup_cgroup_environment();
+	iters_task__destroy(skel);
+}
+
 void test_iters(void)
 {
 	RUN_TESTS(iters_state_safety);
@@ -103,4 +257,11 @@ void test_iters(void)
 		subtest_num_iters();
 	if (test__start_subtest("testmod_seq"))
 		subtest_testmod_seq_iters();
+	if (test__start_subtest("task"))
+		subtest_task_iters();
+	if (test__start_subtest("css_task"))
+		subtest_css_task_iters();
+	if (test__start_subtest("css"))
+		subtest_css_iters();
+	RUN_TESTS(iters_task_failure);
 }
diff --git a/tools/testing/selftests/bpf/progs/iters_task.c b/tools/testing/selftests/bpf/progs/iters_task.c
new file mode 100644
index 000000000000..0bf922fc750f
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/iters_task.c
@@ -0,0 +1,132 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include "bpf_misc.h"
+#include "bpf_experimental.h"
+
+char _license[] SEC("license") = "GPL";
+
+pid_t target_pid = 0;
+int process_cnt = 0;
+int thread_cnt = 0;
+int all_thread_cnt = 0;
+int css_task_cnt = 0;
+int post_css_dec_cnt = 0;
+int pre_css_dec_cnt = 0;
+int tree_high = 0;
+
+u64 last_cg_id;
+u64 first_cg_id;
+
+u64 root_cg_id;
+u64 leaf_cg_id;
+
+
+struct cgroup *bpf_cgroup_from_id(u64 cgid) __ksym;
+struct cgroup *bpf_cgroup_acquire(struct cgroup *cgrp) __ksym;
+void bpf_cgroup_release(struct cgroup *p) __ksym;
+void bpf_rcu_read_lock(void) __ksym;
+void bpf_rcu_read_unlock(void) __ksym;
+
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+int iter_task_for_each_sleep(void *ctx)
+{
+	struct task_struct *pos;
+	struct task_struct *cur_task = bpf_get_current_task_btf();
+
+	if (cur_task->pid != target_pid)
+		return 0;
+	bpf_rcu_read_lock();
+	bpf_for_each(task, pos, NULL, BPF_TASK_ITER_PROC) {
+		if (pos->pid == target_pid)
+			process_cnt += 1;
+	}
+	bpf_for_each(task, pos, cur_task, BPF_TASK_ITER_THREAD) {
+		thread_cnt += 1;
+	}
+	bpf_for_each(task, pos, NULL, BPF_TASK_ITER_ALL) {
+		if (pos->tgid == target_pid)
+			all_thread_cnt += 1;
+	}
+	bpf_rcu_read_unlock();
+	return 0;
+}
+
+SEC("?lsm/file_mprotect")
+int BPF_PROG(iter_css_task_for_each)
+{
+	struct task_struct *task;
+	struct task_struct *cur_task = bpf_get_current_task_btf();
+
+	if (cur_task->pid != target_pid)
+		return 0;
+
+	struct cgroup *cgrp = bpf_cgroup_from_id(root_cg_id);
+
+	if (cgrp == NULL)
+		return 0;
+	struct cgroup_subsys_state *css = &cgrp->self;
+
+	bpf_for_each(css_task, task, css, CSS_TASK_ITER_PROCS) {
+		if (!task)
+			continue;
+		if (task->pid == target_pid)
+			css_task_cnt += 1;
+	}
+	bpf_cgroup_release(cgrp);
+	return 0;
+}
+
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+int iter_css_for_each(const void *ctx)
+{
+	struct task_struct *cur_task = bpf_get_current_task_btf();
+
+	if (cur_task->pid != target_pid)
+		return 0;
+
+	struct cgroup *root_cgrp = bpf_cgroup_from_id(root_cg_id);
+
+	if (!root_cgrp)
+		return 0;
+
+	struct cgroup *leaf_cgrp = bpf_cgroup_from_id(leaf_cg_id);
+
+	if (!leaf_cgrp) {
+		bpf_cgroup_release(root_cgrp);
+		return 0;
+	}
+	struct cgroup_subsys_state *root_css = &root_cgrp->self;
+	struct cgroup_subsys_state *leaf_css = &leaf_cgrp->self;
+	struct cgroup_subsys_state *pos = NULL;
+
+	bpf_rcu_read_lock();
+
+	bpf_for_each(css, pos, root_css, BPF_CGROUP_ITER_DESCENDANTS_POST) {
+		struct cgroup *cur_cgrp = pos->cgroup;
+
+		post_css_dec_cnt += 1;
+			if (cur_cgrp)
+				last_cg_id = cur_cgrp->kn->id;
+	}
+
+	bpf_for_each(css, pos, root_css, BPF_CGROUP_ITER_DESCENDANTS_PRE) {
+		struct cgroup *cur_cgrp = pos->cgroup;
+
+		pre_css_dec_cnt += 1;
+		if (cur_cgrp && !first_cg_id)
+			first_cg_id = cur_cgrp->kn->id;
+	}
+
+	bpf_for_each(css, pos, leaf_css, BPF_CGROUP_ITER_ANCESTORS_UP)
+		tree_high += 1;
+
+	bpf_for_each(css, pos, root_css, BPF_CGROUP_ITER_ANCESTORS_UP)
+		tree_high -= 1;
+
+	bpf_rcu_read_unlock();
+	bpf_cgroup_release(root_cgrp);
+	bpf_cgroup_release(leaf_cgrp);
+	return 0;
+}
diff --git a/tools/testing/selftests/bpf/progs/iters_task_failure.c b/tools/testing/selftests/bpf/progs/iters_task_failure.c
new file mode 100644
index 000000000000..40eb2704d94f
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/iters_task_failure.c
@@ -0,0 +1,103 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include "bpf_misc.h"
+#include "bpf_experimental.h"
+
+char _license[] SEC("license") = "GPL";
+
+struct cgroup *bpf_cgroup_from_id(u64 cgid) __ksym;
+struct cgroup *bpf_cgroup_acquire(struct cgroup *cgrp) __ksym;
+void bpf_cgroup_release(struct cgroup *p) __ksym;
+void bpf_rcu_read_lock(void) __ksym;
+void bpf_rcu_read_unlock(void) __ksym;
+
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+__failure __msg("expected an RCU CS when using bpf_iter_task_next")
+int BPF_PROG(iter_tasks_without_lock)
+{
+	struct task_struct *pos;
+
+	bpf_for_each(task, pos, NULL, BPF_TASK_ITER_PROC) {
+
+	}
+	return 0;
+}
+
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+__failure __msg("expected an RCU CS when using bpf_iter_css_next")
+int BPF_PROG(iter_css_without_lock)
+{
+	u64 cg_id = 0;
+	struct cgroup *cgrp = bpf_cgroup_from_id(cg_id);
+
+	if (!cgrp)
+		return 0;
+	struct cgroup_subsys_state *root_css = &cgrp->self;
+	struct cgroup_subsys_state *pos;
+
+	bpf_for_each(css, pos, root_css, BPF_CGROUP_ITER_DESCENDANTS_POST) {
+
+	}
+	bpf_cgroup_release(cgrp);
+	return 0;
+}
+
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+__failure __msg("expected an RCU CS when using bpf_iter_task_next")
+int BPF_PROG(iter_tasks_lock_and_unlock)
+{
+	struct task_struct *pos;
+
+	bpf_rcu_read_lock();
+	bpf_for_each(task, pos, NULL, BPF_TASK_ITER_PROC) {
+		bpf_rcu_read_unlock();
+
+		bpf_rcu_read_lock();
+	}
+	bpf_rcu_read_unlock();
+	return 0;
+}
+
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+__failure __msg("expected an RCU CS when using bpf_iter_css_next")
+int BPF_PROG(iter_css_lock_and_unlock)
+{
+	u64 cg_id = 0;
+	struct cgroup *cgrp = bpf_cgroup_from_id(cg_id);
+
+	if (!cgrp)
+		return 0;
+	struct cgroup_subsys_state *root_css = &cgrp->self;
+	struct cgroup_subsys_state *pos;
+
+	bpf_rcu_read_lock();
+	bpf_for_each(css, pos, root_css, BPF_CGROUP_ITER_DESCENDANTS_POST) {
+		bpf_rcu_read_unlock();
+
+		bpf_rcu_read_lock();
+	}
+	bpf_rcu_read_unlock();
+	bpf_cgroup_release(cgrp);
+	return 0;
+}
+
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+__failure __msg("css_task_iter is only allowed in bpf_lsm and bpf iter-s")
+int BPF_PROG(iter_css_task_for_each)
+{
+	struct task_struct *task;
+	int cg_id = bpf_get_current_cgroup_id();
+	struct cgroup *cgrp = bpf_cgroup_from_id(cg_id);
+
+	if (cgrp == NULL)
+		return 0;
+	struct cgroup_subsys_state *css = &cgrp->self;
+
+	bpf_for_each(css_task, task, css, CSS_TASK_ITER_PROCS) {
+
+	}
+	bpf_cgroup_release(cgrp);
+	return 0;
+}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v3 0/7] Add Open-coded task, css_task and css iters
  2023-09-25 10:55 [PATCH bpf-next v3 0/7] Add Open-coded task, css_task and css iters Chuyi Zhou
                   ` (6 preceding siblings ...)
  2023-09-25 10:55 ` [PATCH bpf-next v3 7/7] selftests/bpf: Add tests for open-coded task and css iter Chuyi Zhou
@ 2023-09-25 18:48 ` Tejun Heo
  7 siblings, 0 replies; 22+ messages in thread
From: Tejun Heo @ 2023-09-25 18:48 UTC (permalink / raw)
  To: Chuyi Zhou; +Cc: bpf, ast, daniel, andrii, martin.lau, linux-kernel

On Mon, Sep 25, 2023 at 06:55:45PM +0800, Chuyi Zhou wrote:
> Hi,
> 
> This is version 3 of task, css_task and css iters support.
> Thanks for your review!

From cgroup POV, looks good to me.

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v3 5/7] bpf: teach the verifier to enforce css_iter and task_iter in RCU CS
  2023-09-25 10:55 ` [PATCH bpf-next v3 5/7] bpf: teach the verifier to enforce css_iter and task_iter in RCU CS Chuyi Zhou
@ 2023-09-27 10:00   ` Yafang Shao
  2023-09-27 10:16     ` Chuyi Zhou
  2023-09-27 23:29   ` Andrii Nakryiko
  1 sibling, 1 reply; 22+ messages in thread
From: Yafang Shao @ 2023-09-27 10:00 UTC (permalink / raw)
  To: Chuyi Zhou; +Cc: bpf, ast, daniel, andrii, martin.lau, tj, linux-kernel

On Mon, Sep 25, 2023 at 6:56 PM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>
> css_iter and task_iter should be used in rcu section. Specifically, in
> sleepable progs explicit bpf_rcu_read_lock() is needed before use these
> iters. In normal bpf progs that have implicit rcu_read_lock(), it's OK to
> use them directly.
>
> This patch adds a new a KF flag KF_RCU_PROTECTED for bpf_iter_task_new and
> bpf_iter_css_new. It means the kfunc should be used in RCU CS. We check
> whether we are in rcu cs before we want to invoke this kfunc. If the rcu
> protection is guaranteed, we would let st->type = PTR_TO_STACK | MEM_RCU.
> Once user do rcu_unlock during the iteration, state MEM_RCU of regs would
> be cleared. is_iter_reg_valid_init() will reject if reg->type is UNTRUSTED.
>
> It is worth noting that currently, bpf_rcu_read_unlock does not
> clear the state of the STACK_ITER reg, since bpf_for_each_spilled_reg
> only considers STACK_SPILL. This patch also let bpf_for_each_spilled_reg
> search STACK_ITER.
>
> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>

This patch should be ahead of patch #2 and you introduce
KF_RCU_PROTECTED in it then use this flag in later patches.
BTW, I can't apply your series on bpf-next. I think you should rebase
it on the latest bpf-next, otherwise the BPF CI can't be triggered.

> ---
>  include/linux/bpf_verifier.h | 19 ++++++++------
>  include/linux/btf.h          |  1 +
>  kernel/bpf/helpers.c         |  4 +--
>  kernel/bpf/verifier.c        | 48 +++++++++++++++++++++++++++---------
>  4 files changed, 50 insertions(+), 22 deletions(-)
>
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index a3236651ec64..b5cdcc332b0a 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -385,19 +385,18 @@ struct bpf_verifier_state {
>         u32 jmp_history_cnt;
>  };
>
> -#define bpf_get_spilled_reg(slot, frame)                               \
> +#define bpf_get_spilled_reg(slot, frame, mask)                         \
>         (((slot < frame->allocated_stack / BPF_REG_SIZE) &&             \
> -         (frame->stack[slot].slot_type[0] == STACK_SPILL))             \
> +         ((1 << frame->stack[slot].slot_type[0]) & (mask))) \
>          ? &frame->stack[slot].spilled_ptr : NULL)
>
>  /* Iterate over 'frame', setting 'reg' to either NULL or a spilled register. */
> -#define bpf_for_each_spilled_reg(iter, frame, reg)                     \
> -       for (iter = 0, reg = bpf_get_spilled_reg(iter, frame);          \
> +#define bpf_for_each_spilled_reg(iter, frame, reg, mask)                       \
> +       for (iter = 0, reg = bpf_get_spilled_reg(iter, frame, mask);            \
>              iter < frame->allocated_stack / BPF_REG_SIZE;              \
> -            iter++, reg = bpf_get_spilled_reg(iter, frame))
> +            iter++, reg = bpf_get_spilled_reg(iter, frame, mask))
>
> -/* Invoke __expr over regsiters in __vst, setting __state and __reg */
> -#define bpf_for_each_reg_in_vstate(__vst, __state, __reg, __expr)   \
> +#define bpf_for_each_reg_in_vstate_mask(__vst, __state, __reg, __mask, __expr)   \
>         ({                                                               \
>                 struct bpf_verifier_state *___vstate = __vst;            \
>                 int ___i, ___j;                                          \
> @@ -409,7 +408,7 @@ struct bpf_verifier_state {
>                                 __reg = &___regs[___j];                  \
>                                 (void)(__expr);                          \
>                         }                                                \
> -                       bpf_for_each_spilled_reg(___j, __state, __reg) { \
> +                       bpf_for_each_spilled_reg(___j, __state, __reg, __mask) { \
>                                 if (!__reg)                              \
>                                         continue;                        \
>                                 (void)(__expr);                          \
> @@ -417,6 +416,10 @@ struct bpf_verifier_state {
>                 }                                                        \
>         })
>
> +/* Invoke __expr over regsiters in __vst, setting __state and __reg */
> +#define bpf_for_each_reg_in_vstate(__vst, __state, __reg, __expr) \
> +       bpf_for_each_reg_in_vstate_mask(__vst, __state, __reg, 1 << STACK_SPILL, __expr)
> +
>  /* linked list of verifier states used to prune search */
>  struct bpf_verifier_state_list {
>         struct bpf_verifier_state state;
> diff --git a/include/linux/btf.h b/include/linux/btf.h
> index 928113a80a95..c2231c64d60b 100644
> --- a/include/linux/btf.h
> +++ b/include/linux/btf.h
> @@ -74,6 +74,7 @@
>  #define KF_ITER_NEW     (1 << 8) /* kfunc implements BPF iter constructor */
>  #define KF_ITER_NEXT    (1 << 9) /* kfunc implements BPF iter next method */
>  #define KF_ITER_DESTROY (1 << 10) /* kfunc implements BPF iter destructor */
> +#define KF_RCU_PROTECTED (1 << 11) /* kfunc should be protected by rcu cs when they are invoked */
>
>  /*
>   * Tag marking a kernel function as a kfunc. This is meant to minimize the
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 9c3af36249a2..aa9e03fbfe1a 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2507,10 +2507,10 @@ BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY)
>  BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
>  BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL)
>  BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
> -BTF_ID_FLAGS(func, bpf_iter_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
> +BTF_ID_FLAGS(func, bpf_iter_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS | KF_RCU_PROTECTED)
>  BTF_ID_FLAGS(func, bpf_iter_task_next, KF_ITER_NEXT | KF_RET_NULL)
>  BTF_ID_FLAGS(func, bpf_iter_task_destroy, KF_ITER_DESTROY)
> -BTF_ID_FLAGS(func, bpf_iter_css_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
> +BTF_ID_FLAGS(func, bpf_iter_css_new, KF_ITER_NEW | KF_TRUSTED_ARGS | KF_RCU_PROTECTED)
>  BTF_ID_FLAGS(func, bpf_iter_css_next, KF_ITER_NEXT | KF_RET_NULL)
>  BTF_ID_FLAGS(func, bpf_iter_css_destroy, KF_ITER_DESTROY)
>  BTF_ID_FLAGS(func, bpf_dynptr_adjust)
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 2367483bf4c2..a065e18a0b3a 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -1172,7 +1172,12 @@ static bool is_dynptr_type_expected(struct bpf_verifier_env *env, struct bpf_reg
>
>  static void __mark_reg_known_zero(struct bpf_reg_state *reg);
>
> +static bool in_rcu_cs(struct bpf_verifier_env *env);
> +
> +static bool is_kfunc_rcu_protected(struct bpf_kfunc_call_arg_meta *meta);
> +
>  static int mark_stack_slots_iter(struct bpf_verifier_env *env,
> +                                struct bpf_kfunc_call_arg_meta *meta,
>                                  struct bpf_reg_state *reg, int insn_idx,
>                                  struct btf *btf, u32 btf_id, int nr_slots)
>  {
> @@ -1193,6 +1198,12 @@ static int mark_stack_slots_iter(struct bpf_verifier_env *env,
>
>                 __mark_reg_known_zero(st);
>                 st->type = PTR_TO_STACK; /* we don't have dedicated reg type */
> +               if (is_kfunc_rcu_protected(meta)) {
> +                       if (in_rcu_cs(env))
> +                               st->type |= MEM_RCU;

I think this change is incorrect.  The type of st->type is enum
bpf_reg_type, but MEM_RCU is enum bpf_type_flag.
Or am I missing something?

> +                       else
> +                               st->type |= PTR_UNTRUSTED;
> +               }
>                 st->live |= REG_LIVE_WRITTEN;
>                 st->ref_obj_id = i == 0 ? id : 0;
>                 st->iter.btf = btf;
> @@ -1267,7 +1278,7 @@ static bool is_iter_reg_valid_uninit(struct bpf_verifier_env *env,
>         return true;
>  }
>
> -static bool is_iter_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
> +static int is_iter_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
>                                    struct btf *btf, u32 btf_id, int nr_slots)
>  {
>         struct bpf_func_state *state = func(env, reg);
> @@ -1275,26 +1286,28 @@ static bool is_iter_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_
>
>         spi = iter_get_spi(env, reg, nr_slots);
>         if (spi < 0)
> -               return false;
> +               return -EINVAL;
>
>         for (i = 0; i < nr_slots; i++) {
>                 struct bpf_stack_state *slot = &state->stack[spi - i];
>                 struct bpf_reg_state *st = &slot->spilled_ptr;
>
> +               if (st->type & PTR_UNTRUSTED)
> +                       return -EPERM;
>                 /* only main (first) slot has ref_obj_id set */
>                 if (i == 0 && !st->ref_obj_id)
> -                       return false;
> +                       return -EINVAL;
>                 if (i != 0 && st->ref_obj_id)
> -                       return false;
> +                       return -EINVAL;
>                 if (st->iter.btf != btf || st->iter.btf_id != btf_id)
> -                       return false;
> +                       return -EINVAL;
>
>                 for (j = 0; j < BPF_REG_SIZE; j++)
>                         if (slot->slot_type[j] != STACK_ITER)
> -                               return false;
> +                               return -EINVAL;
>         }
>
> -       return true;
> +       return 0;
>  }
>
>  /* Check if given stack slot is "special":
> @@ -7503,15 +7516,20 @@ static int process_iter_arg(struct bpf_verifier_env *env, int regno, int insn_id
>                                 return err;
>                 }
>
> -               err = mark_stack_slots_iter(env, reg, insn_idx, meta->btf, btf_id, nr_slots);
> +               err = mark_stack_slots_iter(env, meta, reg, insn_idx, meta->btf, btf_id, nr_slots);
>                 if (err)
>                         return err;
>         } else {
>                 /* iter_next() or iter_destroy() expect initialized iter state*/
> -               if (!is_iter_reg_valid_init(env, reg, meta->btf, btf_id, nr_slots)) {
> -                       verbose(env, "expected an initialized iter_%s as arg #%d\n",
> +               err = is_iter_reg_valid_init(env, reg, meta->btf, btf_id, nr_slots);
> +               switch (err) {
> +               case -EINVAL:
> +                       verbose(env, "expected an initialized iter_%s as arg #%d or without bpf_rcu_read_lock()\n",
>                                 iter_type_str(meta->btf, btf_id), regno);
> -                       return -EINVAL;
> +                       return err;
> +               case -EPERM:
> +                       verbose(env, "expected an RCU CS when using %s\n", meta->func_name);
> +                       return err;
>                 }
>
>                 spi = iter_get_spi(env, reg, nr_slots);
> @@ -10092,6 +10110,11 @@ static bool is_kfunc_rcu(struct bpf_kfunc_call_arg_meta *meta)
>         return meta->kfunc_flags & KF_RCU;
>  }
>
> +static bool is_kfunc_rcu_protected(struct bpf_kfunc_call_arg_meta *meta)
> +{
> +       return meta->kfunc_flags & KF_RCU_PROTECTED;
> +}
> +
>  static bool __kfunc_param_match_suffix(const struct btf *btf,
>                                        const struct btf_param *arg,
>                                        const char *suffix)
> @@ -11428,6 +11451,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>         if (env->cur_state->active_rcu_lock) {
>                 struct bpf_func_state *state;
>                 struct bpf_reg_state *reg;
> +               u32 clear_mask = (1 << STACK_SPILL) | (1 << STACK_ITER);
>
>                 if (in_rbtree_lock_required_cb(env) && (rcu_lock || rcu_unlock)) {
>                         verbose(env, "Calling bpf_rcu_read_{lock,unlock} in unnecessary rbtree callback\n");
> @@ -11438,7 +11462,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>                         verbose(env, "nested rcu read lock (kernel function %s)\n", func_name);
>                         return -EINVAL;
>                 } else if (rcu_unlock) {
> -                       bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
> +                       bpf_for_each_reg_in_vstate_mask(env->cur_state, state, reg, clear_mask, ({
>                                 if (reg->type & MEM_RCU) {
>                                         reg->type &= ~(MEM_RCU | PTR_MAYBE_NULL);
>                                         reg->type |= PTR_UNTRUSTED;
> --
> 2.20.1
>
>


-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v3 5/7] bpf: teach the verifier to enforce css_iter and task_iter in RCU CS
  2023-09-27 10:00   ` Yafang Shao
@ 2023-09-27 10:16     ` Chuyi Zhou
  0 siblings, 0 replies; 22+ messages in thread
From: Chuyi Zhou @ 2023-09-27 10:16 UTC (permalink / raw)
  To: Yafang Shao; +Cc: bpf, ast, daniel, andrii, martin.lau, tj, linux-kernel



在 2023/9/27 18:00, Yafang Shao 写道:
> On Mon, Sep 25, 2023 at 6:56 PM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>>
>> css_iter and task_iter should be used in rcu section. Specifically, in
>> sleepable progs explicit bpf_rcu_read_lock() is needed before use these
>> iters. In normal bpf progs that have implicit rcu_read_lock(), it's OK to
>> use them directly.
>>
>> This patch adds a new a KF flag KF_RCU_PROTECTED for bpf_iter_task_new and
>> bpf_iter_css_new. It means the kfunc should be used in RCU CS. We check
>> whether we are in rcu cs before we want to invoke this kfunc. If the rcu
>> protection is guaranteed, we would let st->type = PTR_TO_STACK | MEM_RCU.
>> Once user do rcu_unlock during the iteration, state MEM_RCU of regs would
>> be cleared. is_iter_reg_valid_init() will reject if reg->type is UNTRUSTED.
>>
>> It is worth noting that currently, bpf_rcu_read_unlock does not
>> clear the state of the STACK_ITER reg, since bpf_for_each_spilled_reg
>> only considers STACK_SPILL. This patch also let bpf_for_each_spilled_reg
>> search STACK_ITER.
>>
>> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
> 
> This patch should be ahead of patch #2 and you introduce
> KF_RCU_PROTECTED in it then use this flag in later patches.
> BTW, I can't apply your series on bpf-next. I think you should rebase
> it on the latest bpf-next, otherwise the BPF CI can't be triggered.
> 

Sorry for the mistake, will rebase in v4.

>> ---
>>   include/linux/bpf_verifier.h | 19 ++++++++------
>>   include/linux/btf.h          |  1 +
>>   kernel/bpf/helpers.c         |  4 +--
>>   kernel/bpf/verifier.c        | 48 +++++++++++++++++++++++++++---------
>>   4 files changed, 50 insertions(+), 22 deletions(-)
>>
>> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
>> index a3236651ec64..b5cdcc332b0a 100644
>> --- a/include/linux/bpf_verifier.h
>> +++ b/include/linux/bpf_verifier.h
>> @@ -385,19 +385,18 @@ struct bpf_verifier_state {
>>          u32 jmp_history_cnt;
>>   };
>>
>> -#define bpf_get_spilled_reg(slot, frame)                               \
>> +#define bpf_get_spilled_reg(slot, frame, mask)                         \
>>          (((slot < frame->allocated_stack / BPF_REG_SIZE) &&             \
>> -         (frame->stack[slot].slot_type[0] == STACK_SPILL))             \
>> +         ((1 << frame->stack[slot].slot_type[0]) & (mask))) \
>>           ? &frame->stack[slot].spilled_ptr : NULL)
>>
>>   /* Iterate over 'frame', setting 'reg' to either NULL or a spilled register. */
>> -#define bpf_for_each_spilled_reg(iter, frame, reg)                     \
>> -       for (iter = 0, reg = bpf_get_spilled_reg(iter, frame);          \
>> +#define bpf_for_each_spilled_reg(iter, frame, reg, mask)                       \
>> +       for (iter = 0, reg = bpf_get_spilled_reg(iter, frame, mask);            \
>>               iter < frame->allocated_stack / BPF_REG_SIZE;              \
>> -            iter++, reg = bpf_get_spilled_reg(iter, frame))
>> +            iter++, reg = bpf_get_spilled_reg(iter, frame, mask))
>>
>> -/* Invoke __expr over regsiters in __vst, setting __state and __reg */
>> -#define bpf_for_each_reg_in_vstate(__vst, __state, __reg, __expr)   \
>> +#define bpf_for_each_reg_in_vstate_mask(__vst, __state, __reg, __mask, __expr)   \
>>          ({                                                               \
>>                  struct bpf_verifier_state *___vstate = __vst;            \
>>                  int ___i, ___j;                                          \
>> @@ -409,7 +408,7 @@ struct bpf_verifier_state {
>>                                  __reg = &___regs[___j];                  \
>>                                  (void)(__expr);                          \
>>                          }                                                \
>> -                       bpf_for_each_spilled_reg(___j, __state, __reg) { \
>> +                       bpf_for_each_spilled_reg(___j, __state, __reg, __mask) { \
>>                                  if (!__reg)                              \
>>                                          continue;                        \
>>                                  (void)(__expr);                          \
>> @@ -417,6 +416,10 @@ struct bpf_verifier_state {
>>                  }                                                        \
>>          })
>>
>> +/* Invoke __expr over regsiters in __vst, setting __state and __reg */
>> +#define bpf_for_each_reg_in_vstate(__vst, __state, __reg, __expr) \
>> +       bpf_for_each_reg_in_vstate_mask(__vst, __state, __reg, 1 << STACK_SPILL, __expr)
>> +
>>   /* linked list of verifier states used to prune search */
>>   struct bpf_verifier_state_list {
>>          struct bpf_verifier_state state;
>> diff --git a/include/linux/btf.h b/include/linux/btf.h
>> index 928113a80a95..c2231c64d60b 100644
>> --- a/include/linux/btf.h
>> +++ b/include/linux/btf.h
>> @@ -74,6 +74,7 @@
>>   #define KF_ITER_NEW     (1 << 8) /* kfunc implements BPF iter constructor */
>>   #define KF_ITER_NEXT    (1 << 9) /* kfunc implements BPF iter next method */
>>   #define KF_ITER_DESTROY (1 << 10) /* kfunc implements BPF iter destructor */
>> +#define KF_RCU_PROTECTED (1 << 11) /* kfunc should be protected by rcu cs when they are invoked */
>>
>>   /*
>>    * Tag marking a kernel function as a kfunc. This is meant to minimize the
>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>> index 9c3af36249a2..aa9e03fbfe1a 100644
>> --- a/kernel/bpf/helpers.c
>> +++ b/kernel/bpf/helpers.c
>> @@ -2507,10 +2507,10 @@ BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY)
>>   BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
>>   BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL)
>>   BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
>> -BTF_ID_FLAGS(func, bpf_iter_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
>> +BTF_ID_FLAGS(func, bpf_iter_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS | KF_RCU_PROTECTED)
>>   BTF_ID_FLAGS(func, bpf_iter_task_next, KF_ITER_NEXT | KF_RET_NULL)
>>   BTF_ID_FLAGS(func, bpf_iter_task_destroy, KF_ITER_DESTROY)
>> -BTF_ID_FLAGS(func, bpf_iter_css_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
>> +BTF_ID_FLAGS(func, bpf_iter_css_new, KF_ITER_NEW | KF_TRUSTED_ARGS | KF_RCU_PROTECTED)
>>   BTF_ID_FLAGS(func, bpf_iter_css_next, KF_ITER_NEXT | KF_RET_NULL)
>>   BTF_ID_FLAGS(func, bpf_iter_css_destroy, KF_ITER_DESTROY)
>>   BTF_ID_FLAGS(func, bpf_dynptr_adjust)
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 2367483bf4c2..a065e18a0b3a 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -1172,7 +1172,12 @@ static bool is_dynptr_type_expected(struct bpf_verifier_env *env, struct bpf_reg
>>
>>   static void __mark_reg_known_zero(struct bpf_reg_state *reg);
>>
>> +static bool in_rcu_cs(struct bpf_verifier_env *env);
>> +
>> +static bool is_kfunc_rcu_protected(struct bpf_kfunc_call_arg_meta *meta);
>> +
>>   static int mark_stack_slots_iter(struct bpf_verifier_env *env,
>> +                                struct bpf_kfunc_call_arg_meta *meta,
>>                                   struct bpf_reg_state *reg, int insn_idx,
>>                                   struct btf *btf, u32 btf_id, int nr_slots)
>>   {
>> @@ -1193,6 +1198,12 @@ static int mark_stack_slots_iter(struct bpf_verifier_env *env,
>>
>>                  __mark_reg_known_zero(st);
>>                  st->type = PTR_TO_STACK; /* we don't have dedicated reg type */
>> +               if (is_kfunc_rcu_protected(meta)) {
>> +                       if (in_rcu_cs(env))
>> +                               st->type |= MEM_RCU;
> 
> I think this change is incorrect.  The type of st->type is enum
> bpf_reg_type, but MEM_RCU is enum bpf_type_flag.
> Or am I missing something?
Looking at is_rcu_reg(), It seems OK to add MEM_RCU flag to st->type.

static bool is_rcu_reg(const struct bpf_reg_state *reg)
{
	return reg->type & MEM_RCU;
}

Here is the previous discussion link:
https://lore.kernel.org/lkml/CAADnVQKu+a6MKKfJy8NVmwtpEw1ae-_8opsGjdvvfoUjwE1sog@mail.gmail.com/

Thanks.

> 
>> +                       else
>> +                               st->type |= PTR_UNTRUSTED;
>> +               }
>>                  st->live |= REG_LIVE_WRITTEN;
>>                  st->ref_obj_id = i == 0 ? id : 0;
>>                  st->iter.btf = btf;
>> @@ -1267,7 +1278,7 @@ static bool is_iter_reg_valid_uninit(struct bpf_verifier_env *env,
>>          return true;
>>   }
>>
>> -static bool is_iter_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
>> +static int is_iter_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
>>                                     struct btf *btf, u32 btf_id, int nr_slots)
>>   {
>>          struct bpf_func_state *state = func(env, reg);
>> @@ -1275,26 +1286,28 @@ static bool is_iter_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_
>>
>>          spi = iter_get_spi(env, reg, nr_slots);
>>          if (spi < 0)
>> -               return false;
>> +               return -EINVAL;
>>
>>          for (i = 0; i < nr_slots; i++) {
>>                  struct bpf_stack_state *slot = &state->stack[spi - i];
>>                  struct bpf_reg_state *st = &slot->spilled_ptr;
>>
>> +               if (st->type & PTR_UNTRUSTED)
>> +                       return -EPERM;
>>                  /* only main (first) slot has ref_obj_id set */
>>                  if (i == 0 && !st->ref_obj_id)
>> -                       return false;
>> +                       return -EINVAL;
>>                  if (i != 0 && st->ref_obj_id)
>> -                       return false;
>> +                       return -EINVAL;
>>                  if (st->iter.btf != btf || st->iter.btf_id != btf_id)
>> -                       return false;
>> +                       return -EINVAL;
>>
>>                  for (j = 0; j < BPF_REG_SIZE; j++)
>>                          if (slot->slot_type[j] != STACK_ITER)
>> -                               return false;
>> +                               return -EINVAL;
>>          }
>>
>> -       return true;
>> +       return 0;
>>   }
>>
>>   /* Check if given stack slot is "special":
>> @@ -7503,15 +7516,20 @@ static int process_iter_arg(struct bpf_verifier_env *env, int regno, int insn_id
>>                                  return err;
>>                  }
>>
>> -               err = mark_stack_slots_iter(env, reg, insn_idx, meta->btf, btf_id, nr_slots);
>> +               err = mark_stack_slots_iter(env, meta, reg, insn_idx, meta->btf, btf_id, nr_slots);
>>                  if (err)
>>                          return err;
>>          } else {
>>                  /* iter_next() or iter_destroy() expect initialized iter state*/
>> -               if (!is_iter_reg_valid_init(env, reg, meta->btf, btf_id, nr_slots)) {
>> -                       verbose(env, "expected an initialized iter_%s as arg #%d\n",
>> +               err = is_iter_reg_valid_init(env, reg, meta->btf, btf_id, nr_slots);
>> +               switch (err) {
>> +               case -EINVAL:
>> +                       verbose(env, "expected an initialized iter_%s as arg #%d or without bpf_rcu_read_lock()\n",
>>                                  iter_type_str(meta->btf, btf_id), regno);
>> -                       return -EINVAL;
>> +                       return err;
>> +               case -EPERM:
>> +                       verbose(env, "expected an RCU CS when using %s\n", meta->func_name);
>> +                       return err;
>>                  }
>>
>>                  spi = iter_get_spi(env, reg, nr_slots);
>> @@ -10092,6 +10110,11 @@ static bool is_kfunc_rcu(struct bpf_kfunc_call_arg_meta *meta)
>>          return meta->kfunc_flags & KF_RCU;
>>   }
>>
>> +static bool is_kfunc_rcu_protected(struct bpf_kfunc_call_arg_meta *meta)
>> +{
>> +       return meta->kfunc_flags & KF_RCU_PROTECTED;
>> +}
>> +
>>   static bool __kfunc_param_match_suffix(const struct btf *btf,
>>                                         const struct btf_param *arg,
>>                                         const char *suffix)
>> @@ -11428,6 +11451,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>          if (env->cur_state->active_rcu_lock) {
>>                  struct bpf_func_state *state;
>>                  struct bpf_reg_state *reg;
>> +               u32 clear_mask = (1 << STACK_SPILL) | (1 << STACK_ITER);
>>
>>                  if (in_rbtree_lock_required_cb(env) && (rcu_lock || rcu_unlock)) {
>>                          verbose(env, "Calling bpf_rcu_read_{lock,unlock} in unnecessary rbtree callback\n");
>> @@ -11438,7 +11462,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>                          verbose(env, "nested rcu read lock (kernel function %s)\n", func_name);
>>                          return -EINVAL;
>>                  } else if (rcu_unlock) {
>> -                       bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
>> +                       bpf_for_each_reg_in_vstate_mask(env->cur_state, state, reg, clear_mask, ({
>>                                  if (reg->type & MEM_RCU) {
>>                                          reg->type &= ~(MEM_RCU | PTR_MAYBE_NULL);
>>                                          reg->type |= PTR_UNTRUSTED;
>> --
>> 2.20.1
>>
>>
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v3 3/7] bpf: Introduce task open coded iterator kfuncs
  2023-09-25 10:55 ` [PATCH bpf-next v3 3/7] bpf: Introduce task open coded " Chuyi Zhou
@ 2023-09-27 23:20   ` Andrii Nakryiko
  2023-09-28  3:29     ` Chuyi Zhou
  0 siblings, 1 reply; 22+ messages in thread
From: Andrii Nakryiko @ 2023-09-27 23:20 UTC (permalink / raw)
  To: Chuyi Zhou; +Cc: bpf, ast, daniel, andrii, martin.lau, tj, linux-kernel

On Mon, Sep 25, 2023 at 3:56 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>
> This patch adds kfuncs bpf_iter_task_{new,next,destroy} which allow
> creation and manipulation of struct bpf_iter_task in open-coded iterator
> style. BPF programs can use these kfuncs or through bpf_for_each macro to
> iterate all processes in the system.
>
> The API design keep consistent with SEC("iter/task"). bpf_iter_task_new()
> accepts a specific task and iterating type which allows:
> 1. iterating all process in the system
>
> 2. iterating all threads in the system
>
> 3. iterating all threads of a specific task
> Here we also resuse enum bpf_iter_task_type and rename BPF_TASK_ITER_TID
> to BPF_TASK_ITER_THREAD, rename BPF_TASK_ITER_TGID to BPF_TASK_ITER_PROC.
>
> The newly-added struct bpf_iter_task has a name collision with a selftest
> for the seq_file task iter's bpf skel, so the selftests/bpf/progs file is
> renamed in order to avoid the collision.
>
> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
> ---
>  include/linux/bpf.h                           |  8 +-
>  kernel/bpf/helpers.c                          |  3 +
>  kernel/bpf/task_iter.c                        | 96 ++++++++++++++++---
>  .../testing/selftests/bpf/bpf_experimental.h  |  5 +
>  .../selftests/bpf/prog_tests/bpf_iter.c       | 18 ++--
>  .../{bpf_iter_task.c => bpf_iter_tasks.c}     |  0
>  6 files changed, 106 insertions(+), 24 deletions(-)
>  rename tools/testing/selftests/bpf/progs/{bpf_iter_task.c => bpf_iter_tasks.c} (100%)
>

[...]

> @@ -692,9 +692,9 @@ static int bpf_iter_fill_link_info(const struct bpf_iter_aux_info *aux, struct b
>  static void bpf_iter_task_show_fdinfo(const struct bpf_iter_aux_info *aux, struct seq_file *seq)
>  {
>         seq_printf(seq, "task_type:\t%s\n", iter_task_type_names[aux->task.type]);
> -       if (aux->task.type == BPF_TASK_ITER_TID)
> +       if (aux->task.type == BPF_TASK_ITER_THREAD)
>                 seq_printf(seq, "tid:\t%u\n", aux->task.pid);
> -       else if (aux->task.type == BPF_TASK_ITER_TGID)
> +       else if (aux->task.type == BPF_TASK_ITER_PROC)
>                 seq_printf(seq, "pid:\t%u\n", aux->task.pid);
>  }
>
> @@ -856,6 +856,80 @@ __bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it)
>         bpf_mem_free(&bpf_global_ma, kit->css_it);
>  }
>
> +struct bpf_iter_task {
> +       __u64 __opaque[2];
> +       __u32 __opaque_int[1];

this should be __u64 __opaque[3], because struct takes full 24 bytes

> +} __attribute__((aligned(8)));
> +
> +struct bpf_iter_task_kern {
> +       struct task_struct *task;
> +       struct task_struct *pos;
> +       unsigned int type;
> +} __attribute__((aligned(8)));
> +
> +__bpf_kfunc int bpf_iter_task_new(struct bpf_iter_task *it, struct task_struct *task, unsigned int type)

nit: type -> flags, so we can add a bit more stuff, if necessary

> +{
> +       struct bpf_iter_task_kern *kit = (void *)it;

empty line after variable declarations

> +       BUILD_BUG_ON(sizeof(struct bpf_iter_task_kern) != sizeof(struct bpf_iter_task));
> +       BUILD_BUG_ON(__alignof__(struct bpf_iter_task_kern) !=
> +                                       __alignof__(struct bpf_iter_task));

and I'd add empty line here to keep BUILD_BUG_ON block separate

> +       kit->task = kit->pos = NULL;
> +       switch (type) {
> +       case BPF_TASK_ITER_ALL:
> +       case BPF_TASK_ITER_PROC:
> +       case BPF_TASK_ITER_THREAD:
> +               break;
> +       default:
> +               return -EINVAL;
> +       }
> +
> +       if (type == BPF_TASK_ITER_THREAD)
> +               kit->task = task;
> +       else
> +               kit->task = &init_task;
> +       kit->pos = kit->task;
> +       kit->type = type;
> +       return 0;
> +}
> +
> +__bpf_kfunc struct task_struct *bpf_iter_task_next(struct bpf_iter_task *it)
> +{
> +       struct bpf_iter_task_kern *kit = (void *)it;
> +       struct task_struct *pos;
> +       unsigned int type;
> +
> +       type = kit->type;
> +       pos = kit->pos;
> +
> +       if (!pos)
> +               goto out;
> +
> +       if (type == BPF_TASK_ITER_PROC)
> +               goto get_next_task;
> +
> +       kit->pos = next_thread(kit->pos);
> +       if (kit->pos == kit->task) {
> +               if (type == BPF_TASK_ITER_THREAD) {
> +                       kit->pos = NULL;
> +                       goto out;
> +               }
> +       } else
> +               goto out;
> +
> +get_next_task:
> +       kit->pos = next_task(kit->pos);
> +       kit->task = kit->pos;
> +       if (kit->pos == &init_task)
> +               kit->pos = NULL;

I can't say I completely follow the logic (e.g., for
BPF_TASK_ITER_PROC, why do we do next_task() on first next() call)?
Can you elabore the expected behavior for various combinations of
types and starting task argument?

> +
> +out:
> +       return pos;
> +}
> +
> +__bpf_kfunc void bpf_iter_task_destroy(struct bpf_iter_task *it)
> +{
> +}
> +
>  DEFINE_PER_CPU(struct mmap_unlock_irq_work, mmap_unlock_work);
>
>  static void do_mmap_read_unlock(struct irq_work *entry)
> diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
> index d3ea90f0e142..d989775dbdb5 100644
> --- a/tools/testing/selftests/bpf/bpf_experimental.h
> +++ b/tools/testing/selftests/bpf/bpf_experimental.h
> @@ -169,4 +169,9 @@ extern int bpf_iter_css_task_new(struct bpf_iter_css_task *it,
>  extern struct task_struct *bpf_iter_css_task_next(struct bpf_iter_css_task *it) __weak __ksym;
>  extern void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it) __weak __ksym;
>
> +struct bpf_iter_task;
> +extern int bpf_iter_task_new(struct bpf_iter_task *it, struct task_struct *task, unsigned int type) __weak __ksym;
> +extern struct task_struct *bpf_iter_task_next(struct bpf_iter_task *it) __weak __ksym;
> +extern void bpf_iter_task_destroy(struct bpf_iter_task *it) __weak __ksym;
> +
>  #endif
> diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c

please split out selftests changes from kernel-side changes. We only
combine them if kernel changes break selftests, preventing bisection.

[...]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v3 4/7] bpf: Introduce css open-coded iterator kfuncs
  2023-09-25 10:55 ` [PATCH bpf-next v3 4/7] bpf: Introduce css open-coded " Chuyi Zhou
@ 2023-09-27 23:24   ` Andrii Nakryiko
  2023-09-28  2:51     ` Chuyi Zhou
  0 siblings, 1 reply; 22+ messages in thread
From: Andrii Nakryiko @ 2023-09-27 23:24 UTC (permalink / raw)
  To: Chuyi Zhou; +Cc: bpf, ast, daniel, andrii, martin.lau, tj, linux-kernel

On Mon, Sep 25, 2023 at 3:56 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>
> This Patch adds kfuncs bpf_iter_css_{new,next,destroy} which allow
> creation and manipulation of struct bpf_iter_css in open-coded iterator
> style. These kfuncs actually wrapps css_next_descendant_{pre, post}.
> css_iter can be used to:
>
> 1) iterating a sepcific cgroup tree with pre/post/up order
>
> 2) iterating cgroup_subsystem in BPF Prog, like
> for_each_mem_cgroup_tree/cpuset_for_each_descendant_pre in kernel.
>
> The API design is consistent with cgroup_iter. bpf_iter_css_new accepts
> parameters defining iteration order and starting css. Here we also reuse
> BPF_CGROUP_ITER_DESCENDANTS_PRE, BPF_CGROUP_ITER_DESCENDANTS_POST,
> BPF_CGROUP_ITER_ANCESTORS_UP enums.
>
> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
> ---
>  kernel/bpf/cgroup_iter.c                      | 57 +++++++++++++++++++
>  kernel/bpf/helpers.c                          |  3 +
>  .../testing/selftests/bpf/bpf_experimental.h  |  6 ++
>  3 files changed, 66 insertions(+)
>
> diff --git a/kernel/bpf/cgroup_iter.c b/kernel/bpf/cgroup_iter.c
> index 810378f04fbc..ebc3d9471f52 100644
> --- a/kernel/bpf/cgroup_iter.c
> +++ b/kernel/bpf/cgroup_iter.c
> @@ -294,3 +294,60 @@ static int __init bpf_cgroup_iter_init(void)
>  }
>
>  late_initcall(bpf_cgroup_iter_init);
> +
> +struct bpf_iter_css {
> +       __u64 __opaque[2];
> +       __u32 __opaque_int[1];
> +} __attribute__((aligned(8)));
> +

same as before, __opaque[3] only


> +struct bpf_iter_css_kern {
> +       struct cgroup_subsys_state *start;
> +       struct cgroup_subsys_state *pos;
> +       int order;
> +} __attribute__((aligned(8)));
> +
> +__bpf_kfunc int bpf_iter_css_new(struct bpf_iter_css *it,
> +               struct cgroup_subsys_state *start, enum bpf_cgroup_iter_order order)

Similarly, I wonder if we should go for a more generic "flags" argument?

> +{
> +       struct bpf_iter_css_kern *kit = (void *)it;

empty line

> +       kit->start = NULL;
> +       BUILD_BUG_ON(sizeof(struct bpf_iter_css_kern) != sizeof(struct bpf_iter_css));
> +       BUILD_BUG_ON(__alignof__(struct bpf_iter_css_kern) != __alignof__(struct bpf_iter_css));

please move this up before kit->start assignment, and separate by empty lines

> +       switch (order) {
> +       case BPF_CGROUP_ITER_DESCENDANTS_PRE:
> +       case BPF_CGROUP_ITER_DESCENDANTS_POST:
> +       case BPF_CGROUP_ITER_ANCESTORS_UP:
> +               break;
> +       default:
> +               return -EINVAL;
> +       }
> +
> +       kit->start = start;
> +       kit->pos = NULL;
> +       kit->order = order;
> +       return 0;
> +}
> +
> +__bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_next(struct bpf_iter_css *it)
> +{
> +       struct bpf_iter_css_kern *kit = (void *)it;

empty line

> +       if (!kit->start)
> +               return NULL;
> +
> +       switch (kit->order) {
> +       case BPF_CGROUP_ITER_DESCENDANTS_PRE:
> +               kit->pos = css_next_descendant_pre(kit->pos, kit->start);
> +               break;
> +       case BPF_CGROUP_ITER_DESCENDANTS_POST:
> +               kit->pos = css_next_descendant_post(kit->pos, kit->start);
> +               break;
> +       default:

we know it's BPF_CGROUP_ITER_ANCESTORS_UP, so why not have that here explicitly?

> +               kit->pos = kit->pos ? kit->pos->parent : kit->start;
> +       }
> +
> +       return kit->pos;

wouldn't this implementation never return the "start" css? is that intentional?

> +}
> +
> +__bpf_kfunc void bpf_iter_css_destroy(struct bpf_iter_css *it)
> +{
> +}
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 556262c27a75..9c3af36249a2 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2510,6 +2510,9 @@ BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
>  BTF_ID_FLAGS(func, bpf_iter_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
>  BTF_ID_FLAGS(func, bpf_iter_task_next, KF_ITER_NEXT | KF_RET_NULL)
>  BTF_ID_FLAGS(func, bpf_iter_task_destroy, KF_ITER_DESTROY)
> +BTF_ID_FLAGS(func, bpf_iter_css_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
> +BTF_ID_FLAGS(func, bpf_iter_css_next, KF_ITER_NEXT | KF_RET_NULL)
> +BTF_ID_FLAGS(func, bpf_iter_css_destroy, KF_ITER_DESTROY)
>  BTF_ID_FLAGS(func, bpf_dynptr_adjust)
>  BTF_ID_FLAGS(func, bpf_dynptr_is_null)
>  BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
> diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
> index d989775dbdb5..aa247d1d81d1 100644
> --- a/tools/testing/selftests/bpf/bpf_experimental.h
> +++ b/tools/testing/selftests/bpf/bpf_experimental.h
> @@ -174,4 +174,10 @@ extern int bpf_iter_task_new(struct bpf_iter_task *it, struct task_struct *task,
>  extern struct task_struct *bpf_iter_task_next(struct bpf_iter_task *it) __weak __ksym;
>  extern void bpf_iter_task_destroy(struct bpf_iter_task *it) __weak __ksym;
>
> +struct bpf_iter_css;
> +extern int bpf_iter_css_new(struct bpf_iter_css *it,
> +                               struct cgroup_subsys_state *start, enum bpf_cgroup_iter_order order) __weak __ksym;
> +extern struct cgroup_subsys_state *bpf_iter_css_next(struct bpf_iter_css *it) __weak __ksym;
> +extern void bpf_iter_css_destroy(struct bpf_iter_css *it) __weak __ksym;
> +
>  #endif
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v3 5/7] bpf: teach the verifier to enforce css_iter and task_iter in RCU CS
  2023-09-25 10:55 ` [PATCH bpf-next v3 5/7] bpf: teach the verifier to enforce css_iter and task_iter in RCU CS Chuyi Zhou
  2023-09-27 10:00   ` Yafang Shao
@ 2023-09-27 23:29   ` Andrii Nakryiko
  1 sibling, 0 replies; 22+ messages in thread
From: Andrii Nakryiko @ 2023-09-27 23:29 UTC (permalink / raw)
  To: Chuyi Zhou; +Cc: bpf, ast, daniel, andrii, martin.lau, tj, linux-kernel

On Mon, Sep 25, 2023 at 3:56 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>
> css_iter and task_iter should be used in rcu section. Specifically, in
> sleepable progs explicit bpf_rcu_read_lock() is needed before use these
> iters. In normal bpf progs that have implicit rcu_read_lock(), it's OK to
> use them directly.
>
> This patch adds a new a KF flag KF_RCU_PROTECTED for bpf_iter_task_new and
> bpf_iter_css_new. It means the kfunc should be used in RCU CS. We check
> whether we are in rcu cs before we want to invoke this kfunc. If the rcu
> protection is guaranteed, we would let st->type = PTR_TO_STACK | MEM_RCU.
> Once user do rcu_unlock during the iteration, state MEM_RCU of regs would
> be cleared. is_iter_reg_valid_init() will reject if reg->type is UNTRUSTED.
>
> It is worth noting that currently, bpf_rcu_read_unlock does not
> clear the state of the STACK_ITER reg, since bpf_for_each_spilled_reg
> only considers STACK_SPILL. This patch also let bpf_for_each_spilled_reg
> search STACK_ITER.
>
> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
> ---
>  include/linux/bpf_verifier.h | 19 ++++++++------
>  include/linux/btf.h          |  1 +
>  kernel/bpf/helpers.c         |  4 +--
>  kernel/bpf/verifier.c        | 48 +++++++++++++++++++++++++++---------
>  4 files changed, 50 insertions(+), 22 deletions(-)
>

[...]

>
> -static bool is_iter_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
> +static int is_iter_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
>                                    struct btf *btf, u32 btf_id, int nr_slots)
>  {
>         struct bpf_func_state *state = func(env, reg);
> @@ -1275,26 +1286,28 @@ static bool is_iter_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_
>
>         spi = iter_get_spi(env, reg, nr_slots);
>         if (spi < 0)
> -               return false;
> +               return -EINVAL;
>
>         for (i = 0; i < nr_slots; i++) {
>                 struct bpf_stack_state *slot = &state->stack[spi - i];
>                 struct bpf_reg_state *st = &slot->spilled_ptr;
>
> +               if (st->type & PTR_UNTRUSTED)
> +                       return -EPERM;
>                 /* only main (first) slot has ref_obj_id set */
>                 if (i == 0 && !st->ref_obj_id)
> -                       return false;
> +                       return -EINVAL;
>                 if (i != 0 && st->ref_obj_id)
> -                       return false;
> +                       return -EINVAL;
>                 if (st->iter.btf != btf || st->iter.btf_id != btf_id)
> -                       return false;
> +                       return -EINVAL;
>
>                 for (j = 0; j < BPF_REG_SIZE; j++)
>                         if (slot->slot_type[j] != STACK_ITER)
> -                               return false;
> +                               return -EINVAL;
>         }
>
> -       return true;
> +       return 0;
>  }
>
>  /* Check if given stack slot is "special":
> @@ -7503,15 +7516,20 @@ static int process_iter_arg(struct bpf_verifier_env *env, int regno, int insn_id
>                                 return err;
>                 }
>
> -               err = mark_stack_slots_iter(env, reg, insn_idx, meta->btf, btf_id, nr_slots);
> +               err = mark_stack_slots_iter(env, meta, reg, insn_idx, meta->btf, btf_id, nr_slots);
>                 if (err)
>                         return err;
>         } else {
>                 /* iter_next() or iter_destroy() expect initialized iter state*/
> -               if (!is_iter_reg_valid_init(env, reg, meta->btf, btf_id, nr_slots)) {
> -                       verbose(env, "expected an initialized iter_%s as arg #%d\n",
> +               err = is_iter_reg_valid_init(env, reg, meta->btf, btf_id, nr_slots);
> +               switch (err) {
> +               case -EINVAL:

I'd also add default: here, in case we ever emit some other error from
is_iter_reg_valid_init()

> +                       verbose(env, "expected an initialized iter_%s as arg #%d or without bpf_rcu_read_lock()\n",
>                                 iter_type_str(meta->btf, btf_id), regno);
> -                       return -EINVAL;
> +                       return err;
> +               case -EPERM:

I find -EPERM a bit confusing. Let's use something a bit more
specific, e.g., -EPROTO? We are basically not following a protocol if
we don't keep RCU-protected iterators within a single RCU region,
right?

> +                       verbose(env, "expected an RCU CS when using %s\n", meta->func_name);
> +                       return err;
>                 }
>
>                 spi = iter_get_spi(env, reg, nr_slots);
> @@ -10092,6 +10110,11 @@ static bool is_kfunc_rcu(struct bpf_kfunc_call_arg_meta *meta)
>         return meta->kfunc_flags & KF_RCU;
>  }
>
> +static bool is_kfunc_rcu_protected(struct bpf_kfunc_call_arg_meta *meta)
> +{
> +       return meta->kfunc_flags & KF_RCU_PROTECTED;
> +}
> +
>  static bool __kfunc_param_match_suffix(const struct btf *btf,
>                                        const struct btf_param *arg,
>                                        const char *suffix)
> @@ -11428,6 +11451,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>         if (env->cur_state->active_rcu_lock) {
>                 struct bpf_func_state *state;
>                 struct bpf_reg_state *reg;
> +               u32 clear_mask = (1 << STACK_SPILL) | (1 << STACK_ITER);
>
>                 if (in_rbtree_lock_required_cb(env) && (rcu_lock || rcu_unlock)) {
>                         verbose(env, "Calling bpf_rcu_read_{lock,unlock} in unnecessary rbtree callback\n");
> @@ -11438,7 +11462,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>                         verbose(env, "nested rcu read lock (kernel function %s)\n", func_name);
>                         return -EINVAL;
>                 } else if (rcu_unlock) {
> -                       bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
> +                       bpf_for_each_reg_in_vstate_mask(env->cur_state, state, reg, clear_mask, ({
>                                 if (reg->type & MEM_RCU) {
>                                         reg->type &= ~(MEM_RCU | PTR_MAYBE_NULL);
>                                         reg->type |= PTR_UNTRUSTED;
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v3 6/7] bpf: Let bpf_iter_task_new accept null task ptr
  2023-09-25 10:55 ` [PATCH bpf-next v3 6/7] bpf: Let bpf_iter_task_new accept null task ptr Chuyi Zhou
@ 2023-09-27 23:37   ` Andrii Nakryiko
  2023-10-01  8:30     ` Chuyi Zhou
  0 siblings, 1 reply; 22+ messages in thread
From: Andrii Nakryiko @ 2023-09-27 23:37 UTC (permalink / raw)
  To: Chuyi Zhou; +Cc: bpf, ast, daniel, andrii, martin.lau, tj, linux-kernel

On Mon, Sep 25, 2023 at 3:56 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>
> When using task_iter to iterate all threads of a specific task, we enforce
> that the user must pass a valid task pointer to ensure safety. However,
> when iterating all threads/process in the system, BPF verifier still
> require a valid ptr instead of "nullable" pointer, even though it's
> pointless, which is a kind of surprising from usability standpoint. It
> would be nice if we could let that kfunc accept a explicit null pointer
> when we are using BPF_TASK_ITER_ALL/BPF_TASK_ITER_PROC and a valid pointer
> when using BPF_TASK_ITER_THREAD.
>
> Given a trival kfunc:
>         __bpf_kfunc void FN(struct TYPE_A *obj)
>
> BPF Prog would reject a nullptr for obj. The error info is:
> "arg#x pointer type xx xx must point to scalar, or struct with scalar"
> reported by get_kfunc_ptr_arg_type(). The reg->type is SCALAR_VALUE and
> the btf type of ref_t is not scalar or scalar_struct which leads to the
> rejection of get_kfunc_ptr_arg_type.
>
> This patch reuse the __opt annotation which was used to indicate that
> the buffer associated with an __sz or __szk argument may be null:
>         __bpf_kfunc void FN(struct TYPE_A *obj__opt)
> Here __opt indicates obj can be optional, user can pass a explicit nullptr
> or a normal TYPE_A pointer. In get_kfunc_ptr_arg_type(), we will detect
> whether the current arg is optional and register is null, If so, return
> a new kfunc_ptr_arg_type KF_ARG_PTR_TO_NULL and skip to the next arg in
> check_kfunc_args().
>
> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
> ---
>  kernel/bpf/task_iter.c |  7 +++++--
>  kernel/bpf/verifier.c  | 13 ++++++++++++-
>  2 files changed, 17 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> index 9bcd3f9922b1..7ac007f161cc 100644
> --- a/kernel/bpf/task_iter.c
> +++ b/kernel/bpf/task_iter.c
> @@ -867,7 +867,7 @@ struct bpf_iter_task_kern {
>         unsigned int type;
>  } __attribute__((aligned(8)));
>
> -__bpf_kfunc int bpf_iter_task_new(struct bpf_iter_task *it, struct task_struct *task, unsigned int type)
> +__bpf_kfunc int bpf_iter_task_new(struct bpf_iter_task *it, struct task_struct *task__opt, unsigned int type)
>  {
>         struct bpf_iter_task_kern *kit = (void *)it;
>         BUILD_BUG_ON(sizeof(struct bpf_iter_task_kern) != sizeof(struct bpf_iter_task));
> @@ -877,14 +877,17 @@ __bpf_kfunc int bpf_iter_task_new(struct bpf_iter_task *it, struct task_struct *
>         switch (type) {
>         case BPF_TASK_ITER_ALL:
>         case BPF_TASK_ITER_PROC:
> +               break;
>         case BPF_TASK_ITER_THREAD:
> +               if (!task__opt)
> +                       return -EINVAL;
>                 break;
>         default:
>                 return -EINVAL;
>         }
>
>         if (type == BPF_TASK_ITER_THREAD)
> -               kit->task = task;
> +               kit->task = task__opt;
>         else
>                 kit->task = &init_task;
>         kit->pos = kit->task;
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index a065e18a0b3a..a79204c75a90 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -10331,6 +10331,7 @@ enum kfunc_ptr_arg_type {
>         KF_ARG_PTR_TO_CALLBACK,
>         KF_ARG_PTR_TO_RB_ROOT,
>         KF_ARG_PTR_TO_RB_NODE,
> +       KF_ARG_PTR_TO_NULL,
>  };
>
>  enum special_kfunc_type {
> @@ -10425,6 +10426,12 @@ static bool is_kfunc_bpf_rcu_read_unlock(struct bpf_kfunc_call_arg_meta *meta)
>         return meta->func_id == special_kfunc_list[KF_bpf_rcu_read_unlock];
>  }
>
> +static inline bool is_kfunc_arg_optional_null(struct bpf_reg_state *reg,
> +                               const struct btf *btf, const struct btf_param *arg)
> +{
> +       return register_is_null(reg) && is_kfunc_arg_optional(btf, arg);
> +}
> +
>  static enum kfunc_ptr_arg_type
>  get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
>                        struct bpf_kfunc_call_arg_meta *meta,
> @@ -10497,6 +10504,8 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
>          */
>         if (!btf_type_is_scalar(ref_t) && !__btf_type_is_scalar_struct(env, meta->btf, ref_t, 0) &&
>             (arg_mem_size ? !btf_type_is_void(ref_t) : 1)) {
> +                       if (is_kfunc_arg_optional_null(reg, meta->btf, &args[argno]))
> +                               return KF_ARG_PTR_TO_NULL;

This nested check seems misplaced. Maybe we shouldn't reuse __opt
suffix which already has a different meaning (coupled with __sz). Why
not add "__nullable" convention and just check it separately?

>                 verbose(env, "arg#%d pointer type %s %s must point to %sscalar, or struct with scalar\n",
>                         argno, btf_type_str(ref_t), ref_tname, arg_mem_size ? "void, " : "");
>                 return -EINVAL;
> @@ -11028,7 +11037,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
>                 }
>
>                 if ((is_kfunc_trusted_args(meta) || is_kfunc_rcu(meta)) &&
> -                   (register_is_null(reg) || type_may_be_null(reg->type))) {
> +                   (register_is_null(reg) || type_may_be_null(reg->type)) && !is_kfunc_arg_optional(meta->btf, &args[i])) {

nit: looks like a very long line, probably wrap to the next line?

>                         verbose(env, "Possibly NULL pointer passed to trusted arg%d\n", i);
>                         return -EACCES;
>                 }
> @@ -11053,6 +11062,8 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
>                         return kf_arg_type;
>
>                 switch (kf_arg_type) {
> +               case KF_ARG_PTR_TO_NULL:
> +                       continue;
>                 case KF_ARG_PTR_TO_ALLOC_BTF_ID:
>                 case KF_ARG_PTR_TO_BTF_ID:
>                         if (!is_kfunc_trusted_args(meta) && !is_kfunc_rcu(meta))
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v3 4/7] bpf: Introduce css open-coded iterator kfuncs
  2023-09-27 23:24   ` Andrii Nakryiko
@ 2023-09-28  2:51     ` Chuyi Zhou
  2023-09-29 21:29       ` Andrii Nakryiko
  0 siblings, 1 reply; 22+ messages in thread
From: Chuyi Zhou @ 2023-09-28  2:51 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, ast, daniel, andrii, martin.lau, tj, linux-kernel

Hello,

在 2023/9/28 07:24, Andrii Nakryiko 写道:
> On Mon, Sep 25, 2023 at 3:56 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>>
>> This Patch adds kfuncs bpf_iter_css_{new,next,destroy} which allow
>> creation and manipulation of struct bpf_iter_css in open-coded iterator
>> style. These kfuncs actually wrapps css_next_descendant_{pre, post}.
>> css_iter can be used to:
>>
>> 1) iterating a sepcific cgroup tree with pre/post/up order
>>
>> 2) iterating cgroup_subsystem in BPF Prog, like
>> for_each_mem_cgroup_tree/cpuset_for_each_descendant_pre in kernel.
>>
>> The API design is consistent with cgroup_iter. bpf_iter_css_new accepts
>> parameters defining iteration order and starting css. Here we also reuse
>> BPF_CGROUP_ITER_DESCENDANTS_PRE, BPF_CGROUP_ITER_DESCENDANTS_POST,
>> BPF_CGROUP_ITER_ANCESTORS_UP enums.
>>
>> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
>> ---
>>   kernel/bpf/cgroup_iter.c                      | 57 +++++++++++++++++++
>>   kernel/bpf/helpers.c                          |  3 +
>>   .../testing/selftests/bpf/bpf_experimental.h  |  6 ++
>>   3 files changed, 66 insertions(+)
>>
>> diff --git a/kernel/bpf/cgroup_iter.c b/kernel/bpf/cgroup_iter.c
>> index 810378f04fbc..ebc3d9471f52 100644
>> --- a/kernel/bpf/cgroup_iter.c
>> +++ b/kernel/bpf/cgroup_iter.c
>> @@ -294,3 +294,60 @@ static int __init bpf_cgroup_iter_init(void)
>>   }
>>
>>   late_initcall(bpf_cgroup_iter_init);
>> +
>> +struct bpf_iter_css {
>> +       __u64 __opaque[2];
>> +       __u32 __opaque_int[1];
>> +} __attribute__((aligned(8)));
>> +
> 
> same as before, __opaque[3] only
> 
> 
>> +struct bpf_iter_css_kern {
>> +       struct cgroup_subsys_state *start;
>> +       struct cgroup_subsys_state *pos;
>> +       int order;
>> +} __attribute__((aligned(8)));
>> +
>> +__bpf_kfunc int bpf_iter_css_new(struct bpf_iter_css *it,
>> +               struct cgroup_subsys_state *start, enum bpf_cgroup_iter_order order)
> 
> Similarly, I wonder if we should go for a more generic "flags" argument?
> 
>> +{
>> +       struct bpf_iter_css_kern *kit = (void *)it;
> 
> empty line
> 
>> +       kit->start = NULL;
>> +       BUILD_BUG_ON(sizeof(struct bpf_iter_css_kern) != sizeof(struct bpf_iter_css));
>> +       BUILD_BUG_ON(__alignof__(struct bpf_iter_css_kern) != __alignof__(struct bpf_iter_css));
> 
> please move this up before kit->start assignment, and separate by empty lines
> 
>> +       switch (order) {
>> +       case BPF_CGROUP_ITER_DESCENDANTS_PRE:
>> +       case BPF_CGROUP_ITER_DESCENDANTS_POST:
>> +       case BPF_CGROUP_ITER_ANCESTORS_UP:
>> +               break;
>> +       default:
>> +               return -EINVAL;
>> +       }
>> +
>> +       kit->start = start;
>> +       kit->pos = NULL;
>> +       kit->order = order;
>> +       return 0;
>> +}
>> +
>> +__bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_next(struct bpf_iter_css *it)
>> +{
>> +       struct bpf_iter_css_kern *kit = (void *)it;
> 
> empty line
> 
>> +       if (!kit->start)
>> +               return NULL;
>> +
>> +       switch (kit->order) {
>> +       case BPF_CGROUP_ITER_DESCENDANTS_PRE:
>> +               kit->pos = css_next_descendant_pre(kit->pos, kit->start);
>> +               break;
>> +       case BPF_CGROUP_ITER_DESCENDANTS_POST:
>> +               kit->pos = css_next_descendant_post(kit->pos, kit->start);
>> +               break;
>> +       default:
> 
> we know it's BPF_CGROUP_ITER_ANCESTORS_UP, so why not have that here explicitly?
> 
>> +               kit->pos = kit->pos ? kit->pos->parent : kit->start;
>> +       }
>> +
>> +       return kit->pos;
> 
> wouldn't this implementation never return the "start" css? is that intentional?
> 

Thanks for the review.

This implementation actually would return the "start" css.

1. BPF_CGROUP_ITER_DESCENDANTS_PRE:
1.1 when we first call next(), css_next_descendant_pre(NULL, kit->start) 
will return kit->start.
1.2 second call next(), css_next_descendant_pre(kit->start, kit->start) 
would return a first valid child under kit->start with pre-order
1.3 third call next, css_next_descendant_pre(last_valid_child, 
kit->start) would return the next valid child
...
util css_next_descendant_pre return a NULL pointer, which means we have 
visited all valid child including "start" css itself.

The above logic is equal to macro 'css_for_each_descendant_pre' in kernel.

Same, BPF_CGROUP_ITER_DESCENDANTS_POST is equal to macro 
'css_for_each_descendant_post' which would return 'start' css when we 
have visited all valid child.

2. BPF_CGROUP_ITER_ANCESTORS_UP
2.1 when we fisrt call next(), kit->pos is NULL, and we would return 
kit->start.


The selftest in patch7 whould check:
1. when we use BPF_CGROUP_ITER_DESCENDANTS_PRE to iterate a cgroup tree, 
the first cgroup we visted should be root('start') cgroup.
2. when we use BPF_CGROUP_ITER_DESCENDANTS_POST to iterate a cgroup 
tree, the last cgroup we visited should be root('start') cgroup.


Am I miss something important?


Thanks.




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v3 3/7] bpf: Introduce task open coded iterator kfuncs
  2023-09-27 23:20   ` Andrii Nakryiko
@ 2023-09-28  3:29     ` Chuyi Zhou
  2023-09-29 21:27       ` Andrii Nakryiko
  0 siblings, 1 reply; 22+ messages in thread
From: Chuyi Zhou @ 2023-09-28  3:29 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, ast, daniel, andrii, martin.lau, tj, linux-kernel

Hello,

在 2023/9/28 07:20, Andrii Nakryiko 写道:
> On Mon, Sep 25, 2023 at 3:56 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>>
>> This patch adds kfuncs bpf_iter_task_{new,next,destroy} which allow
>> creation and manipulation of struct bpf_iter_task in open-coded iterator
>> style. BPF programs can use these kfuncs or through bpf_for_each macro to
>> iterate all processes in the system.
>>
>> The API design keep consistent with SEC("iter/task"). bpf_iter_task_new()
>> accepts a specific task and iterating type which allows:
>> 1. iterating all process in the system
>>
>> 2. iterating all threads in the system
>>
>> 3. iterating all threads of a specific task
>> Here we also resuse enum bpf_iter_task_type and rename BPF_TASK_ITER_TID
>> to BPF_TASK_ITER_THREAD, rename BPF_TASK_ITER_TGID to BPF_TASK_ITER_PROC.
>>
>> The newly-added struct bpf_iter_task has a name collision with a selftest
>> for the seq_file task iter's bpf skel, so the selftests/bpf/progs file is
>> renamed in order to avoid the collision.
>>
>> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
>> ---
>>   include/linux/bpf.h                           |  8 +-
>>   kernel/bpf/helpers.c                          |  3 +
>>   kernel/bpf/task_iter.c                        | 96 ++++++++++++++++---
>>   .../testing/selftests/bpf/bpf_experimental.h  |  5 +
>>   .../selftests/bpf/prog_tests/bpf_iter.c       | 18 ++--
>>   .../{bpf_iter_task.c => bpf_iter_tasks.c}     |  0
>>   6 files changed, 106 insertions(+), 24 deletions(-)
>>   rename tools/testing/selftests/bpf/progs/{bpf_iter_task.c => bpf_iter_tasks.c} (100%)
>>
> 
> [...]
> 
>> @@ -692,9 +692,9 @@ static int bpf_iter_fill_link_info(const struct bpf_iter_aux_info *aux, struct b
>>   static void bpf_iter_task_show_fdinfo(const struct bpf_iter_aux_info *aux, struct seq_file *seq)
>>   {
>>          seq_printf(seq, "task_type:\t%s\n", iter_task_type_names[aux->task.type]);
>> -       if (aux->task.type == BPF_TASK_ITER_TID)
>> +       if (aux->task.type == BPF_TASK_ITER_THREAD)
>>                  seq_printf(seq, "tid:\t%u\n", aux->task.pid);
>> -       else if (aux->task.type == BPF_TASK_ITER_TGID)
>> +       else if (aux->task.type == BPF_TASK_ITER_PROC)
>>                  seq_printf(seq, "pid:\t%u\n", aux->task.pid);
>>   }
>>
>> @@ -856,6 +856,80 @@ __bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it)
>>          bpf_mem_free(&bpf_global_ma, kit->css_it);
>>   }
>>
>> +struct bpf_iter_task {
>> +       __u64 __opaque[2];
>> +       __u32 __opaque_int[1];
> 
> this should be __u64 __opaque[3], because struct takes full 24 bytes
> 
>> +} __attribute__((aligned(8)));
>> +
>> +struct bpf_iter_task_kern {
>> +       struct task_struct *task;
>> +       struct task_struct *pos;
>> +       unsigned int type;
>> +} __attribute__((aligned(8)));
>> +
>> +__bpf_kfunc int bpf_iter_task_new(struct bpf_iter_task *it, struct task_struct *task, unsigned int type)
> 
> nit: type -> flags, so we can add a bit more stuff, if necessary
> 
>> +{
>> +       struct bpf_iter_task_kern *kit = (void *)it;
> 
> empty line after variable declarations
> 
>> +       BUILD_BUG_ON(sizeof(struct bpf_iter_task_kern) != sizeof(struct bpf_iter_task));
>> +       BUILD_BUG_ON(__alignof__(struct bpf_iter_task_kern) !=
>> +                                       __alignof__(struct bpf_iter_task));
> 
> and I'd add empty line here to keep BUILD_BUG_ON block separate
> 
>> +       kit->task = kit->pos = NULL;
>> +       switch (type) {
>> +       case BPF_TASK_ITER_ALL:
>> +       case BPF_TASK_ITER_PROC:
>> +       case BPF_TASK_ITER_THREAD:
>> +               break;
>> +       default:
>> +               return -EINVAL;
>> +       }
>> +
>> +       if (type == BPF_TASK_ITER_THREAD)
>> +               kit->task = task;
>> +       else
>> +               kit->task = &init_task;
>> +       kit->pos = kit->task;
>> +       kit->type = type;
>> +       return 0;
>> +}
>> +
>> +__bpf_kfunc struct task_struct *bpf_iter_task_next(struct bpf_iter_task *it)
>> +{
>> +       struct bpf_iter_task_kern *kit = (void *)it;
>> +       struct task_struct *pos;
>> +       unsigned int type;
>> +
>> +       type = kit->type;
>> +       pos = kit->pos;
>> +
>> +       if (!pos)
>> +               goto out;
>> +
>> +       if (type == BPF_TASK_ITER_PROC)
>> +               goto get_next_task;
>> +
>> +       kit->pos = next_thread(kit->pos);
>> +       if (kit->pos == kit->task) {
>> +               if (type == BPF_TASK_ITER_THREAD) {
>> +                       kit->pos = NULL;
>> +                       goto out;
>> +               }
>> +       } else
>> +               goto out;
>> +
>> +get_next_task:
>> +       kit->pos = next_task(kit->pos);
>> +       kit->task = kit->pos;
>> +       if (kit->pos == &init_task)
>> +               kit->pos = NULL;
> 
> I can't say I completely follow the logic (e.g., for
> BPF_TASK_ITER_PROC, why do we do next_task() on first next() call)?
> Can you elabore the expected behavior for various combinations of
> types and starting task argument?
> 

Thanks for the review.

The expected behavior of current implementation is:

BPF_TASK_ITER_PROC:

init_task->first_process->second_process->...->last_process->init_task

We would exit before visiting init_task again.

BPF_TASK_ITER_THREAD:

group_task->first_thread->second_thread->...->last_thread->group_task

We would exit before visiting group_task again.

BPF_TASK_ITER_ALL:

init_task -> first_process -> second_process -> ...
                 |                    |
		-> first_thread..    |
				     -> first_thread

Actually, every next() call, we would return the "pos" which was 
prepared by previous next() call, and use next_task()/next_thread() to 
update kit->pos. Once we meet the exit condition (next_task() return 
init_task or next_thread() return group_task), we would update kit->pos 
to NULL. In this way, when next() is called again, we will terminate the 
iteration.

Here "kit->pos = NULL;" means we would return the last valid "pos" and 
will return NULL in next call to exit from the iteration.

Am I miss something important?

Thanks.




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v3 3/7] bpf: Introduce task open coded iterator kfuncs
  2023-09-28  3:29     ` Chuyi Zhou
@ 2023-09-29 21:27       ` Andrii Nakryiko
  2023-10-01  8:21         ` Chuyi Zhou
  0 siblings, 1 reply; 22+ messages in thread
From: Andrii Nakryiko @ 2023-09-29 21:27 UTC (permalink / raw)
  To: Chuyi Zhou; +Cc: bpf, ast, daniel, andrii, martin.lau, tj, linux-kernel

On Wed, Sep 27, 2023 at 8:29 PM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>
> Hello,
>
> 在 2023/9/28 07:20, Andrii Nakryiko 写道:
> > On Mon, Sep 25, 2023 at 3:56 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
> >>
> >> This patch adds kfuncs bpf_iter_task_{new,next,destroy} which allow
> >> creation and manipulation of struct bpf_iter_task in open-coded iterator
> >> style. BPF programs can use these kfuncs or through bpf_for_each macro to
> >> iterate all processes in the system.
> >>
> >> The API design keep consistent with SEC("iter/task"). bpf_iter_task_new()
> >> accepts a specific task and iterating type which allows:
> >> 1. iterating all process in the system
> >>
> >> 2. iterating all threads in the system
> >>
> >> 3. iterating all threads of a specific task
> >> Here we also resuse enum bpf_iter_task_type and rename BPF_TASK_ITER_TID
> >> to BPF_TASK_ITER_THREAD, rename BPF_TASK_ITER_TGID to BPF_TASK_ITER_PROC.
> >>
> >> The newly-added struct bpf_iter_task has a name collision with a selftest
> >> for the seq_file task iter's bpf skel, so the selftests/bpf/progs file is
> >> renamed in order to avoid the collision.
> >>
> >> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
> >> ---
> >>   include/linux/bpf.h                           |  8 +-
> >>   kernel/bpf/helpers.c                          |  3 +
> >>   kernel/bpf/task_iter.c                        | 96 ++++++++++++++++---
> >>   .../testing/selftests/bpf/bpf_experimental.h  |  5 +
> >>   .../selftests/bpf/prog_tests/bpf_iter.c       | 18 ++--
> >>   .../{bpf_iter_task.c => bpf_iter_tasks.c}     |  0
> >>   6 files changed, 106 insertions(+), 24 deletions(-)
> >>   rename tools/testing/selftests/bpf/progs/{bpf_iter_task.c => bpf_iter_tasks.c} (100%)
> >>
> >
> > [...]
> >
> >> @@ -692,9 +692,9 @@ static int bpf_iter_fill_link_info(const struct bpf_iter_aux_info *aux, struct b
> >>   static void bpf_iter_task_show_fdinfo(const struct bpf_iter_aux_info *aux, struct seq_file *seq)
> >>   {
> >>          seq_printf(seq, "task_type:\t%s\n", iter_task_type_names[aux->task.type]);
> >> -       if (aux->task.type == BPF_TASK_ITER_TID)
> >> +       if (aux->task.type == BPF_TASK_ITER_THREAD)
> >>                  seq_printf(seq, "tid:\t%u\n", aux->task.pid);
> >> -       else if (aux->task.type == BPF_TASK_ITER_TGID)
> >> +       else if (aux->task.type == BPF_TASK_ITER_PROC)
> >>                  seq_printf(seq, "pid:\t%u\n", aux->task.pid);
> >>   }
> >>
> >> @@ -856,6 +856,80 @@ __bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it)
> >>          bpf_mem_free(&bpf_global_ma, kit->css_it);
> >>   }
> >>
> >> +struct bpf_iter_task {
> >> +       __u64 __opaque[2];
> >> +       __u32 __opaque_int[1];
> >
> > this should be __u64 __opaque[3], because struct takes full 24 bytes
> >
> >> +} __attribute__((aligned(8)));
> >> +
> >> +struct bpf_iter_task_kern {
> >> +       struct task_struct *task;
> >> +       struct task_struct *pos;
> >> +       unsigned int type;
> >> +} __attribute__((aligned(8)));
> >> +
> >> +__bpf_kfunc int bpf_iter_task_new(struct bpf_iter_task *it, struct task_struct *task, unsigned int type)
> >
> > nit: type -> flags, so we can add a bit more stuff, if necessary
> >
> >> +{
> >> +       struct bpf_iter_task_kern *kit = (void *)it;
> >
> > empty line after variable declarations
> >
> >> +       BUILD_BUG_ON(sizeof(struct bpf_iter_task_kern) != sizeof(struct bpf_iter_task));
> >> +       BUILD_BUG_ON(__alignof__(struct bpf_iter_task_kern) !=
> >> +                                       __alignof__(struct bpf_iter_task));
> >
> > and I'd add empty line here to keep BUILD_BUG_ON block separate
> >
> >> +       kit->task = kit->pos = NULL;
> >> +       switch (type) {
> >> +       case BPF_TASK_ITER_ALL:
> >> +       case BPF_TASK_ITER_PROC:
> >> +       case BPF_TASK_ITER_THREAD:
> >> +               break;
> >> +       default:
> >> +               return -EINVAL;
> >> +       }
> >> +
> >> +       if (type == BPF_TASK_ITER_THREAD)
> >> +               kit->task = task;
> >> +       else
> >> +               kit->task = &init_task;
> >> +       kit->pos = kit->task;
> >> +       kit->type = type;
> >> +       return 0;
> >> +}
> >> +
> >> +__bpf_kfunc struct task_struct *bpf_iter_task_next(struct bpf_iter_task *it)
> >> +{
> >> +       struct bpf_iter_task_kern *kit = (void *)it;
> >> +       struct task_struct *pos;
> >> +       unsigned int type;
> >> +
> >> +       type = kit->type;
> >> +       pos = kit->pos;
> >> +
> >> +       if (!pos)
> >> +               goto out;
> >> +
> >> +       if (type == BPF_TASK_ITER_PROC)
> >> +               goto get_next_task;
> >> +
> >> +       kit->pos = next_thread(kit->pos);
> >> +       if (kit->pos == kit->task) {
> >> +               if (type == BPF_TASK_ITER_THREAD) {
> >> +                       kit->pos = NULL;
> >> +                       goto out;
> >> +               }
> >> +       } else
> >> +               goto out;
> >> +
> >> +get_next_task:
> >> +       kit->pos = next_task(kit->pos);
> >> +       kit->task = kit->pos;
> >> +       if (kit->pos == &init_task)
> >> +               kit->pos = NULL;
> >
> > I can't say I completely follow the logic (e.g., for
> > BPF_TASK_ITER_PROC, why do we do next_task() on first next() call)?
> > Can you elabore the expected behavior for various combinations of
> > types and starting task argument?
> >
>
> Thanks for the review.
>
> The expected behavior of current implementation is:
>
> BPF_TASK_ITER_PROC:
>
> init_task->first_process->second_process->...->last_process->init_task
>
> We would exit before visiting init_task again.

ah, ok, so in this case it's more like BPF_TASK_ITER_ALL_PROCS, i.e.,
we iterate all processes in the system. Input `task` that we provide
is ignored/meaningless, right? Maybe we should express it as
ALL_PROCS?

>
> BPF_TASK_ITER_THREAD:
>
> group_task->first_thread->second_thread->...->last_thread->group_task
>
> We would exit before visiting group_task again.
>

And this one is iterating threads of a process specified by given
`task`, right?   This is where my confusion comes from. ITER_PROC and
ITER_THREAD, by their name, seems to be very similar, but in reality
ITER_PROC is more like ITER_ALL (except process vs thread iteration),
while ITER_THREAD is parameterized by input `task`.

I'm not sure what's the least confusing way to name and organize
everything, but I think it's quite confusing right now, unfortunately.
I wonder if you or someone else have a better suggestion on making
this more straightforward?


> BPF_TASK_ITER_ALL:
>
> init_task -> first_process -> second_process -> ...
>                  |                    |
>                 -> first_thread..    |
>                                      -> first_thread
>

Ok, and this one is "all threads in the system". So
BPF_TASK_ITER_ALL_THREADS would describe it more precisely, right?

> Actually, every next() call, we would return the "pos" which was
> prepared by previous next() call, and use next_task()/next_thread() to
> update kit->pos. Once we meet the exit condition (next_task() return
> init_task or next_thread() return group_task), we would update kit->pos
> to NULL. In this way, when next() is called again, we will terminate the
> iteration.
>
> Here "kit->pos = NULL;" means we would return the last valid "pos" and
> will return NULL in next call to exit from the iteration.
>
> Am I miss something important?

No, it's my bad. I did check, but somehow concluded that you are
returning kit->pos, but you are returning locally cached previous
value of kit->pos. All good here, I think.

>
> Thanks.
>
>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v3 4/7] bpf: Introduce css open-coded iterator kfuncs
  2023-09-28  2:51     ` Chuyi Zhou
@ 2023-09-29 21:29       ` Andrii Nakryiko
  0 siblings, 0 replies; 22+ messages in thread
From: Andrii Nakryiko @ 2023-09-29 21:29 UTC (permalink / raw)
  To: Chuyi Zhou; +Cc: bpf, ast, daniel, andrii, martin.lau, tj, linux-kernel

On Wed, Sep 27, 2023 at 7:51 PM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>
> Hello,
>
> 在 2023/9/28 07:24, Andrii Nakryiko 写道:
> > On Mon, Sep 25, 2023 at 3:56 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
> >>
> >> This Patch adds kfuncs bpf_iter_css_{new,next,destroy} which allow
> >> creation and manipulation of struct bpf_iter_css in open-coded iterator
> >> style. These kfuncs actually wrapps css_next_descendant_{pre, post}.
> >> css_iter can be used to:
> >>
> >> 1) iterating a sepcific cgroup tree with pre/post/up order
> >>
> >> 2) iterating cgroup_subsystem in BPF Prog, like
> >> for_each_mem_cgroup_tree/cpuset_for_each_descendant_pre in kernel.
> >>
> >> The API design is consistent with cgroup_iter. bpf_iter_css_new accepts
> >> parameters defining iteration order and starting css. Here we also reuse
> >> BPF_CGROUP_ITER_DESCENDANTS_PRE, BPF_CGROUP_ITER_DESCENDANTS_POST,
> >> BPF_CGROUP_ITER_ANCESTORS_UP enums.
> >>
> >> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
> >> ---
> >>   kernel/bpf/cgroup_iter.c                      | 57 +++++++++++++++++++
> >>   kernel/bpf/helpers.c                          |  3 +
> >>   .../testing/selftests/bpf/bpf_experimental.h  |  6 ++
> >>   3 files changed, 66 insertions(+)
> >>
> >> diff --git a/kernel/bpf/cgroup_iter.c b/kernel/bpf/cgroup_iter.c
> >> index 810378f04fbc..ebc3d9471f52 100644
> >> --- a/kernel/bpf/cgroup_iter.c
> >> +++ b/kernel/bpf/cgroup_iter.c
> >> @@ -294,3 +294,60 @@ static int __init bpf_cgroup_iter_init(void)
> >>   }
> >>
> >>   late_initcall(bpf_cgroup_iter_init);
> >> +
> >> +struct bpf_iter_css {
> >> +       __u64 __opaque[2];
> >> +       __u32 __opaque_int[1];
> >> +} __attribute__((aligned(8)));
> >> +
> >
> > same as before, __opaque[3] only
> >
> >
> >> +struct bpf_iter_css_kern {
> >> +       struct cgroup_subsys_state *start;
> >> +       struct cgroup_subsys_state *pos;
> >> +       int order;
> >> +} __attribute__((aligned(8)));
> >> +
> >> +__bpf_kfunc int bpf_iter_css_new(struct bpf_iter_css *it,
> >> +               struct cgroup_subsys_state *start, enum bpf_cgroup_iter_order order)
> >
> > Similarly, I wonder if we should go for a more generic "flags" argument?
> >
> >> +{
> >> +       struct bpf_iter_css_kern *kit = (void *)it;
> >
> > empty line
> >
> >> +       kit->start = NULL;
> >> +       BUILD_BUG_ON(sizeof(struct bpf_iter_css_kern) != sizeof(struct bpf_iter_css));
> >> +       BUILD_BUG_ON(__alignof__(struct bpf_iter_css_kern) != __alignof__(struct bpf_iter_css));
> >
> > please move this up before kit->start assignment, and separate by empty lines
> >
> >> +       switch (order) {
> >> +       case BPF_CGROUP_ITER_DESCENDANTS_PRE:
> >> +       case BPF_CGROUP_ITER_DESCENDANTS_POST:
> >> +       case BPF_CGROUP_ITER_ANCESTORS_UP:
> >> +               break;
> >> +       default:
> >> +               return -EINVAL;
> >> +       }
> >> +
> >> +       kit->start = start;
> >> +       kit->pos = NULL;
> >> +       kit->order = order;
> >> +       return 0;
> >> +}
> >> +
> >> +__bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_next(struct bpf_iter_css *it)
> >> +{
> >> +       struct bpf_iter_css_kern *kit = (void *)it;
> >
> > empty line
> >
> >> +       if (!kit->start)
> >> +               return NULL;
> >> +
> >> +       switch (kit->order) {
> >> +       case BPF_CGROUP_ITER_DESCENDANTS_PRE:
> >> +               kit->pos = css_next_descendant_pre(kit->pos, kit->start);
> >> +               break;
> >> +       case BPF_CGROUP_ITER_DESCENDANTS_POST:
> >> +               kit->pos = css_next_descendant_post(kit->pos, kit->start);
> >> +               break;
> >> +       default:
> >
> > we know it's BPF_CGROUP_ITER_ANCESTORS_UP, so why not have that here explicitly?
> >
> >> +               kit->pos = kit->pos ? kit->pos->parent : kit->start;
> >> +       }
> >> +
> >> +       return kit->pos;
> >
> > wouldn't this implementation never return the "start" css? is that intentional?
> >
>
> Thanks for the review.
>
> This implementation actually would return the "start" css.
>
> 1. BPF_CGROUP_ITER_DESCENDANTS_PRE:
> 1.1 when we first call next(), css_next_descendant_pre(NULL, kit->start)
> will return kit->start.
> 1.2 second call next(), css_next_descendant_pre(kit->start, kit->start)
> would return a first valid child under kit->start with pre-order
> 1.3 third call next, css_next_descendant_pre(last_valid_child,
> kit->start) would return the next valid child
> ...
> util css_next_descendant_pre return a NULL pointer, which means we have
> visited all valid child including "start" css itself.
>
> The above logic is equal to macro 'css_for_each_descendant_pre' in kernel.
>
> Same, BPF_CGROUP_ITER_DESCENDANTS_POST is equal to macro
> 'css_for_each_descendant_post' which would return 'start' css when we
> have visited all valid child.
>
> 2. BPF_CGROUP_ITER_ANCESTORS_UP
> 2.1 when we fisrt call next(), kit->pos is NULL, and we would return
> kit->start.
>
>
> The selftest in patch7 whould check:
> 1. when we use BPF_CGROUP_ITER_DESCENDANTS_PRE to iterate a cgroup tree,
> the first cgroup we visted should be root('start') cgroup.
> 2. when we use BPF_CGROUP_ITER_DESCENDANTS_POST to iterate a cgroup
> tree, the last cgroup we visited should be root('start') cgroup.
>
>
> Am I miss something important?
>

No, again, my bad, I didn't trace the logic completely before asking.
All makes sense with kit->pos being initialized to NULL. Thanks for
elaborating!

>
> Thanks.
>
>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v3 3/7] bpf: Introduce task open coded iterator kfuncs
  2023-09-29 21:27       ` Andrii Nakryiko
@ 2023-10-01  8:21         ` Chuyi Zhou
  2023-10-03 22:05           ` Andrii Nakryiko
  0 siblings, 1 reply; 22+ messages in thread
From: Chuyi Zhou @ 2023-10-01  8:21 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, ast, daniel, andrii, martin.lau, tj, linux-kernel

Hello, Andrii

在 2023/9/30 05:27, Andrii Nakryiko 写道:
> On Wed, Sep 27, 2023 at 8:29 PM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>>
>> Hello,
>>
>> 在 2023/9/28 07:20, Andrii Nakryiko 写道:
>>> On Mon, Sep 25, 2023 at 3:56 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>>>>
>>>> This patch adds kfuncs bpf_iter_task_{new,next,destroy} which allow
>>>> creation and manipulation of struct bpf_iter_task in open-coded iterator
>>>> style. BPF programs can use these kfuncs or through bpf_for_each macro to
>>>> iterate all processes in the system.
>>>>
>>>> The API design keep consistent with SEC("iter/task"). bpf_iter_task_new()
>>>> accepts a specific task and iterating type which allows:
>>>> 1. iterating all process in the system
>>>>
>>>> 2. iterating all threads in the system
>>>>
>>>> 3. iterating all threads of a specific task
>>>> Here we also resuse enum bpf_iter_task_type and rename BPF_TASK_ITER_TID
>>>> to BPF_TASK_ITER_THREAD, rename BPF_TASK_ITER_TGID to BPF_TASK_ITER_PROC.
>>>>
>>>> The newly-added struct bpf_iter_task has a name collision with a selftest
>>>> for the seq_file task iter's bpf skel, so the selftests/bpf/progs file is
>>>> renamed in order to avoid the collision.
>>>>
>>>> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
>>>> ---
>>>>    include/linux/bpf.h                           |  8 +-
>>>>    kernel/bpf/helpers.c                          |  3 +
>>>>    kernel/bpf/task_iter.c                        | 96 ++++++++++++++++---
>>>>    .../testing/selftests/bpf/bpf_experimental.h  |  5 +
>>>>    .../selftests/bpf/prog_tests/bpf_iter.c       | 18 ++--
>>>>    .../{bpf_iter_task.c => bpf_iter_tasks.c}     |  0
>>>>    6 files changed, 106 insertions(+), 24 deletions(-)
>>>>    rename tools/testing/selftests/bpf/progs/{bpf_iter_task.c => bpf_iter_tasks.c} (100%)
>>>>
>>>


[...]

>>>> +get_next_task:
>>>> +       kit->pos = next_task(kit->pos);
>>>> +       kit->task = kit->pos;
>>>> +       if (kit->pos == &init_task)
>>>> +               kit->pos = NULL;
>>>
>>> I can't say I completely follow the logic (e.g., for
>>> BPF_TASK_ITER_PROC, why do we do next_task() on first next() call)?
>>> Can you elabore the expected behavior for various combinations of
>>> types and starting task argument?
>>>
>>
>> Thanks for the review.
>>
>> The expected behavior of current implementation is:
>>
>> BPF_TASK_ITER_PROC:
>>
>> init_task->first_process->second_process->...->last_process->init_task
>>
>> We would exit before visiting init_task again.
> 
> ah, ok, so in this case it's more like BPF_TASK_ITER_ALL_PROCS, i.e.,
> we iterate all processes in the system. Input `task` that we provide
> is ignored/meaningless, right? Maybe we should express it as
> ALL_PROCS?
> 
>>
>> BPF_TASK_ITER_THREAD:
>>
>> group_task->first_thread->second_thread->...->last_thread->group_task
>>
>> We would exit before visiting group_task again.
>>
> 
> And this one is iterating threads of a process specified by given
> `task`, right?   This is where my confusion comes from. ITER_PROC and
> ITER_THREAD, by their name, seems to be very similar, but in reality
> ITER_PROC is more like ITER_ALL (except process vs thread iteration),
> while ITER_THREAD is parameterized by input `task`.
> 
> I'm not sure what's the least confusing way to name and organize
> everything, but I think it's quite confusing right now, unfortunately.
> I wonder if you or someone else have a better suggestion on making
> this more straightforward?
> 

Maybe here we can introduce new enums and not reuse or rename 
BPF_TASK_ITER_TID/BPF_TASK_ITER_TGID?

{
BPF_TASK_ITER_ALL_PROC,
BPF_TASK_ITER_ALL_THREAD,
BPF_TASK_ITER_THREAD
}

BPF_TASK_ITER_TID/BPF_TASK_ITER_TGID are inner flags. Looking at the
example usage of SEC("iter/task"), unlike 
BPF_CGROUP_ITER_DESCENDANTS_PRE/BPF_CGROUP_ITER_DESCENDANTS_POST, we 
actually don't use BPF_TASK_ITER_TID/BPF_TASK_ITER_TGID directly. When 
using SEC("iter/task"), we just set pid/tid for struct 
bpf_iter_link_info. Exposing new enums to users for open coded 
task_iters will not confuse users.

Thanks.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v3 6/7] bpf: Let bpf_iter_task_new accept null task ptr
  2023-09-27 23:37   ` Andrii Nakryiko
@ 2023-10-01  8:30     ` Chuyi Zhou
  0 siblings, 0 replies; 22+ messages in thread
From: Chuyi Zhou @ 2023-10-01  8:30 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, ast, daniel, andrii, martin.lau, tj, linux-kernel

Hello,

在 2023/9/28 07:37, Andrii Nakryiko 写道:
> On Mon, Sep 25, 2023 at 3:56 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>>
>> When using task_iter to iterate all threads of a specific task, we enforce
>> that the user must pass a valid task pointer to ensure safety. However,
>> when iterating all threads/process in the system, BPF verifier still
>> require a valid ptr instead of "nullable" pointer, even though it's
>> pointless, which is a kind of surprising from usability standpoint. It
>> would be nice if we could let that kfunc accept a explicit null pointer
>> when we are using BPF_TASK_ITER_ALL/BPF_TASK_ITER_PROC and a valid pointer
>> when using BPF_TASK_ITER_THREAD.
>>
>> Given a trival kfunc:
>>          __bpf_kfunc void FN(struct TYPE_A *obj)
>>
>> BPF Prog would reject a nullptr for obj. The error info is:
>> "arg#x pointer type xx xx must point to scalar, or struct with scalar"
>> reported by get_kfunc_ptr_arg_type(). The reg->type is SCALAR_VALUE and
>> the btf type of ref_t is not scalar or scalar_struct which leads to the
>> rejection of get_kfunc_ptr_arg_type.
>>
>> This patch reuse the __opt annotation which was used to indicate that
>> the buffer associated with an __sz or __szk argument may be null:
>>          __bpf_kfunc void FN(struct TYPE_A *obj__opt)
>> Here __opt indicates obj can be optional, user can pass a explicit nullptr
>> or a normal TYPE_A pointer. In get_kfunc_ptr_arg_type(), we will detect
>> whether the current arg is optional and register is null, If so, return
>> a new kfunc_ptr_arg_type KF_ARG_PTR_TO_NULL and skip to the next arg in
>> check_kfunc_args().
>>
>> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
>> ---
>>   kernel/bpf/task_iter.c |  7 +++++--
>>   kernel/bpf/verifier.c  | 13 ++++++++++++-
>>   2 files changed, 17 insertions(+), 3 deletions(-)
>>
>> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
>> index 9bcd3f9922b1..7ac007f161cc 100644
>> --- a/kernel/bpf/task_iter.c
>> +++ b/kernel/bpf/task_iter.c
>> @@ -867,7 +867,7 @@ struct bpf_iter_task_kern {
>>          unsigned int type;
>>   } __attribute__((aligned(8)));
>>
>> -__bpf_kfunc int bpf_iter_task_new(struct bpf_iter_task *it, struct task_struct *task, unsigned int type)
>> +__bpf_kfunc int bpf_iter_task_new(struct bpf_iter_task *it, struct task_struct *task__opt, unsigned int type)
>>   {
>>          struct bpf_iter_task_kern *kit = (void *)it;
>>          BUILD_BUG_ON(sizeof(struct bpf_iter_task_kern) != sizeof(struct bpf_iter_task));
>> @@ -877,14 +877,17 @@ __bpf_kfunc int bpf_iter_task_new(struct bpf_iter_task *it, struct task_struct *
>>          switch (type) {
>>          case BPF_TASK_ITER_ALL:
>>          case BPF_TASK_ITER_PROC:
>> +               break;
>>          case BPF_TASK_ITER_THREAD:
>> +               if (!task__opt)
>> +                       return -EINVAL;
>>                  break;
>>          default:
>>                  return -EINVAL;
>>          }
>>
>>          if (type == BPF_TASK_ITER_THREAD)
>> -               kit->task = task;
>> +               kit->task = task__opt;
>>          else
>>                  kit->task = &init_task;
>>          kit->pos = kit->task;
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index a065e18a0b3a..a79204c75a90 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -10331,6 +10331,7 @@ enum kfunc_ptr_arg_type {
>>          KF_ARG_PTR_TO_CALLBACK,
>>          KF_ARG_PTR_TO_RB_ROOT,
>>          KF_ARG_PTR_TO_RB_NODE,
>> +       KF_ARG_PTR_TO_NULL,
>>   };
>>
>>   enum special_kfunc_type {
>> @@ -10425,6 +10426,12 @@ static bool is_kfunc_bpf_rcu_read_unlock(struct bpf_kfunc_call_arg_meta *meta)
>>          return meta->func_id == special_kfunc_list[KF_bpf_rcu_read_unlock];
>>   }
>>
>> +static inline bool is_kfunc_arg_optional_null(struct bpf_reg_state *reg,
>> +                               const struct btf *btf, const struct btf_param *arg)
>> +{
>> +       return register_is_null(reg) && is_kfunc_arg_optional(btf, arg);
>> +}
>> +
>>   static enum kfunc_ptr_arg_type
>>   get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
>>                         struct bpf_kfunc_call_arg_meta *meta,
>> @@ -10497,6 +10504,8 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
>>           */
>>          if (!btf_type_is_scalar(ref_t) && !__btf_type_is_scalar_struct(env, meta->btf, ref_t, 0) &&
>>              (arg_mem_size ? !btf_type_is_void(ref_t) : 1)) {
>> +                       if (is_kfunc_arg_optional_null(reg, meta->btf, &args[argno]))
>> +                               return KF_ARG_PTR_TO_NULL;
> 
> This nested check seems misplaced. Maybe we shouldn't reuse __opt
> suffix which already has a different meaning (coupled with __sz). Why
> not add "__nullable" convention and just check it separately?
> 

IIUC, do you mean:

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index dbba2b806017..05d197365fcb 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -10458,6 +10458,8 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
         if (is_kfunc_arg_callback(env, meta->btf, &args[argno]))
                 return KF_ARG_PTR_TO_CALLBACK;

+       if (is_kfunc_arg_nullable(meta->btf, &args[argno]) && 
register_is_null(reg))
+               return KF_ARG_PTR_TO_NULL;

         if (argno + 1 < nargs &&
             (is_kfunc_arg_mem_size(meta->btf, &args[argno + 1], 
&regs[regno + 1]) ||


OK, I would change in next version.

Thanks.


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v3 3/7] bpf: Introduce task open coded iterator kfuncs
  2023-10-01  8:21         ` Chuyi Zhou
@ 2023-10-03 22:05           ` Andrii Nakryiko
  0 siblings, 0 replies; 22+ messages in thread
From: Andrii Nakryiko @ 2023-10-03 22:05 UTC (permalink / raw)
  To: Chuyi Zhou; +Cc: bpf, ast, daniel, andrii, martin.lau, tj, linux-kernel

On Sun, Oct 1, 2023 at 1:21 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>
> Hello, Andrii
>
> 在 2023/9/30 05:27, Andrii Nakryiko 写道:
> > On Wed, Sep 27, 2023 at 8:29 PM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
> >>
> >> Hello,
> >>
> >> 在 2023/9/28 07:20, Andrii Nakryiko 写道:
> >>> On Mon, Sep 25, 2023 at 3:56 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
> >>>>
> >>>> This patch adds kfuncs bpf_iter_task_{new,next,destroy} which allow
> >>>> creation and manipulation of struct bpf_iter_task in open-coded iterator
> >>>> style. BPF programs can use these kfuncs or through bpf_for_each macro to
> >>>> iterate all processes in the system.
> >>>>
> >>>> The API design keep consistent with SEC("iter/task"). bpf_iter_task_new()
> >>>> accepts a specific task and iterating type which allows:
> >>>> 1. iterating all process in the system
> >>>>
> >>>> 2. iterating all threads in the system
> >>>>
> >>>> 3. iterating all threads of a specific task
> >>>> Here we also resuse enum bpf_iter_task_type and rename BPF_TASK_ITER_TID
> >>>> to BPF_TASK_ITER_THREAD, rename BPF_TASK_ITER_TGID to BPF_TASK_ITER_PROC.
> >>>>
> >>>> The newly-added struct bpf_iter_task has a name collision with a selftest
> >>>> for the seq_file task iter's bpf skel, so the selftests/bpf/progs file is
> >>>> renamed in order to avoid the collision.
> >>>>
> >>>> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
> >>>> ---
> >>>>    include/linux/bpf.h                           |  8 +-
> >>>>    kernel/bpf/helpers.c                          |  3 +
> >>>>    kernel/bpf/task_iter.c                        | 96 ++++++++++++++++---
> >>>>    .../testing/selftests/bpf/bpf_experimental.h  |  5 +
> >>>>    .../selftests/bpf/prog_tests/bpf_iter.c       | 18 ++--
> >>>>    .../{bpf_iter_task.c => bpf_iter_tasks.c}     |  0
> >>>>    6 files changed, 106 insertions(+), 24 deletions(-)
> >>>>    rename tools/testing/selftests/bpf/progs/{bpf_iter_task.c => bpf_iter_tasks.c} (100%)
> >>>>
> >>>
>
>
> [...]
>
> >>>> +get_next_task:
> >>>> +       kit->pos = next_task(kit->pos);
> >>>> +       kit->task = kit->pos;
> >>>> +       if (kit->pos == &init_task)
> >>>> +               kit->pos = NULL;
> >>>
> >>> I can't say I completely follow the logic (e.g., for
> >>> BPF_TASK_ITER_PROC, why do we do next_task() on first next() call)?
> >>> Can you elabore the expected behavior for various combinations of
> >>> types and starting task argument?
> >>>
> >>
> >> Thanks for the review.
> >>
> >> The expected behavior of current implementation is:
> >>
> >> BPF_TASK_ITER_PROC:
> >>
> >> init_task->first_process->second_process->...->last_process->init_task
> >>
> >> We would exit before visiting init_task again.
> >
> > ah, ok, so in this case it's more like BPF_TASK_ITER_ALL_PROCS, i.e.,
> > we iterate all processes in the system. Input `task` that we provide
> > is ignored/meaningless, right? Maybe we should express it as
> > ALL_PROCS?
> >
> >>
> >> BPF_TASK_ITER_THREAD:
> >>
> >> group_task->first_thread->second_thread->...->last_thread->group_task
> >>
> >> We would exit before visiting group_task again.
> >>
> >
> > And this one is iterating threads of a process specified by given
> > `task`, right?   This is where my confusion comes from. ITER_PROC and
> > ITER_THREAD, by their name, seems to be very similar, but in reality
> > ITER_PROC is more like ITER_ALL (except process vs thread iteration),
> > while ITER_THREAD is parameterized by input `task`.
> >
> > I'm not sure what's the least confusing way to name and organize
> > everything, but I think it's quite confusing right now, unfortunately.
> > I wonder if you or someone else have a better suggestion on making
> > this more straightforward?
> >
>
> Maybe here we can introduce new enums and not reuse or rename
> BPF_TASK_ITER_TID/BPF_TASK_ITER_TGID?

Yep, probably it's cleaner

>
> {
> BPF_TASK_ITER_ALL_PROC,

BPF_TASK_ITER_ALL_PROCS

> BPF_TASK_ITER_ALL_THREAD,

BPF_TASK_ITER_ALL_THREADS

> BPF_TASK_ITER_THREAD

BPF_TASK_ITER_PROC_THREADS ?

> }
>
> BPF_TASK_ITER_TID/BPF_TASK_ITER_TGID are inner flags. Looking at the
> example usage of SEC("iter/task"), unlike
> BPF_CGROUP_ITER_DESCENDANTS_PRE/BPF_CGROUP_ITER_DESCENDANTS_POST, we
> actually don't use BPF_TASK_ITER_TID/BPF_TASK_ITER_TGID directly. When
> using SEC("iter/task"), we just set pid/tid for struct
> bpf_iter_link_info. Exposing new enums to users for open coded
> task_iters will not confuse users.
>
> Thanks.
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2023-10-03 22:06 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-25 10:55 [PATCH bpf-next v3 0/7] Add Open-coded task, css_task and css iters Chuyi Zhou
2023-09-25 10:55 ` [PATCH bpf-next v3 1/7] cgroup: Prepare for using css_task_iter_*() in BPF Chuyi Zhou
2023-09-25 10:55 ` [PATCH bpf-next v3 2/7] bpf: Introduce css_task open-coded iterator kfuncs Chuyi Zhou
2023-09-25 10:55 ` [PATCH bpf-next v3 3/7] bpf: Introduce task open coded " Chuyi Zhou
2023-09-27 23:20   ` Andrii Nakryiko
2023-09-28  3:29     ` Chuyi Zhou
2023-09-29 21:27       ` Andrii Nakryiko
2023-10-01  8:21         ` Chuyi Zhou
2023-10-03 22:05           ` Andrii Nakryiko
2023-09-25 10:55 ` [PATCH bpf-next v3 4/7] bpf: Introduce css open-coded " Chuyi Zhou
2023-09-27 23:24   ` Andrii Nakryiko
2023-09-28  2:51     ` Chuyi Zhou
2023-09-29 21:29       ` Andrii Nakryiko
2023-09-25 10:55 ` [PATCH bpf-next v3 5/7] bpf: teach the verifier to enforce css_iter and task_iter in RCU CS Chuyi Zhou
2023-09-27 10:00   ` Yafang Shao
2023-09-27 10:16     ` Chuyi Zhou
2023-09-27 23:29   ` Andrii Nakryiko
2023-09-25 10:55 ` [PATCH bpf-next v3 6/7] bpf: Let bpf_iter_task_new accept null task ptr Chuyi Zhou
2023-09-27 23:37   ` Andrii Nakryiko
2023-10-01  8:30     ` Chuyi Zhou
2023-09-25 10:55 ` [PATCH bpf-next v3 7/7] selftests/bpf: Add tests for open-coded task and css iter Chuyi Zhou
2023-09-25 18:48 ` [PATCH bpf-next v3 0/7] Add Open-coded task, css_task and css iters Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).