bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v2 0/6] Add Open-coded process and css iters
@ 2023-09-12  7:01 Chuyi Zhou
  2023-09-12  7:01 ` [PATCH bpf-next v2 1/6] cgroup: Prepare for using css_task_iter_*() in BPF Chuyi Zhou
                   ` (6 more replies)
  0 siblings, 7 replies; 33+ messages in thread
From: Chuyi Zhou @ 2023-09-12  7:01 UTC (permalink / raw)
  To: bpf; +Cc: ast, daniel, andrii, martin.lau, tj, linux-kernel, Chuyi Zhou

Hi,

This is version 2 of process and css iters support. All the changes were
suggested by Alexei.

Thanks for your review!

--- Changelog ---
Changes from v1:
- Add a pre-patch to make some preparations before supporting css_task
  iters.
- Add an allowlist for css_task iters
- Let bpf progs do explicit bpf_rcu_read_lock() when using process iters
and css_descendant iters.
---------------------

In some BPF usage scenarios, it will be useful to iterate the process and
css directly in the BPF program. One of the expected scenarios is
customizable OOM victim selection via BPF[1].

Inspired by Dave's task_vma iter[2], this patchset adds three types of
open-coded iterator kfuncs:

1. bpf_for_each(process, p). Just like for_each_process(p) in kernel to
itearing all tasks in the system.

2. bpf_for_each(css_task, task, css). It works like
css_task_iter_{start, next, end} and would be used to iterating
tasks/threads under a css.

3. bpf_for_each(css_{post, pre}, pos, root_css). It works like
css_next_descendant_{pre, post} to iterating all descendant css.

BPF programs can use these kfuncs directly or through bpf_for_each macro.

link[1]: https://lore.kernel.org/lkml/20230810081319.65668-1-zhouchuyi@bytedance.com/
link[2]: https://lore.kernel.org/all/20230810183513.684836-1-davemarchevsky@fb.com/

Chuyi Zhou (6):
  cgroup: Prepare for using css_task_iter_*() in BPF
  bpf: Introduce css_task open-coded iterator kfuncs
  bpf: Introduce process open coded iterator kfuncs
  bpf: Introduce css_descendant open-coded iterator kfuncs
  bpf: teach the verifier to enforce css_iter and process_iter in RCU CS
  selftests/bpf: Add tests for open-coded task and css iter

 include/linux/cgroup.h                        |  12 +-
 include/uapi/linux/bpf.h                      |  16 ++
 kernel/bpf/helpers.c                          |  12 ++
 kernel/bpf/task_iter.c                        | 130 +++++++++++++++++
 kernel/bpf/verifier.c                         |  53 ++++++-
 kernel/cgroup/cgroup.c                        |  18 ++-
 tools/include/uapi/linux/bpf.h                |  16 ++
 tools/lib/bpf/bpf_helpers.h                   |  24 +++
 .../testing/selftests/bpf/prog_tests/iters.c  | 138 ++++++++++++++++++
 .../testing/selftests/bpf/progs/iters_task.c  | 104 +++++++++++++
 10 files changed, 508 insertions(+), 15 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/iters_task.c

-- 
2.20.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH bpf-next v2 1/6] cgroup: Prepare for using css_task_iter_*() in BPF
  2023-09-12  7:01 [PATCH bpf-next v2 0/6] Add Open-coded process and css iters Chuyi Zhou
@ 2023-09-12  7:01 ` Chuyi Zhou
  2023-09-18 20:42   ` Tejun Heo
  2023-09-12  7:01 ` [PATCH bpf-next v2 2/6] bpf: Introduce css_task open-coded iterator kfuncs Chuyi Zhou
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 33+ messages in thread
From: Chuyi Zhou @ 2023-09-12  7:01 UTC (permalink / raw)
  To: bpf; +Cc: ast, daniel, andrii, martin.lau, tj, linux-kernel, Chuyi Zhou

This patch makes some preparations for using css_task_iter_*() in BPF
Program.

1. Flags CSS_TASK_ITER_* are #define-s and it's not easy for bpf prog to
use them. Convert them to enum so bpf prog can take them from vmlinux.h.

2. In the next patch we will add css_task_iter_*() in common kfuncs which
is not safe. Since css_task_iter_*() does spin_unlock_irq() which might
screw up irq flags depending on the context where bpf prog is running.
So we should use irqsave/irqrestore here and the switching is harmless.

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
---
 include/linux/cgroup.h | 12 +++++-------
 kernel/cgroup/cgroup.c | 18 ++++++++++++------
 2 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index b307013b9c6c..0ef0af66080e 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -40,13 +40,11 @@ struct kernel_clone_args;
 #define CGROUP_WEIGHT_DFL		100
 #define CGROUP_WEIGHT_MAX		10000
 
-/* walk only threadgroup leaders */
-#define CSS_TASK_ITER_PROCS		(1U << 0)
-/* walk all threaded css_sets in the domain */
-#define CSS_TASK_ITER_THREADED		(1U << 1)
-
-/* internal flags */
-#define CSS_TASK_ITER_SKIPPED		(1U << 16)
+enum {
+	CSS_TASK_ITER_PROCS    = (1U << 0),  /* walk only threadgroup leaders */
+	CSS_TASK_ITER_THREADED = (1U << 1),  /* walk all threaded css_sets in the domain */
+	CSS_TASK_ITER_SKIPPED  = (1U << 16), /* internal flags */
+};
 
 /* a css_task_iter should be treated as an opaque object */
 struct css_task_iter {
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 1fb7f562289d..b6d64f3b8888 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -4917,9 +4917,11 @@ static void css_task_iter_advance(struct css_task_iter *it)
 void css_task_iter_start(struct cgroup_subsys_state *css, unsigned int flags,
 			 struct css_task_iter *it)
 {
+	unsigned long irqflags;
+
 	memset(it, 0, sizeof(*it));
 
-	spin_lock_irq(&css_set_lock);
+	spin_lock_irqsave(&css_set_lock, irqflags);
 
 	it->ss = css->ss;
 	it->flags = flags;
@@ -4933,7 +4935,7 @@ void css_task_iter_start(struct cgroup_subsys_state *css, unsigned int flags,
 
 	css_task_iter_advance(it);
 
-	spin_unlock_irq(&css_set_lock);
+	spin_unlock_irqrestore(&css_set_lock, irqflags);
 }
 
 /**
@@ -4946,12 +4948,14 @@ void css_task_iter_start(struct cgroup_subsys_state *css, unsigned int flags,
  */
 struct task_struct *css_task_iter_next(struct css_task_iter *it)
 {
+	unsigned long irqflags;
+
 	if (it->cur_task) {
 		put_task_struct(it->cur_task);
 		it->cur_task = NULL;
 	}
 
-	spin_lock_irq(&css_set_lock);
+	spin_lock_irqsave(&css_set_lock, irqflags);
 
 	/* @it may be half-advanced by skips, finish advancing */
 	if (it->flags & CSS_TASK_ITER_SKIPPED)
@@ -4964,7 +4968,7 @@ struct task_struct *css_task_iter_next(struct css_task_iter *it)
 		css_task_iter_advance(it);
 	}
 
-	spin_unlock_irq(&css_set_lock);
+	spin_unlock_irqrestore(&css_set_lock, irqflags);
 
 	return it->cur_task;
 }
@@ -4977,11 +4981,13 @@ struct task_struct *css_task_iter_next(struct css_task_iter *it)
  */
 void css_task_iter_end(struct css_task_iter *it)
 {
+	unsigned long irqflags;
+
 	if (it->cur_cset) {
-		spin_lock_irq(&css_set_lock);
+		spin_lock_irqsave(&css_set_lock, irqflags);
 		list_del(&it->iters_node);
 		put_css_set_locked(it->cur_cset);
-		spin_unlock_irq(&css_set_lock);
+		spin_unlock_irqrestore(&css_set_lock, irqflags);
 	}
 
 	if (it->cur_dcset)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH bpf-next v2 2/6] bpf: Introduce css_task open-coded iterator kfuncs
  2023-09-12  7:01 [PATCH bpf-next v2 0/6] Add Open-coded process and css iters Chuyi Zhou
  2023-09-12  7:01 ` [PATCH bpf-next v2 1/6] cgroup: Prepare for using css_task_iter_*() in BPF Chuyi Zhou
@ 2023-09-12  7:01 ` Chuyi Zhou
  2023-09-12 17:13   ` Alexei Starovoitov
                     ` (2 more replies)
  2023-09-12  7:01 ` [PATCH bpf-next v2 3/6] bpf: Introduce process open coded " Chuyi Zhou
                   ` (4 subsequent siblings)
  6 siblings, 3 replies; 33+ messages in thread
From: Chuyi Zhou @ 2023-09-12  7:01 UTC (permalink / raw)
  To: bpf; +Cc: ast, daniel, andrii, martin.lau, tj, linux-kernel, Chuyi Zhou

This patch adds kfuncs bpf_iter_css_task_{new,next,destroy} which allow
creation and manipulation of struct bpf_iter_css_task in open-coded
iterator style. These kfuncs actually wrapps css_task_iter_{start,next,
end}. BPF programs can use these kfuncs through bpf_for_each macro for
iteration of all tasks under a css.

css_task_iter_*() would try to get the global spin-lock *css_set_lock*, so
the bpf side has to be careful in where it allows to use this iter.
Currently we only allow it in bpf_lsm and bpf iter-s.

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
---
 include/uapi/linux/bpf.h       |  4 +++
 kernel/bpf/helpers.c           |  3 +++
 kernel/bpf/task_iter.c         | 48 ++++++++++++++++++++++++++++++++++
 kernel/bpf/verifier.c          | 23 ++++++++++++++++
 tools/include/uapi/linux/bpf.h |  4 +++
 tools/lib/bpf/bpf_helpers.h    |  7 +++++
 6 files changed, 89 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 73b155e52204..de02c0971428 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -7318,4 +7318,8 @@ struct bpf_iter_num {
 	__u64 __opaque[1];
 } __attribute__((aligned(8)));
 
+struct bpf_iter_css_task {
+	__u64 __opaque[1];
+} __attribute__((aligned(8)));
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index b0a9834f1051..d6a16becfbb9 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2504,6 +2504,9 @@ BTF_ID_FLAGS(func, bpf_dynptr_slice_rdwr, KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_iter_num_new, KF_ITER_NEW)
 BTF_ID_FLAGS(func, bpf_iter_num_next, KF_ITER_NEXT | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY)
+BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW)
+BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
 BTF_ID_FLAGS(func, bpf_dynptr_adjust)
 BTF_ID_FLAGS(func, bpf_dynptr_is_null)
 BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index 7473068ed313..d8539cc05ffd 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -803,6 +803,54 @@ const struct bpf_func_proto bpf_find_vma_proto = {
 	.arg5_type	= ARG_ANYTHING,
 };
 
+struct bpf_iter_css_task_kern {
+	struct css_task_iter *css_it;
+} __attribute__((aligned(8)));
+
+__bpf_kfunc int bpf_iter_css_task_new(struct bpf_iter_css_task *it,
+		struct cgroup_subsys_state *css, unsigned int flags)
+{
+	struct bpf_iter_css_task_kern *kit = (void *)it;
+
+	BUILD_BUG_ON(sizeof(struct bpf_iter_css_task_kern) != sizeof(struct bpf_iter_css_task));
+	BUILD_BUG_ON(__alignof__(struct bpf_iter_css_task_kern) !=
+					__alignof__(struct bpf_iter_css_task));
+
+	switch (flags) {
+	case CSS_TASK_ITER_PROCS | CSS_TASK_ITER_THREADED:
+	case CSS_TASK_ITER_PROCS:
+	case 0:
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	kit->css_it = kzalloc(sizeof(struct css_task_iter), GFP_KERNEL);
+	if (!kit->css_it)
+		return -ENOMEM;
+	css_task_iter_start(css, flags, kit->css_it);
+	return 0;
+}
+
+__bpf_kfunc struct task_struct *bpf_iter_css_task_next(struct bpf_iter_css_task *it)
+{
+	struct bpf_iter_css_task_kern *kit = (void *)it;
+
+	if (!kit->css_it)
+		return NULL;
+	return css_task_iter_next(kit->css_it);
+}
+
+__bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it)
+{
+	struct bpf_iter_css_task_kern *kit = (void *)it;
+
+	if (!kit->css_it)
+		return;
+	css_task_iter_end(kit->css_it);
+	kfree(kit->css_it);
+}
+
 DEFINE_PER_CPU(struct mmap_unlock_irq_work, mmap_unlock_work);
 
 static void do_mmap_read_unlock(struct irq_work *entry)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index dbba2b806017..2367483bf4c2 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -10332,6 +10332,7 @@ enum special_kfunc_type {
 	KF_bpf_dynptr_clone,
 	KF_bpf_percpu_obj_new_impl,
 	KF_bpf_percpu_obj_drop_impl,
+	KF_bpf_iter_css_task_new,
 };
 
 BTF_SET_START(special_kfunc_set)
@@ -10354,6 +10355,7 @@ BTF_ID(func, bpf_dynptr_slice_rdwr)
 BTF_ID(func, bpf_dynptr_clone)
 BTF_ID(func, bpf_percpu_obj_new_impl)
 BTF_ID(func, bpf_percpu_obj_drop_impl)
+BTF_ID(func, bpf_iter_css_task_new)
 BTF_SET_END(special_kfunc_set)
 
 BTF_ID_LIST(special_kfunc_list)
@@ -10378,6 +10380,7 @@ BTF_ID(func, bpf_dynptr_slice_rdwr)
 BTF_ID(func, bpf_dynptr_clone)
 BTF_ID(func, bpf_percpu_obj_new_impl)
 BTF_ID(func, bpf_percpu_obj_drop_impl)
+BTF_ID(func, bpf_iter_css_task_new)
 
 static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta)
 {
@@ -10902,6 +10905,20 @@ static int process_kf_arg_ptr_to_rbtree_node(struct bpf_verifier_env *env,
 						  &meta->arg_rbtree_root.field);
 }
 
+static bool check_css_task_iter_allowlist(struct bpf_verifier_env *env)
+{
+	enum bpf_prog_type prog_type = resolve_prog_type(env->prog);
+
+	switch (prog_type) {
+	case BPF_PROG_TYPE_LSM:
+		return true;
+	case BPF_TRACE_ITER:
+		return env->prog->aux->sleepable;
+	default:
+		return false;
+	}
+}
+
 static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta,
 			    int insn_idx)
 {
@@ -11152,6 +11169,12 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 			break;
 		}
 		case KF_ARG_PTR_TO_ITER:
+			if (meta->func_id == special_kfunc_list[KF_bpf_iter_css_task_new]) {
+				if (!check_css_task_iter_allowlist(env)) {
+					verbose(env, "css_task_iter is only allowed in bpf_lsm and bpf iter-s\n");
+					return -EINVAL;
+				}
+			}
 			ret = process_iter_arg(env, regno, insn_idx, meta);
 			if (ret < 0)
 				return ret;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 73b155e52204..de02c0971428 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -7318,4 +7318,8 @@ struct bpf_iter_num {
 	__u64 __opaque[1];
 } __attribute__((aligned(8)));
 
+struct bpf_iter_css_task {
+	__u64 __opaque[1];
+} __attribute__((aligned(8)));
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h
index 77ceea575dc7..f48723c6c593 100644
--- a/tools/lib/bpf/bpf_helpers.h
+++ b/tools/lib/bpf/bpf_helpers.h
@@ -303,6 +303,13 @@ extern int bpf_iter_num_new(struct bpf_iter_num *it, int start, int end) __weak
 extern int *bpf_iter_num_next(struct bpf_iter_num *it) __weak __ksym;
 extern void bpf_iter_num_destroy(struct bpf_iter_num *it) __weak __ksym;
 
+struct bpf_iter_css_task;
+struct cgroup_subsys_state;
+extern int bpf_iter_css_task_new(struct bpf_iter_css_task *it,
+		struct cgroup_subsys_state *css, unsigned int flags) __weak __ksym;
+extern struct task_struct *bpf_iter_css_task_next(struct bpf_iter_css_task *it) __weak __ksym;
+extern void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it) __weak __ksym;
+
 #ifndef bpf_for_each
 /* bpf_for_each(iter_type, cur_elem, args...) provides generic construct for
  * using BPF open-coded iterators without having to write mundane explicit
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH bpf-next v2 3/6] bpf: Introduce process open coded iterator kfuncs
  2023-09-12  7:01 [PATCH bpf-next v2 0/6] Add Open-coded process and css iters Chuyi Zhou
  2023-09-12  7:01 ` [PATCH bpf-next v2 1/6] cgroup: Prepare for using css_task_iter_*() in BPF Chuyi Zhou
  2023-09-12  7:01 ` [PATCH bpf-next v2 2/6] bpf: Introduce css_task open-coded iterator kfuncs Chuyi Zhou
@ 2023-09-12  7:01 ` Chuyi Zhou
  2023-09-14 23:26   ` Andrii Nakryiko
  2023-09-12  7:01 ` [PATCH bpf-next v2 4/6] bpf: Introduce css_descendant open-coded " Chuyi Zhou
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 33+ messages in thread
From: Chuyi Zhou @ 2023-09-12  7:01 UTC (permalink / raw)
  To: bpf; +Cc: ast, daniel, andrii, martin.lau, tj, linux-kernel, Chuyi Zhou

This patch adds kfuncs bpf_iter_process_{new,next,destroy} which allow
creation and manipulation of struct bpf_iter_process in open-coded iterator
style. BPF programs can use these kfuncs or through bpf_for_each macro to
iterate all processes in the system.

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
---
 include/uapi/linux/bpf.h       |  4 ++++
 kernel/bpf/helpers.c           |  3 +++
 kernel/bpf/task_iter.c         | 29 +++++++++++++++++++++++++++++
 tools/include/uapi/linux/bpf.h |  4 ++++
 tools/lib/bpf/bpf_helpers.h    |  5 +++++
 5 files changed, 45 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index de02c0971428..befa55b52e29 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -7322,4 +7322,8 @@ struct bpf_iter_css_task {
 	__u64 __opaque[1];
 } __attribute__((aligned(8)));
 
+struct bpf_iter_process {
+	__u64 __opaque[1];
+} __attribute__((aligned(8)));
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index d6a16becfbb9..9b7d2c6f99d1 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2507,6 +2507,9 @@ BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY)
 BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW)
 BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
+BTF_ID_FLAGS(func, bpf_iter_process_new, KF_ITER_NEW)
+BTF_ID_FLAGS(func, bpf_iter_process_next, KF_ITER_NEXT | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_iter_process_destroy, KF_ITER_DESTROY)
 BTF_ID_FLAGS(func, bpf_dynptr_adjust)
 BTF_ID_FLAGS(func, bpf_dynptr_is_null)
 BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index d8539cc05ffd..9d1927dc3a06 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -851,6 +851,35 @@ __bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it)
 	kfree(kit->css_it);
 }
 
+struct bpf_iter_process_kern {
+	struct task_struct *tsk;
+} __attribute__((aligned(8)));
+
+__bpf_kfunc int bpf_iter_process_new(struct bpf_iter_process *it)
+{
+	struct bpf_iter_process_kern *kit = (void *)it;
+
+	BUILD_BUG_ON(sizeof(struct bpf_iter_process_kern) != sizeof(struct bpf_iter_process));
+	BUILD_BUG_ON(__alignof__(struct bpf_iter_process_kern) !=
+					__alignof__(struct bpf_iter_process));
+
+	kit->tsk = &init_task;
+	return 0;
+}
+
+__bpf_kfunc struct task_struct *bpf_iter_process_next(struct bpf_iter_process *it)
+{
+	struct bpf_iter_process_kern *kit = (void *)it;
+
+	kit->tsk = next_task(kit->tsk);
+
+	return kit->tsk == &init_task ? NULL : kit->tsk;
+}
+
+__bpf_kfunc void bpf_iter_process_destroy(struct bpf_iter_process *it)
+{
+}
+
 DEFINE_PER_CPU(struct mmap_unlock_irq_work, mmap_unlock_work);
 
 static void do_mmap_read_unlock(struct irq_work *entry)
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index de02c0971428..befa55b52e29 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -7322,4 +7322,8 @@ struct bpf_iter_css_task {
 	__u64 __opaque[1];
 } __attribute__((aligned(8)));
 
+struct bpf_iter_process {
+	__u64 __opaque[1];
+} __attribute__((aligned(8)));
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h
index f48723c6c593..858252c2641c 100644
--- a/tools/lib/bpf/bpf_helpers.h
+++ b/tools/lib/bpf/bpf_helpers.h
@@ -310,6 +310,11 @@ extern int bpf_iter_css_task_new(struct bpf_iter_css_task *it,
 extern struct task_struct *bpf_iter_css_task_next(struct bpf_iter_css_task *it) __weak __ksym;
 extern void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it) __weak __ksym;
 
+struct bpf_iter_process;
+extern int bpf_iter_process_new(struct bpf_iter_process *it) __weak __ksym;
+extern struct task_struct *bpf_iter_process_next(struct bpf_iter_process *it) __weak __ksym;
+extern void bpf_iter_process_destroy(struct bpf_iter_process *it) __weak __ksym;
+
 #ifndef bpf_for_each
 /* bpf_for_each(iter_type, cur_elem, args...) provides generic construct for
  * using BPF open-coded iterators without having to write mundane explicit
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH bpf-next v2 4/6] bpf: Introduce css_descendant open-coded iterator kfuncs
  2023-09-12  7:01 [PATCH bpf-next v2 0/6] Add Open-coded process and css iters Chuyi Zhou
                   ` (2 preceding siblings ...)
  2023-09-12  7:01 ` [PATCH bpf-next v2 3/6] bpf: Introduce process open coded " Chuyi Zhou
@ 2023-09-12  7:01 ` Chuyi Zhou
  2023-09-13  7:25   ` kernel test robot
                     ` (3 more replies)
  2023-09-12  7:01 ` [PATCH bpf-next v2 5/6] bpf: teach the verifier to enforce css_iter and process_iter in RCU CS Chuyi Zhou
                   ` (2 subsequent siblings)
  6 siblings, 4 replies; 33+ messages in thread
From: Chuyi Zhou @ 2023-09-12  7:01 UTC (permalink / raw)
  To: bpf; +Cc: ast, daniel, andrii, martin.lau, tj, linux-kernel, Chuyi Zhou

This Patch adds kfuncs bpf_iter_css_{pre,post}_{new,next,destroy} which
allow creation and manipulation of struct bpf_iter_css in open-coded
iterator style. These kfuncs actually wrapps css_next_descendant_{pre,
post}. BPF programs can use these kfuncs through bpf_for_each macro for
iteration of all descendant css under a root css.

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
---
 include/uapi/linux/bpf.h       |  8 +++++
 kernel/bpf/helpers.c           |  6 ++++
 kernel/bpf/task_iter.c         | 53 ++++++++++++++++++++++++++++++++++
 tools/include/uapi/linux/bpf.h |  8 +++++
 tools/lib/bpf/bpf_helpers.h    | 12 ++++++++
 5 files changed, 87 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index befa55b52e29..57760afc13d0 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -7326,4 +7326,12 @@ struct bpf_iter_process {
 	__u64 __opaque[1];
 } __attribute__((aligned(8)));
 
+struct bpf_iter_css_pre {
+	__u64 __opaque[2];
+} __attribute__((aligned(8)));
+
+struct bpf_iter_css_post {
+	__u64 __opaque[2];
+} __attribute__((aligned(8)));
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 9b7d2c6f99d1..ca1f6404af9e 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2510,6 +2510,12 @@ BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
 BTF_ID_FLAGS(func, bpf_iter_process_new, KF_ITER_NEW)
 BTF_ID_FLAGS(func, bpf_iter_process_next, KF_ITER_NEXT | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_iter_process_destroy, KF_ITER_DESTROY)
+BTF_ID_FLAGS(func, bpf_iter_css_pre_new, KF_ITER_NEW)
+BTF_ID_FLAGS(func, bpf_iter_css_pre_next, KF_ITER_NEXT | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_iter_css_pre_destroy, KF_ITER_DESTROY)
+BTF_ID_FLAGS(func, bpf_iter_css_post_new, KF_ITER_NEW)
+BTF_ID_FLAGS(func, bpf_iter_css_post_next, KF_ITER_NEXT | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_iter_css_post_destroy, KF_ITER_DESTROY)
 BTF_ID_FLAGS(func, bpf_dynptr_adjust)
 BTF_ID_FLAGS(func, bpf_dynptr_is_null)
 BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index 9d1927dc3a06..8963fc779b87 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -880,6 +880,59 @@ __bpf_kfunc void bpf_iter_process_destroy(struct bpf_iter_process *it)
 {
 }
 
+struct bpf_iter_css_kern {
+	struct cgroup_subsys_state *root;
+	struct cgroup_subsys_state *pos;
+} __attribute__((aligned(8)));
+
+__bpf_kfunc int bpf_iter_css_pre_new(struct bpf_iter_css_pre *it,
+		struct cgroup_subsys_state *root)
+{
+	struct bpf_iter_css_kern *kit = (void *)it;
+
+	BUILD_BUG_ON(sizeof(struct bpf_iter_css_kern) != sizeof(struct bpf_iter_css_pre));
+	BUILD_BUG_ON(__alignof__(struct bpf_iter_css_kern) != __alignof__(struct bpf_iter_css_pre));
+	kit->root = root;
+	kit->pos = NULL;
+	return 0;
+}
+
+__bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_pre_next(struct bpf_iter_css_pre *it)
+{
+	struct bpf_iter_css_kern *kit = (void *)it;
+
+	kit->pos = css_next_descendant_pre(kit->pos, kit->root);
+	return kit->pos;
+}
+
+__bpf_kfunc void bpf_iter_css_pre_destroy(struct bpf_iter_css_pre *it)
+{
+}
+
+__bpf_kfunc int bpf_iter_css_post_new(struct bpf_iter_css_post *it,
+		struct cgroup_subsys_state *root)
+{
+	struct bpf_iter_css_kern *kit = (void *)it;
+
+	BUILD_BUG_ON(sizeof(struct bpf_iter_css_kern) != sizeof(struct bpf_iter_css_post));
+	BUILD_BUG_ON(__alignof__(struct bpf_iter_css_kern) != __alignof__(struct bpf_iter_css_post));
+	kit->root = root;
+	kit->pos = NULL;
+	return 0;
+}
+
+__bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_post_next(struct bpf_iter_css_post *it)
+{
+	struct bpf_iter_css_kern *kit = (void *)it;
+
+	kit->pos = css_next_descendant_post(kit->pos, kit->root);
+	return kit->pos;
+}
+
+__bpf_kfunc void bpf_iter_css_post_destroy(struct bpf_iter_css_post *it)
+{
+}
+
 DEFINE_PER_CPU(struct mmap_unlock_irq_work, mmap_unlock_work);
 
 static void do_mmap_read_unlock(struct irq_work *entry)
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index befa55b52e29..57760afc13d0 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -7326,4 +7326,12 @@ struct bpf_iter_process {
 	__u64 __opaque[1];
 } __attribute__((aligned(8)));
 
+struct bpf_iter_css_pre {
+	__u64 __opaque[2];
+} __attribute__((aligned(8)));
+
+struct bpf_iter_css_post {
+	__u64 __opaque[2];
+} __attribute__((aligned(8)));
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h
index 858252c2641c..6e5bd9ef14d6 100644
--- a/tools/lib/bpf/bpf_helpers.h
+++ b/tools/lib/bpf/bpf_helpers.h
@@ -315,6 +315,18 @@ extern int bpf_iter_process_new(struct bpf_iter_process *it) __weak __ksym;
 extern struct task_struct *bpf_iter_process_next(struct bpf_iter_process *it) __weak __ksym;
 extern void bpf_iter_process_destroy(struct bpf_iter_process *it) __weak __ksym;
 
+struct bpf_iter_css_pre;
+extern int bpf_iter_css_pre_new(struct bpf_iter_css_pre *it,
+		struct cgroup_subsys_state *root) __weak __ksym;
+extern struct cgroup_subsys_state *bpf_iter_css_pre_next(struct bpf_iter_css_pre *it) __weak __ksym;
+extern void bpf_iter_css_pre_destroy(struct bpf_iter_css_pre *it) __weak __ksym;
+
+struct bpf_iter_css_post;
+extern int bpf_iter_css_post_new(struct bpf_iter_css_post *it,
+		struct cgroup_subsys_state *root) __weak __ksym;
+extern struct cgroup_subsys_state *bpf_iter_css_post_next(struct bpf_iter_css_post *it) __weak __ksym;
+extern void bpf_iter_css_post_destroy(struct bpf_iter_css_post *it) __weak __ksym;
+
 #ifndef bpf_for_each
 /* bpf_for_each(iter_type, cur_elem, args...) provides generic construct for
  * using BPF open-coded iterators without having to write mundane explicit
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH bpf-next v2 5/6] bpf: teach the verifier to enforce css_iter and process_iter in RCU CS
  2023-09-12  7:01 [PATCH bpf-next v2 0/6] Add Open-coded process and css iters Chuyi Zhou
                   ` (3 preceding siblings ...)
  2023-09-12  7:01 ` [PATCH bpf-next v2 4/6] bpf: Introduce css_descendant open-coded " Chuyi Zhou
@ 2023-09-12  7:01 ` Chuyi Zhou
  2023-09-13 13:53   ` Chuyi Zhou
  2023-09-14 23:26   ` Andrii Nakryiko
  2023-09-12  7:01 ` [PATCH bpf-next v2 6/6] selftests/bpf: Add tests for open-coded task and css iter Chuyi Zhou
  2023-09-12  7:11 ` [PATCH bpf-next v2 0/6] Add Open-coded process and css iters Chuyi Zhou
  6 siblings, 2 replies; 33+ messages in thread
From: Chuyi Zhou @ 2023-09-12  7:01 UTC (permalink / raw)
  To: bpf; +Cc: ast, daniel, andrii, martin.lau, tj, linux-kernel, Chuyi Zhou

css_iter and process_iter should be used in rcu section. Specifically, in
sleepable progs explicit bpf_rcu_read_lock() is needed before use these
iters. In normal bpf progs that have implicit rcu_read_lock(), it's OK to
use them directly.

This patch checks whether we are in rcu cs before we want to invoke
bpf_iter_process_new and bpf_iter_css_{pre, post}_new in
mark_stack_slots_iter(). If the rcu protection is guaranteed, we would
let st->type = PTR_TO_STACK | MEM_RCU. is_iter_reg_valid_init() will
reject if reg->type is UNTRUSTED.

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
---
 kernel/bpf/verifier.c | 30 ++++++++++++++++++++++++++++--
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 2367483bf4c2..6a6827ba7a18 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1172,7 +1172,13 @@ static bool is_dynptr_type_expected(struct bpf_verifier_env *env, struct bpf_reg
 
 static void __mark_reg_known_zero(struct bpf_reg_state *reg);
 
+static bool in_rcu_cs(struct bpf_verifier_env *env);
+
+/* check whether we are using bpf_iter_process_*() or bpf_iter_css_*() */
+static bool is_iter_need_rcu(struct bpf_kfunc_call_arg_meta *meta);
+
 static int mark_stack_slots_iter(struct bpf_verifier_env *env,
+				 struct bpf_kfunc_call_arg_meta *meta,
 				 struct bpf_reg_state *reg, int insn_idx,
 				 struct btf *btf, u32 btf_id, int nr_slots)
 {
@@ -1193,6 +1199,12 @@ static int mark_stack_slots_iter(struct bpf_verifier_env *env,
 
 		__mark_reg_known_zero(st);
 		st->type = PTR_TO_STACK; /* we don't have dedicated reg type */
+		if (is_iter_need_rcu(meta)) {
+			if (in_rcu_cs(env))
+				st->type |= MEM_RCU;
+			else
+				st->type |= PTR_UNTRUSTED;
+		}
 		st->live |= REG_LIVE_WRITTEN;
 		st->ref_obj_id = i == 0 ? id : 0;
 		st->iter.btf = btf;
@@ -1281,6 +1293,8 @@ static bool is_iter_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_
 		struct bpf_stack_state *slot = &state->stack[spi - i];
 		struct bpf_reg_state *st = &slot->spilled_ptr;
 
+		if (st->type & PTR_UNTRUSTED)
+			return false;
 		/* only main (first) slot has ref_obj_id set */
 		if (i == 0 && !st->ref_obj_id)
 			return false;
@@ -7503,13 +7517,13 @@ static int process_iter_arg(struct bpf_verifier_env *env, int regno, int insn_id
 				return err;
 		}
 
-		err = mark_stack_slots_iter(env, reg, insn_idx, meta->btf, btf_id, nr_slots);
+		err = mark_stack_slots_iter(env, meta, reg, insn_idx, meta->btf, btf_id, nr_slots);
 		if (err)
 			return err;
 	} else {
 		/* iter_next() or iter_destroy() expect initialized iter state*/
 		if (!is_iter_reg_valid_init(env, reg, meta->btf, btf_id, nr_slots)) {
-			verbose(env, "expected an initialized iter_%s as arg #%d\n",
+			verbose(env, "expected an initialized iter_%s as arg #%d or without bpf_rcu_read_lock()\n",
 				iter_type_str(meta->btf, btf_id), regno);
 			return -EINVAL;
 		}
@@ -10382,6 +10396,18 @@ BTF_ID(func, bpf_percpu_obj_new_impl)
 BTF_ID(func, bpf_percpu_obj_drop_impl)
 BTF_ID(func, bpf_iter_css_task_new)
 
+BTF_SET_START(rcu_protect_kfuns_set)
+BTF_ID(func, bpf_iter_process_new)
+BTF_ID(func, bpf_iter_css_pre_new)
+BTF_ID(func, bpf_iter_css_post_new)
+BTF_SET_END(rcu_protect_kfuns_set)
+
+static inline bool is_iter_need_rcu(struct bpf_kfunc_call_arg_meta *meta)
+{
+	return btf_id_set_contains(&rcu_protect_kfuns_set, meta->func_id);
+}
+
+
 static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta)
 {
 	if (meta->func_id == special_kfunc_list[KF_bpf_refcount_acquire_impl] &&
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH bpf-next v2 6/6] selftests/bpf: Add tests for open-coded task and css iter
  2023-09-12  7:01 [PATCH bpf-next v2 0/6] Add Open-coded process and css iters Chuyi Zhou
                   ` (4 preceding siblings ...)
  2023-09-12  7:01 ` [PATCH bpf-next v2 5/6] bpf: teach the verifier to enforce css_iter and process_iter in RCU CS Chuyi Zhou
@ 2023-09-12  7:01 ` Chuyi Zhou
  2023-09-12  7:11 ` [PATCH bpf-next v2 0/6] Add Open-coded process and css iters Chuyi Zhou
  6 siblings, 0 replies; 33+ messages in thread
From: Chuyi Zhou @ 2023-09-12  7:01 UTC (permalink / raw)
  To: bpf; +Cc: ast, daniel, andrii, martin.lau, tj, linux-kernel, Chuyi Zhou

This patch adds three tests to demonstrate these patterns and validating
correctness.

test1: We use bpf_for_each(process, task) to iterate all process in the
system and search for the current process with a given pid.

test2: We create a cgroup and add the current task to the cgroup. In the
BPF program, we would use bpf_for_each(css_task, task, css) to iterate all
tasks under the cgroup. As expected, we would find the current process.

test3: We create a cgroup tree. In the BPF program, we use
bpf_for_each(css_{pre,post}, pos, root) to iterate all descendant under the
root with pre and post order. As expected, we would find all descendant and
the last iterating cgroup in post-order is root cgroup, the first
iterating cgroup in pre-order is root cgroup.

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
---
 .../testing/selftests/bpf/prog_tests/iters.c  | 138 ++++++++++++++++++
 .../testing/selftests/bpf/progs/iters_task.c  | 104 +++++++++++++
 2 files changed, 242 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/iters_task.c

diff --git a/tools/testing/selftests/bpf/prog_tests/iters.c b/tools/testing/selftests/bpf/prog_tests/iters.c
index 10804ae5ae97..f4e69a506509 100644
--- a/tools/testing/selftests/bpf/prog_tests/iters.c
+++ b/tools/testing/selftests/bpf/prog_tests/iters.c
@@ -1,13 +1,21 @@
 // SPDX-License-Identifier: GPL-2.0
 /* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */
 
+#include <sys/syscall.h>
+#include <sys/mman.h>
+#include <sys/wait.h>
+#include <unistd.h>
+#include <malloc.h>
+#include <stdlib.h>
 #include <test_progs.h>
+#include "cgroup_helpers.h"
 
 #include "iters.skel.h"
 #include "iters_state_safety.skel.h"
 #include "iters_looping.skel.h"
 #include "iters_num.skel.h"
 #include "iters_testmod_seq.skel.h"
+#include "iters_task.skel.h"
 
 static void subtest_num_iters(void)
 {
@@ -90,6 +98,130 @@ static void subtest_testmod_seq_iters(void)
 	iters_testmod_seq__destroy(skel);
 }
 
+static void subtest_process_iters(void)
+{
+	struct iters_task *skel;
+	int err;
+
+	skel = iters_task__open();
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		goto cleanup;
+	bpf_program__set_autoload(skel->progs.iter_task_for_each_sleep, true);
+	err = iters_task__load(skel);
+	if (!ASSERT_OK(err, "skel_load"))
+		goto cleanup;
+	skel->bss->target_pid = getpid();
+	err = iters_task__attach(skel);
+	if (!ASSERT_OK(err, "iters_task__attach"))
+		goto cleanup;
+	syscall(SYS_getpgid);
+	iters_task__detach(skel);
+	ASSERT_EQ(skel->bss->process_cnt, 1, "process_cnt");
+
+cleanup:
+	iters_task__destroy(skel);
+}
+
+extern int stack_mprotect(void);
+
+static void subtest_css_task_iters(void)
+{
+	struct iters_task *skel;
+	int err, cg_fd, cg_id;
+	const char *cgrp_path = "/cg1";
+
+	err = setup_cgroup_environment();
+	if (!ASSERT_OK(err, "setup_cgroup_environment"))
+		goto cleanup;
+	cg_fd = create_and_get_cgroup(cgrp_path);
+	if (!ASSERT_GE(cg_fd, 0, "cg_create"))
+		goto cleanup;
+	cg_id = get_cgroup_id(cgrp_path);
+	err = join_cgroup(cgrp_path);
+	if (!ASSERT_OK(err, "setup_cgroup_environment"))
+		goto cleanup;
+
+	skel = iters_task__open();
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		goto cleanup;
+
+	bpf_program__set_autoload(skel->progs.iter_css_task_for_each, true);
+	err = iters_task__load(skel);
+	if (!ASSERT_OK(err, "skel_load"))
+		goto cleanup;
+
+	skel->bss->target_pid = getpid();
+	skel->bss->cg_id = cg_id;
+	err = iters_task__attach(skel);
+
+	err = stack_mprotect();
+	if (!ASSERT_OK(err, "iters_task__attach"))
+		goto cleanup;
+
+	iters_task__detach(skel);
+	ASSERT_EQ(skel->bss->css_task_cnt, 1, "css_task_cnt");
+
+cleanup:
+	cleanup_cgroup_environment();
+	iters_task__destroy(skel);
+}
+
+static void subtest_css_dec_iters(void)
+{
+	struct iters_task *skel;
+	struct {
+		const char *path;
+		int fd;
+	} cgs[] = {
+		{ "/cg1" },
+		{ "/cg1/cg2" },
+		{ "/cg1/cg2/cg3" },
+		{ "/cg1/cg2/cg3/cg4" },
+	};
+	int err, cg_nr = ARRAY_SIZE(cgs);
+	int i;
+
+	err = setup_cgroup_environment();
+	if (!ASSERT_OK(err, "setup_cgroup_environment"))
+		goto cleanup;
+	for (i = 0; i < cg_nr; i++) {
+		cgs[i].fd = create_and_get_cgroup(cgs[i].path);
+		if (!ASSERT_GE(cgs[i].fd, 0, "cg_create"))
+			goto cleanup;
+	}
+
+	skel = iters_task__open();
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		goto cleanup;
+	bpf_program__set_autoload(skel->progs.iter_css_dec_for_each, true);
+	err = iters_task__load(skel);
+	if (!ASSERT_OK(err, "skel_load"))
+		goto cleanup;
+
+	skel->bss->target_pid = getpid();
+	skel->bss->cg_id = get_cgroup_id(cgs[0].path);
+
+	err = iters_task__attach(skel);
+
+	if (!ASSERT_OK(err, "iters_task__attach"))
+		goto cleanup;
+
+	syscall(SYS_getpgid);
+	ASSERT_EQ(skel->bss->css_dec_cnt, cg_nr, "pre order search dec count");
+	ASSERT_EQ(skel->bss->first_cg_id, get_cgroup_id(cgs[0].path),
+				"pre order search first cgroup id");
+	skel->bss->css_dec_cnt = 0;
+	skel->bss->is_post_order = 1;
+	syscall(SYS_getpgid);
+	ASSERT_EQ(skel->bss->css_dec_cnt, cg_nr, "post order search dec count");
+	ASSERT_EQ(skel->bss->last_cg_id, get_cgroup_id(cgs[0].path),
+				"post order search last cgroup id");
+	iters_task__detach(skel);
+cleanup:
+	cleanup_cgroup_environment();
+	iters_task__destroy(skel);
+}
+
 void test_iters(void)
 {
 	RUN_TESTS(iters_state_safety);
@@ -103,4 +235,10 @@ void test_iters(void)
 		subtest_num_iters();
 	if (test__start_subtest("testmod_seq"))
 		subtest_testmod_seq_iters();
+	if (test__start_subtest("process"))
+		subtest_process_iters();
+	if (test__start_subtest("css_task"))
+		subtest_css_task_iters();
+	if (test__start_subtest("css_dec"))
+		subtest_css_dec_iters();
 }
diff --git a/tools/testing/selftests/bpf/progs/iters_task.c b/tools/testing/selftests/bpf/progs/iters_task.c
new file mode 100644
index 000000000000..cf24a3b177f2
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/iters_task.c
@@ -0,0 +1,104 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include "bpf_misc.h"
+char _license[] SEC("license") = "GPL";
+
+pid_t target_pid = 0;
+int process_cnt = 0;
+int css_task_cnt = 0;
+int css_dec_cnt = 0;
+
+char is_post_order;
+u64 cg_id;
+u64 last_cg_id;
+u64 first_cg_id;
+
+struct cgroup *bpf_cgroup_from_id(u64 cgid) __ksym;
+struct cgroup *bpf_cgroup_acquire(struct cgroup *cgrp) __ksym;
+void bpf_cgroup_release(struct cgroup *p) __ksym;
+void bpf_rcu_read_lock(void) __ksym;
+void bpf_rcu_read_unlock(void) __ksym;
+
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+int iter_task_for_each_sleep(void *ctx)
+{
+	struct task_struct *task;
+	struct task_struct *cur_task = bpf_get_current_task_btf();
+
+	if (cur_task->pid != target_pid)
+		return 0;
+	bpf_rcu_read_lock();
+	bpf_for_each(process, task) {
+		if (task->pid == target_pid)
+			process_cnt += 1;
+	}
+	bpf_rcu_read_unlock();
+	return 0;
+}
+
+
+SEC("?lsm/file_mprotect")
+int BPF_PROG(iter_css_task_for_each)
+{
+	struct task_struct *task;
+	struct task_struct *cur_task = bpf_get_current_task_btf();
+
+	if (cur_task->pid != target_pid)
+		return 0;
+
+	struct cgroup *cgrp = bpf_cgroup_from_id(cg_id);
+
+	if (cgrp == NULL)
+		return 0;
+	struct cgroup_subsys_state *css = &cgrp->self;
+
+	bpf_for_each(css_task, task, css, CSS_TASK_ITER_PROCS) {
+		if (!task)
+			continue;
+		if (task->pid == target_pid)
+			css_task_cnt += 1;
+	}
+	bpf_cgroup_release(cgrp);
+	return 0;
+}
+
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+int iter_css_dec_for_each(const void *ctx)
+{
+	struct task_struct *cur_task = bpf_get_current_task_btf();
+
+	if (cur_task->pid != target_pid)
+		return 0;
+
+	struct cgroup *cgrp = bpf_cgroup_from_id(cg_id);
+
+	if (cgrp == NULL)
+		return 0;
+	struct cgroup_subsys_state *root = &cgrp->self;
+	struct cgroup_subsys_state *pos = NULL;
+
+	bpf_rcu_read_lock();
+	if (is_post_order) {
+		bpf_for_each(css_post, pos, root) {
+			struct cgroup *cur_cgrp = pos->cgroup;
+
+			css_dec_cnt += 1;
+			if (cur_cgrp)
+				last_cg_id = cur_cgrp->kn->id;
+		}
+	} else {
+		bpf_for_each(css_pre, pos, root) {
+			struct cgroup *cur_cgrp = pos->cgroup;
+
+			css_dec_cnt += 1;
+			if (cur_cgrp && !first_cg_id)
+				first_cg_id = cur_cgrp->kn->id;
+		}
+	}
+	bpf_rcu_read_unlock();
+	bpf_cgroup_release(cgrp);
+	return 0;
+}
+
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v2 0/6] Add Open-coded process and css iters
  2023-09-12  7:01 [PATCH bpf-next v2 0/6] Add Open-coded process and css iters Chuyi Zhou
                   ` (5 preceding siblings ...)
  2023-09-12  7:01 ` [PATCH bpf-next v2 6/6] selftests/bpf: Add tests for open-coded task and css iter Chuyi Zhou
@ 2023-09-12  7:11 ` Chuyi Zhou
  6 siblings, 0 replies; 33+ messages in thread
From: Chuyi Zhou @ 2023-09-12  7:11 UTC (permalink / raw)
  To: bpf; +Cc: ast, daniel, andrii, martin.lau, tj, linux-kernel



在 2023/9/12 15:01, Chuyi Zhou 写道:
> Hi,
> 
> This is version 2 of process and css iters support. All the changes were
> suggested by Alexei.
> 
> Thanks for your review!
> 
> --- Changelog ---
> Changes from v1:
> - Add a pre-patch to make some preparations before supporting css_task
>    iters.
> - Add an allowlist for css_task iters
> - Let bpf progs do explicit bpf_rcu_read_lock() when using process iters
> and css_descendant iters.

Sorry for missing the link to v1 
(https://lore.kernel.org/lkml/20230827072057.1591929-1-zhouchuyi@bytedance.com/).


> ---------------------
> 
> In some BPF usage scenarios, it will be useful to iterate the process and
> css directly in the BPF program. One of the expected scenarios is
> customizable OOM victim selection via BPF[1].
> 
> Inspired by Dave's task_vma iter[2], this patchset adds three types of
> open-coded iterator kfuncs:
> 
> 1. bpf_for_each(process, p). Just like for_each_process(p) in kernel to
> itearing all tasks in the system.
> 
> 2. bpf_for_each(css_task, task, css). It works like
> css_task_iter_{start, next, end} and would be used to iterating
> tasks/threads under a css.
> 
> 3. bpf_for_each(css_{post, pre}, pos, root_css). It works like
> css_next_descendant_{pre, post} to iterating all descendant css.
> 
> BPF programs can use these kfuncs directly or through bpf_for_each macro.
> 
> link[1]: https://lore.kernel.org/lkml/20230810081319.65668-1-zhouchuyi@bytedance.com/
> link[2]: https://lore.kernel.org/all/20230810183513.684836-1-davemarchevsky@fb.com/
> 
> Chuyi Zhou (6):
>    cgroup: Prepare for using css_task_iter_*() in BPF
>    bpf: Introduce css_task open-coded iterator kfuncs
>    bpf: Introduce process open coded iterator kfuncs
>    bpf: Introduce css_descendant open-coded iterator kfuncs
>    bpf: teach the verifier to enforce css_iter and process_iter in RCU CS
>    selftests/bpf: Add tests for open-coded task and css iter
> 
>   include/linux/cgroup.h                        |  12 +-
>   include/uapi/linux/bpf.h                      |  16 ++
>   kernel/bpf/helpers.c                          |  12 ++
>   kernel/bpf/task_iter.c                        | 130 +++++++++++++++++
>   kernel/bpf/verifier.c                         |  53 ++++++-
>   kernel/cgroup/cgroup.c                        |  18 ++-
>   tools/include/uapi/linux/bpf.h                |  16 ++
>   tools/lib/bpf/bpf_helpers.h                   |  24 +++
>   .../testing/selftests/bpf/prog_tests/iters.c  | 138 ++++++++++++++++++
>   .../testing/selftests/bpf/progs/iters_task.c  | 104 +++++++++++++
>   10 files changed, 508 insertions(+), 15 deletions(-)
>   create mode 100644 tools/testing/selftests/bpf/progs/iters_task.c
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v2 2/6] bpf: Introduce css_task open-coded iterator kfuncs
  2023-09-12  7:01 ` [PATCH bpf-next v2 2/6] bpf: Introduce css_task open-coded iterator kfuncs Chuyi Zhou
@ 2023-09-12 17:13   ` Alexei Starovoitov
  2023-09-13 14:02     ` Chuyi Zhou
  2023-09-12 19:57   ` Martin KaFai Lau
  2023-09-14 23:26   ` Andrii Nakryiko
  2 siblings, 1 reply; 33+ messages in thread
From: Alexei Starovoitov @ 2023-09-12 17:13 UTC (permalink / raw)
  To: Chuyi Zhou; +Cc: bpf, ast, daniel, andrii, martin.lau, tj, linux-kernel

On Tue, Sep 12, 2023 at 03:01:45PM +0800, Chuyi Zhou wrote:
> This patch adds kfuncs bpf_iter_css_task_{new,next,destroy} which allow
> creation and manipulation of struct bpf_iter_css_task in open-coded
> iterator style. These kfuncs actually wrapps css_task_iter_{start,next,
> end}. BPF programs can use these kfuncs through bpf_for_each macro for
> iteration of all tasks under a css.
> 
> css_task_iter_*() would try to get the global spin-lock *css_set_lock*, so
> the bpf side has to be careful in where it allows to use this iter.
> Currently we only allow it in bpf_lsm and bpf iter-s.
> 
> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
> ---
>  include/uapi/linux/bpf.h       |  4 +++
>  kernel/bpf/helpers.c           |  3 +++
>  kernel/bpf/task_iter.c         | 48 ++++++++++++++++++++++++++++++++++
>  kernel/bpf/verifier.c          | 23 ++++++++++++++++
>  tools/include/uapi/linux/bpf.h |  4 +++
>  tools/lib/bpf/bpf_helpers.h    |  7 +++++
>  6 files changed, 89 insertions(+)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 73b155e52204..de02c0971428 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -7318,4 +7318,8 @@ struct bpf_iter_num {
>  	__u64 __opaque[1];
>  } __attribute__((aligned(8)));
>  
> +struct bpf_iter_css_task {
> +	__u64 __opaque[1];
> +} __attribute__((aligned(8)));
> +
>  #endif /* _UAPI__LINUX_BPF_H__ */
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index b0a9834f1051..d6a16becfbb9 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2504,6 +2504,9 @@ BTF_ID_FLAGS(func, bpf_dynptr_slice_rdwr, KF_RET_NULL)
>  BTF_ID_FLAGS(func, bpf_iter_num_new, KF_ITER_NEW)
>  BTF_ID_FLAGS(func, bpf_iter_num_next, KF_ITER_NEXT | KF_RET_NULL)
>  BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY)
> +BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW)

Looking at selftest:
struct cgroup_subsys_state *css = &cgrp->self;

realized that we're missing KF_TRUSTED_ARGS here.

> +BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL)
> +BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
>  BTF_ID_FLAGS(func, bpf_dynptr_adjust)
>  BTF_ID_FLAGS(func, bpf_dynptr_is_null)
>  BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> index 7473068ed313..d8539cc05ffd 100644
> --- a/kernel/bpf/task_iter.c
> +++ b/kernel/bpf/task_iter.c
> @@ -803,6 +803,54 @@ const struct bpf_func_proto bpf_find_vma_proto = {
>  	.arg5_type	= ARG_ANYTHING,
>  };
>  
> +struct bpf_iter_css_task_kern {
> +	struct css_task_iter *css_it;
> +} __attribute__((aligned(8)));
> +
> +__bpf_kfunc int bpf_iter_css_task_new(struct bpf_iter_css_task *it,
> +		struct cgroup_subsys_state *css, unsigned int flags)

the verifier does a type check, but it's not strong enough.
We need KF_TRUSTED_ARGS to make sure the pointer is valid.
The BTF_TYPE_SAFE_RCU(struct cgroup) {
probably doesn't need to change, since '&cgrp->self' is not a pointer deref.
The verifier should understand that cgroup_subsys_state is also PTR_TRUSTED
just like 'cgrp' pointer.

Also please add negative tests in patch 6.
Like doing bpf_rcu_read_unlock() in the middle and check that the verifier
catches such mistake.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v2 2/6] bpf: Introduce css_task open-coded iterator kfuncs
  2023-09-12  7:01 ` [PATCH bpf-next v2 2/6] bpf: Introduce css_task open-coded iterator kfuncs Chuyi Zhou
  2023-09-12 17:13   ` Alexei Starovoitov
@ 2023-09-12 19:57   ` Martin KaFai Lau
  2023-09-13  4:56     ` Chuyi Zhou
  2023-09-14 23:26   ` Andrii Nakryiko
  2 siblings, 1 reply; 33+ messages in thread
From: Martin KaFai Lau @ 2023-09-12 19:57 UTC (permalink / raw)
  To: Chuyi Zhou; +Cc: ast, daniel, andrii, martin.lau, tj, linux-kernel, bpf

On 9/12/23 12:01 AM, Chuyi Zhou wrote:
> +__bpf_kfunc int bpf_iter_css_task_new(struct bpf_iter_css_task *it,
> +		struct cgroup_subsys_state *css, unsigned int flags)
> +{
> +	struct bpf_iter_css_task_kern *kit = (void *)it;
> +
> +	BUILD_BUG_ON(sizeof(struct bpf_iter_css_task_kern) != sizeof(struct bpf_iter_css_task));
> +	BUILD_BUG_ON(__alignof__(struct bpf_iter_css_task_kern) !=
> +					__alignof__(struct bpf_iter_css_task));
> +
> +	switch (flags) {
> +	case CSS_TASK_ITER_PROCS | CSS_TASK_ITER_THREADED:
> +	case CSS_TASK_ITER_PROCS:
> +	case 0:
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	kit->css_it = kzalloc(sizeof(struct css_task_iter), GFP_KERNEL);
> +	if (!kit->css_it)
> +		return -ENOMEM;
> +	css_task_iter_start(css, flags, kit->css_it);
> +	return 0;
> +}
> +

> +static bool check_css_task_iter_allowlist(struct bpf_verifier_env *env)
> +{
> +	enum bpf_prog_type prog_type = resolve_prog_type(env->prog);
> +
> +	switch (prog_type) {
> +	case BPF_PROG_TYPE_LSM:

This will allow the non-sleepable lsm prog to call bpf_iter_css_task_new. The 
above kzalloc(GFP_KERNEL) in bpf_iter_css_task_new should trigger a lockdep 
error in the lsm selftest in patch 6.

> +		return true;
> +	case BPF_TRACE_ITER:
> +		return env->prog->aux->sleepable;
> +	default:
> +		return false;
> +	}
> +}


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v2 2/6] bpf: Introduce css_task open-coded iterator kfuncs
  2023-09-12 19:57   ` Martin KaFai Lau
@ 2023-09-13  4:56     ` Chuyi Zhou
  0 siblings, 0 replies; 33+ messages in thread
From: Chuyi Zhou @ 2023-09-13  4:56 UTC (permalink / raw)
  To: Martin KaFai Lau; +Cc: ast, daniel, andrii, martin.lau, tj, linux-kernel, bpf

Hello Martin.

在 2023/9/13 03:57, Martin KaFai Lau 写道:
> On 9/12/23 12:01 AM, Chuyi Zhou wrote:
>> +__bpf_kfunc int bpf_iter_css_task_new(struct bpf_iter_css_task *it,
>> +        struct cgroup_subsys_state *css, unsigned int flags)
>> +{
>> +    struct bpf_iter_css_task_kern *kit = (void *)it;
>> +
>> +    BUILD_BUG_ON(sizeof(struct bpf_iter_css_task_kern) != 
>> sizeof(struct bpf_iter_css_task));
>> +    BUILD_BUG_ON(__alignof__(struct bpf_iter_css_task_kern) !=
>> +                    __alignof__(struct bpf_iter_css_task));
>> +
>> +    switch (flags) {
>> +    case CSS_TASK_ITER_PROCS | CSS_TASK_ITER_THREADED:
>> +    case CSS_TASK_ITER_PROCS:
>> +    case 0:
>> +        break;
>> +    default:
>> +        return -EINVAL;
>> +    }
>> +
>> +    kit->css_it = kzalloc(sizeof(struct css_task_iter), GFP_KERNEL);
>> +    if (!kit->css_it)
>> +        return -ENOMEM;
>> +    css_task_iter_start(css, flags, kit->css_it);
>> +    return 0;
>> +}
>> +
> 
>> +static bool check_css_task_iter_allowlist(struct bpf_verifier_env *env)
>> +{
>> +    enum bpf_prog_type prog_type = resolve_prog_type(env->prog);
>> +
>> +    switch (prog_type) {
>> +    case BPF_PROG_TYPE_LSM:
> 
> This will allow the non-sleepable lsm prog to call 
> bpf_iter_css_task_new. The above kzalloc(GFP_KERNEL) in 
> bpf_iter_css_task_new should trigger a lockdep error in the lsm selftest 
> in patch 6.

Thanks for your test. I missed the dmesg error since I did not set 
CONFIG_LOCKDEP.

I think here we can use kzalloc(GFP_ATOMIC) to eliminate this error if 
we want to use this iter in non-sleepable lsm prog. I just tested, it 
works well.

> 
>> +        return true;
>> +    case BPF_TRACE_ITER:
>> +        return env->prog->aux->sleepable;
>> +    default:
>> +        return false;
>> +    }
>> +}
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v2 4/6] bpf: Introduce css_descendant open-coded iterator kfuncs
  2023-09-12  7:01 ` [PATCH bpf-next v2 4/6] bpf: Introduce css_descendant open-coded " Chuyi Zhou
@ 2023-09-13  7:25   ` kernel test robot
  2023-09-13  9:02   ` kernel test robot
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 33+ messages in thread
From: kernel test robot @ 2023-09-13  7:25 UTC (permalink / raw)
  To: Chuyi Zhou, bpf
  Cc: oe-kbuild-all, ast, daniel, andrii, martin.lau, tj, linux-kernel,
	Chuyi Zhou

Hi Chuyi,

kernel test robot noticed the following build warnings:

[auto build test WARNING on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Chuyi-Zhou/cgroup-Prepare-for-using-css_task_iter_-in-BPF/20230912-150454
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link:    https://lore.kernel.org/r/20230912070149.969939-5-zhouchuyi%40bytedance.com
patch subject: [PATCH bpf-next v2 4/6] bpf: Introduce css_descendant open-coded iterator kfuncs
config: s390-defconfig (https://download.01.org/0day-ci/archive/20230913/202309131500.J19z0Dil-lkp@intel.com/config)
compiler: s390-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230913/202309131500.J19z0Dil-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202309131500.J19z0Dil-lkp@intel.com/

All warnings (new ones prefixed by >>):

   kernel/bpf/task_iter.c:810:17: warning: no previous prototype for 'bpf_iter_css_task_new' [-Wmissing-prototypes]
     810 | __bpf_kfunc int bpf_iter_css_task_new(struct bpf_iter_css_task *it,
         |                 ^~~~~~~~~~~~~~~~~~~~~
   kernel/bpf/task_iter.c:835:33: warning: no previous prototype for 'bpf_iter_css_task_next' [-Wmissing-prototypes]
     835 | __bpf_kfunc struct task_struct *bpf_iter_css_task_next(struct bpf_iter_css_task *it)
         |                                 ^~~~~~~~~~~~~~~~~~~~~~
   kernel/bpf/task_iter.c:844:18: warning: no previous prototype for 'bpf_iter_css_task_destroy' [-Wmissing-prototypes]
     844 | __bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it)
         |                  ^~~~~~~~~~~~~~~~~~~~~~~~~
   kernel/bpf/task_iter.c:858:17: warning: no previous prototype for 'bpf_iter_process_new' [-Wmissing-prototypes]
     858 | __bpf_kfunc int bpf_iter_process_new(struct bpf_iter_process *it)
         |                 ^~~~~~~~~~~~~~~~~~~~
   kernel/bpf/task_iter.c:870:33: warning: no previous prototype for 'bpf_iter_process_next' [-Wmissing-prototypes]
     870 | __bpf_kfunc struct task_struct *bpf_iter_process_next(struct bpf_iter_process *it)
         |                                 ^~~~~~~~~~~~~~~~~~~~~
   kernel/bpf/task_iter.c:879:18: warning: no previous prototype for 'bpf_iter_process_destroy' [-Wmissing-prototypes]
     879 | __bpf_kfunc void bpf_iter_process_destroy(struct bpf_iter_process *it)
         |                  ^~~~~~~~~~~~~~~~~~~~~~~~
>> kernel/bpf/task_iter.c:888:17: warning: no previous prototype for 'bpf_iter_css_pre_new' [-Wmissing-prototypes]
     888 | __bpf_kfunc int bpf_iter_css_pre_new(struct bpf_iter_css_pre *it,
         |                 ^~~~~~~~~~~~~~~~~~~~
>> kernel/bpf/task_iter.c:900:41: warning: no previous prototype for 'bpf_iter_css_pre_next' [-Wmissing-prototypes]
     900 | __bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_pre_next(struct bpf_iter_css_pre *it)
         |                                         ^~~~~~~~~~~~~~~~~~~~~
>> kernel/bpf/task_iter.c:908:18: warning: no previous prototype for 'bpf_iter_css_pre_destroy' [-Wmissing-prototypes]
     908 | __bpf_kfunc void bpf_iter_css_pre_destroy(struct bpf_iter_css_pre *it)
         |                  ^~~~~~~~~~~~~~~~~~~~~~~~
>> kernel/bpf/task_iter.c:912:17: warning: no previous prototype for 'bpf_iter_css_post_new' [-Wmissing-prototypes]
     912 | __bpf_kfunc int bpf_iter_css_post_new(struct bpf_iter_css_post *it,
         |                 ^~~~~~~~~~~~~~~~~~~~~
>> kernel/bpf/task_iter.c:924:41: warning: no previous prototype for 'bpf_iter_css_post_next' [-Wmissing-prototypes]
     924 | __bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_post_next(struct bpf_iter_css_post *it)
         |                                         ^~~~~~~~~~~~~~~~~~~~~~
>> kernel/bpf/task_iter.c:932:18: warning: no previous prototype for 'bpf_iter_css_post_destroy' [-Wmissing-prototypes]
     932 | __bpf_kfunc void bpf_iter_css_post_destroy(struct bpf_iter_css_post *it)
         |                  ^~~~~~~~~~~~~~~~~~~~~~~~~


vim +/bpf_iter_css_pre_new +888 kernel/bpf/task_iter.c

   887	
 > 888	__bpf_kfunc int bpf_iter_css_pre_new(struct bpf_iter_css_pre *it,
   889			struct cgroup_subsys_state *root)
   890	{
   891		struct bpf_iter_css_kern *kit = (void *)it;
   892	
   893		BUILD_BUG_ON(sizeof(struct bpf_iter_css_kern) != sizeof(struct bpf_iter_css_pre));
   894		BUILD_BUG_ON(__alignof__(struct bpf_iter_css_kern) != __alignof__(struct bpf_iter_css_pre));
   895		kit->root = root;
   896		kit->pos = NULL;
   897		return 0;
   898	}
   899	
 > 900	__bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_pre_next(struct bpf_iter_css_pre *it)
   901	{
   902		struct bpf_iter_css_kern *kit = (void *)it;
   903	
   904		kit->pos = css_next_descendant_pre(kit->pos, kit->root);
   905		return kit->pos;
   906	}
   907	
 > 908	__bpf_kfunc void bpf_iter_css_pre_destroy(struct bpf_iter_css_pre *it)
   909	{
   910	}
   911	
 > 912	__bpf_kfunc int bpf_iter_css_post_new(struct bpf_iter_css_post *it,
   913			struct cgroup_subsys_state *root)
   914	{
   915		struct bpf_iter_css_kern *kit = (void *)it;
   916	
   917		BUILD_BUG_ON(sizeof(struct bpf_iter_css_kern) != sizeof(struct bpf_iter_css_post));
   918		BUILD_BUG_ON(__alignof__(struct bpf_iter_css_kern) != __alignof__(struct bpf_iter_css_post));
   919		kit->root = root;
   920		kit->pos = NULL;
   921		return 0;
   922	}
   923	
 > 924	__bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_post_next(struct bpf_iter_css_post *it)
   925	{
   926		struct bpf_iter_css_kern *kit = (void *)it;
   927	
   928		kit->pos = css_next_descendant_post(kit->pos, kit->root);
   929		return kit->pos;
   930	}
   931	
 > 932	__bpf_kfunc void bpf_iter_css_post_destroy(struct bpf_iter_css_post *it)
   933	{
   934	}
   935	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v2 4/6] bpf: Introduce css_descendant open-coded iterator kfuncs
  2023-09-12  7:01 ` [PATCH bpf-next v2 4/6] bpf: Introduce css_descendant open-coded " Chuyi Zhou
  2023-09-13  7:25   ` kernel test robot
@ 2023-09-13  9:02   ` kernel test robot
  2023-09-13  9:13   ` kernel test robot
  2023-09-14 23:26   ` Andrii Nakryiko
  3 siblings, 0 replies; 33+ messages in thread
From: kernel test robot @ 2023-09-13  9:02 UTC (permalink / raw)
  To: Chuyi Zhou, bpf
  Cc: llvm, oe-kbuild-all, ast, daniel, andrii, martin.lau, tj,
	linux-kernel, Chuyi Zhou

Hi Chuyi,

kernel test robot noticed the following build warnings:

[auto build test WARNING on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Chuyi-Zhou/cgroup-Prepare-for-using-css_task_iter_-in-BPF/20230912-150454
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link:    https://lore.kernel.org/r/20230912070149.969939-5-zhouchuyi%40bytedance.com
patch subject: [PATCH bpf-next v2 4/6] bpf: Introduce css_descendant open-coded iterator kfuncs
config: hexagon-randconfig-r032-20230913 (https://download.01.org/0day-ci/archive/20230913/202309131621.h5ogfV0Z-lkp@intel.com/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project.git 4a5ac14ee968ff0ad5d2cc1ffa0299048db4c88a)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230913/202309131621.h5ogfV0Z-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202309131621.h5ogfV0Z-lkp@intel.com/

All warnings (new ones prefixed by >>):

     573 |         val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
         |                                                         ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/little_endian.h:35:51: note: expanded from macro '__le32_to_cpu'
      35 | #define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
         |                                                   ^
   In file included from kernel/bpf/task_iter.c:9:
   In file included from include/linux/filter.h:9:
   In file included from include/linux/bpf.h:31:
   In file included from include/linux/memcontrol.h:13:
   In file included from include/linux/cgroup.h:26:
   In file included from include/linux/kernel_stat.h:9:
   In file included from include/linux/interrupt.h:11:
   In file included from include/linux/hardirq.h:11:
   In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/hexagon/include/asm/io.h:337:
   include/asm-generic/io.h:584:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     584 |         __raw_writeb(value, PCI_IOBASE + addr);
         |                             ~~~~~~~~~~ ^
   include/asm-generic/io.h:594:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     594 |         __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
         |                                                       ~~~~~~~~~~ ^
   include/asm-generic/io.h:604:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     604 |         __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
         |                                                       ~~~~~~~~~~ ^
   kernel/bpf/task_iter.c:820:7: error: use of undeclared identifier 'CSS_TASK_ITER_PROCS'
     820 |         case CSS_TASK_ITER_PROCS | CSS_TASK_ITER_THREADED:
         |              ^
   kernel/bpf/task_iter.c:820:29: error: use of undeclared identifier 'CSS_TASK_ITER_THREADED'
     820 |         case CSS_TASK_ITER_PROCS | CSS_TASK_ITER_THREADED:
         |                                    ^
   kernel/bpf/task_iter.c:821:7: error: use of undeclared identifier 'CSS_TASK_ITER_PROCS'
     821 |         case CSS_TASK_ITER_PROCS:
         |              ^
   kernel/bpf/task_iter.c:828:24: error: invalid application of 'sizeof' to an incomplete type 'struct css_task_iter'
     828 |         kit->css_it = kzalloc(sizeof(struct css_task_iter), GFP_KERNEL);
         |                               ^     ~~~~~~~~~~~~~~~~~~~~~~
   kernel/bpf/task_iter.c:807:9: note: forward declaration of 'struct css_task_iter'
     807 |         struct css_task_iter *css_it;
         |                ^
   kernel/bpf/task_iter.c:831:2: error: call to undeclared function 'css_task_iter_start'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
     831 |         css_task_iter_start(css, flags, kit->css_it);
         |         ^
   kernel/bpf/task_iter.c:831:2: note: did you mean '__sg_page_iter_start'?
   include/linux/scatterlist.h:573:6: note: '__sg_page_iter_start' declared here
     573 | void __sg_page_iter_start(struct sg_page_iter *piter,
         |      ^
   kernel/bpf/task_iter.c:810:17: warning: no previous prototype for function 'bpf_iter_css_task_new' [-Wmissing-prototypes]
     810 | __bpf_kfunc int bpf_iter_css_task_new(struct bpf_iter_css_task *it,
         |                 ^
   kernel/bpf/task_iter.c:810:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
     810 | __bpf_kfunc int bpf_iter_css_task_new(struct bpf_iter_css_task *it,
         |             ^
         |             static 
   kernel/bpf/task_iter.c:841:9: error: call to undeclared function 'css_task_iter_next'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
     841 |         return css_task_iter_next(kit->css_it);
         |                ^
   kernel/bpf/task_iter.c:841:9: error: incompatible integer to pointer conversion returning 'int' from a function with result type 'struct task_struct *' [-Wint-conversion]
     841 |         return css_task_iter_next(kit->css_it);
         |                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   kernel/bpf/task_iter.c:835:33: warning: no previous prototype for function 'bpf_iter_css_task_next' [-Wmissing-prototypes]
     835 | __bpf_kfunc struct task_struct *bpf_iter_css_task_next(struct bpf_iter_css_task *it)
         |                                 ^
   kernel/bpf/task_iter.c:835:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
     835 | __bpf_kfunc struct task_struct *bpf_iter_css_task_next(struct bpf_iter_css_task *it)
         |             ^
         |             static 
   kernel/bpf/task_iter.c:850:2: error: call to undeclared function 'css_task_iter_end'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
     850 |         css_task_iter_end(kit->css_it);
         |         ^
   kernel/bpf/task_iter.c:844:18: warning: no previous prototype for function 'bpf_iter_css_task_destroy' [-Wmissing-prototypes]
     844 | __bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it)
         |                  ^
   kernel/bpf/task_iter.c:844:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
     844 | __bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it)
         |             ^
         |             static 
   kernel/bpf/task_iter.c:858:17: warning: no previous prototype for function 'bpf_iter_process_new' [-Wmissing-prototypes]
     858 | __bpf_kfunc int bpf_iter_process_new(struct bpf_iter_process *it)
         |                 ^
   kernel/bpf/task_iter.c:858:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
     858 | __bpf_kfunc int bpf_iter_process_new(struct bpf_iter_process *it)
         |             ^
         |             static 
   kernel/bpf/task_iter.c:870:33: warning: no previous prototype for function 'bpf_iter_process_next' [-Wmissing-prototypes]
     870 | __bpf_kfunc struct task_struct *bpf_iter_process_next(struct bpf_iter_process *it)
         |                                 ^
   kernel/bpf/task_iter.c:870:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
     870 | __bpf_kfunc struct task_struct *bpf_iter_process_next(struct bpf_iter_process *it)
         |             ^
         |             static 
   kernel/bpf/task_iter.c:879:18: warning: no previous prototype for function 'bpf_iter_process_destroy' [-Wmissing-prototypes]
     879 | __bpf_kfunc void bpf_iter_process_destroy(struct bpf_iter_process *it)
         |                  ^
   kernel/bpf/task_iter.c:879:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
     879 | __bpf_kfunc void bpf_iter_process_destroy(struct bpf_iter_process *it)
         |             ^
         |             static 
>> kernel/bpf/task_iter.c:888:17: warning: no previous prototype for function 'bpf_iter_css_pre_new' [-Wmissing-prototypes]
     888 | __bpf_kfunc int bpf_iter_css_pre_new(struct bpf_iter_css_pre *it,
         |                 ^
   kernel/bpf/task_iter.c:888:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
     888 | __bpf_kfunc int bpf_iter_css_pre_new(struct bpf_iter_css_pre *it,
         |             ^
         |             static 
   kernel/bpf/task_iter.c:904:13: error: call to undeclared function 'css_next_descendant_pre'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
     904 |         kit->pos = css_next_descendant_pre(kit->pos, kit->root);
         |                    ^
   kernel/bpf/task_iter.c:904:11: error: incompatible integer to pointer conversion assigning to 'struct cgroup_subsys_state *' from 'int' [-Wint-conversion]
     904 |         kit->pos = css_next_descendant_pre(kit->pos, kit->root);
         |                  ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> kernel/bpf/task_iter.c:900:41: warning: no previous prototype for function 'bpf_iter_css_pre_next' [-Wmissing-prototypes]
     900 | __bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_pre_next(struct bpf_iter_css_pre *it)
         |                                         ^
   kernel/bpf/task_iter.c:900:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
     900 | __bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_pre_next(struct bpf_iter_css_pre *it)
         |             ^
         |             static 
>> kernel/bpf/task_iter.c:908:18: warning: no previous prototype for function 'bpf_iter_css_pre_destroy' [-Wmissing-prototypes]
     908 | __bpf_kfunc void bpf_iter_css_pre_destroy(struct bpf_iter_css_pre *it)
         |                  ^
   kernel/bpf/task_iter.c:908:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
     908 | __bpf_kfunc void bpf_iter_css_pre_destroy(struct bpf_iter_css_pre *it)
         |             ^
         |             static 
>> kernel/bpf/task_iter.c:912:17: warning: no previous prototype for function 'bpf_iter_css_post_new' [-Wmissing-prototypes]
     912 | __bpf_kfunc int bpf_iter_css_post_new(struct bpf_iter_css_post *it,
         |                 ^
   kernel/bpf/task_iter.c:912:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
     912 | __bpf_kfunc int bpf_iter_css_post_new(struct bpf_iter_css_post *it,
         |             ^
         |             static 
   kernel/bpf/task_iter.c:928:13: error: call to undeclared function 'css_next_descendant_post'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
     928 |         kit->pos = css_next_descendant_post(kit->pos, kit->root);
         |                    ^
   kernel/bpf/task_iter.c:928:11: error: incompatible integer to pointer conversion assigning to 'struct cgroup_subsys_state *' from 'int' [-Wint-conversion]
     928 |         kit->pos = css_next_descendant_post(kit->pos, kit->root);
         |                  ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> kernel/bpf/task_iter.c:924:41: warning: no previous prototype for function 'bpf_iter_css_post_next' [-Wmissing-prototypes]
     924 | __bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_post_next(struct bpf_iter_css_post *it)
         |                                         ^
   kernel/bpf/task_iter.c:924:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
     924 | __bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_post_next(struct bpf_iter_css_post *it)
         |             ^
         |             static 
>> kernel/bpf/task_iter.c:932:18: warning: no previous prototype for function 'bpf_iter_css_post_destroy' [-Wmissing-prototypes]
     932 | __bpf_kfunc void bpf_iter_css_post_destroy(struct bpf_iter_css_post *it)
         |                  ^
   kernel/bpf/task_iter.c:932:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
     932 | __bpf_kfunc void bpf_iter_css_post_destroy(struct bpf_iter_css_post *it)
         |             ^
         |             static 
   18 warnings and 12 errors generated.


vim +/bpf_iter_css_pre_new +888 kernel/bpf/task_iter.c

   878	
 > 879	__bpf_kfunc void bpf_iter_process_destroy(struct bpf_iter_process *it)
   880	{
   881	}
   882	
   883	struct bpf_iter_css_kern {
   884		struct cgroup_subsys_state *root;
   885		struct cgroup_subsys_state *pos;
   886	} __attribute__((aligned(8)));
   887	
 > 888	__bpf_kfunc int bpf_iter_css_pre_new(struct bpf_iter_css_pre *it,
   889			struct cgroup_subsys_state *root)
   890	{
   891		struct bpf_iter_css_kern *kit = (void *)it;
   892	
   893		BUILD_BUG_ON(sizeof(struct bpf_iter_css_kern) != sizeof(struct bpf_iter_css_pre));
   894		BUILD_BUG_ON(__alignof__(struct bpf_iter_css_kern) != __alignof__(struct bpf_iter_css_pre));
   895		kit->root = root;
   896		kit->pos = NULL;
   897		return 0;
   898	}
   899	
 > 900	__bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_pre_next(struct bpf_iter_css_pre *it)
   901	{
   902		struct bpf_iter_css_kern *kit = (void *)it;
   903	
 > 904		kit->pos = css_next_descendant_pre(kit->pos, kit->root);
   905		return kit->pos;
   906	}
   907	
 > 908	__bpf_kfunc void bpf_iter_css_pre_destroy(struct bpf_iter_css_pre *it)
   909	{
   910	}
   911	
 > 912	__bpf_kfunc int bpf_iter_css_post_new(struct bpf_iter_css_post *it,
   913			struct cgroup_subsys_state *root)
   914	{
   915		struct bpf_iter_css_kern *kit = (void *)it;
   916	
   917		BUILD_BUG_ON(sizeof(struct bpf_iter_css_kern) != sizeof(struct bpf_iter_css_post));
   918		BUILD_BUG_ON(__alignof__(struct bpf_iter_css_kern) != __alignof__(struct bpf_iter_css_post));
   919		kit->root = root;
   920		kit->pos = NULL;
   921		return 0;
   922	}
   923	
 > 924	__bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_post_next(struct bpf_iter_css_post *it)
   925	{
   926		struct bpf_iter_css_kern *kit = (void *)it;
   927	
 > 928		kit->pos = css_next_descendant_post(kit->pos, kit->root);
   929		return kit->pos;
   930	}
   931	
 > 932	__bpf_kfunc void bpf_iter_css_post_destroy(struct bpf_iter_css_post *it)
   933	{
   934	}
   935	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v2 4/6] bpf: Introduce css_descendant open-coded iterator kfuncs
  2023-09-12  7:01 ` [PATCH bpf-next v2 4/6] bpf: Introduce css_descendant open-coded " Chuyi Zhou
  2023-09-13  7:25   ` kernel test robot
  2023-09-13  9:02   ` kernel test robot
@ 2023-09-13  9:13   ` kernel test robot
  2023-09-14 23:26   ` Andrii Nakryiko
  3 siblings, 0 replies; 33+ messages in thread
From: kernel test robot @ 2023-09-13  9:13 UTC (permalink / raw)
  To: Chuyi Zhou, bpf
  Cc: llvm, oe-kbuild-all, ast, daniel, andrii, martin.lau, tj,
	linux-kernel, Chuyi Zhou

Hi Chuyi,

kernel test robot noticed the following build errors:

[auto build test ERROR on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Chuyi-Zhou/cgroup-Prepare-for-using-css_task_iter_-in-BPF/20230912-150454
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link:    https://lore.kernel.org/r/20230912070149.969939-5-zhouchuyi%40bytedance.com
patch subject: [PATCH bpf-next v2 4/6] bpf: Introduce css_descendant open-coded iterator kfuncs
config: arm-randconfig-r032-20230913 (https://download.01.org/0day-ci/archive/20230913/202309131634.hJlxw7NA-lkp@intel.com/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project.git 4a5ac14ee968ff0ad5d2cc1ffa0299048db4c88a)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230913/202309131634.hJlxw7NA-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202309131634.hJlxw7NA-lkp@intel.com/

All errors (new ones prefixed by >>):

   kernel/bpf/task_iter.c:810:17: warning: no previous prototype for function 'bpf_iter_css_task_new' [-Wmissing-prototypes]
     810 | __bpf_kfunc int bpf_iter_css_task_new(struct bpf_iter_css_task *it,
         |                 ^
   kernel/bpf/task_iter.c:810:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
     810 | __bpf_kfunc int bpf_iter_css_task_new(struct bpf_iter_css_task *it,
         |             ^
         |             static 
   kernel/bpf/task_iter.c:835:33: warning: no previous prototype for function 'bpf_iter_css_task_next' [-Wmissing-prototypes]
     835 | __bpf_kfunc struct task_struct *bpf_iter_css_task_next(struct bpf_iter_css_task *it)
         |                                 ^
   kernel/bpf/task_iter.c:835:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
     835 | __bpf_kfunc struct task_struct *bpf_iter_css_task_next(struct bpf_iter_css_task *it)
         |             ^
         |             static 
   kernel/bpf/task_iter.c:844:18: warning: no previous prototype for function 'bpf_iter_css_task_destroy' [-Wmissing-prototypes]
     844 | __bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it)
         |                  ^
   kernel/bpf/task_iter.c:844:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
     844 | __bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it)
         |             ^
         |             static 
   kernel/bpf/task_iter.c:858:17: warning: no previous prototype for function 'bpf_iter_process_new' [-Wmissing-prototypes]
     858 | __bpf_kfunc int bpf_iter_process_new(struct bpf_iter_process *it)
         |                 ^
   kernel/bpf/task_iter.c:858:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
     858 | __bpf_kfunc int bpf_iter_process_new(struct bpf_iter_process *it)
         |             ^
         |             static 
   kernel/bpf/task_iter.c:870:33: warning: no previous prototype for function 'bpf_iter_process_next' [-Wmissing-prototypes]
     870 | __bpf_kfunc struct task_struct *bpf_iter_process_next(struct bpf_iter_process *it)
         |                                 ^
   kernel/bpf/task_iter.c:870:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
     870 | __bpf_kfunc struct task_struct *bpf_iter_process_next(struct bpf_iter_process *it)
         |             ^
         |             static 
   kernel/bpf/task_iter.c:879:18: warning: no previous prototype for function 'bpf_iter_process_destroy' [-Wmissing-prototypes]
     879 | __bpf_kfunc void bpf_iter_process_destroy(struct bpf_iter_process *it)
         |                  ^
   kernel/bpf/task_iter.c:879:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
     879 | __bpf_kfunc void bpf_iter_process_destroy(struct bpf_iter_process *it)
         |             ^
         |             static 
   kernel/bpf/task_iter.c:888:17: warning: no previous prototype for function 'bpf_iter_css_pre_new' [-Wmissing-prototypes]
     888 | __bpf_kfunc int bpf_iter_css_pre_new(struct bpf_iter_css_pre *it,
         |                 ^
   kernel/bpf/task_iter.c:888:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
     888 | __bpf_kfunc int bpf_iter_css_pre_new(struct bpf_iter_css_pre *it,
         |             ^
         |             static 
   kernel/bpf/task_iter.c:900:41: warning: no previous prototype for function 'bpf_iter_css_pre_next' [-Wmissing-prototypes]
     900 | __bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_pre_next(struct bpf_iter_css_pre *it)
         |                                         ^
   kernel/bpf/task_iter.c:900:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
     900 | __bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_pre_next(struct bpf_iter_css_pre *it)
         |             ^
         |             static 
   kernel/bpf/task_iter.c:908:18: warning: no previous prototype for function 'bpf_iter_css_pre_destroy' [-Wmissing-prototypes]
     908 | __bpf_kfunc void bpf_iter_css_pre_destroy(struct bpf_iter_css_pre *it)
         |                  ^
   kernel/bpf/task_iter.c:908:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
     908 | __bpf_kfunc void bpf_iter_css_pre_destroy(struct bpf_iter_css_pre *it)
         |             ^
         |             static 
   kernel/bpf/task_iter.c:912:17: warning: no previous prototype for function 'bpf_iter_css_post_new' [-Wmissing-prototypes]
     912 | __bpf_kfunc int bpf_iter_css_post_new(struct bpf_iter_css_post *it,
         |                 ^
   kernel/bpf/task_iter.c:912:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
     912 | __bpf_kfunc int bpf_iter_css_post_new(struct bpf_iter_css_post *it,
         |             ^
         |             static 
   kernel/bpf/task_iter.c:924:41: warning: no previous prototype for function 'bpf_iter_css_post_next' [-Wmissing-prototypes]
     924 | __bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_post_next(struct bpf_iter_css_post *it)
         |                                         ^
   kernel/bpf/task_iter.c:924:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
     924 | __bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_post_next(struct bpf_iter_css_post *it)
         |             ^
         |             static 
   kernel/bpf/task_iter.c:932:18: warning: no previous prototype for function 'bpf_iter_css_post_destroy' [-Wmissing-prototypes]
     932 | __bpf_kfunc void bpf_iter_css_post_destroy(struct bpf_iter_css_post *it)
         |                  ^
   kernel/bpf/task_iter.c:932:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
     932 | __bpf_kfunc void bpf_iter_css_post_destroy(struct bpf_iter_css_post *it)
         |             ^
         |             static 
>> kernel/bpf/task_iter.c:893:2: error: call to '__compiletime_assert_389' declared with 'error' attribute: BUILD_BUG_ON failed: sizeof(struct bpf_iter_css_kern) != sizeof(struct bpf_iter_css_pre)
     893 |         BUILD_BUG_ON(sizeof(struct bpf_iter_css_kern) != sizeof(struct bpf_iter_css_pre));
         |         ^
   include/linux/build_bug.h:50:2: note: expanded from macro 'BUILD_BUG_ON'
      50 |         BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
         |         ^
   include/linux/build_bug.h:39:37: note: expanded from macro 'BUILD_BUG_ON_MSG'
      39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
         |                                     ^
   include/linux/compiler_types.h:425:2: note: expanded from macro 'compiletime_assert'
     425 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
         |         ^
   include/linux/compiler_types.h:413:2: note: expanded from macro '_compiletime_assert'
     413 |         __compiletime_assert(condition, msg, prefix, suffix)
         |         ^
   include/linux/compiler_types.h:406:4: note: expanded from macro '__compiletime_assert'
     406 |                         prefix ## suffix();                             \
         |                         ^
   <scratch space>:6:1: note: expanded from here
       6 | __compiletime_assert_389
         | ^
>> kernel/bpf/task_iter.c:917:2: error: call to '__compiletime_assert_391' declared with 'error' attribute: BUILD_BUG_ON failed: sizeof(struct bpf_iter_css_kern) != sizeof(struct bpf_iter_css_post)
     917 |         BUILD_BUG_ON(sizeof(struct bpf_iter_css_kern) != sizeof(struct bpf_iter_css_post));
         |         ^
   include/linux/build_bug.h:50:2: note: expanded from macro 'BUILD_BUG_ON'
      50 |         BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
         |         ^
   include/linux/build_bug.h:39:37: note: expanded from macro 'BUILD_BUG_ON_MSG'
      39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
         |                                     ^
   include/linux/compiler_types.h:425:2: note: expanded from macro 'compiletime_assert'
     425 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
         |         ^
   include/linux/compiler_types.h:413:2: note: expanded from macro '_compiletime_assert'
     413 |         __compiletime_assert(condition, msg, prefix, suffix)
         |         ^
   include/linux/compiler_types.h:406:4: note: expanded from macro '__compiletime_assert'
     406 |                         prefix ## suffix();                             \
         |                         ^
   <scratch space>:14:1: note: expanded from here
      14 | __compiletime_assert_391
         | ^
   12 warnings and 2 errors generated.


vim +893 kernel/bpf/task_iter.c

   887	
   888	__bpf_kfunc int bpf_iter_css_pre_new(struct bpf_iter_css_pre *it,
   889			struct cgroup_subsys_state *root)
   890	{
   891		struct bpf_iter_css_kern *kit = (void *)it;
   892	
 > 893		BUILD_BUG_ON(sizeof(struct bpf_iter_css_kern) != sizeof(struct bpf_iter_css_pre));
   894		BUILD_BUG_ON(__alignof__(struct bpf_iter_css_kern) != __alignof__(struct bpf_iter_css_pre));
   895		kit->root = root;
   896		kit->pos = NULL;
   897		return 0;
   898	}
   899	
   900	__bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_pre_next(struct bpf_iter_css_pre *it)
   901	{
   902		struct bpf_iter_css_kern *kit = (void *)it;
   903	
   904		kit->pos = css_next_descendant_pre(kit->pos, kit->root);
   905		return kit->pos;
   906	}
   907	
   908	__bpf_kfunc void bpf_iter_css_pre_destroy(struct bpf_iter_css_pre *it)
   909	{
   910	}
   911	
   912	__bpf_kfunc int bpf_iter_css_post_new(struct bpf_iter_css_post *it,
   913			struct cgroup_subsys_state *root)
   914	{
   915		struct bpf_iter_css_kern *kit = (void *)it;
   916	
 > 917		BUILD_BUG_ON(sizeof(struct bpf_iter_css_kern) != sizeof(struct bpf_iter_css_post));
   918		BUILD_BUG_ON(__alignof__(struct bpf_iter_css_kern) != __alignof__(struct bpf_iter_css_post));
   919		kit->root = root;
   920		kit->pos = NULL;
   921		return 0;
   922	}
   923	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v2 5/6] bpf: teach the verifier to enforce css_iter and process_iter in RCU CS
  2023-09-12  7:01 ` [PATCH bpf-next v2 5/6] bpf: teach the verifier to enforce css_iter and process_iter in RCU CS Chuyi Zhou
@ 2023-09-13 13:53   ` Chuyi Zhou
  2023-09-14  8:56     ` Chuyi Zhou
  2023-09-14 23:26   ` Andrii Nakryiko
  1 sibling, 1 reply; 33+ messages in thread
From: Chuyi Zhou @ 2023-09-13 13:53 UTC (permalink / raw)
  To: bpf, Alexei Starovoitov; +Cc: ast, daniel, andrii, martin.lau, tj, linux-kernel

Hello.

在 2023/9/12 15:01, Chuyi Zhou 写道:
> css_iter and process_iter should be used in rcu section. Specifically, in
> sleepable progs explicit bpf_rcu_read_lock() is needed before use these
> iters. In normal bpf progs that have implicit rcu_read_lock(), it's OK to
> use them directly.
> 
> This patch checks whether we are in rcu cs before we want to invoke
> bpf_iter_process_new and bpf_iter_css_{pre, post}_new in
> mark_stack_slots_iter(). If the rcu protection is guaranteed, we would
> let st->type = PTR_TO_STACK | MEM_RCU. is_iter_reg_valid_init() will
> reject if reg->type is UNTRUSTED.

I use the following BPF Prog to test this patch:

SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
int iter_task_for_each_sleep(void *ctx)
{
	struct task_struct *task;
	struct task_struct *cur_task = bpf_get_current_task_btf();

	if (cur_task->pid != target_pid)
		return 0;
	bpf_rcu_read_lock();
	bpf_for_each(process, task) {
		bpf_rcu_read_unlock();
		if (task->pid == target_pid)
			process_cnt += 1;
		bpf_rcu_read_lock();
	}
	bpf_rcu_read_unlock();
	return 0;
}

Unfortunately, we can pass the verifier.

Then I add some printk-messages before setting/clearing state to help debug:

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d151e6b43a5f..35f3fa9471a9 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1200,7 +1200,7 @@ static int mark_stack_slots_iter(struct 
bpf_verifier_env *env,
                 __mark_reg_known_zero(st);
                 st->type = PTR_TO_STACK; /* we don't have dedicated reg 
type */
                 if (is_iter_need_rcu(meta)) {
+                       printk("mark reg_addr : %px", st);
                         if (in_rcu_cs(env))
                                 st->type |= MEM_RCU;
                         else
@@ -11472,8 +11472,8 @@ static int check_kfunc_call(struct 
bpf_verifier_env *env, struct bpf_insn *insn,
                         return -EINVAL;
                 } else if (rcu_unlock) {
                         bpf_for_each_reg_in_vstate(env->cur_state, 
state, reg, ({
+                               printk("clear reg_addr : %px MEM_RCU : 
%d PTR_UNTRUSTED : %d\n ", reg, reg->type & MEM_RCU, reg->type & 
PTR_UNTRUSTED);
                                 if (reg->type & MEM_RCU) {
-                                       printk("clear reg addr : %lld", 
reg);
                                         reg->type &= ~(MEM_RCU | 
PTR_MAYBE_NULL);
                                         reg->type |= PTR_UNTRUSTED;
                                 }


The demsg log:

[  393.705324] mark reg_addr : ffff88814e40e200

[  393.706883] clear reg_addr : ffff88814d5f8000 MEM_RCU : 0 
PTR_UNTRUSTED : 0

[  393.707353] clear reg_addr : ffff88814d5f8078 MEM_RCU : 0 
PTR_UNTRUSTED : 0

[  393.708099] clear reg_addr : ffff88814d5f80f0 MEM_RCU : 0 
PTR_UNTRUSTED : 0
....
....

I didn't see ffff88814e40e200 is cleared as expected because 
bpf_for_each_reg_in_vstate didn't find it.

It seems when we are doing bpf_read_unlock() in the middle of iteration 
and want to clearing state through bpf_for_each_reg_in_vstate, we can 
not find the previous reg which we marked MEM_RCU/PTR_UNTRUSTED in 
mark_stack_slots_iter().

I thought maybe the correct answer here is operating the *iter_reg* 
parameter in mark_stack_slots_iter() direcly so we can find it in 
bpf_for_each_reg_in_vstate.

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 6a6827ba7a18..53330ddf2b3c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1218,6 +1218,12 @@ static int mark_stack_slots_iter(struct 
bpf_verifier_env *env,
                 mark_stack_slot_scratched(env, spi - i);
         }

+       if (is_iter_need_rcu(meta)) {
+               if (in_rcu_cs(env))
+                       reg->type |= MEM_RCU;
+               else
+                       reg->type |= PTR_UNTRUSTED;
+       }
         return 0;
  }

@@ -1307,7 +1315,8 @@ static bool is_iter_reg_valid_init(struct 
bpf_verifier_env *env, struct bpf_reg_
                         if (slot->slot_type[j] != STACK_ITER)
                                 Kumarreturn false;
         }
-
+       if (reg->type & PTR_UNTRUSTED)
+               return false;
         return true;
  }

However, it did not work either. The reason it didn't work is the state 
of iter_reg will be cleared implicitly before the 
is_iter_reg_valid_init() even we don't call bpf_rcu_unlock.

It would be appreciate if you could give some suggestion. Maby it worthy 
to try the solution proposed by Kumar?[1]

[1] 
https://lore.kernel.org/lkml/CAP01T77cWxWNwq5HLr+Woiu7k4-P3QQfJWX1OeQJUkxW3=P4bA@mail.gmail.com/#t




^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v2 2/6] bpf: Introduce css_task open-coded iterator kfuncs
  2023-09-12 17:13   ` Alexei Starovoitov
@ 2023-09-13 14:02     ` Chuyi Zhou
  0 siblings, 0 replies; 33+ messages in thread
From: Chuyi Zhou @ 2023-09-13 14:02 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: bpf, ast, daniel, andrii, martin.lau, tj, linux-kernel

Hello, Alexei.

在 2023/9/13 01:13, Alexei Starovoitov 写道:
> On Tue, Sep 12, 2023 at 03:01:45PM +0800, Chuyi Zhou wrote:
>> This patch adds kfuncs bpf_iter_css_task_{new,next,destroy} which allow
>> creation and manipulation of struct bpf_iter_css_task in open-coded
>> iterator style. These kfuncs actually wrapps css_task_iter_{start,next,
>> end}. BPF programs can use these kfuncs through bpf_for_each macro for
>> iteration of all tasks under a css.
>>
>> css_task_iter_*() would try to get the global spin-lock *css_set_lock*, so
>> the bpf side has to be careful in where it allows to use this iter.
>> Currently we only allow it in bpf_lsm and bpf iter-s.
>>
>> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
>> ---
>>   include/uapi/linux/bpf.h       |  4 +++
>>   kernel/bpf/helpers.c           |  3 +++
>>   kernel/bpf/task_iter.c         | 48 ++++++++++++++++++++++++++++++++++
>>   kernel/bpf/verifier.c          | 23 ++++++++++++++++
>>   tools/include/uapi/linux/bpf.h |  4 +++
>>   tools/lib/bpf/bpf_helpers.h    |  7 +++++
>>   6 files changed, 89 insertions(+)
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 73b155e52204..de02c0971428 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -7318,4 +7318,8 @@ struct bpf_iter_num {
>>   	__u64 __opaque[1];
>>   } __attribute__((aligned(8)));
>>   
>> +struct bpf_iter_css_task {
>> +	__u64 __opaque[1];
>> +} __attribute__((aligned(8)));
>> +
>>   #endif /* _UAPI__LINUX_BPF_H__ */
>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>> index b0a9834f1051..d6a16becfbb9 100644
>> --- a/kernel/bpf/helpers.c
>> +++ b/kernel/bpf/helpers.c
>> @@ -2504,6 +2504,9 @@ BTF_ID_FLAGS(func, bpf_dynptr_slice_rdwr, KF_RET_NULL)
>>   BTF_ID_FLAGS(func, bpf_iter_num_new, KF_ITER_NEW)
>>   BTF_ID_FLAGS(func, bpf_iter_num_next, KF_ITER_NEXT | KF_RET_NULL)
>>   BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY)
>> +BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW)
> 
> Looking at selftest:
> struct cgroup_subsys_state *css = &cgrp->self;
> 
> realized that we're missing KF_TRUSTED_ARGS here.
> 
>> +BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL)
>> +BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
>>   BTF_ID_FLAGS(func, bpf_dynptr_adjust)
>>   BTF_ID_FLAGS(func, bpf_dynptr_is_null)
>>   BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
>> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
>> index 7473068ed313..d8539cc05ffd 100644
>> --- a/kernel/bpf/task_iter.c
>> +++ b/kernel/bpf/task_iter.c
>> @@ -803,6 +803,54 @@ const struct bpf_func_proto bpf_find_vma_proto = {
>>   	.arg5_type	= ARG_ANYTHING,
>>   };
>>   
>> +struct bpf_iter_css_task_kern {
>> +	struct css_task_iter *css_it;
>> +} __attribute__((aligned(8)));
>> +
>> +__bpf_kfunc int bpf_iter_css_task_new(struct bpf_iter_css_task *it,
>> +		struct cgroup_subsys_state *css, unsigned int flags)
> 
> the verifier does a type check, but it's not strong enough.
> We need KF_TRUSTED_ARGS to make sure the pointer is valid.
> The BTF_TYPE_SAFE_RCU(struct cgroup) {
> probably doesn't need to change, since '&cgrp->self' is not a pointer deref.
> The verifier should understand that cgroup_subsys_state is also PTR_TRUSTED
> just like 'cgrp' pointer.

Got it. It seems we should also apply this to bpf_iter_css_{pre,post}_new.

> 
> Also please add negative tests in patch 6.
> Like doing bpf_rcu_read_unlock() in the middle and check that the verifier
> catches such mistake.

I will do it in next version.

Thanks.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v2 5/6] bpf: teach the verifier to enforce css_iter and process_iter in RCU CS
  2023-09-13 13:53   ` Chuyi Zhou
@ 2023-09-14  8:56     ` Chuyi Zhou
  2023-09-14 23:26       ` Andrii Nakryiko
  0 siblings, 1 reply; 33+ messages in thread
From: Chuyi Zhou @ 2023-09-14  8:56 UTC (permalink / raw)
  To: bpf, Alexei Starovoitov; +Cc: ast, daniel, andrii, martin.lau, tj, linux-kernel



在 2023/9/13 21:53, Chuyi Zhou 写道:
> Hello.
> 
> 在 2023/9/12 15:01, Chuyi Zhou 写道:
>> css_iter and process_iter should be used in rcu section. Specifically, in
>> sleepable progs explicit bpf_rcu_read_lock() is needed before use these
>> iters. In normal bpf progs that have implicit rcu_read_lock(), it's OK to
>> use them directly.
>>
>> This patch checks whether we are in rcu cs before we want to invoke
>> bpf_iter_process_new and bpf_iter_css_{pre, post}_new in
>> mark_stack_slots_iter(). If the rcu protection is guaranteed, we would
>> let st->type = PTR_TO_STACK | MEM_RCU. is_iter_reg_valid_init() will
>> reject if reg->type is UNTRUSTED.
> 
> I use the following BPF Prog to test this patch:
> 
> SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
> int iter_task_for_each_sleep(void *ctx)
> {
>      struct task_struct *task;
>      struct task_struct *cur_task = bpf_get_current_task_btf();
> 
>      if (cur_task->pid != target_pid)
>          return 0;
>      bpf_rcu_read_lock();
>      bpf_for_each(process, task) {
>          bpf_rcu_read_unlock();
>          if (task->pid == target_pid)
>              process_cnt += 1;
>          bpf_rcu_read_lock();
>      }
>      bpf_rcu_read_unlock();
>      return 0;
> }
> 
> Unfortunately, we can pass the verifier.
> 
> Then I add some printk-messages before setting/clearing state to help 
> debug:
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index d151e6b43a5f..35f3fa9471a9 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -1200,7 +1200,7 @@ static int mark_stack_slots_iter(struct 
> bpf_verifier_env *env,
>                  __mark_reg_known_zero(st);
>                  st->type = PTR_TO_STACK; /* we don't have dedicated reg 
> type */
>                  if (is_iter_need_rcu(meta)) {
> +                       printk("mark reg_addr : %px", st);
>                          if (in_rcu_cs(env))
>                                  st->type |= MEM_RCU;
>                          else
> @@ -11472,8 +11472,8 @@ static int check_kfunc_call(struct 
> bpf_verifier_env *env, struct bpf_insn *insn,
>                          return -EINVAL;
>                  } else if (rcu_unlock) {
>                          bpf_for_each_reg_in_vstate(env->cur_state, 
> state, reg, ({
> +                               printk("clear reg_addr : %px MEM_RCU : 
> %d PTR_UNTRUSTED : %d\n ", reg, reg->type & MEM_RCU, reg->type & 
> PTR_UNTRUSTED);
>                                  if (reg->type & MEM_RCU) {
> -                                       printk("clear reg addr : %lld", 
> reg);
>                                          reg->type &= ~(MEM_RCU | 
> PTR_MAYBE_NULL);
>                                          reg->type |= PTR_UNTRUSTED;
>                                  }
> 
> 
> The demsg log:
> 
> [  393.705324] mark reg_addr : ffff88814e40e200
> 
> [  393.706883] clear reg_addr : ffff88814d5f8000 MEM_RCU : 0 
> PTR_UNTRUSTED : 0
> 
> [  393.707353] clear reg_addr : ffff88814d5f8078 MEM_RCU : 0 
> PTR_UNTRUSTED : 0
> 
> [  393.708099] clear reg_addr : ffff88814d5f80f0 MEM_RCU : 0 
> PTR_UNTRUSTED : 0
> ....
> ....
> 
> I didn't see ffff88814e40e200 is cleared as expected because 
> bpf_for_each_reg_in_vstate didn't find it.
> 
> It seems when we are doing bpf_read_unlock() in the middle of iteration 
> and want to clearing state through bpf_for_each_reg_in_vstate, we can 
> not find the previous reg which we marked MEM_RCU/PTR_UNTRUSTED in 
> mark_stack_slots_iter().
> 

bpf_get_spilled_reg will skip slots if they are not STACK_SPILL, but in 
mark_stack_slots_iter() we has marked the slots *STACK_ITER*

With the following change, everything seems work OK.

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index a3236651ec64..83c5ecccadb4 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -387,7 +387,7 @@ struct bpf_verifier_state {

  #define bpf_get_spilled_reg(slot, frame)                               \
         (((slot < frame->allocated_stack / BPF_REG_SIZE) &&             \
-         (frame->stack[slot].slot_type[0] == STACK_SPILL))             \
+         (frame->stack[slot].slot_type[0] == STACK_SPILL || 
frame->stack[slot].slot_type[0] == STACK_ITER))            \
          ? &frame->stack[slot].spilled_ptr : NULL)

I am not sure whether this would harm some logic implicitly when using 
bpf_get_spilled_reg/bpf_for_each_spilled_reg in other place. If so, 
maybe we should add a extra parameter to control the picking behaviour.

#define bpf_get_spilled_reg(slot, frame, stack_type)
			\
	(((slot < frame->allocated_stack / BPF_REG_SIZE) &&		\
	  (frame->stack[slot].slot_type[0] == stack_type))		\
	 ? &frame->stack[slot].spilled_ptr : NULL)

Thanks.

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v2 2/6] bpf: Introduce css_task open-coded iterator kfuncs
  2023-09-12  7:01 ` [PATCH bpf-next v2 2/6] bpf: Introduce css_task open-coded iterator kfuncs Chuyi Zhou
  2023-09-12 17:13   ` Alexei Starovoitov
  2023-09-12 19:57   ` Martin KaFai Lau
@ 2023-09-14 23:26   ` Andrii Nakryiko
  2 siblings, 0 replies; 33+ messages in thread
From: Andrii Nakryiko @ 2023-09-14 23:26 UTC (permalink / raw)
  To: Chuyi Zhou; +Cc: bpf, ast, daniel, andrii, martin.lau, tj, linux-kernel

On Tue, Sep 12, 2023 at 12:02 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>
> This patch adds kfuncs bpf_iter_css_task_{new,next,destroy} which allow
> creation and manipulation of struct bpf_iter_css_task in open-coded
> iterator style. These kfuncs actually wrapps css_task_iter_{start,next,
> end}. BPF programs can use these kfuncs through bpf_for_each macro for
> iteration of all tasks under a css.
>
> css_task_iter_*() would try to get the global spin-lock *css_set_lock*, so
> the bpf side has to be careful in where it allows to use this iter.
> Currently we only allow it in bpf_lsm and bpf iter-s.
>
> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
> ---
>  include/uapi/linux/bpf.h       |  4 +++
>  kernel/bpf/helpers.c           |  3 +++
>  kernel/bpf/task_iter.c         | 48 ++++++++++++++++++++++++++++++++++
>  kernel/bpf/verifier.c          | 23 ++++++++++++++++
>  tools/include/uapi/linux/bpf.h |  4 +++
>  tools/lib/bpf/bpf_helpers.h    |  7 +++++
>  6 files changed, 89 insertions(+)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 73b155e52204..de02c0971428 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -7318,4 +7318,8 @@ struct bpf_iter_num {
>         __u64 __opaque[1];
>  } __attribute__((aligned(8)));
>
> +struct bpf_iter_css_task {
> +       __u64 __opaque[1];
> +} __attribute__((aligned(8)));
> +
>  #endif /* _UAPI__LINUX_BPF_H__ */
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index b0a9834f1051..d6a16becfbb9 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2504,6 +2504,9 @@ BTF_ID_FLAGS(func, bpf_dynptr_slice_rdwr, KF_RET_NULL)
>  BTF_ID_FLAGS(func, bpf_iter_num_new, KF_ITER_NEW)
>  BTF_ID_FLAGS(func, bpf_iter_num_next, KF_ITER_NEXT | KF_RET_NULL)
>  BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY)
> +BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW)
> +BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL)
> +BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
>  BTF_ID_FLAGS(func, bpf_dynptr_adjust)
>  BTF_ID_FLAGS(func, bpf_dynptr_is_null)
>  BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> index 7473068ed313..d8539cc05ffd 100644
> --- a/kernel/bpf/task_iter.c
> +++ b/kernel/bpf/task_iter.c
> @@ -803,6 +803,54 @@ const struct bpf_func_proto bpf_find_vma_proto = {
>         .arg5_type      = ARG_ANYTHING,
>  };
>
> +struct bpf_iter_css_task_kern {
> +       struct css_task_iter *css_it;
> +} __attribute__((aligned(8)));
> +
> +__bpf_kfunc int bpf_iter_css_task_new(struct bpf_iter_css_task *it,
> +               struct cgroup_subsys_state *css, unsigned int flags)
> +{
> +       struct bpf_iter_css_task_kern *kit = (void *)it;
> +
> +       BUILD_BUG_ON(sizeof(struct bpf_iter_css_task_kern) != sizeof(struct bpf_iter_css_task));
> +       BUILD_BUG_ON(__alignof__(struct bpf_iter_css_task_kern) !=
> +                                       __alignof__(struct bpf_iter_css_task));
> +
> +       switch (flags) {
> +       case CSS_TASK_ITER_PROCS | CSS_TASK_ITER_THREADED:
> +       case CSS_TASK_ITER_PROCS:
> +       case 0:
> +               break;
> +       default:
> +               return -EINVAL;
> +       }
> +
> +       kit->css_it = kzalloc(sizeof(struct css_task_iter), GFP_KERNEL);

Dave used bpf_mem_alloc() inside his iterator, any reason to not use it here?


> +       if (!kit->css_it)
> +               return -ENOMEM;
> +       css_task_iter_start(css, flags, kit->css_it);
> +       return 0;
> +}
> +
> +__bpf_kfunc struct task_struct *bpf_iter_css_task_next(struct bpf_iter_css_task *it)
> +{
> +       struct bpf_iter_css_task_kern *kit = (void *)it;
> +
> +       if (!kit->css_it)
> +               return NULL;
> +       return css_task_iter_next(kit->css_it);
> +}
> +
> +__bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it)
> +{
> +       struct bpf_iter_css_task_kern *kit = (void *)it;
> +
> +       if (!kit->css_it)
> +               return;
> +       css_task_iter_end(kit->css_it);
> +       kfree(kit->css_it);
> +}
> +
>  DEFINE_PER_CPU(struct mmap_unlock_irq_work, mmap_unlock_work);
>
>  static void do_mmap_read_unlock(struct irq_work *entry)
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index dbba2b806017..2367483bf4c2 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -10332,6 +10332,7 @@ enum special_kfunc_type {
>         KF_bpf_dynptr_clone,
>         KF_bpf_percpu_obj_new_impl,
>         KF_bpf_percpu_obj_drop_impl,
> +       KF_bpf_iter_css_task_new,
>  };
>
>  BTF_SET_START(special_kfunc_set)
> @@ -10354,6 +10355,7 @@ BTF_ID(func, bpf_dynptr_slice_rdwr)
>  BTF_ID(func, bpf_dynptr_clone)
>  BTF_ID(func, bpf_percpu_obj_new_impl)
>  BTF_ID(func, bpf_percpu_obj_drop_impl)
> +BTF_ID(func, bpf_iter_css_task_new)
>  BTF_SET_END(special_kfunc_set)
>
>  BTF_ID_LIST(special_kfunc_list)
> @@ -10378,6 +10380,7 @@ BTF_ID(func, bpf_dynptr_slice_rdwr)
>  BTF_ID(func, bpf_dynptr_clone)
>  BTF_ID(func, bpf_percpu_obj_new_impl)
>  BTF_ID(func, bpf_percpu_obj_drop_impl)
> +BTF_ID(func, bpf_iter_css_task_new)
>
>  static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta)
>  {
> @@ -10902,6 +10905,20 @@ static int process_kf_arg_ptr_to_rbtree_node(struct bpf_verifier_env *env,
>                                                   &meta->arg_rbtree_root.field);
>  }
>
> +static bool check_css_task_iter_allowlist(struct bpf_verifier_env *env)
> +{
> +       enum bpf_prog_type prog_type = resolve_prog_type(env->prog);
> +
> +       switch (prog_type) {
> +       case BPF_PROG_TYPE_LSM:
> +               return true;
> +       case BPF_TRACE_ITER:
> +               return env->prog->aux->sleepable;
> +       default:
> +               return false;
> +       }
> +}
> +
>  static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta,
>                             int insn_idx)
>  {
> @@ -11152,6 +11169,12 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
>                         break;
>                 }
>                 case KF_ARG_PTR_TO_ITER:
> +                       if (meta->func_id == special_kfunc_list[KF_bpf_iter_css_task_new]) {
> +                               if (!check_css_task_iter_allowlist(env)) {
> +                                       verbose(env, "css_task_iter is only allowed in bpf_lsm and bpf iter-s\n");
> +                                       return -EINVAL;
> +                               }
> +                       }
>                         ret = process_iter_arg(env, regno, insn_idx, meta);
>                         if (ret < 0)
>                                 return ret;
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 73b155e52204..de02c0971428 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -7318,4 +7318,8 @@ struct bpf_iter_num {
>         __u64 __opaque[1];
>  } __attribute__((aligned(8)));
>
> +struct bpf_iter_css_task {
> +       __u64 __opaque[1];
> +} __attribute__((aligned(8)));
> +
>  #endif /* _UAPI__LINUX_BPF_H__ */
> diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h
> index 77ceea575dc7..f48723c6c593 100644
> --- a/tools/lib/bpf/bpf_helpers.h
> +++ b/tools/lib/bpf/bpf_helpers.h
> @@ -303,6 +303,13 @@ extern int bpf_iter_num_new(struct bpf_iter_num *it, int start, int end) __weak
>  extern int *bpf_iter_num_next(struct bpf_iter_num *it) __weak __ksym;
>  extern void bpf_iter_num_destroy(struct bpf_iter_num *it) __weak __ksym;
>
> +struct bpf_iter_css_task;
> +struct cgroup_subsys_state;
> +extern int bpf_iter_css_task_new(struct bpf_iter_css_task *it,
> +               struct cgroup_subsys_state *css, unsigned int flags) __weak __ksym;
> +extern struct task_struct *bpf_iter_css_task_next(struct bpf_iter_css_task *it) __weak __ksym;
> +extern void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it) __weak __ksym;

please move this into bpf_experimental.h under selftests, this
shouldn't be in libbpf's stable API headers


> +
>  #ifndef bpf_for_each
>  /* bpf_for_each(iter_type, cur_elem, args...) provides generic construct for
>   * using BPF open-coded iterators without having to write mundane explicit
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v2 3/6] bpf: Introduce process open coded iterator kfuncs
  2023-09-12  7:01 ` [PATCH bpf-next v2 3/6] bpf: Introduce process open coded " Chuyi Zhou
@ 2023-09-14 23:26   ` Andrii Nakryiko
  2023-09-15  4:48     ` Chuyi Zhou
  2023-09-15 15:03     ` Chuyi Zhou
  0 siblings, 2 replies; 33+ messages in thread
From: Andrii Nakryiko @ 2023-09-14 23:26 UTC (permalink / raw)
  To: Chuyi Zhou; +Cc: bpf, ast, daniel, andrii, martin.lau, tj, linux-kernel

On Tue, Sep 12, 2023 at 12:02 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>
> This patch adds kfuncs bpf_iter_process_{new,next,destroy} which allow
> creation and manipulation of struct bpf_iter_process in open-coded iterator
> style. BPF programs can use these kfuncs or through bpf_for_each macro to
> iterate all processes in the system.
>
> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
> ---
>  include/uapi/linux/bpf.h       |  4 ++++
>  kernel/bpf/helpers.c           |  3 +++
>  kernel/bpf/task_iter.c         | 29 +++++++++++++++++++++++++++++
>  tools/include/uapi/linux/bpf.h |  4 ++++
>  tools/lib/bpf/bpf_helpers.h    |  5 +++++
>  5 files changed, 45 insertions(+)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index de02c0971428..befa55b52e29 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -7322,4 +7322,8 @@ struct bpf_iter_css_task {
>         __u64 __opaque[1];
>  } __attribute__((aligned(8)));
>
> +struct bpf_iter_process {
> +       __u64 __opaque[1];
> +} __attribute__((aligned(8)));
> +
>  #endif /* _UAPI__LINUX_BPF_H__ */
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index d6a16becfbb9..9b7d2c6f99d1 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2507,6 +2507,9 @@ BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY)
>  BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW)
>  BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL)
>  BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
> +BTF_ID_FLAGS(func, bpf_iter_process_new, KF_ITER_NEW)
> +BTF_ID_FLAGS(func, bpf_iter_process_next, KF_ITER_NEXT | KF_RET_NULL)
> +BTF_ID_FLAGS(func, bpf_iter_process_destroy, KF_ITER_DESTROY)
>  BTF_ID_FLAGS(func, bpf_dynptr_adjust)
>  BTF_ID_FLAGS(func, bpf_dynptr_is_null)
>  BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> index d8539cc05ffd..9d1927dc3a06 100644
> --- a/kernel/bpf/task_iter.c
> +++ b/kernel/bpf/task_iter.c
> @@ -851,6 +851,35 @@ __bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it)
>         kfree(kit->css_it);
>  }
>
> +struct bpf_iter_process_kern {
> +       struct task_struct *tsk;
> +} __attribute__((aligned(8)));
> +

Few high level thoughts. I think it would be good to follow
SEC("iter/task") naming and approach. Open-coded iterators in many
ways are in-kernel counterpart to iterator programs, so keeping them
close enough within reason is useful for knowledge transfer.

SEC("iter/task") allows to:
a) iterate all threads in the system
b) iterate all threads for a given TGID
c) it also allows to "iterate" a single thread or process, but that's
a bit less relevant for in-kernel iterator, but we can still support
them, why not?

I'm not sure if it supports iterating all processes (as in group
leaders of each task group) in the system, but if it's possible I
think we should support it at least for open-coded iterator, seems
like a very useful functionality.

So to that end, let's design a small set of input arguments for
bpf_iter_process_new() that would allow to specify this as flags +
either (optional) struct task_struct * pointer to represent
task/process or PID/TGID.


> +__bpf_kfunc int bpf_iter_process_new(struct bpf_iter_process *it)

Also, given iterator in the previous is called css_task, and that we
have iter/task, iter/task_vma, etc iterator programs, shouldn't this
one be called bpf_iter_task_new(), which also will be consistent with
Dave's task_vma open-coded iterator?

> +{
> +       struct bpf_iter_process_kern *kit = (void *)it;
> +
> +       BUILD_BUG_ON(sizeof(struct bpf_iter_process_kern) != sizeof(struct bpf_iter_process));
> +       BUILD_BUG_ON(__alignof__(struct bpf_iter_process_kern) !=
> +                                       __alignof__(struct bpf_iter_process));
> +
> +       kit->tsk = &init_task;
> +       return 0;
> +}
> +
> +__bpf_kfunc struct task_struct *bpf_iter_process_next(struct bpf_iter_process *it)
> +{
> +       struct bpf_iter_process_kern *kit = (void *)it;
> +
> +       kit->tsk = next_task(kit->tsk);
> +
> +       return kit->tsk == &init_task ? NULL : kit->tsk;
> +}
> +
> +__bpf_kfunc void bpf_iter_process_destroy(struct bpf_iter_process *it)
> +{
> +}
> +
>  DEFINE_PER_CPU(struct mmap_unlock_irq_work, mmap_unlock_work);
>
>  static void do_mmap_read_unlock(struct irq_work *entry)
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index de02c0971428..befa55b52e29 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -7322,4 +7322,8 @@ struct bpf_iter_css_task {
>         __u64 __opaque[1];
>  } __attribute__((aligned(8)));
>
> +struct bpf_iter_process {
> +       __u64 __opaque[1];
> +} __attribute__((aligned(8)));
> +
>  #endif /* _UAPI__LINUX_BPF_H__ */
> diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h
> index f48723c6c593..858252c2641c 100644
> --- a/tools/lib/bpf/bpf_helpers.h
> +++ b/tools/lib/bpf/bpf_helpers.h
> @@ -310,6 +310,11 @@ extern int bpf_iter_css_task_new(struct bpf_iter_css_task *it,
>  extern struct task_struct *bpf_iter_css_task_next(struct bpf_iter_css_task *it) __weak __ksym;
>  extern void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it) __weak __ksym;
>
> +struct bpf_iter_process;
> +extern int bpf_iter_process_new(struct bpf_iter_process *it) __weak __ksym;
> +extern struct task_struct *bpf_iter_process_next(struct bpf_iter_process *it) __weak __ksym;
> +extern void bpf_iter_process_destroy(struct bpf_iter_process *it) __weak __ksym;
> +

same, please add this to bpf_experimental, not bpf_helpers.h


>  #ifndef bpf_for_each
>  /* bpf_for_each(iter_type, cur_elem, args...) provides generic construct for
>   * using BPF open-coded iterators without having to write mundane explicit
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v2 4/6] bpf: Introduce css_descendant open-coded iterator kfuncs
  2023-09-12  7:01 ` [PATCH bpf-next v2 4/6] bpf: Introduce css_descendant open-coded " Chuyi Zhou
                     ` (2 preceding siblings ...)
  2023-09-13  9:13   ` kernel test robot
@ 2023-09-14 23:26   ` Andrii Nakryiko
  2023-09-15 11:57     ` Chuyi Zhou
  3 siblings, 1 reply; 33+ messages in thread
From: Andrii Nakryiko @ 2023-09-14 23:26 UTC (permalink / raw)
  To: Chuyi Zhou; +Cc: bpf, ast, daniel, andrii, martin.lau, tj, linux-kernel

On Tue, Sep 12, 2023 at 12:02 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>
> This Patch adds kfuncs bpf_iter_css_{pre,post}_{new,next,destroy} which
> allow creation and manipulation of struct bpf_iter_css in open-coded
> iterator style. These kfuncs actually wrapps css_next_descendant_{pre,
> post}. BPF programs can use these kfuncs through bpf_for_each macro for
> iteration of all descendant css under a root css.
>
> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
> ---
>  include/uapi/linux/bpf.h       |  8 +++++
>  kernel/bpf/helpers.c           |  6 ++++
>  kernel/bpf/task_iter.c         | 53 ++++++++++++++++++++++++++++++++++
>  tools/include/uapi/linux/bpf.h |  8 +++++
>  tools/lib/bpf/bpf_helpers.h    | 12 ++++++++
>  5 files changed, 87 insertions(+)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index befa55b52e29..57760afc13d0 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -7326,4 +7326,12 @@ struct bpf_iter_process {
>         __u64 __opaque[1];
>  } __attribute__((aligned(8)));
>
> +struct bpf_iter_css_pre {
> +       __u64 __opaque[2];
> +} __attribute__((aligned(8)));
> +
> +struct bpf_iter_css_post {
> +       __u64 __opaque[2];
> +} __attribute__((aligned(8)));
> +
>  #endif /* _UAPI__LINUX_BPF_H__ */
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 9b7d2c6f99d1..ca1f6404af9e 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2510,6 +2510,12 @@ BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
>  BTF_ID_FLAGS(func, bpf_iter_process_new, KF_ITER_NEW)
>  BTF_ID_FLAGS(func, bpf_iter_process_next, KF_ITER_NEXT | KF_RET_NULL)
>  BTF_ID_FLAGS(func, bpf_iter_process_destroy, KF_ITER_DESTROY)
> +BTF_ID_FLAGS(func, bpf_iter_css_pre_new, KF_ITER_NEW)
> +BTF_ID_FLAGS(func, bpf_iter_css_pre_next, KF_ITER_NEXT | KF_RET_NULL)
> +BTF_ID_FLAGS(func, bpf_iter_css_pre_destroy, KF_ITER_DESTROY)
> +BTF_ID_FLAGS(func, bpf_iter_css_post_new, KF_ITER_NEW)
> +BTF_ID_FLAGS(func, bpf_iter_css_post_next, KF_ITER_NEXT | KF_RET_NULL)
> +BTF_ID_FLAGS(func, bpf_iter_css_post_destroy, KF_ITER_DESTROY)
>  BTF_ID_FLAGS(func, bpf_dynptr_adjust)
>  BTF_ID_FLAGS(func, bpf_dynptr_is_null)
>  BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> index 9d1927dc3a06..8963fc779b87 100644
> --- a/kernel/bpf/task_iter.c
> +++ b/kernel/bpf/task_iter.c
> @@ -880,6 +880,59 @@ __bpf_kfunc void bpf_iter_process_destroy(struct bpf_iter_process *it)
>  {
>  }
>
> +struct bpf_iter_css_kern {
> +       struct cgroup_subsys_state *root;
> +       struct cgroup_subsys_state *pos;
> +} __attribute__((aligned(8)));
> +
> +__bpf_kfunc int bpf_iter_css_pre_new(struct bpf_iter_css_pre *it,
> +               struct cgroup_subsys_state *root)

similar to my comment on previous patches, please see
kernel/bpf/cgroup_iter.c for iter/cgroup iterator program. Let's stay
consistent. We have one iterator that accepts parameters defining
iteration order and starting cgroup. Unless there are some technical
reasons we can't follow similar approach with this open-coded iter,
let's use the same approach. We can even reuse
BPF_CGROUP_ITER_DESCENDANTS_PRE, BPF_CGROUP_ITER_DESCENDANTS_POST,
BPF_CGROUP_ITER_ANCESTORS_UP enums.


> +{
> +       struct bpf_iter_css_kern *kit = (void *)it;
> +
> +       BUILD_BUG_ON(sizeof(struct bpf_iter_css_kern) != sizeof(struct bpf_iter_css_pre));
> +       BUILD_BUG_ON(__alignof__(struct bpf_iter_css_kern) != __alignof__(struct bpf_iter_css_pre));
> +       kit->root = root;
> +       kit->pos = NULL;
> +       return 0;
> +}
> +
> +__bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_pre_next(struct bpf_iter_css_pre *it)
> +{
> +       struct bpf_iter_css_kern *kit = (void *)it;
> +
> +       kit->pos = css_next_descendant_pre(kit->pos, kit->root);
> +       return kit->pos;
> +}
> +
> +__bpf_kfunc void bpf_iter_css_pre_destroy(struct bpf_iter_css_pre *it)
> +{
> +}
> +
> +__bpf_kfunc int bpf_iter_css_post_new(struct bpf_iter_css_post *it,
> +               struct cgroup_subsys_state *root)
> +{
> +       struct bpf_iter_css_kern *kit = (void *)it;
> +
> +       BUILD_BUG_ON(sizeof(struct bpf_iter_css_kern) != sizeof(struct bpf_iter_css_post));
> +       BUILD_BUG_ON(__alignof__(struct bpf_iter_css_kern) != __alignof__(struct bpf_iter_css_post));
> +       kit->root = root;
> +       kit->pos = NULL;
> +       return 0;
> +}
> +
> +__bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_post_next(struct bpf_iter_css_post *it)
> +{
> +       struct bpf_iter_css_kern *kit = (void *)it;
> +
> +       kit->pos = css_next_descendant_post(kit->pos, kit->root);
> +       return kit->pos;
> +}
> +
> +__bpf_kfunc void bpf_iter_css_post_destroy(struct bpf_iter_css_post *it)
> +{
> +}
> +
>  DEFINE_PER_CPU(struct mmap_unlock_irq_work, mmap_unlock_work);
>
>  static void do_mmap_read_unlock(struct irq_work *entry)
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index befa55b52e29..57760afc13d0 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -7326,4 +7326,12 @@ struct bpf_iter_process {
>         __u64 __opaque[1];
>  } __attribute__((aligned(8)));
>
> +struct bpf_iter_css_pre {
> +       __u64 __opaque[2];
> +} __attribute__((aligned(8)));
> +
> +struct bpf_iter_css_post {
> +       __u64 __opaque[2];
> +} __attribute__((aligned(8)));
> +
>  #endif /* _UAPI__LINUX_BPF_H__ */
> diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h
> index 858252c2641c..6e5bd9ef14d6 100644
> --- a/tools/lib/bpf/bpf_helpers.h
> +++ b/tools/lib/bpf/bpf_helpers.h
> @@ -315,6 +315,18 @@ extern int bpf_iter_process_new(struct bpf_iter_process *it) __weak __ksym;
>  extern struct task_struct *bpf_iter_process_next(struct bpf_iter_process *it) __weak __ksym;
>  extern void bpf_iter_process_destroy(struct bpf_iter_process *it) __weak __ksym;
>
> +struct bpf_iter_css_pre;
> +extern int bpf_iter_css_pre_new(struct bpf_iter_css_pre *it,
> +               struct cgroup_subsys_state *root) __weak __ksym;
> +extern struct cgroup_subsys_state *bpf_iter_css_pre_next(struct bpf_iter_css_pre *it) __weak __ksym;
> +extern void bpf_iter_css_pre_destroy(struct bpf_iter_css_pre *it) __weak __ksym;
> +
> +struct bpf_iter_css_post;
> +extern int bpf_iter_css_post_new(struct bpf_iter_css_post *it,
> +               struct cgroup_subsys_state *root) __weak __ksym;
> +extern struct cgroup_subsys_state *bpf_iter_css_post_next(struct bpf_iter_css_post *it) __weak __ksym;
> +extern void bpf_iter_css_post_destroy(struct bpf_iter_css_post *it) __weak __ksym;
> +
>  #ifndef bpf_for_each
>  /* bpf_for_each(iter_type, cur_elem, args...) provides generic construct for
>   * using BPF open-coded iterators without having to write mundane explicit
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v2 5/6] bpf: teach the verifier to enforce css_iter and process_iter in RCU CS
  2023-09-12  7:01 ` [PATCH bpf-next v2 5/6] bpf: teach the verifier to enforce css_iter and process_iter in RCU CS Chuyi Zhou
  2023-09-13 13:53   ` Chuyi Zhou
@ 2023-09-14 23:26   ` Andrii Nakryiko
  1 sibling, 0 replies; 33+ messages in thread
From: Andrii Nakryiko @ 2023-09-14 23:26 UTC (permalink / raw)
  To: Chuyi Zhou; +Cc: bpf, ast, daniel, andrii, martin.lau, tj, linux-kernel

On Tue, Sep 12, 2023 at 12:02 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>
> css_iter and process_iter should be used in rcu section. Specifically, in
> sleepable progs explicit bpf_rcu_read_lock() is needed before use these
> iters. In normal bpf progs that have implicit rcu_read_lock(), it's OK to
> use them directly.
>
> This patch checks whether we are in rcu cs before we want to invoke
> bpf_iter_process_new and bpf_iter_css_{pre, post}_new in
> mark_stack_slots_iter(). If the rcu protection is guaranteed, we would
> let st->type = PTR_TO_STACK | MEM_RCU. is_iter_reg_valid_init() will
> reject if reg->type is UNTRUSTED.

it would be nice to mention where this MEM_RCU is turned into
UNTRUSTED when we do rcu_read_unlock(). For someone unfamiliar with
these parts of verifier (like me) this is completely unobvious.

>
> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
> ---
>  kernel/bpf/verifier.c | 30 ++++++++++++++++++++++++++++--
>  1 file changed, 28 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 2367483bf4c2..6a6827ba7a18 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -1172,7 +1172,13 @@ static bool is_dynptr_type_expected(struct bpf_verifier_env *env, struct bpf_reg
>
>  static void __mark_reg_known_zero(struct bpf_reg_state *reg);
>
> +static bool in_rcu_cs(struct bpf_verifier_env *env);
> +
> +/* check whether we are using bpf_iter_process_*() or bpf_iter_css_*() */
> +static bool is_iter_need_rcu(struct bpf_kfunc_call_arg_meta *meta);
> +
>  static int mark_stack_slots_iter(struct bpf_verifier_env *env,
> +                                struct bpf_kfunc_call_arg_meta *meta,
>                                  struct bpf_reg_state *reg, int insn_idx,
>                                  struct btf *btf, u32 btf_id, int nr_slots)
>  {
> @@ -1193,6 +1199,12 @@ static int mark_stack_slots_iter(struct bpf_verifier_env *env,
>
>                 __mark_reg_known_zero(st);
>                 st->type = PTR_TO_STACK; /* we don't have dedicated reg type */
> +               if (is_iter_need_rcu(meta)) {
> +                       if (in_rcu_cs(env))
> +                               st->type |= MEM_RCU;
> +                       else
> +                               st->type |= PTR_UNTRUSTED;
> +               }
>                 st->live |= REG_LIVE_WRITTEN;
>                 st->ref_obj_id = i == 0 ? id : 0;
>                 st->iter.btf = btf;
> @@ -1281,6 +1293,8 @@ static bool is_iter_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_
>                 struct bpf_stack_state *slot = &state->stack[spi - i];
>                 struct bpf_reg_state *st = &slot->spilled_ptr;
>
> +               if (st->type & PTR_UNTRUSTED)
> +                       return false;
>                 /* only main (first) slot has ref_obj_id set */
>                 if (i == 0 && !st->ref_obj_id)
>                         return false;
> @@ -7503,13 +7517,13 @@ static int process_iter_arg(struct bpf_verifier_env *env, int regno, int insn_id
>                                 return err;
>                 }
>
> -               err = mark_stack_slots_iter(env, reg, insn_idx, meta->btf, btf_id, nr_slots);
> +               err = mark_stack_slots_iter(env, meta, reg, insn_idx, meta->btf, btf_id, nr_slots);
>                 if (err)
>                         return err;
>         } else {
>                 /* iter_next() or iter_destroy() expect initialized iter state*/
>                 if (!is_iter_reg_valid_init(env, reg, meta->btf, btf_id, nr_slots)) {
> -                       verbose(env, "expected an initialized iter_%s as arg #%d\n",
> +                       verbose(env, "expected an initialized iter_%s as arg #%d or without bpf_rcu_read_lock()\n",
>                                 iter_type_str(meta->btf, btf_id), regno);

this message makes no sense, but even if reworded it would be
confusing for users. So maybe do the RCU check separately and report a
clear message that this iterator is expected to be within a single
continuous rcu_read_{lock+unlock} region.

I do think tracking RCU regions explicitly would make for much easier
to follow code, better messages, etc. Probably would be beneficial for
some other RCU-protected features. But that's a separate topic.

>                         return -EINVAL;
>                 }
> @@ -10382,6 +10396,18 @@ BTF_ID(func, bpf_percpu_obj_new_impl)
>  BTF_ID(func, bpf_percpu_obj_drop_impl)
>  BTF_ID(func, bpf_iter_css_task_new)
>
> +BTF_SET_START(rcu_protect_kfuns_set)
> +BTF_ID(func, bpf_iter_process_new)
> +BTF_ID(func, bpf_iter_css_pre_new)
> +BTF_ID(func, bpf_iter_css_post_new)
> +BTF_SET_END(rcu_protect_kfuns_set)
> +

instead of maintaining these extra special sets, why not add a KF
flag, like KF_RCU_PROTECTED?


> +static inline bool is_iter_need_rcu(struct bpf_kfunc_call_arg_meta *meta)
> +{
> +       return btf_id_set_contains(&rcu_protect_kfuns_set, meta->func_id);
> +}
> +
> +
>  static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta)
>  {
>         if (meta->func_id == special_kfunc_list[KF_bpf_refcount_acquire_impl] &&
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v2 5/6] bpf: teach the verifier to enforce css_iter and process_iter in RCU CS
  2023-09-14  8:56     ` Chuyi Zhou
@ 2023-09-14 23:26       ` Andrii Nakryiko
  2023-09-15  5:46         ` [External] " Chuyi Zhou
  0 siblings, 1 reply; 33+ messages in thread
From: Andrii Nakryiko @ 2023-09-14 23:26 UTC (permalink / raw)
  To: Chuyi Zhou
  Cc: bpf, Alexei Starovoitov, ast, daniel, andrii, martin.lau, tj,
	linux-kernel

On Thu, Sep 14, 2023 at 1:56 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>
>
>
> 在 2023/9/13 21:53, Chuyi Zhou 写道:
> > Hello.
> >
> > 在 2023/9/12 15:01, Chuyi Zhou 写道:
> >> css_iter and process_iter should be used in rcu section. Specifically, in
> >> sleepable progs explicit bpf_rcu_read_lock() is needed before use these
> >> iters. In normal bpf progs that have implicit rcu_read_lock(), it's OK to
> >> use them directly.
> >>
> >> This patch checks whether we are in rcu cs before we want to invoke
> >> bpf_iter_process_new and bpf_iter_css_{pre, post}_new in
> >> mark_stack_slots_iter(). If the rcu protection is guaranteed, we would
> >> let st->type = PTR_TO_STACK | MEM_RCU. is_iter_reg_valid_init() will
> >> reject if reg->type is UNTRUSTED.
> >
> > I use the following BPF Prog to test this patch:
> >
> > SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
> > int iter_task_for_each_sleep(void *ctx)
> > {
> >      struct task_struct *task;
> >      struct task_struct *cur_task = bpf_get_current_task_btf();
> >
> >      if (cur_task->pid != target_pid)
> >          return 0;
> >      bpf_rcu_read_lock();
> >      bpf_for_each(process, task) {
> >          bpf_rcu_read_unlock();
> >          if (task->pid == target_pid)
> >              process_cnt += 1;
> >          bpf_rcu_read_lock();
> >      }
> >      bpf_rcu_read_unlock();
> >      return 0;
> > }
> >
> > Unfortunately, we can pass the verifier.
> >
> > Then I add some printk-messages before setting/clearing state to help
> > debug:
> >
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index d151e6b43a5f..35f3fa9471a9 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -1200,7 +1200,7 @@ static int mark_stack_slots_iter(struct
> > bpf_verifier_env *env,
> >                  __mark_reg_known_zero(st);
> >                  st->type = PTR_TO_STACK; /* we don't have dedicated reg
> > type */
> >                  if (is_iter_need_rcu(meta)) {
> > +                       printk("mark reg_addr : %px", st);
> >                          if (in_rcu_cs(env))
> >                                  st->type |= MEM_RCU;
> >                          else
> > @@ -11472,8 +11472,8 @@ static int check_kfunc_call(struct
> > bpf_verifier_env *env, struct bpf_insn *insn,
> >                          return -EINVAL;
> >                  } else if (rcu_unlock) {
> >                          bpf_for_each_reg_in_vstate(env->cur_state,
> > state, reg, ({
> > +                               printk("clear reg_addr : %px MEM_RCU :
> > %d PTR_UNTRUSTED : %d\n ", reg, reg->type & MEM_RCU, reg->type &
> > PTR_UNTRUSTED);
> >                                  if (reg->type & MEM_RCU) {
> > -                                       printk("clear reg addr : %lld",
> > reg);
> >                                          reg->type &= ~(MEM_RCU |
> > PTR_MAYBE_NULL);
> >                                          reg->type |= PTR_UNTRUSTED;
> >                                  }
> >
> >
> > The demsg log:
> >
> > [  393.705324] mark reg_addr : ffff88814e40e200
> >
> > [  393.706883] clear reg_addr : ffff88814d5f8000 MEM_RCU : 0
> > PTR_UNTRUSTED : 0
> >
> > [  393.707353] clear reg_addr : ffff88814d5f8078 MEM_RCU : 0
> > PTR_UNTRUSTED : 0
> >
> > [  393.708099] clear reg_addr : ffff88814d5f80f0 MEM_RCU : 0
> > PTR_UNTRUSTED : 0
> > ....
> > ....
> >
> > I didn't see ffff88814e40e200 is cleared as expected because
> > bpf_for_each_reg_in_vstate didn't find it.
> >
> > It seems when we are doing bpf_read_unlock() in the middle of iteration
> > and want to clearing state through bpf_for_each_reg_in_vstate, we can
> > not find the previous reg which we marked MEM_RCU/PTR_UNTRUSTED in
> > mark_stack_slots_iter().
> >
>
> bpf_get_spilled_reg will skip slots if they are not STACK_SPILL, but in
> mark_stack_slots_iter() we has marked the slots *STACK_ITER*
>
> With the following change, everything seems work OK.
>
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index a3236651ec64..83c5ecccadb4 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -387,7 +387,7 @@ struct bpf_verifier_state {
>
>   #define bpf_get_spilled_reg(slot, frame)                               \
>          (((slot < frame->allocated_stack / BPF_REG_SIZE) &&             \
> -         (frame->stack[slot].slot_type[0] == STACK_SPILL))             \
> +         (frame->stack[slot].slot_type[0] == STACK_SPILL ||
> frame->stack[slot].slot_type[0] == STACK_ITER))            \
>           ? &frame->stack[slot].spilled_ptr : NULL)
>
> I am not sure whether this would harm some logic implicitly when using
> bpf_get_spilled_reg/bpf_for_each_spilled_reg in other place. If so,
> maybe we should add a extra parameter to control the picking behaviour.
>
> #define bpf_get_spilled_reg(slot, frame, stack_type)
>                         \
>         (((slot < frame->allocated_stack / BPF_REG_SIZE) &&             \
>           (frame->stack[slot].slot_type[0] == stack_type))              \
>          ? &frame->stack[slot].spilled_ptr : NULL)
>
> Thanks.

I don't think it's safe to just make bpf_get_spilled_reg, and
subsequently bpf_for_each_reg_in_vstate and bpf_for_each_spilled_reg
just suddenly start iterating iterator states and/or dynptrs. At least
some of existing uses of those assume they are really working just
with registers.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v2 3/6] bpf: Introduce process open coded iterator kfuncs
  2023-09-14 23:26   ` Andrii Nakryiko
@ 2023-09-15  4:48     ` Chuyi Zhou
  2023-09-15 15:03     ` Chuyi Zhou
  1 sibling, 0 replies; 33+ messages in thread
From: Chuyi Zhou @ 2023-09-15  4:48 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, ast, daniel, andrii, martin.lau, tj, linux-kernel

Hello.

在 2023/9/15 07:26, Andrii Nakryiko 写道:
> On Tue, Sep 12, 2023 at 12:02 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>>
>> This patch adds kfuncs bpf_iter_process_{new,next,destroy} which allow
>> creation and manipulation of struct bpf_iter_process in open-coded iterator
>> style. BPF programs can use these kfuncs or through bpf_for_each macro to
>> iterate all processes in the system.
>>
>> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
>> ---
>>   include/uapi/linux/bpf.h       |  4 ++++
>>   kernel/bpf/helpers.c           |  3 +++
>>   kernel/bpf/task_iter.c         | 29 +++++++++++++++++++++++++++++
>>   tools/include/uapi/linux/bpf.h |  4 ++++
>>   tools/lib/bpf/bpf_helpers.h    |  5 +++++
>>   5 files changed, 45 insertions(+)
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index de02c0971428..befa55b52e29 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -7322,4 +7322,8 @@ struct bpf_iter_css_task {
>>          __u64 __opaque[1];
>>   } __attribute__((aligned(8)));
>>
>> +struct bpf_iter_process {
>> +       __u64 __opaque[1];
>> +} __attribute__((aligned(8)));
>> +
>>   #endif /* _UAPI__LINUX_BPF_H__ */
>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>> index d6a16becfbb9..9b7d2c6f99d1 100644
>> --- a/kernel/bpf/helpers.c
>> +++ b/kernel/bpf/helpers.c
>> @@ -2507,6 +2507,9 @@ BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY)
>>   BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW)
>>   BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL)
>>   BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
>> +BTF_ID_FLAGS(func, bpf_iter_process_new, KF_ITER_NEW)
>> +BTF_ID_FLAGS(func, bpf_iter_process_next, KF_ITER_NEXT | KF_RET_NULL)
>> +BTF_ID_FLAGS(func, bpf_iter_process_destroy, KF_ITER_DESTROY)
>>   BTF_ID_FLAGS(func, bpf_dynptr_adjust)
>>   BTF_ID_FLAGS(func, bpf_dynptr_is_null)
>>   BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
>> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
>> index d8539cc05ffd..9d1927dc3a06 100644
>> --- a/kernel/bpf/task_iter.c
>> +++ b/kernel/bpf/task_iter.c
>> @@ -851,6 +851,35 @@ __bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it)
>>          kfree(kit->css_it);
>>   }
>>
>> +struct bpf_iter_process_kern {
>> +       struct task_struct *tsk;
>> +} __attribute__((aligned(8)));
>> +
> 
> Few high level thoughts. I think it would be good to follow
> SEC("iter/task") naming and approach. Open-coded iterators in many
> ways are in-kernel counterpart to iterator programs, so keeping them
> close enough within reason is useful for knowledge transfer.
> 
> SEC("iter/task") allows to:
> a) iterate all threads in the system
> b) iterate all threads for a given TGID
> c) it also allows to "iterate" a single thread or process, but that's
> a bit less relevant for in-kernel iterator, but we can still support
> them, why not?
> 
> I'm not sure if it supports iterating all processes (as in group
> leaders of each task group) in the system, but if it's possible I
> think we should support it at least for open-coded iterator, seems
> like a very useful functionality.
> 
> So to that end, let's design a small set of input arguments for
> bpf_iter_process_new() that would allow to specify this as flags +
> either (optional) struct task_struct * pointer to represent
> task/process or PID/TGID.
> 

IIUC, we should define the following task_new kfunc

struct bpf_iter_task_kern {
	struct task_struct *start;
	struct task_struct *cur;	
	unsigned int flag;
} __attribute__((aligned(8)));

__bpf_kfunc int bpf_iter_task_new(struct bpf_iter_css_task *it,
		struct task_struct *start, unsigned int flags)

If we want to iterate all threads of a task, just pass it to *start*,
and if we want to iterating all process in the system, users may need to 
pass a nullptr to the *start*. But it seems current BPF verifier will 
reject the nullptr to task_struct. The error message meybe:
"Possibly NULL pointer passed to trusted argx"

I noticed the __opt annotation in kfunc document. It seems with 
following we can pass the nullptr to *start*

__bpf_kfunc int bpf_iter_task_new(struct bpf_iter_css_task *it,
		void *start__opt, unsigned int flags__szk)

However, in this way, user can pass any invalid pointer to the kfunc 
without verifying. Besides, it seems __opt is only allowed to use with 
__szk together and flags__szk is ambiguous in semantics.

Do you have better ideas? Or I missing something ?

Thanks


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [External] Re: [PATCH bpf-next v2 5/6] bpf: teach the verifier to enforce css_iter and process_iter in RCU CS
  2023-09-14 23:26       ` Andrii Nakryiko
@ 2023-09-15  5:46         ` Chuyi Zhou
  2023-09-15 20:23           ` Andrii Nakryiko
  0 siblings, 1 reply; 33+ messages in thread
From: Chuyi Zhou @ 2023-09-15  5:46 UTC (permalink / raw)
  To: Andrii Nakryiko, Alexei Starovoitov
  Cc: bpf, daniel, andrii, martin.lau, tj, linux-kernel, ast

Hello.

在 2023/9/15 07:26, Andrii Nakryiko 写道:
> On Thu, Sep 14, 2023 at 1:56 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>>
>>
>>
>> 在 2023/9/13 21:53, Chuyi Zhou 写道:
>>> Hello.
>>>
>>> 在 2023/9/12 15:01, Chuyi Zhou 写道:
>>>> css_iter and process_iter should be used in rcu section. Specifically, in
>>>> sleepable progs explicit bpf_rcu_read_lock() is needed before use these
>>>> iters. In normal bpf progs that have implicit rcu_read_lock(), it's OK to
>>>> use them directly.
>>>>
>>>> This patch checks whether we are in rcu cs before we want to invoke
>>>> bpf_iter_process_new and bpf_iter_css_{pre, post}_new in
>>>> mark_stack_slots_iter(). If the rcu protection is guaranteed, we would
>>>> let st->type = PTR_TO_STACK | MEM_RCU. is_iter_reg_valid_init() will
>>>> reject if reg->type is UNTRUSTED.
>>>
>>> I use the following BPF Prog to test this patch:
>>>
>>> SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
>>> int iter_task_for_each_sleep(void *ctx)
>>> {
>>>       struct task_struct *task;
>>>       struct task_struct *cur_task = bpf_get_current_task_btf();
>>>
>>>       if (cur_task->pid != target_pid)
>>>           return 0;
>>>       bpf_rcu_read_lock();
>>>       bpf_for_each(process, task) {
>>>           bpf_rcu_read_unlock();
>>>           if (task->pid == target_pid)
>>>               process_cnt += 1;
>>>           bpf_rcu_read_lock();
>>>       }
>>>       bpf_rcu_read_unlock();
>>>       return 0;
>>> }
>>>
>>> Unfortunately, we can pass the verifier.
>>>
>>> Then I add some printk-messages before setting/clearing state to help
>>> debug:
>>>
>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>>> index d151e6b43a5f..35f3fa9471a9 100644
>>> --- a/kernel/bpf/verifier.c
>>> +++ b/kernel/bpf/verifier.c
>>> @@ -1200,7 +1200,7 @@ static int mark_stack_slots_iter(struct
>>> bpf_verifier_env *env,
>>>                   __mark_reg_known_zero(st);
>>>                   st->type = PTR_TO_STACK; /* we don't have dedicated reg
>>> type */
>>>                   if (is_iter_need_rcu(meta)) {
>>> +                       printk("mark reg_addr : %px", st);
>>>                           if (in_rcu_cs(env))
>>>                                   st->type |= MEM_RCU;
>>>                           else
>>> @@ -11472,8 +11472,8 @@ static int check_kfunc_call(struct
>>> bpf_verifier_env *env, struct bpf_insn *insn,
>>>                           return -EINVAL;
>>>                   } else if (rcu_unlock) {
>>>                           bpf_for_each_reg_in_vstate(env->cur_state,
>>> state, reg, ({
>>> +                               printk("clear reg_addr : %px MEM_RCU :
>>> %d PTR_UNTRUSTED : %d\n ", reg, reg->type & MEM_RCU, reg->type &
>>> PTR_UNTRUSTED);
>>>                                   if (reg->type & MEM_RCU) {
>>> -                                       printk("clear reg addr : %lld",
>>> reg);
>>>                                           reg->type &= ~(MEM_RCU |
>>> PTR_MAYBE_NULL);
>>>                                           reg->type |= PTR_UNTRUSTED;
>>>                                   }
>>>
>>>
>>> The demsg log:
>>>
>>> [  393.705324] mark reg_addr : ffff88814e40e200
>>>
>>> [  393.706883] clear reg_addr : ffff88814d5f8000 MEM_RCU : 0
>>> PTR_UNTRUSTED : 0
>>>
>>> [  393.707353] clear reg_addr : ffff88814d5f8078 MEM_RCU : 0
>>> PTR_UNTRUSTED : 0
>>>
>>> [  393.708099] clear reg_addr : ffff88814d5f80f0 MEM_RCU : 0
>>> PTR_UNTRUSTED : 0
>>> ....
>>> ....
>>>
>>> I didn't see ffff88814e40e200 is cleared as expected because
>>> bpf_for_each_reg_in_vstate didn't find it.
>>>
>>> It seems when we are doing bpf_read_unlock() in the middle of iteration
>>> and want to clearing state through bpf_for_each_reg_in_vstate, we can
>>> not find the previous reg which we marked MEM_RCU/PTR_UNTRUSTED in
>>> mark_stack_slots_iter().
>>>
>>
>> bpf_get_spilled_reg will skip slots if they are not STACK_SPILL, but in
>> mark_stack_slots_iter() we has marked the slots *STACK_ITER*
>>
>> With the following change, everything seems work OK.
>>
>> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
>> index a3236651ec64..83c5ecccadb4 100644
>> --- a/include/linux/bpf_verifier.h
>> +++ b/include/linux/bpf_verifier.h
>> @@ -387,7 +387,7 @@ struct bpf_verifier_state {
>>
>>    #define bpf_get_spilled_reg(slot, frame)                               \
>>           (((slot < frame->allocated_stack / BPF_REG_SIZE) &&             \
>> -         (frame->stack[slot].slot_type[0] == STACK_SPILL))             \
>> +         (frame->stack[slot].slot_type[0] == STACK_SPILL ||
>> frame->stack[slot].slot_type[0] == STACK_ITER))            \
>>            ? &frame->stack[slot].spilled_ptr : NULL)
>>
>> I am not sure whether this would harm some logic implicitly when using
>> bpf_get_spilled_reg/bpf_for_each_spilled_reg in other place. If so,
>> maybe we should add a extra parameter to control the picking behaviour.
>>
>> #define bpf_get_spilled_reg(slot, frame, stack_type)
>>                          \
>>          (((slot < frame->allocated_stack / BPF_REG_SIZE) &&             \
>>            (frame->stack[slot].slot_type[0] == stack_type))              \
>>           ? &frame->stack[slot].spilled_ptr : NULL)
>>
>> Thanks.
> 
> I don't think it's safe to just make bpf_get_spilled_reg, and
> subsequently bpf_for_each_reg_in_vstate and bpf_for_each_spilled_reg
> just suddenly start iterating iterator states and/or dynptrs. At least
> some of existing uses of those assume they are really working just
> with registers.

IIUC, when we are doing bpf_rcu_unlock, we do need to clear the state of 
reg including STACK_ITER.

Maybe here we only need change the logic when using 
bpf_for_each_reg_in_vstate to clear state in bpf_rcu_unlock and keep 
everything else unchanged ?

Thanks.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v2 4/6] bpf: Introduce css_descendant open-coded iterator kfuncs
  2023-09-14 23:26   ` Andrii Nakryiko
@ 2023-09-15 11:57     ` Chuyi Zhou
  2023-09-15 20:25       ` Andrii Nakryiko
  0 siblings, 1 reply; 33+ messages in thread
From: Chuyi Zhou @ 2023-09-15 11:57 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, ast, daniel, andrii, martin.lau, tj, linux-kernel

Hello.

在 2023/9/15 07:26, Andrii Nakryiko 写道:
> On Tue, Sep 12, 2023 at 12:02 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>>
>> This Patch adds kfuncs bpf_iter_css_{pre,post}_{new,next,destroy} which
>> allow creation and manipulation of struct bpf_iter_css in open-coded
>> iterator style. These kfuncs actually wrapps css_next_descendant_{pre,
>> post}. BPF programs can use these kfuncs through bpf_for_each macro for
>> iteration of all descendant css under a root css.
>>
>> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
>> ---
>>   include/uapi/linux/bpf.h       |  8 +++++
>>   kernel/bpf/helpers.c           |  6 ++++
>>   kernel/bpf/task_iter.c         | 53 ++++++++++++++++++++++++++++++++++
>>   tools/include/uapi/linux/bpf.h |  8 +++++
>>   tools/lib/bpf/bpf_helpers.h    | 12 ++++++++
>>   5 files changed, 87 insertions(+)
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index befa55b52e29..57760afc13d0 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -7326,4 +7326,12 @@ struct bpf_iter_process {
>>          __u64 __opaque[1];
>>   } __attribute__((aligned(8)));
>>
>> +struct bpf_iter_css_pre {
>> +       __u64 __opaque[2];
>> +} __attribute__((aligned(8)));
>> +
>> +struct bpf_iter_css_post {
>> +       __u64 __opaque[2];
>> +} __attribute__((aligned(8)));
>> +
>>   #endif /* _UAPI__LINUX_BPF_H__ */
>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>> index 9b7d2c6f99d1..ca1f6404af9e 100644
>> --- a/kernel/bpf/helpers.c
>> +++ b/kernel/bpf/helpers.c
>> @@ -2510,6 +2510,12 @@ BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
>>   BTF_ID_FLAGS(func, bpf_iter_process_new, KF_ITER_NEW)
>>   BTF_ID_FLAGS(func, bpf_iter_process_next, KF_ITER_NEXT | KF_RET_NULL)
>>   BTF_ID_FLAGS(func, bpf_iter_process_destroy, KF_ITER_DESTROY)
>> +BTF_ID_FLAGS(func, bpf_iter_css_pre_new, KF_ITER_NEW)
>> +BTF_ID_FLAGS(func, bpf_iter_css_pre_next, KF_ITER_NEXT | KF_RET_NULL)
>> +BTF_ID_FLAGS(func, bpf_iter_css_pre_destroy, KF_ITER_DESTROY)
>> +BTF_ID_FLAGS(func, bpf_iter_css_post_new, KF_ITER_NEW)
>> +BTF_ID_FLAGS(func, bpf_iter_css_post_next, KF_ITER_NEXT | KF_RET_NULL)
>> +BTF_ID_FLAGS(func, bpf_iter_css_post_destroy, KF_ITER_DESTROY)
>>   BTF_ID_FLAGS(func, bpf_dynptr_adjust)
>>   BTF_ID_FLAGS(func, bpf_dynptr_is_null)
>>   BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
>> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
>> index 9d1927dc3a06..8963fc779b87 100644
>> --- a/kernel/bpf/task_iter.c
>> +++ b/kernel/bpf/task_iter.c
>> @@ -880,6 +880,59 @@ __bpf_kfunc void bpf_iter_process_destroy(struct bpf_iter_process *it)
>>   {
>>   }
>>
>> +struct bpf_iter_css_kern {
>> +       struct cgroup_subsys_state *root;
>> +       struct cgroup_subsys_state *pos;
>> +} __attribute__((aligned(8)));
>> +
>> +__bpf_kfunc int bpf_iter_css_pre_new(struct bpf_iter_css_pre *it,
>> +               struct cgroup_subsys_state *root)
> 
> similar to my comment on previous patches, please see
> kernel/bpf/cgroup_iter.c for iter/cgroup iterator program. Let's stay
> consistent. We have one iterator that accepts parameters defining
> iteration order and starting cgroup. Unless there are some technical
> reasons we can't follow similar approach with this open-coded iter,
> let's use the same approach. We can even reuse
> BPF_CGROUP_ITER_DESCENDANTS_PRE, BPF_CGROUP_ITER_DESCENDANTS_POST,
> BPF_CGROUP_ITER_ANCESTORS_UP enums.
> 

I know your concern. It would be nice if we keep consistent with 
kernel/bpf/cgroup_iter.c

But this patch actually want to support iterating css 
(cgroup_subsys_state) not cgroup (css is more low lever).
With css_iter we can do something like 
"for_each_mem_cgroup_tree/cpuset_for_each_descendant_pre"
in BPF Progs which is hard for cgroup_iter. In the future we can use 
this iterator to plug some customizable policy in other resource control 
system.

BTW, what I did in RFC actually very similar with the approach of 
cgroup_iter. 
(https://lore.kernel.org/all/20230827072057.1591929-4-zhouchuyi@bytedance.com/).

Thanks.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v2 3/6] bpf: Introduce process open coded iterator kfuncs
  2023-09-14 23:26   ` Andrii Nakryiko
  2023-09-15  4:48     ` Chuyi Zhou
@ 2023-09-15 15:03     ` Chuyi Zhou
  2023-09-15 20:37       ` Andrii Nakryiko
  1 sibling, 1 reply; 33+ messages in thread
From: Chuyi Zhou @ 2023-09-15 15:03 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, ast, daniel, andrii, martin.lau, tj, linux-kernel



在 2023/9/15 07:26, Andrii Nakryiko 写道:
> On Tue, Sep 12, 2023 at 12:02 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>>
>> This patch adds kfuncs bpf_iter_process_{new,next,destroy} which allow
>> creation and manipulation of struct bpf_iter_process in open-coded iterator
>> style. BPF programs can use these kfuncs or through bpf_for_each macro to
>> iterate all processes in the system.
>>
>> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
>> ---
>>   include/uapi/linux/bpf.h       |  4 ++++
>>   kernel/bpf/helpers.c           |  3 +++
>>   kernel/bpf/task_iter.c         | 29 +++++++++++++++++++++++++++++
>>   tools/include/uapi/linux/bpf.h |  4 ++++
>>   tools/lib/bpf/bpf_helpers.h    |  5 +++++
>>   5 files changed, 45 insertions(+)
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index de02c0971428..befa55b52e29 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -7322,4 +7322,8 @@ struct bpf_iter_css_task {
>>          __u64 __opaque[1];
>>   } __attribute__((aligned(8)));
>>
>> +struct bpf_iter_process {
>> +       __u64 __opaque[1];
>> +} __attribute__((aligned(8)));
>> +
>>   #endif /* _UAPI__LINUX_BPF_H__ */
>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>> index d6a16becfbb9..9b7d2c6f99d1 100644
>> --- a/kernel/bpf/helpers.c
>> +++ b/kernel/bpf/helpers.c
>> @@ -2507,6 +2507,9 @@ BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY)
>>   BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW)
>>   BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL)
>>   BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
>> +BTF_ID_FLAGS(func, bpf_iter_process_new, KF_ITER_NEW)
>> +BTF_ID_FLAGS(func, bpf_iter_process_next, KF_ITER_NEXT | KF_RET_NULL)
>> +BTF_ID_FLAGS(func, bpf_iter_process_destroy, KF_ITER_DESTROY)
>>   BTF_ID_FLAGS(func, bpf_dynptr_adjust)
>>   BTF_ID_FLAGS(func, bpf_dynptr_is_null)
>>   BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
>> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
>> index d8539cc05ffd..9d1927dc3a06 100644
>> --- a/kernel/bpf/task_iter.c
>> +++ b/kernel/bpf/task_iter.c
>> @@ -851,6 +851,35 @@ __bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it)
>>          kfree(kit->css_it);
>>   }
>>
>> +struct bpf_iter_process_kern {
>> +       struct task_struct *tsk;
>> +} __attribute__((aligned(8)));
>> +
> 
> Few high level thoughts. I think it would be good to follow
> SEC("iter/task") naming and approach. Open-coded iterators in many
> ways are in-kernel counterpart to iterator programs, so keeping them
> close enough within reason is useful for knowledge transfer.
> 
> SEC("iter/task") allows to:
> a) iterate all threads in the system
> b) iterate all threads for a given TGID
> c) it also allows to "iterate" a single thread or process, but that's
> a bit less relevant for in-kernel iterator, but we can still support
> them, why not?
> 
> I'm not sure if it supports iterating all processes (as in group
> leaders of each task group) in the system, but if it's possible I
> think we should support it at least for open-coded iterator, seems
> like a very useful functionality.
> 
> So to that end, let's design a small set of input arguments for
> bpf_iter_process_new() that would allow to specify this as flags +
> either (optional) struct task_struct * pointer to represent
> task/process or PID/TGID.
> 

Another concern from Alexei was the readability of the API of open-coded 
in BPF Program[1].

bpf_for_each(task, curr) is straightforward. Users can easily understand 
that this API does the same thing as 'for_each_process' in kernel.

However, if we keep the approach of SEC("iter/task")

enum ITER_ITEM {
	ITER_TASK,
	ITER_THREAD,
}

__bpf_kfunc int bpf_iter_task_new(struct bpf_iter_process *it, struct 
task_struct *group_task, enum ITER_ITEM type)

the API have to chang:


bpf_for_each(task, curr, NULL, ITERATE_TASK) // iterate all process in 
the  system
bpf_for_each(task, curr, group_leader, ITERATE_THREAD) // iterate all 
thread of group_leader
bpf_for_each(task, curr, NULL, ITERATE_THREAD) //iterate all threads of 
all the process in the system

Useres may guess what are this API actually doing....

So, I'm thinking if we can add a layer of abstraction to hide the 
details from the users:

#define bpf_for_each_process(task) \
	bpf_for_each(task, curr, NULL, ITERATE_TASK)


It would be nice if you could give me some better suggestions.

Thanks!

[1] 
https://lore.kernel.org/lkml/CAADnVQLbDWUxFen-RS67C86sOE5DykEPD8xyihJ2RnG1WEnTQg@mail.gmail.com/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [External] Re: [PATCH bpf-next v2 5/6] bpf: teach the verifier to enforce css_iter and process_iter in RCU CS
  2023-09-15  5:46         ` [External] " Chuyi Zhou
@ 2023-09-15 20:23           ` Andrii Nakryiko
  0 siblings, 0 replies; 33+ messages in thread
From: Andrii Nakryiko @ 2023-09-15 20:23 UTC (permalink / raw)
  To: Chuyi Zhou
  Cc: Alexei Starovoitov, bpf, daniel, andrii, martin.lau, tj,
	linux-kernel, ast

On Thu, Sep 14, 2023 at 10:46 PM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>
> Hello.
>
> 在 2023/9/15 07:26, Andrii Nakryiko 写道:
> > On Thu, Sep 14, 2023 at 1:56 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
> >>
> >>
> >>
> >> 在 2023/9/13 21:53, Chuyi Zhou 写道:
> >>> Hello.
> >>>
> >>> 在 2023/9/12 15:01, Chuyi Zhou 写道:
> >>>> css_iter and process_iter should be used in rcu section. Specifically, in
> >>>> sleepable progs explicit bpf_rcu_read_lock() is needed before use these
> >>>> iters. In normal bpf progs that have implicit rcu_read_lock(), it's OK to
> >>>> use them directly.
> >>>>
> >>>> This patch checks whether we are in rcu cs before we want to invoke
> >>>> bpf_iter_process_new and bpf_iter_css_{pre, post}_new in
> >>>> mark_stack_slots_iter(). If the rcu protection is guaranteed, we would
> >>>> let st->type = PTR_TO_STACK | MEM_RCU. is_iter_reg_valid_init() will
> >>>> reject if reg->type is UNTRUSTED.
> >>>
> >>> I use the following BPF Prog to test this patch:
> >>>
> >>> SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
> >>> int iter_task_for_each_sleep(void *ctx)
> >>> {
> >>>       struct task_struct *task;
> >>>       struct task_struct *cur_task = bpf_get_current_task_btf();
> >>>
> >>>       if (cur_task->pid != target_pid)
> >>>           return 0;
> >>>       bpf_rcu_read_lock();
> >>>       bpf_for_each(process, task) {
> >>>           bpf_rcu_read_unlock();
> >>>           if (task->pid == target_pid)
> >>>               process_cnt += 1;
> >>>           bpf_rcu_read_lock();
> >>>       }
> >>>       bpf_rcu_read_unlock();
> >>>       return 0;
> >>> }
> >>>
> >>> Unfortunately, we can pass the verifier.
> >>>
> >>> Then I add some printk-messages before setting/clearing state to help
> >>> debug:
> >>>
> >>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> >>> index d151e6b43a5f..35f3fa9471a9 100644
> >>> --- a/kernel/bpf/verifier.c
> >>> +++ b/kernel/bpf/verifier.c
> >>> @@ -1200,7 +1200,7 @@ static int mark_stack_slots_iter(struct
> >>> bpf_verifier_env *env,
> >>>                   __mark_reg_known_zero(st);
> >>>                   st->type = PTR_TO_STACK; /* we don't have dedicated reg
> >>> type */
> >>>                   if (is_iter_need_rcu(meta)) {
> >>> +                       printk("mark reg_addr : %px", st);
> >>>                           if (in_rcu_cs(env))
> >>>                                   st->type |= MEM_RCU;
> >>>                           else
> >>> @@ -11472,8 +11472,8 @@ static int check_kfunc_call(struct
> >>> bpf_verifier_env *env, struct bpf_insn *insn,
> >>>                           return -EINVAL;
> >>>                   } else if (rcu_unlock) {
> >>>                           bpf_for_each_reg_in_vstate(env->cur_state,
> >>> state, reg, ({
> >>> +                               printk("clear reg_addr : %px MEM_RCU :
> >>> %d PTR_UNTRUSTED : %d\n ", reg, reg->type & MEM_RCU, reg->type &
> >>> PTR_UNTRUSTED);
> >>>                                   if (reg->type & MEM_RCU) {
> >>> -                                       printk("clear reg addr : %lld",
> >>> reg);
> >>>                                           reg->type &= ~(MEM_RCU |
> >>> PTR_MAYBE_NULL);
> >>>                                           reg->type |= PTR_UNTRUSTED;
> >>>                                   }
> >>>
> >>>
> >>> The demsg log:
> >>>
> >>> [  393.705324] mark reg_addr : ffff88814e40e200
> >>>
> >>> [  393.706883] clear reg_addr : ffff88814d5f8000 MEM_RCU : 0
> >>> PTR_UNTRUSTED : 0
> >>>
> >>> [  393.707353] clear reg_addr : ffff88814d5f8078 MEM_RCU : 0
> >>> PTR_UNTRUSTED : 0
> >>>
> >>> [  393.708099] clear reg_addr : ffff88814d5f80f0 MEM_RCU : 0
> >>> PTR_UNTRUSTED : 0
> >>> ....
> >>> ....
> >>>
> >>> I didn't see ffff88814e40e200 is cleared as expected because
> >>> bpf_for_each_reg_in_vstate didn't find it.
> >>>
> >>> It seems when we are doing bpf_read_unlock() in the middle of iteration
> >>> and want to clearing state through bpf_for_each_reg_in_vstate, we can
> >>> not find the previous reg which we marked MEM_RCU/PTR_UNTRUSTED in
> >>> mark_stack_slots_iter().
> >>>
> >>
> >> bpf_get_spilled_reg will skip slots if they are not STACK_SPILL, but in
> >> mark_stack_slots_iter() we has marked the slots *STACK_ITER*
> >>
> >> With the following change, everything seems work OK.
> >>
> >> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> >> index a3236651ec64..83c5ecccadb4 100644
> >> --- a/include/linux/bpf_verifier.h
> >> +++ b/include/linux/bpf_verifier.h
> >> @@ -387,7 +387,7 @@ struct bpf_verifier_state {
> >>
> >>    #define bpf_get_spilled_reg(slot, frame)                               \
> >>           (((slot < frame->allocated_stack / BPF_REG_SIZE) &&             \
> >> -         (frame->stack[slot].slot_type[0] == STACK_SPILL))             \
> >> +         (frame->stack[slot].slot_type[0] == STACK_SPILL ||
> >> frame->stack[slot].slot_type[0] == STACK_ITER))            \
> >>            ? &frame->stack[slot].spilled_ptr : NULL)
> >>
> >> I am not sure whether this would harm some logic implicitly when using
> >> bpf_get_spilled_reg/bpf_for_each_spilled_reg in other place. If so,
> >> maybe we should add a extra parameter to control the picking behaviour.
> >>
> >> #define bpf_get_spilled_reg(slot, frame, stack_type)
> >>                          \
> >>          (((slot < frame->allocated_stack / BPF_REG_SIZE) &&             \
> >>            (frame->stack[slot].slot_type[0] == stack_type))              \
> >>           ? &frame->stack[slot].spilled_ptr : NULL)
> >>
> >> Thanks.
> >
> > I don't think it's safe to just make bpf_get_spilled_reg, and
> > subsequently bpf_for_each_reg_in_vstate and bpf_for_each_spilled_reg
> > just suddenly start iterating iterator states and/or dynptrs. At least
> > some of existing uses of those assume they are really working just
> > with registers.
>
> IIUC, when we are doing bpf_rcu_unlock, we do need to clear the state of
> reg including STACK_ITER.
>
> Maybe here we only need change the logic when using
> bpf_for_each_reg_in_vstate to clear state in bpf_rcu_unlock and keep
> everything else unchanged ?

Right, maybe. I see 10 uses of bpf_for_each_reg_in_vstate() in
kernel/bpf/verifier.c. Before we change the definition of
bpf_for_each_reg_in_vstate() we should validate that iterating dynptr
and iter states doesn't break any of them, that's all.

>
> Thanks.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v2 4/6] bpf: Introduce css_descendant open-coded iterator kfuncs
  2023-09-15 11:57     ` Chuyi Zhou
@ 2023-09-15 20:25       ` Andrii Nakryiko
  0 siblings, 0 replies; 33+ messages in thread
From: Andrii Nakryiko @ 2023-09-15 20:25 UTC (permalink / raw)
  To: Chuyi Zhou; +Cc: bpf, ast, daniel, andrii, martin.lau, tj, linux-kernel

On Fri, Sep 15, 2023 at 4:57 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>
> Hello.
>
> 在 2023/9/15 07:26, Andrii Nakryiko 写道:
> > On Tue, Sep 12, 2023 at 12:02 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
> >>
> >> This Patch adds kfuncs bpf_iter_css_{pre,post}_{new,next,destroy} which
> >> allow creation and manipulation of struct bpf_iter_css in open-coded
> >> iterator style. These kfuncs actually wrapps css_next_descendant_{pre,
> >> post}. BPF programs can use these kfuncs through bpf_for_each macro for
> >> iteration of all descendant css under a root css.
> >>
> >> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
> >> ---
> >>   include/uapi/linux/bpf.h       |  8 +++++
> >>   kernel/bpf/helpers.c           |  6 ++++
> >>   kernel/bpf/task_iter.c         | 53 ++++++++++++++++++++++++++++++++++
> >>   tools/include/uapi/linux/bpf.h |  8 +++++
> >>   tools/lib/bpf/bpf_helpers.h    | 12 ++++++++
> >>   5 files changed, 87 insertions(+)
> >>
> >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> >> index befa55b52e29..57760afc13d0 100644
> >> --- a/include/uapi/linux/bpf.h
> >> +++ b/include/uapi/linux/bpf.h
> >> @@ -7326,4 +7326,12 @@ struct bpf_iter_process {
> >>          __u64 __opaque[1];
> >>   } __attribute__((aligned(8)));
> >>
> >> +struct bpf_iter_css_pre {
> >> +       __u64 __opaque[2];
> >> +} __attribute__((aligned(8)));
> >> +
> >> +struct bpf_iter_css_post {
> >> +       __u64 __opaque[2];
> >> +} __attribute__((aligned(8)));
> >> +
> >>   #endif /* _UAPI__LINUX_BPF_H__ */
> >> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> >> index 9b7d2c6f99d1..ca1f6404af9e 100644
> >> --- a/kernel/bpf/helpers.c
> >> +++ b/kernel/bpf/helpers.c
> >> @@ -2510,6 +2510,12 @@ BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
> >>   BTF_ID_FLAGS(func, bpf_iter_process_new, KF_ITER_NEW)
> >>   BTF_ID_FLAGS(func, bpf_iter_process_next, KF_ITER_NEXT | KF_RET_NULL)
> >>   BTF_ID_FLAGS(func, bpf_iter_process_destroy, KF_ITER_DESTROY)
> >> +BTF_ID_FLAGS(func, bpf_iter_css_pre_new, KF_ITER_NEW)
> >> +BTF_ID_FLAGS(func, bpf_iter_css_pre_next, KF_ITER_NEXT | KF_RET_NULL)
> >> +BTF_ID_FLAGS(func, bpf_iter_css_pre_destroy, KF_ITER_DESTROY)
> >> +BTF_ID_FLAGS(func, bpf_iter_css_post_new, KF_ITER_NEW)
> >> +BTF_ID_FLAGS(func, bpf_iter_css_post_next, KF_ITER_NEXT | KF_RET_NULL)
> >> +BTF_ID_FLAGS(func, bpf_iter_css_post_destroy, KF_ITER_DESTROY)
> >>   BTF_ID_FLAGS(func, bpf_dynptr_adjust)
> >>   BTF_ID_FLAGS(func, bpf_dynptr_is_null)
> >>   BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
> >> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> >> index 9d1927dc3a06..8963fc779b87 100644
> >> --- a/kernel/bpf/task_iter.c
> >> +++ b/kernel/bpf/task_iter.c
> >> @@ -880,6 +880,59 @@ __bpf_kfunc void bpf_iter_process_destroy(struct bpf_iter_process *it)
> >>   {
> >>   }
> >>
> >> +struct bpf_iter_css_kern {
> >> +       struct cgroup_subsys_state *root;
> >> +       struct cgroup_subsys_state *pos;
> >> +} __attribute__((aligned(8)));
> >> +
> >> +__bpf_kfunc int bpf_iter_css_pre_new(struct bpf_iter_css_pre *it,
> >> +               struct cgroup_subsys_state *root)
> >
> > similar to my comment on previous patches, please see
> > kernel/bpf/cgroup_iter.c for iter/cgroup iterator program. Let's stay
> > consistent. We have one iterator that accepts parameters defining
> > iteration order and starting cgroup. Unless there are some technical
> > reasons we can't follow similar approach with this open-coded iter,
> > let's use the same approach. We can even reuse
> > BPF_CGROUP_ITER_DESCENDANTS_PRE, BPF_CGROUP_ITER_DESCENDANTS_POST,
> > BPF_CGROUP_ITER_ANCESTORS_UP enums.
> >
>
> I know your concern. It would be nice if we keep consistent with
> kernel/bpf/cgroup_iter.c
>
> But this patch actually want to support iterating css
> (cgroup_subsys_state) not cgroup (css is more low lever).
> With css_iter we can do something like
> "for_each_mem_cgroup_tree/cpuset_for_each_descendant_pre"
> in BPF Progs which is hard for cgroup_iter. In the future we can use
> this iterator to plug some customizable policy in other resource control
> system.

That's fine if it's not exactly cgroup iter and returns a different
kernel object. But let's at least consistently use
BPF_CGROUP_ITER_DESCENDANTS_PRE/BPF_CGROUP_ITER_DESCENDANTS_POST/BPF_CGROUP_ITER_ANCESTORS_UP
approach as a way to specify iteration order?

>
> BTW, what I did in RFC actually very similar with the approach of
> cgroup_iter.
> (https://lore.kernel.org/all/20230827072057.1591929-4-zhouchuyi@bytedance.com/).
>
> Thanks.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v2 3/6] bpf: Introduce process open coded iterator kfuncs
  2023-09-15 15:03     ` Chuyi Zhou
@ 2023-09-15 20:37       ` Andrii Nakryiko
  2023-09-16 14:03         ` Chuyi Zhou
  0 siblings, 1 reply; 33+ messages in thread
From: Andrii Nakryiko @ 2023-09-15 20:37 UTC (permalink / raw)
  To: Chuyi Zhou; +Cc: bpf, ast, daniel, andrii, martin.lau, tj, linux-kernel

On Fri, Sep 15, 2023 at 8:03 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>
>
>
> 在 2023/9/15 07:26, Andrii Nakryiko 写道:
> > On Tue, Sep 12, 2023 at 12:02 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
> >>
> >> This patch adds kfuncs bpf_iter_process_{new,next,destroy} which allow
> >> creation and manipulation of struct bpf_iter_process in open-coded iterator
> >> style. BPF programs can use these kfuncs or through bpf_for_each macro to
> >> iterate all processes in the system.
> >>
> >> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
> >> ---
> >>   include/uapi/linux/bpf.h       |  4 ++++
> >>   kernel/bpf/helpers.c           |  3 +++
> >>   kernel/bpf/task_iter.c         | 29 +++++++++++++++++++++++++++++
> >>   tools/include/uapi/linux/bpf.h |  4 ++++
> >>   tools/lib/bpf/bpf_helpers.h    |  5 +++++
> >>   5 files changed, 45 insertions(+)
> >>
> >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> >> index de02c0971428..befa55b52e29 100644
> >> --- a/include/uapi/linux/bpf.h
> >> +++ b/include/uapi/linux/bpf.h
> >> @@ -7322,4 +7322,8 @@ struct bpf_iter_css_task {
> >>          __u64 __opaque[1];
> >>   } __attribute__((aligned(8)));
> >>
> >> +struct bpf_iter_process {
> >> +       __u64 __opaque[1];
> >> +} __attribute__((aligned(8)));
> >> +
> >>   #endif /* _UAPI__LINUX_BPF_H__ */
> >> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> >> index d6a16becfbb9..9b7d2c6f99d1 100644
> >> --- a/kernel/bpf/helpers.c
> >> +++ b/kernel/bpf/helpers.c
> >> @@ -2507,6 +2507,9 @@ BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY)
> >>   BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW)
> >>   BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL)
> >>   BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
> >> +BTF_ID_FLAGS(func, bpf_iter_process_new, KF_ITER_NEW)
> >> +BTF_ID_FLAGS(func, bpf_iter_process_next, KF_ITER_NEXT | KF_RET_NULL)
> >> +BTF_ID_FLAGS(func, bpf_iter_process_destroy, KF_ITER_DESTROY)
> >>   BTF_ID_FLAGS(func, bpf_dynptr_adjust)
> >>   BTF_ID_FLAGS(func, bpf_dynptr_is_null)
> >>   BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
> >> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> >> index d8539cc05ffd..9d1927dc3a06 100644
> >> --- a/kernel/bpf/task_iter.c
> >> +++ b/kernel/bpf/task_iter.c
> >> @@ -851,6 +851,35 @@ __bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it)
> >>          kfree(kit->css_it);
> >>   }
> >>
> >> +struct bpf_iter_process_kern {
> >> +       struct task_struct *tsk;
> >> +} __attribute__((aligned(8)));
> >> +
> >
> > Few high level thoughts. I think it would be good to follow
> > SEC("iter/task") naming and approach. Open-coded iterators in many
> > ways are in-kernel counterpart to iterator programs, so keeping them
> > close enough within reason is useful for knowledge transfer.
> >
> > SEC("iter/task") allows to:
> > a) iterate all threads in the system
> > b) iterate all threads for a given TGID
> > c) it also allows to "iterate" a single thread or process, but that's
> > a bit less relevant for in-kernel iterator, but we can still support
> > them, why not?
> >
> > I'm not sure if it supports iterating all processes (as in group
> > leaders of each task group) in the system, but if it's possible I
> > think we should support it at least for open-coded iterator, seems
> > like a very useful functionality.
> >
> > So to that end, let's design a small set of input arguments for
> > bpf_iter_process_new() that would allow to specify this as flags +
> > either (optional) struct task_struct * pointer to represent
> > task/process or PID/TGID.
> >
>
> Another concern from Alexei was the readability of the API of open-coded
> in BPF Program[1].
>
> bpf_for_each(task, curr) is straightforward. Users can easily understand
> that this API does the same thing as 'for_each_process' in kernel.

In general, users might have no idea about for_each_process macro in
the kernel, so I don't find this particular argument very convincing.

We can add a separate set of iterator kfuncs for every useful
combination of conditions, of course, but it's a double-edged sword.
Needing to use a different iterator just to specify a different
direction of cgroup iteration (from the example you referred in [1])
also means that it's now harder to write some generic function that
needs to do something for all cgroups matching some criteria where the
order might be coming as an argument.

Similarly for task iterators. It's not hard to imagine some processing
that can be equivalently done per thread or per process in the system,
or on each thread of the process, depending on some conditions or
external configuration. Having to do three different
bpf_for_each(task_xxx, task, ...) for this seems suboptimal. If the
nature of the thing that is iterated over is the same, and it's just a
different set of filters to specify which subset of those items should
be iterated, I think it's better to try to stick to the same iterator
with few simple arguments. IMO, of course, there is no objectively
best approach.

>
> However, if we keep the approach of SEC("iter/task")
>
> enum ITER_ITEM {
>         ITER_TASK,
>         ITER_THREAD,
> }
>
> __bpf_kfunc int bpf_iter_task_new(struct bpf_iter_process *it, struct
> task_struct *group_task, enum ITER_ITEM type)
>
> the API have to chang:
>
>
> bpf_for_each(task, curr, NULL, ITERATE_TASK) // iterate all process in
> the  system
> bpf_for_each(task, curr, group_leader, ITERATE_THREAD) // iterate all
> thread of group_leader
> bpf_for_each(task, curr, NULL, ITERATE_THREAD) //iterate all threads of
> all the process in the system
>
> Useres may guess what are this API actually doing....

I'd expect users to consult documentation before trying to use an
unfamiliar cutting-edge functionality. So let's try to keep
documentation clear and up to the point. Extra flag argument doesn't
seem to be a big deal.


>
> So, I'm thinking if we can add a layer of abstraction to hide the
> details from the users:
>
> #define bpf_for_each_process(task) \
>         bpf_for_each(task, curr, NULL, ITERATE_TASK)
>
>
> It would be nice if you could give me some better suggestions.

No, please no. This macro wrapper is useless and just obfuscates what
is going on. If users think it's helpful for them, they can trivially
define it for their own code.

>
> Thanks!
>
> [1]
> https://lore.kernel.org/lkml/CAADnVQLbDWUxFen-RS67C86sOE5DykEPD8xyihJ2RnG1WEnTQg@mail.gmail.com/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v2 3/6] bpf: Introduce process open coded iterator kfuncs
  2023-09-15 20:37       ` Andrii Nakryiko
@ 2023-09-16 14:03         ` Chuyi Zhou
  2023-09-19 23:30           ` Andrii Nakryiko
  0 siblings, 1 reply; 33+ messages in thread
From: Chuyi Zhou @ 2023-09-16 14:03 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, ast, daniel, andrii, martin.lau, tj, linux-kernel

Hello.

在 2023/9/16 04:37, Andrii Nakryiko 写道:
> On Fri, Sep 15, 2023 at 8:03 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>> 在 2023/9/15 07:26, Andrii Nakryiko 写道:
>>> On Tue, Sep 12, 2023 at 12:02 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>>>>
>>>> This patch adds kfuncs bpf_iter_process_{new,next,destroy} which allow
>>>> creation and manipulation of struct bpf_iter_process in open-coded iterator
>>>> style. BPF programs can use these kfuncs or through bpf_for_each macro to
>>>> iterate all processes in the system.
>>>>
[...cut...]
>>>
>>> Few high level thoughts. I think it would be good to follow
>>> SEC("iter/task") naming and approach. Open-coded iterators in many
>>> ways are in-kernel counterpart to iterator programs, so keeping them
>>> close enough within reason is useful for knowledge transfer.
>>>
>>> SEC("iter/task") allows to:
>>> a) iterate all threads in the system
>>> b) iterate all threads for a given TGID
>>> c) it also allows to "iterate" a single thread or process, but that's
>>> a bit less relevant for in-kernel iterator, but we can still support
>>> them, why not?
>>>
>>> I'm not sure if it supports iterating all processes (as in group
>>> leaders of each task group) in the system, but if it's possible I
>>> think we should support it at least for open-coded iterator, seems
>>> like a very useful functionality.
>>>
>>> So to that end, let's design a small set of input arguments for
>>> bpf_iter_process_new() that would allow to specify this as flags +
>>> either (optional) struct task_struct * pointer to represent
>>> task/process or PID/TGID.
>>>
>>
>> Another concern from Alexei was the readability of the API of open-coded
>> in BPF Program[1].
>>
>> bpf_for_each(task, curr) is straightforward. Users can easily understand
>> that this API does the same thing as 'for_each_process' in kernel.
> 
> In general, users might have no idea about for_each_process macro in
> the kernel, so I don't find this particular argument very convincing.
> 
> We can add a separate set of iterator kfuncs for every useful
> combination of conditions, of course, but it's a double-edged sword.
> Needing to use a different iterator just to specify a different
> direction of cgroup iteration (from the example you referred in [1])
> also means that it's now harder to write some generic function that
> needs to do something for all cgroups matching some criteria where the
> order might be coming as an argument.
> 
> Similarly for task iterators. It's not hard to imagine some processing
> that can be equivalently done per thread or per process in the system,
> or on each thread of the process, depending on some conditions or
> external configuration. Having to do three different
> bpf_for_each(task_xxx, task, ...) for this seems suboptimal. If the
> nature of the thing that is iterated over is the same, and it's just a
> different set of filters to specify which subset of those items should
> be iterated, I think it's better to try to stick to the same iterator
> with few simple arguments. IMO, of course, there is no objectively
> best approach.
> 
>>
>> However, if we keep the approach of SEC("iter/task")
>>
>> enum ITER_ITEM {
>>          ITER_TASK,
>>          ITER_THREAD,
>> }
>>
>> __bpf_kfunc int bpf_iter_task_new(struct bpf_iter_process *it, struct
>> task_struct *group_task, enum ITER_ITEM type)
>>
>> the API have to chang:
>>
>>
>> bpf_for_each(task, curr, NULL, ITERATE_TASK) // iterate all process in
>> the  system
>> bpf_for_each(task, curr, group_leader, ITERATE_THREAD) // iterate all
>> thread of group_leader
>> bpf_for_each(task, curr, NULL, ITERATE_THREAD) //iterate all threads of
>> all the process in the system
>>
>> Useres may guess what are this API actually doing....
> 
> I'd expect users to consult documentation before trying to use an
> unfamiliar cutting-edge functionality. So let's try to keep
> documentation clear and up to the point. Extra flag argument doesn't
> seem to be a big deal.

Thanks for your suggestion!

Before we begin working on the next version, I have outlined a detailed 
API design here:

1.task_iter

It will be used to iterate process/threads like SEC("iter/task"). Here 
we should better to follow the naming and approach SEC("iter/task"):

enum {
	ITERATE_PROCESS,
	ITERATE_THREAD,
}

__bpf_kfunc int bpf_iter_task_new(struct bpf_iter_task *it, struct 
task_struct *task, int flag);

If we want to iterate all processes in the system, the iteration will 
start from the *task* which is passed from user.(since process in the 
system are connected through a linked list)

Additionally, the *task* can allow users to specify iterating all 
threads within a task group.

SEC("xxx")
int xxxx(void *ctx)
{
	struct task_struct *pos;
	struct task_struct *cur_task = bpf_get_current_task_btf();

	bpf_rcu_read_lock();

	// iterating all process in the system start from cur_task
	bpf_for_each(task, pos, cur_task, ITERATE_PROCESS) {
		
	}

	// iterate all thread belongs to cur_task group.
	bpf_for_each(task, pos, cur_task, ITERATE_THREAD) {
	
	}
	
	bpf_rcu_read_unlock();
	return 0;
}

Iterating all thread of each process is great(ITERATE_ALL). But maybe 
let's break it down step by step and implement 
ITERATE_PROCESS/ITERATE_THREAD first? (I'm little worried about the cpu 
overhead of ITERATE_ALL, since we are doing a heavy job in BPF Prog)

I wanted to reuse BPF_TASK_ITER_ALL/BPF_TASK_ITER_TID/BPF_TASK_ITER_TGID 
insted of new enums like ITERATE_PROCESS/ITERATE_THREAD. But it seems 
necessary. In BPF Prog, we usually operate task_struct directly instead 
of pid/tgid. It's a little weird to use 
BPF_TASK_ITER_TID/BPF_TASK_ITER_TGID here:

bpf_for_each(task, pos, cur_task, BPF_TASK_ITER_TID) {
}

On the other hand, 
BPF_TASK_ITER_ALL/BPF_TASK_ITER_TID/BPF_TASK_ITER_TGID are inner flags 
that are hidden from the users.
Exposing ITERATE_PROCESS/ITERATE_THREAD will not cause confusion to user.


2. css_iter.

css_iter will be used to:
(1) iterating subsystem, like 
for_each_mem_cgroup_tree/cpuset_for_each_descendant_pre in kernel.
(2) iterating cgroup. (patch-6's selfetest has a basic example)

css(cgroup_subsys_state) is more fundamental than struct cgroup. I think 
we'd better operating css rather than cgroup, since it's can be hard for 
cgroup_iter to achive (2). So here we keep the name of "css_iter", 
BPF_CGROUP_ITER_DESCENDANTS_PRE/BPF_CGROUP_ITER_DESCENDANTS_POST/BPF_CGROUP_ITER_ANCESTORS_UP 
can be reused.


__bpf_kfunc int bpf_iter_css_new(struct bpf_iter_css *it,
		struct cgroup_subsys_state *root, unsigned int flag)

bpf_for_each(css, root, BPF_CGROUP_ITER_DESCENDANTS_PRE)

Thanks.





^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v2 1/6] cgroup: Prepare for using css_task_iter_*() in BPF
  2023-09-12  7:01 ` [PATCH bpf-next v2 1/6] cgroup: Prepare for using css_task_iter_*() in BPF Chuyi Zhou
@ 2023-09-18 20:42   ` Tejun Heo
  0 siblings, 0 replies; 33+ messages in thread
From: Tejun Heo @ 2023-09-18 20:42 UTC (permalink / raw)
  To: Chuyi Zhou; +Cc: bpf, ast, daniel, andrii, martin.lau, linux-kernel

On Tue, Sep 12, 2023 at 03:01:44PM +0800, Chuyi Zhou wrote:
> This patch makes some preparations for using css_task_iter_*() in BPF
> Program.
> 
> 1. Flags CSS_TASK_ITER_* are #define-s and it's not easy for bpf prog to
> use them. Convert them to enum so bpf prog can take them from vmlinux.h.
> 
> 2. In the next patch we will add css_task_iter_*() in common kfuncs which
> is not safe. Since css_task_iter_*() does spin_unlock_irq() which might
> screw up irq flags depending on the context where bpf prog is running.
> So we should use irqsave/irqrestore here and the switching is harmless.
> 
> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v2 3/6] bpf: Introduce process open coded iterator kfuncs
  2023-09-16 14:03         ` Chuyi Zhou
@ 2023-09-19 23:30           ` Andrii Nakryiko
  2023-09-20 12:20             ` Chuyi Zhou
  0 siblings, 1 reply; 33+ messages in thread
From: Andrii Nakryiko @ 2023-09-19 23:30 UTC (permalink / raw)
  To: Chuyi Zhou; +Cc: bpf, ast, daniel, andrii, martin.lau, tj, linux-kernel

On Sat, Sep 16, 2023 at 7:03 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>
> Hello.
>
> 在 2023/9/16 04:37, Andrii Nakryiko 写道:
> > On Fri, Sep 15, 2023 at 8:03 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
> >> 在 2023/9/15 07:26, Andrii Nakryiko 写道:
> >>> On Tue, Sep 12, 2023 at 12:02 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
> >>>>
> >>>> This patch adds kfuncs bpf_iter_process_{new,next,destroy} which allow
> >>>> creation and manipulation of struct bpf_iter_process in open-coded iterator
> >>>> style. BPF programs can use these kfuncs or through bpf_for_each macro to
> >>>> iterate all processes in the system.
> >>>>
> [...cut...]
> >>>
> >>> Few high level thoughts. I think it would be good to follow
> >>> SEC("iter/task") naming and approach. Open-coded iterators in many
> >>> ways are in-kernel counterpart to iterator programs, so keeping them
> >>> close enough within reason is useful for knowledge transfer.
> >>>
> >>> SEC("iter/task") allows to:
> >>> a) iterate all threads in the system
> >>> b) iterate all threads for a given TGID
> >>> c) it also allows to "iterate" a single thread or process, but that's
> >>> a bit less relevant for in-kernel iterator, but we can still support
> >>> them, why not?
> >>>
> >>> I'm not sure if it supports iterating all processes (as in group
> >>> leaders of each task group) in the system, but if it's possible I
> >>> think we should support it at least for open-coded iterator, seems
> >>> like a very useful functionality.
> >>>
> >>> So to that end, let's design a small set of input arguments for
> >>> bpf_iter_process_new() that would allow to specify this as flags +
> >>> either (optional) struct task_struct * pointer to represent
> >>> task/process or PID/TGID.
> >>>
> >>
> >> Another concern from Alexei was the readability of the API of open-coded
> >> in BPF Program[1].
> >>
> >> bpf_for_each(task, curr) is straightforward. Users can easily understand
> >> that this API does the same thing as 'for_each_process' in kernel.
> >
> > In general, users might have no idea about for_each_process macro in
> > the kernel, so I don't find this particular argument very convincing.
> >
> > We can add a separate set of iterator kfuncs for every useful
> > combination of conditions, of course, but it's a double-edged sword.
> > Needing to use a different iterator just to specify a different
> > direction of cgroup iteration (from the example you referred in [1])
> > also means that it's now harder to write some generic function that
> > needs to do something for all cgroups matching some criteria where the
> > order might be coming as an argument.
> >
> > Similarly for task iterators. It's not hard to imagine some processing
> > that can be equivalently done per thread or per process in the system,
> > or on each thread of the process, depending on some conditions or
> > external configuration. Having to do three different
> > bpf_for_each(task_xxx, task, ...) for this seems suboptimal. If the
> > nature of the thing that is iterated over is the same, and it's just a
> > different set of filters to specify which subset of those items should
> > be iterated, I think it's better to try to stick to the same iterator
> > with few simple arguments. IMO, of course, there is no objectively
> > best approach.
> >
> >>
> >> However, if we keep the approach of SEC("iter/task")
> >>
> >> enum ITER_ITEM {
> >>          ITER_TASK,
> >>          ITER_THREAD,
> >> }
> >>
> >> __bpf_kfunc int bpf_iter_task_new(struct bpf_iter_process *it, struct
> >> task_struct *group_task, enum ITER_ITEM type)
> >>
> >> the API have to chang:
> >>
> >>
> >> bpf_for_each(task, curr, NULL, ITERATE_TASK) // iterate all process in
> >> the  system
> >> bpf_for_each(task, curr, group_leader, ITERATE_THREAD) // iterate all
> >> thread of group_leader
> >> bpf_for_each(task, curr, NULL, ITERATE_THREAD) //iterate all threads of
> >> all the process in the system
> >>
> >> Useres may guess what are this API actually doing....
> >
> > I'd expect users to consult documentation before trying to use an
> > unfamiliar cutting-edge functionality. So let's try to keep
> > documentation clear and up to the point. Extra flag argument doesn't
> > seem to be a big deal.
>
> Thanks for your suggestion!
>
> Before we begin working on the next version, I have outlined a detailed
> API design here:
>
> 1.task_iter
>
> It will be used to iterate process/threads like SEC("iter/task"). Here
> we should better to follow the naming and approach SEC("iter/task"):
>
> enum {
>         ITERATE_PROCESS,
>         ITERATE_THREAD,
> }
>
> __bpf_kfunc int bpf_iter_task_new(struct bpf_iter_task *it, struct
> task_struct *task, int flag);
>
> If we want to iterate all processes in the system, the iteration will
> start from the *task* which is passed from user.(since process in the
> system are connected through a linked list)

but will go through all of them anyways, right? it's kind of
surprising from usability standpoint to have to pass some task_struct
to iterate all of them, tbh. I wonder if it's hard to adjust kfunc
validation to allow "nullable" pointers? We can look at that
separately, of course.

>
> Additionally, the *task* can allow users to specify iterating all
> threads within a task group.
>
> SEC("xxx")
> int xxxx(void *ctx)
> {
>         struct task_struct *pos;
>         struct task_struct *cur_task = bpf_get_current_task_btf();
>
>         bpf_rcu_read_lock();
>
>         // iterating all process in the system start from cur_task
>         bpf_for_each(task, pos, cur_task, ITERATE_PROCESS) {
>
>         }
>
>         // iterate all thread belongs to cur_task group.
>         bpf_for_each(task, pos, cur_task, ITERATE_THREAD) {
>
>         }
>
>         bpf_rcu_read_unlock();
>         return 0;
> }
>
> Iterating all thread of each process is great(ITERATE_ALL). But maybe
> let's break it down step by step and implement
> ITERATE_PROCESS/ITERATE_THREAD first? (I'm little worried about the cpu
> overhead of ITERATE_ALL, since we are doing a heavy job in BPF Prog)
>

Hm... but if it was a sleepable BPF program and
bpf_rcu_read_{lock,unlock}() was only per task, then it shouldn't be
the problem? See enum bpf_cgroup_iter_order.


> I wanted to reuse BPF_TASK_ITER_ALL/BPF_TASK_ITER_TID/BPF_TASK_ITER_TGID
> insted of new enums like ITERATE_PROCESS/ITERATE_THREAD. But it seems
> necessary. In BPF Prog, we usually operate task_struct directly instead
> of pid/tgid. It's a little weird to use
> BPF_TASK_ITER_TID/BPF_TASK_ITER_TGID here:

enum bpf_iter_task_type is internal type, so we can rename
BPF_TASK_ITER_TID to BPF_TASK_ITER_THREAD and BPF_TASK_ITER_PROC (or
add them as aliases). At the very least, we should use consistent
BPF_TASK_ITER_xxx naming, instead of just ITERATE_PROCESS. See

>
> bpf_for_each(task, pos, cur_task, BPF_TASK_ITER_TID) {
> }
>
> On the other hand,
> BPF_TASK_ITER_ALL/BPF_TASK_ITER_TID/BPF_TASK_ITER_TGID are inner flags
> that are hidden from the users.
> Exposing ITERATE_PROCESS/ITERATE_THREAD will not cause confusion to user.
>

inner types are not a problem when used with vmlinux.h


>
> 2. css_iter.
>
> css_iter will be used to:
> (1) iterating subsystem, like
> for_each_mem_cgroup_tree/cpuset_for_each_descendant_pre in kernel.
> (2) iterating cgroup. (patch-6's selfetest has a basic example)
>
> css(cgroup_subsys_state) is more fundamental than struct cgroup. I think
> we'd better operating css rather than cgroup, since it's can be hard for
> cgroup_iter to achive (2). So here we keep the name of "css_iter",
> BPF_CGROUP_ITER_DESCENDANTS_PRE/BPF_CGROUP_ITER_DESCENDANTS_POST/BPF_CGROUP_ITER_ANCESTORS_UP
> can be reused.
>
>
> __bpf_kfunc int bpf_iter_css_new(struct bpf_iter_css *it,
>                 struct cgroup_subsys_state *root, unsigned int flag)
>
> bpf_for_each(css, root, BPF_CGROUP_ITER_DESCENDANTS_PRE)
>

Makes sense, yep, thanks.

> Thanks.
>
>
>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH bpf-next v2 3/6] bpf: Introduce process open coded iterator kfuncs
  2023-09-19 23:30           ` Andrii Nakryiko
@ 2023-09-20 12:20             ` Chuyi Zhou
  0 siblings, 0 replies; 33+ messages in thread
From: Chuyi Zhou @ 2023-09-20 12:20 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, ast, daniel, andrii, martin.lau, tj, linux-kernel



在 2023/9/20 07:30, Andrii Nakryiko 写道:
> On Sat, Sep 16, 2023 at 7:03 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>>
>> Hello.
>>
>> 在 2023/9/16 04:37, Andrii Nakryiko 写道:
>>> On Fri, Sep 15, 2023 at 8:03 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>>>> 在 2023/9/15 07:26, Andrii Nakryiko 写道:
>>>>> On Tue, Sep 12, 2023 at 12:02 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>>>>>>
>>>>>> This patch adds kfuncs bpf_iter_process_{new,next,destroy} which allow
>>>>>> creation and manipulation of struct bpf_iter_process in open-coded iterator
>>>>>> style. BPF programs can use these kfuncs or through bpf_for_each macro to
>>>>>> iterate all processes in the system.
>>>>>>
>> [...cut...]
>>>>>
>>>>> Few high level thoughts. I think it would be good to follow
>>>>> SEC("iter/task") naming and approach. Open-coded iterators in many
>>>>> ways are in-kernel counterpart to iterator programs, so keeping them
>>>>> close enough within reason is useful for knowledge transfer.
>>>>>
>>>>> SEC("iter/task") allows to:
>>>>> a) iterate all threads in the system
>>>>> b) iterate all threads for a given TGID
>>>>> c) it also allows to "iterate" a single thread or process, but that's
>>>>> a bit less relevant for in-kernel iterator, but we can still support
>>>>> them, why not?
>>>>>
>>>>> I'm not sure if it supports iterating all processes (as in group
>>>>> leaders of each task group) in the system, but if it's possible I
>>>>> think we should support it at least for open-coded iterator, seems
>>>>> like a very useful functionality.
>>>>>
>>>>> So to that end, let's design a small set of input arguments for
>>>>> bpf_iter_process_new() that would allow to specify this as flags +
>>>>> either (optional) struct task_struct * pointer to represent
>>>>> task/process or PID/TGID.
>>>>>
>>>>
>>>> Another concern from Alexei was the readability of the API of open-coded
>>>> in BPF Program[1].
>>>>
>>>> bpf_for_each(task, curr) is straightforward. Users can easily understand
>>>> that this API does the same thing as 'for_each_process' in kernel.
>>>
>>> In general, users might have no idea about for_each_process macro in
>>> the kernel, so I don't find this particular argument very convincing.
>>>
>>> We can add a separate set of iterator kfuncs for every useful
>>> combination of conditions, of course, but it's a double-edged sword.
>>> Needing to use a different iterator just to specify a different
>>> direction of cgroup iteration (from the example you referred in [1])
>>> also means that it's now harder to write some generic function that
>>> needs to do something for all cgroups matching some criteria where the
>>> order might be coming as an argument.
>>>
>>> Similarly for task iterators. It's not hard to imagine some processing
>>> that can be equivalently done per thread or per process in the system,
>>> or on each thread of the process, depending on some conditions or
>>> external configuration. Having to do three different
>>> bpf_for_each(task_xxx, task, ...) for this seems suboptimal. If the
>>> nature of the thing that is iterated over is the same, and it's just a
>>> different set of filters to specify which subset of those items should
>>> be iterated, I think it's better to try to stick to the same iterator
>>> with few simple arguments. IMO, of course, there is no objectively
>>> best approach.
>>>
>>>>
>>>> However, if we keep the approach of SEC("iter/task")
>>>>
>>>> enum ITER_ITEM {
>>>>           ITER_TASK,
>>>>           ITER_THREAD,
>>>> }
>>>>
>>>> __bpf_kfunc int bpf_iter_task_new(struct bpf_iter_process *it, struct
>>>> task_struct *group_task, enum ITER_ITEM type)
>>>>
>>>> the API have to chang:
>>>>
>>>>
>>>> bpf_for_each(task, curr, NULL, ITERATE_TASK) // iterate all process in
>>>> the  system
>>>> bpf_for_each(task, curr, group_leader, ITERATE_THREAD) // iterate all
>>>> thread of group_leader
>>>> bpf_for_each(task, curr, NULL, ITERATE_THREAD) //iterate all threads of
>>>> all the process in the system
>>>>
>>>> Useres may guess what are this API actually doing....
>>>
>>> I'd expect users to consult documentation before trying to use an
>>> unfamiliar cutting-edge functionality. So let's try to keep
>>> documentation clear and up to the point. Extra flag argument doesn't
>>> seem to be a big deal.
>>
>> Thanks for your suggestion!
>>
>> Before we begin working on the next version, I have outlined a detailed
>> API design here:
>>
>> 1.task_iter
>>
>> It will be used to iterate process/threads like SEC("iter/task"). Here
>> we should better to follow the naming and approach SEC("iter/task"):
>>
>> enum {
>>          ITERATE_PROCESS,
>>          ITERATE_THREAD,
>> }
>>
>> __bpf_kfunc int bpf_iter_task_new(struct bpf_iter_task *it, struct
>> task_struct *task, int flag);
>>
>> If we want to iterate all processes in the system, the iteration will
>> start from the *task* which is passed from user.(since process in the
>> system are connected through a linked list)
> 
> but will go through all of them anyways, right? it's kind of
> surprising from usability standpoint to have to pass some task_struct
> to iterate all of them, tbh. I wonder if it's hard to adjust kfunc
> validation to allow "nullable" pointers? We can look at that
> separately, of course.
> 

Yes, little subtle here. When we want to iterate threads of a task, we 
may want the verifier to ensure the *task* is a valid pointer, which 
would require KF_TRUSTED_ARGS. However, when we want to iterate all 
process/threads in the system, we want to accept a null pointer.

Anyway, I will try to explore a workaround in the verifier here.

Thanks.


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2023-09-20 12:21 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-12  7:01 [PATCH bpf-next v2 0/6] Add Open-coded process and css iters Chuyi Zhou
2023-09-12  7:01 ` [PATCH bpf-next v2 1/6] cgroup: Prepare for using css_task_iter_*() in BPF Chuyi Zhou
2023-09-18 20:42   ` Tejun Heo
2023-09-12  7:01 ` [PATCH bpf-next v2 2/6] bpf: Introduce css_task open-coded iterator kfuncs Chuyi Zhou
2023-09-12 17:13   ` Alexei Starovoitov
2023-09-13 14:02     ` Chuyi Zhou
2023-09-12 19:57   ` Martin KaFai Lau
2023-09-13  4:56     ` Chuyi Zhou
2023-09-14 23:26   ` Andrii Nakryiko
2023-09-12  7:01 ` [PATCH bpf-next v2 3/6] bpf: Introduce process open coded " Chuyi Zhou
2023-09-14 23:26   ` Andrii Nakryiko
2023-09-15  4:48     ` Chuyi Zhou
2023-09-15 15:03     ` Chuyi Zhou
2023-09-15 20:37       ` Andrii Nakryiko
2023-09-16 14:03         ` Chuyi Zhou
2023-09-19 23:30           ` Andrii Nakryiko
2023-09-20 12:20             ` Chuyi Zhou
2023-09-12  7:01 ` [PATCH bpf-next v2 4/6] bpf: Introduce css_descendant open-coded " Chuyi Zhou
2023-09-13  7:25   ` kernel test robot
2023-09-13  9:02   ` kernel test robot
2023-09-13  9:13   ` kernel test robot
2023-09-14 23:26   ` Andrii Nakryiko
2023-09-15 11:57     ` Chuyi Zhou
2023-09-15 20:25       ` Andrii Nakryiko
2023-09-12  7:01 ` [PATCH bpf-next v2 5/6] bpf: teach the verifier to enforce css_iter and process_iter in RCU CS Chuyi Zhou
2023-09-13 13:53   ` Chuyi Zhou
2023-09-14  8:56     ` Chuyi Zhou
2023-09-14 23:26       ` Andrii Nakryiko
2023-09-15  5:46         ` [External] " Chuyi Zhou
2023-09-15 20:23           ` Andrii Nakryiko
2023-09-14 23:26   ` Andrii Nakryiko
2023-09-12  7:01 ` [PATCH bpf-next v2 6/6] selftests/bpf: Add tests for open-coded task and css iter Chuyi Zhou
2023-09-12  7:11 ` [PATCH bpf-next v2 0/6] Add Open-coded process and css iters Chuyi Zhou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).