All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next v5 0/3] Parameterize task iterators.
@ 2022-08-11  0:16 Kui-Feng Lee
  2022-08-11  0:16 ` [PATCH bpf-next v5 1/3] bpf: " Kui-Feng Lee
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Kui-Feng Lee @ 2022-08-11  0:16 UTC (permalink / raw)
  To: bpf, ast, daniel, andrii, kernel-team, yhs; +Cc: Kui-Feng Lee

Allow creating an iterator that loops through resources of one task/thread.

People could only create iterators to loop through all resources of
files, vma, and tasks in the system, even though they were interested in only the
resources of a specific task or process.  Passing the addintional
parameters, people can now create an iterator to go through all
resources or only the resources of a task.

Major Changes:

 - Add new parameters in bpf_iter_link_info to indicate to go through
   all tasks or to go through a specific task.

 - Change the implementations of BPF iterators of vma, files, and
   tasks to allow going through only the resources of a specific task.

 - Provide the arguments of parameterized task iterators in
   bpf_link_info.

Differences from v4:

 - Remove 'type' from bpf_iter_link_info and bpf_link_info.

v4: https://lore.kernel.org/bpf/20220809195429.1043220-1-kuifeng@fb.com/
v3: https://lore.kernel.org/bpf/20220809063501.667610-1-kuifeng@fb.com/
v2: https://lore.kernel.org/bpf/20220801232649.2306614-1-kuifeng@fb.com/
v1: https://lore.kernel.org/bpf/20220726051713.840431-1-kuifeng@fb.com/

Kui-Feng Lee (3):
  bpf: Parameterize task iterators.
  bpf: Handle bpf_link_info for the parameterized task BPF iterators.
  selftests/bpf: Test parameterized task BPF iterators.

 include/linux/bpf.h                           |  29 +++
 include/uapi/linux/bpf.h                      |  12 ++
 kernel/bpf/task_iter.c                        | 144 ++++++++++---
 tools/include/uapi/linux/bpf.h                |  12 ++
 .../selftests/bpf/prog_tests/bpf_iter.c       | 204 ++++++++++++++++--
 .../selftests/bpf/prog_tests/btf_dump.c       |   2 +-
 .../selftests/bpf/progs/bpf_iter_task.c       |   9 +
 .../selftests/bpf/progs/bpf_iter_task_file.c  |   7 +
 .../selftests/bpf/progs/bpf_iter_task_vma.c   |   6 +-
 9 files changed, 376 insertions(+), 49 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH bpf-next v5 1/3] bpf: Parameterize task iterators.
  2022-08-11  0:16 [PATCH bpf-next v5 0/3] Parameterize task iterators Kui-Feng Lee
@ 2022-08-11  0:16 ` Kui-Feng Lee
  2022-08-13 22:17   ` Yonghong Song
                     ` (3 more replies)
  2022-08-11  0:16 ` [PATCH bpf-next v5 2/3] bpf: Handle bpf_link_info for the parameterized task BPF iterators Kui-Feng Lee
  2022-08-11  0:16 ` [PATCH bpf-next v5 3/3] selftests/bpf: Test " Kui-Feng Lee
  2 siblings, 4 replies; 22+ messages in thread
From: Kui-Feng Lee @ 2022-08-11  0:16 UTC (permalink / raw)
  To: bpf, ast, daniel, andrii, kernel-team, yhs; +Cc: Kui-Feng Lee

Allow creating an iterator that loops through resources of one task/thread.

People could only create iterators to loop through all resources of
files, vma, and tasks in the system, even though they were interested
in only the resources of a specific task or process.  Passing the
additional parameters, people can now create an iterator to go
through all resources or only the resources of a task.

Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
---
 include/linux/bpf.h            |  29 ++++++++
 include/uapi/linux/bpf.h       |   8 +++
 kernel/bpf/task_iter.c         | 126 ++++++++++++++++++++++++++-------
 tools/include/uapi/linux/bpf.h |   8 +++
 4 files changed, 147 insertions(+), 24 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 11950029284f..6bbe53d06faa 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1716,8 +1716,37 @@ int bpf_obj_get_user(const char __user *pathname, int flags);
 	extern int bpf_iter_ ## target(args);			\
 	int __init bpf_iter_ ## target(args) { return 0; }
 
+/*
+ * The task type of iterators.
+ *
+ * For BPF task iterators, they can be parameterized with various
+ * parameters to visit only some of tasks.
+ *
+ * BPF_TASK_ITER_ALL (default)
+ *	Iterate over resources of every task.
+ *
+ * BPF_TASK_ITER_TID
+ *	Iterate over resources of a task/tid.
+ *
+ * BPF_TASK_ITER_TGID
+ *	Iterate over reosurces of evevry task of a process / task group.
+ */
+enum bpf_iter_task_type {
+	BPF_TASK_ITER_ALL = 0,
+	BPF_TASK_ITER_TID,
+	BPF_TASK_ITER_TGID,
+};
+
 struct bpf_iter_aux_info {
 	struct bpf_map *map;
+	struct {
+		enum bpf_iter_task_type	type;
+		union {
+			u32 tid;
+			u32 tgid;
+			u32 pid_fd;
+		};
+	} task;
 };
 
 typedef int (*bpf_iter_attach_target_t)(struct bpf_prog *prog,
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index ffcbf79a556b..6328aca0cf5c 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -91,6 +91,14 @@ union bpf_iter_link_info {
 	struct {
 		__u32	map_fd;
 	} map;
+	/*
+	 * Parameters of task iterators.
+	 */
+	struct {
+		__u32	tid;
+		__u32	tgid;
+		__u32	pid_fd;
+	} task;
 };
 
 /* BPF syscall commands, see bpf(2) man-page for more details. */
diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index 8c921799def4..f2e21efe075d 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -12,6 +12,12 @@
 
 struct bpf_iter_seq_task_common {
 	struct pid_namespace *ns;
+	enum bpf_iter_task_type	type;
+	union {
+		u32 tid;
+		u32 tgid;
+		u32 pid_fd;
+	};
 };
 
 struct bpf_iter_seq_task_info {
@@ -22,24 +28,40 @@ struct bpf_iter_seq_task_info {
 	u32 tid;
 };
 
-static struct task_struct *task_seq_get_next(struct pid_namespace *ns,
+static struct task_struct *task_seq_get_next(struct bpf_iter_seq_task_common *common,
 					     u32 *tid,
 					     bool skip_if_dup_files)
 {
 	struct task_struct *task = NULL;
 	struct pid *pid;
 
+	if (common->type == BPF_TASK_ITER_TID) {
+		if (*tid && *tid != common->tid)
+			return NULL;
+		rcu_read_lock();
+		pid = find_pid_ns(common->tid, common->ns);
+		if (pid) {
+			task = get_pid_task(pid, PIDTYPE_PID);
+			*tid = common->tid;
+		}
+		rcu_read_unlock();
+		return task;
+	}
+
 	rcu_read_lock();
 retry:
-	pid = find_ge_pid(*tid, ns);
+	pid = find_ge_pid(*tid, common->ns);
 	if (pid) {
-		*tid = pid_nr_ns(pid, ns);
+		*tid = pid_nr_ns(pid, common->ns);
 		task = get_pid_task(pid, PIDTYPE_PID);
+
 		if (!task) {
 			++*tid;
 			goto retry;
-		} else if (skip_if_dup_files && !thread_group_leader(task) &&
-			   task->files == task->group_leader->files) {
+		} else if ((skip_if_dup_files && !thread_group_leader(task) &&
+			    task->files == task->group_leader->files) ||
+			   (common->type == BPF_TASK_ITER_TGID &&
+			    __task_pid_nr_ns(task, PIDTYPE_TGID, common->ns) != common->tgid)) {
 			put_task_struct(task);
 			task = NULL;
 			++*tid;
@@ -56,7 +78,8 @@ static void *task_seq_start(struct seq_file *seq, loff_t *pos)
 	struct bpf_iter_seq_task_info *info = seq->private;
 	struct task_struct *task;
 
-	task = task_seq_get_next(info->common.ns, &info->tid, false);
+	task = task_seq_get_next(&info->common, &info->tid, false);
+
 	if (!task)
 		return NULL;
 
@@ -73,7 +96,8 @@ static void *task_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 	++*pos;
 	++info->tid;
 	put_task_struct((struct task_struct *)v);
-	task = task_seq_get_next(info->common.ns, &info->tid, false);
+
+	task = task_seq_get_next(&info->common, &info->tid, false);
 	if (!task)
 		return NULL;
 
@@ -117,6 +141,43 @@ static void task_seq_stop(struct seq_file *seq, void *v)
 		put_task_struct((struct task_struct *)v);
 }
 
+static int bpf_iter_attach_task(struct bpf_prog *prog,
+				union bpf_iter_link_info *linfo,
+				struct bpf_iter_aux_info *aux)
+{
+	unsigned int flags;
+	struct pid_namespace *ns;
+	struct pid *pid;
+	pid_t tgid;
+
+	if (linfo->task.tid != 0) {
+		aux->task.type = BPF_TASK_ITER_TID;
+		aux->task.tid = linfo->task.tid;
+	} else if (linfo->task.tgid != 0) {
+		aux->task.type = BPF_TASK_ITER_TGID;
+		aux->task.tgid = linfo->task.tgid;
+	} else if (linfo->task.pid_fd != 0) {
+		aux->task.type = BPF_TASK_ITER_TGID;
+		pid = pidfd_get_pid(linfo->task.pid_fd, &flags);
+		if (IS_ERR(pid))
+			return PTR_ERR(pid);
+
+		ns = task_active_pid_ns(current);
+		if (IS_ERR(ns))
+			return PTR_ERR(ns);
+
+		tgid = pid_nr_ns(pid, ns);
+		if (tgid <= 0)
+			return -EINVAL;
+
+		aux->task.tgid = tgid;
+	} else {
+		aux->task.type = BPF_TASK_ITER_ALL;
+	}
+
+	return 0;
+}
+
 static const struct seq_operations task_seq_ops = {
 	.start	= task_seq_start,
 	.next	= task_seq_next,
@@ -137,8 +198,7 @@ struct bpf_iter_seq_task_file_info {
 static struct file *
 task_file_seq_get_next(struct bpf_iter_seq_task_file_info *info)
 {
-	struct pid_namespace *ns = info->common.ns;
-	u32 curr_tid = info->tid;
+	u32 saved_tid = info->tid;
 	struct task_struct *curr_task;
 	unsigned int curr_fd = info->fd;
 
@@ -151,21 +211,18 @@ task_file_seq_get_next(struct bpf_iter_seq_task_file_info *info)
 		curr_task = info->task;
 		curr_fd = info->fd;
 	} else {
-                curr_task = task_seq_get_next(ns, &curr_tid, true);
+		curr_task = task_seq_get_next(&info->common, &info->tid, true);
                 if (!curr_task) {
                         info->task = NULL;
-                        info->tid = curr_tid;
                         return NULL;
                 }
 
-                /* set info->task and info->tid */
+		/* set info->task */
 		info->task = curr_task;
-		if (curr_tid == info->tid) {
+		if (saved_tid == info->tid)
 			curr_fd = info->fd;
-		} else {
-			info->tid = curr_tid;
+		else
 			curr_fd = 0;
-		}
 	}
 
 	rcu_read_lock();
@@ -186,9 +243,15 @@ task_file_seq_get_next(struct bpf_iter_seq_task_file_info *info)
 	/* the current task is done, go to the next task */
 	rcu_read_unlock();
 	put_task_struct(curr_task);
+
+	if (info->common.type == BPF_TASK_ITER_TID) {
+		info->task = NULL;
+		return NULL;
+	}
+
 	info->task = NULL;
 	info->fd = 0;
-	curr_tid = ++(info->tid);
+	saved_tid = ++(info->tid);
 	goto again;
 }
 
@@ -269,6 +332,17 @@ static int init_seq_pidns(void *priv_data, struct bpf_iter_aux_info *aux)
 	struct bpf_iter_seq_task_common *common = priv_data;
 
 	common->ns = get_pid_ns(task_active_pid_ns(current));
+	common->type = aux->task.type;
+	switch (common->type) {
+	case BPF_TASK_ITER_TID:
+		common->tid = aux->task.tid;
+		break;
+	case BPF_TASK_ITER_TGID:
+		common->tgid = aux->task.tgid;
+		break;
+	default:
+		break;
+	}
 	return 0;
 }
 
@@ -307,11 +381,10 @@ enum bpf_task_vma_iter_find_op {
 static struct vm_area_struct *
 task_vma_seq_get_next(struct bpf_iter_seq_task_vma_info *info)
 {
-	struct pid_namespace *ns = info->common.ns;
 	enum bpf_task_vma_iter_find_op op;
 	struct vm_area_struct *curr_vma;
 	struct task_struct *curr_task;
-	u32 curr_tid = info->tid;
+	u32 saved_tid = info->tid;
 
 	/* If this function returns a non-NULL vma, it holds a reference to
 	 * the task_struct, and holds read lock on vma->mm->mmap_lock.
@@ -371,14 +444,13 @@ task_vma_seq_get_next(struct bpf_iter_seq_task_vma_info *info)
 		}
 	} else {
 again:
-		curr_task = task_seq_get_next(ns, &curr_tid, true);
+		curr_task = task_seq_get_next(&info->common, &info->tid, true);
 		if (!curr_task) {
-			info->tid = curr_tid + 1;
+			info->tid++;
 			goto finish;
 		}
 
-		if (curr_tid != info->tid) {
-			info->tid = curr_tid;
+		if (saved_tid != info->tid) {
 			/* new task, process the first vma */
 			op = task_vma_iter_first_vma;
 		} else {
@@ -430,9 +502,12 @@ task_vma_seq_get_next(struct bpf_iter_seq_task_vma_info *info)
 	return curr_vma;
 
 next_task:
+	if (info->common.type == BPF_TASK_ITER_TID)
+		goto finish;
+
 	put_task_struct(curr_task);
 	info->task = NULL;
-	curr_tid++;
+	info->tid++;
 	goto again;
 
 finish:
@@ -533,6 +608,7 @@ static const struct bpf_iter_seq_info task_seq_info = {
 
 static struct bpf_iter_reg task_reg_info = {
 	.target			= "task",
+	.attach_target		= bpf_iter_attach_task,
 	.feature		= BPF_ITER_RESCHED,
 	.ctx_arg_info_size	= 1,
 	.ctx_arg_info		= {
@@ -551,6 +627,7 @@ static const struct bpf_iter_seq_info task_file_seq_info = {
 
 static struct bpf_iter_reg task_file_reg_info = {
 	.target			= "task_file",
+	.attach_target		= bpf_iter_attach_task,
 	.feature		= BPF_ITER_RESCHED,
 	.ctx_arg_info_size	= 2,
 	.ctx_arg_info		= {
@@ -571,6 +648,7 @@ static const struct bpf_iter_seq_info task_vma_seq_info = {
 
 static struct bpf_iter_reg task_vma_reg_info = {
 	.target			= "task_vma",
+	.attach_target		= bpf_iter_attach_task,
 	.feature		= BPF_ITER_RESCHED,
 	.ctx_arg_info_size	= 2,
 	.ctx_arg_info		= {
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index ffcbf79a556b..6328aca0cf5c 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -91,6 +91,14 @@ union bpf_iter_link_info {
 	struct {
 		__u32	map_fd;
 	} map;
+	/*
+	 * Parameters of task iterators.
+	 */
+	struct {
+		__u32	tid;
+		__u32	tgid;
+		__u32	pid_fd;
+	} task;
 };
 
 /* BPF syscall commands, see bpf(2) man-page for more details. */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next v5 2/3] bpf: Handle bpf_link_info for the parameterized task BPF iterators.
  2022-08-11  0:16 [PATCH bpf-next v5 0/3] Parameterize task iterators Kui-Feng Lee
  2022-08-11  0:16 ` [PATCH bpf-next v5 1/3] bpf: " Kui-Feng Lee
@ 2022-08-11  0:16 ` Kui-Feng Lee
  2022-08-13 22:23   ` Yonghong Song
  2022-08-11  0:16 ` [PATCH bpf-next v5 3/3] selftests/bpf: Test " Kui-Feng Lee
  2 siblings, 1 reply; 22+ messages in thread
From: Kui-Feng Lee @ 2022-08-11  0:16 UTC (permalink / raw)
  To: bpf, ast, daniel, andrii, kernel-team, yhs; +Cc: Kui-Feng Lee

Add new fields to bpf_link_info that users can query it through
bpf_obj_get_info_by_fd().

Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
---
 include/uapi/linux/bpf.h       |  4 ++++
 kernel/bpf/task_iter.c         | 18 ++++++++++++++++++
 tools/include/uapi/linux/bpf.h |  4 ++++
 3 files changed, 26 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 6328aca0cf5c..627a16981c90 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -6143,6 +6143,10 @@ struct bpf_link_info {
 				struct {
 					__u32 map_id;
 				} map;
+				struct {
+					__u32 tid;
+					__u32 tgid;
+				} task;
 			};
 		} iter;
 		struct  {
diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index f2e21efe075d..d392b46c6d19 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -606,6 +606,21 @@ static const struct bpf_iter_seq_info task_seq_info = {
 	.seq_priv_size		= sizeof(struct bpf_iter_seq_task_info),
 };
 
+static int bpf_iter_fill_link_info(const struct bpf_iter_aux_info *aux, struct bpf_link_info *info)
+{
+	switch (aux->task.type) {
+	case BPF_TASK_ITER_TID:
+		info->iter.task.tid = aux->task.tid;
+		break;
+	case BPF_TASK_ITER_TGID:
+		info->iter.task.tgid = aux->task.tgid;
+		break;
+	default:
+		break;
+	}
+	return 0;
+}
+
 static struct bpf_iter_reg task_reg_info = {
 	.target			= "task",
 	.attach_target		= bpf_iter_attach_task,
@@ -616,6 +631,7 @@ static struct bpf_iter_reg task_reg_info = {
 		  PTR_TO_BTF_ID_OR_NULL },
 	},
 	.seq_info		= &task_seq_info,
+	.fill_link_info		= bpf_iter_fill_link_info,
 };
 
 static const struct bpf_iter_seq_info task_file_seq_info = {
@@ -637,6 +653,7 @@ static struct bpf_iter_reg task_file_reg_info = {
 		  PTR_TO_BTF_ID_OR_NULL },
 	},
 	.seq_info		= &task_file_seq_info,
+	.fill_link_info		= bpf_iter_fill_link_info,
 };
 
 static const struct bpf_iter_seq_info task_vma_seq_info = {
@@ -658,6 +675,7 @@ static struct bpf_iter_reg task_vma_reg_info = {
 		  PTR_TO_BTF_ID_OR_NULL },
 	},
 	.seq_info		= &task_vma_seq_info,
+	.fill_link_info		= bpf_iter_fill_link_info,
 };
 
 BPF_CALL_5(bpf_find_vma, struct task_struct *, task, u64, start,
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 6328aca0cf5c..627a16981c90 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -6143,6 +6143,10 @@ struct bpf_link_info {
 				struct {
 					__u32 map_id;
 				} map;
+				struct {
+					__u32 tid;
+					__u32 tgid;
+				} task;
 			};
 		} iter;
 		struct  {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next v5 3/3] selftests/bpf: Test parameterized task BPF iterators.
  2022-08-11  0:16 [PATCH bpf-next v5 0/3] Parameterize task iterators Kui-Feng Lee
  2022-08-11  0:16 ` [PATCH bpf-next v5 1/3] bpf: " Kui-Feng Lee
  2022-08-11  0:16 ` [PATCH bpf-next v5 2/3] bpf: Handle bpf_link_info for the parameterized task BPF iterators Kui-Feng Lee
@ 2022-08-11  0:16 ` Kui-Feng Lee
  2022-08-13 22:50   ` Yonghong Song
  2022-08-16  5:15   ` Andrii Nakryiko
  2 siblings, 2 replies; 22+ messages in thread
From: Kui-Feng Lee @ 2022-08-11  0:16 UTC (permalink / raw)
  To: bpf, ast, daniel, andrii, kernel-team, yhs; +Cc: Kui-Feng Lee

Test iterators of vma, files, and tasks of tasks.

Ensure the API works appropriately to visit all tasks,
tasks in a process, or a particular task.

Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
---
 .../selftests/bpf/prog_tests/bpf_iter.c       | 204 ++++++++++++++++--
 .../selftests/bpf/prog_tests/btf_dump.c       |   2 +-
 .../selftests/bpf/progs/bpf_iter_task.c       |   9 +
 .../selftests/bpf/progs/bpf_iter_task_file.c  |   7 +
 .../selftests/bpf/progs/bpf_iter_task_vma.c   |   6 +-
 5 files changed, 203 insertions(+), 25 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
index a33874b081b6..e66f1f3db562 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
@@ -1,6 +1,9 @@
 // SPDX-License-Identifier: GPL-2.0
 /* Copyright (c) 2020 Facebook */
 #include <test_progs.h>
+#include <sys/syscall.h>
+#include <unistd.h>
+#include <signal.h>
 #include "bpf_iter_ipv6_route.skel.h"
 #include "bpf_iter_netlink.skel.h"
 #include "bpf_iter_bpf_map.skel.h"
@@ -42,13 +45,13 @@ static void test_btf_id_or_null(void)
 	}
 }
 
-static void do_dummy_read(struct bpf_program *prog)
+static void do_dummy_read(struct bpf_program *prog, struct bpf_iter_attach_opts *opts)
 {
 	struct bpf_link *link;
 	char buf[16] = {};
 	int iter_fd, len;
 
-	link = bpf_program__attach_iter(prog, NULL);
+	link = bpf_program__attach_iter(prog, opts);
 	if (!ASSERT_OK_PTR(link, "attach_iter"))
 		return;
 
@@ -91,7 +94,7 @@ static void test_ipv6_route(void)
 	if (!ASSERT_OK_PTR(skel, "bpf_iter_ipv6_route__open_and_load"))
 		return;
 
-	do_dummy_read(skel->progs.dump_ipv6_route);
+	do_dummy_read(skel->progs.dump_ipv6_route, NULL);
 
 	bpf_iter_ipv6_route__destroy(skel);
 }
@@ -104,7 +107,7 @@ static void test_netlink(void)
 	if (!ASSERT_OK_PTR(skel, "bpf_iter_netlink__open_and_load"))
 		return;
 
-	do_dummy_read(skel->progs.dump_netlink);
+	do_dummy_read(skel->progs.dump_netlink, NULL);
 
 	bpf_iter_netlink__destroy(skel);
 }
@@ -117,24 +120,139 @@ static void test_bpf_map(void)
 	if (!ASSERT_OK_PTR(skel, "bpf_iter_bpf_map__open_and_load"))
 		return;
 
-	do_dummy_read(skel->progs.dump_bpf_map);
+	do_dummy_read(skel->progs.dump_bpf_map, NULL);
 
 	bpf_iter_bpf_map__destroy(skel);
 }
 
-static void test_task(void)
+static int pidfd_open(pid_t pid, unsigned int flags)
+{
+	return syscall(SYS_pidfd_open, pid, flags);
+}
+
+static void check_bpf_link_info(const struct bpf_program *prog)
+{
+	DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
+	union bpf_iter_link_info linfo;
+	struct bpf_link_info info = {};
+	__u32 info_len;
+	struct bpf_link *link;
+	int err;
+
+	memset(&linfo, 0, sizeof(linfo));
+	linfo.task.tid = getpid();
+	opts.link_info = &linfo;
+	opts.link_info_len = sizeof(linfo);
+
+	link = bpf_program__attach_iter(prog, &opts);
+	if (!ASSERT_OK_PTR(link, "attach_iter"))
+		return;
+
+	info_len = sizeof(info);
+	err = bpf_obj_get_info_by_fd(bpf_link__fd(link), &info, &info_len);
+	if (ASSERT_OK(err, "bpf_obj_get_info_by_fd"))
+		ASSERT_EQ(info.iter.task.tid, getpid(), "check_task_tid");
+
+	bpf_link__destroy(link);
+}
+
+static pthread_mutex_t do_nothing_mutex;
+
+static void *do_nothing_wait(void *arg)
+{
+	pthread_mutex_lock(&do_nothing_mutex);
+	pthread_mutex_unlock(&do_nothing_mutex);
+
+	pthread_exit(arg);
+}
+
+static void test_task_(struct bpf_iter_attach_opts *opts, int num_unknown, int num_known)
 {
 	struct bpf_iter_task *skel;
+	pthread_t thread_id;
+	void *ret;
 
 	skel = bpf_iter_task__open_and_load();
 	if (!ASSERT_OK_PTR(skel, "bpf_iter_task__open_and_load"))
 		return;
 
-	do_dummy_read(skel->progs.dump_task);
+	if (!ASSERT_OK(pthread_mutex_init(&do_nothing_mutex, NULL), "pthread_mutex_init"))
+		goto done;
+	if (!ASSERT_OK(pthread_mutex_lock(&do_nothing_mutex), "pthread_mutex_lock"))
+		goto done;
+
+	if (!ASSERT_OK(pthread_create(&thread_id, NULL, &do_nothing_wait, NULL),
+		  "pthread_create"))
+		goto done;
+
+
+	skel->bss->tid = getpid();
+
+	do_dummy_read(skel->progs.dump_task, opts);
+
+	if (!ASSERT_OK(pthread_mutex_unlock(&do_nothing_mutex), "pthread_mutex_unlock"))
+		goto done;
+
+	if (num_unknown >= 0)
+		ASSERT_EQ(skel->bss->num_unknown_tid, num_unknown, "check_num_unknown_tid");
+	if (num_known >= 0)
+		ASSERT_EQ(skel->bss->num_known_tid, num_known, "check_num_known_tid");
 
+	ASSERT_FALSE(pthread_join(thread_id, &ret) || ret != NULL,
+		     "pthread_join");
+
+done:
 	bpf_iter_task__destroy(skel);
 }
 
+static void test_task(void)
+{
+	DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
+	union bpf_iter_link_info linfo;
+
+	memset(&linfo, 0, sizeof(linfo));
+	linfo.task.tid = getpid();
+	opts.link_info = &linfo;
+	opts.link_info_len = sizeof(linfo);
+
+	test_task_(&opts, 0, 1);
+
+	test_task_(NULL, -1, 1);
+}
+
+static void test_task_tgid(void)
+{
+	DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
+	union bpf_iter_link_info linfo;
+
+	memset(&linfo, 0, sizeof(linfo));
+	linfo.task.tgid = getpid();
+	opts.link_info = &linfo;
+	opts.link_info_len = sizeof(linfo);
+
+	test_task_(&opts, 1, 1);
+}
+
+static void test_task_pidfd(void)
+{
+	DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
+	union bpf_iter_link_info linfo;
+	int pidfd;
+
+	pidfd = pidfd_open(getpid(), 0);
+	if (!ASSERT_GE(pidfd, 0, "pidfd_open"))
+		return;
+
+	memset(&linfo, 0, sizeof(linfo));
+	linfo.task.pid_fd = pidfd;
+	opts.link_info = &linfo;
+	opts.link_info_len = sizeof(linfo);
+
+	test_task_(&opts, 1, 1);
+
+	close(pidfd);
+}
+
 static void test_task_sleepable(void)
 {
 	struct bpf_iter_task *skel;
@@ -143,7 +261,7 @@ static void test_task_sleepable(void)
 	if (!ASSERT_OK_PTR(skel, "bpf_iter_task__open_and_load"))
 		return;
 
-	do_dummy_read(skel->progs.dump_task_sleepable);
+	do_dummy_read(skel->progs.dump_task_sleepable, NULL);
 
 	ASSERT_GT(skel->bss->num_expected_failure_copy_from_user_task, 0,
 		  "num_expected_failure_copy_from_user_task");
@@ -161,8 +279,8 @@ static void test_task_stack(void)
 	if (!ASSERT_OK_PTR(skel, "bpf_iter_task_stack__open_and_load"))
 		return;
 
-	do_dummy_read(skel->progs.dump_task_stack);
-	do_dummy_read(skel->progs.get_task_user_stacks);
+	do_dummy_read(skel->progs.dump_task_stack, NULL);
+	do_dummy_read(skel->progs.get_task_user_stacks, NULL);
 
 	bpf_iter_task_stack__destroy(skel);
 }
@@ -174,7 +292,9 @@ static void *do_nothing(void *arg)
 
 static void test_task_file(void)
 {
+	DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
 	struct bpf_iter_task_file *skel;
+	union bpf_iter_link_info linfo;
 	pthread_t thread_id;
 	void *ret;
 
@@ -188,15 +308,31 @@ static void test_task_file(void)
 		  "pthread_create"))
 		goto done;
 
-	do_dummy_read(skel->progs.dump_task_file);
+	memset(&linfo, 0, sizeof(linfo));
+	linfo.task.tid = getpid();
+	opts.link_info = &linfo;
+	opts.link_info_len = sizeof(linfo);
+
+	do_dummy_read(skel->progs.dump_task_file, &opts);
 
 	if (!ASSERT_FALSE(pthread_join(thread_id, &ret) || ret != NULL,
 		  "pthread_join"))
 		goto done;
 
 	ASSERT_EQ(skel->bss->count, 0, "check_count");
+	ASSERT_EQ(skel->bss->unique_tgid_count, 1, "check_unique_tgid_count");
 
-done:
+	skel->bss->count = 0;
+	skel->bss->unique_tgid_count = 0;
+
+	do_dummy_read(skel->progs.dump_task_file, NULL);
+
+	ASSERT_GE(skel->bss->count, 0, "check_count");
+	ASSERT_GE(skel->bss->unique_tgid_count, 1, "check_unique_tgid_count");
+
+	check_bpf_link_info(skel->progs.dump_task_file);
+
+ done:
 	bpf_iter_task_file__destroy(skel);
 }
 
@@ -274,7 +410,7 @@ static void test_tcp4(void)
 	if (!ASSERT_OK_PTR(skel, "bpf_iter_tcp4__open_and_load"))
 		return;
 
-	do_dummy_read(skel->progs.dump_tcp4);
+	do_dummy_read(skel->progs.dump_tcp4, NULL);
 
 	bpf_iter_tcp4__destroy(skel);
 }
@@ -287,7 +423,7 @@ static void test_tcp6(void)
 	if (!ASSERT_OK_PTR(skel, "bpf_iter_tcp6__open_and_load"))
 		return;
 
-	do_dummy_read(skel->progs.dump_tcp6);
+	do_dummy_read(skel->progs.dump_tcp6, NULL);
 
 	bpf_iter_tcp6__destroy(skel);
 }
@@ -300,7 +436,7 @@ static void test_udp4(void)
 	if (!ASSERT_OK_PTR(skel, "bpf_iter_udp4__open_and_load"))
 		return;
 
-	do_dummy_read(skel->progs.dump_udp4);
+	do_dummy_read(skel->progs.dump_udp4, NULL);
 
 	bpf_iter_udp4__destroy(skel);
 }
@@ -313,7 +449,7 @@ static void test_udp6(void)
 	if (!ASSERT_OK_PTR(skel, "bpf_iter_udp6__open_and_load"))
 		return;
 
-	do_dummy_read(skel->progs.dump_udp6);
+	do_dummy_read(skel->progs.dump_udp6, NULL);
 
 	bpf_iter_udp6__destroy(skel);
 }
@@ -326,7 +462,7 @@ static void test_unix(void)
 	if (!ASSERT_OK_PTR(skel, "bpf_iter_unix__open_and_load"))
 		return;
 
-	do_dummy_read(skel->progs.dump_unix);
+	do_dummy_read(skel->progs.dump_unix, NULL);
 
 	bpf_iter_unix__destroy(skel);
 }
@@ -988,7 +1124,7 @@ static void test_bpf_sk_storage_get(void)
 	if (!ASSERT_OK(err, "bpf_map_update_elem"))
 		goto close_socket;
 
-	do_dummy_read(skel->progs.fill_socket_owner);
+	do_dummy_read(skel->progs.fill_socket_owner, NULL);
 
 	err = bpf_map_lookup_elem(map_fd, &sock_fd, &val);
 	if (CHECK(err || val != getpid(), "bpf_map_lookup_elem",
@@ -996,7 +1132,7 @@ static void test_bpf_sk_storage_get(void)
 	    getpid(), val, err))
 		goto close_socket;
 
-	do_dummy_read(skel->progs.negate_socket_local_storage);
+	do_dummy_read(skel->progs.negate_socket_local_storage, NULL);
 
 	err = bpf_map_lookup_elem(map_fd, &sock_fd, &val);
 	CHECK(err || val != -getpid(), "bpf_map_lookup_elem",
@@ -1116,7 +1252,7 @@ static void test_link_iter(void)
 	if (!ASSERT_OK_PTR(skel, "bpf_iter_bpf_link__open_and_load"))
 		return;
 
-	do_dummy_read(skel->progs.dump_bpf_link);
+	do_dummy_read(skel->progs.dump_bpf_link, NULL);
 
 	bpf_iter_bpf_link__destroy(skel);
 }
@@ -1129,7 +1265,7 @@ static void test_ksym_iter(void)
 	if (!ASSERT_OK_PTR(skel, "bpf_iter_ksym__open_and_load"))
 		return;
 
-	do_dummy_read(skel->progs.dump_ksym);
+	do_dummy_read(skel->progs.dump_ksym, NULL);
 
 	bpf_iter_ksym__destroy(skel);
 }
@@ -1154,7 +1290,7 @@ static void str_strip_first_line(char *str)
 	*dst = '\0';
 }
 
-static void test_task_vma(void)
+static void test_task_vma_(struct bpf_iter_attach_opts *opts)
 {
 	int err, iter_fd = -1, proc_maps_fd = -1;
 	struct bpf_iter_task_vma *skel;
@@ -1166,13 +1302,14 @@ static void test_task_vma(void)
 		return;
 
 	skel->bss->pid = getpid();
+	skel->bss->one_task = opts ? 1 : 0;
 
 	err = bpf_iter_task_vma__load(skel);
 	if (!ASSERT_OK(err, "bpf_iter_task_vma__load"))
 		goto out;
 
 	skel->links.proc_maps = bpf_program__attach_iter(
-		skel->progs.proc_maps, NULL);
+		skel->progs.proc_maps, opts);
 
 	if (!ASSERT_OK_PTR(skel->links.proc_maps, "bpf_program__attach_iter")) {
 		skel->links.proc_maps = NULL;
@@ -1211,12 +1348,29 @@ static void test_task_vma(void)
 	str_strip_first_line(proc_maps_output);
 
 	ASSERT_STREQ(task_vma_output, proc_maps_output, "compare_output");
+
+	check_bpf_link_info(skel->progs.proc_maps);
+
 out:
 	close(proc_maps_fd);
 	close(iter_fd);
 	bpf_iter_task_vma__destroy(skel);
 }
 
+static void test_task_vma(void)
+{
+	DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
+	union bpf_iter_link_info linfo;
+
+	memset(&linfo, 0, sizeof(linfo));
+	linfo.task.tid = getpid();
+	opts.link_info = &linfo;
+	opts.link_info_len = sizeof(linfo);
+
+	test_task_vma_(&opts);
+	test_task_vma_(NULL);
+}
+
 void test_bpf_iter(void)
 {
 	if (test__start_subtest("btf_id_or_null"))
@@ -1229,6 +1383,10 @@ void test_bpf_iter(void)
 		test_bpf_map();
 	if (test__start_subtest("task"))
 		test_task();
+	if (test__start_subtest("task_tgid"))
+		test_task_tgid();
+	if (test__start_subtest("task_pidfd"))
+		test_task_pidfd();
 	if (test__start_subtest("task_sleepable"))
 		test_task_sleepable();
 	if (test__start_subtest("task_stack"))
diff --git a/tools/testing/selftests/bpf/prog_tests/btf_dump.c b/tools/testing/selftests/bpf/prog_tests/btf_dump.c
index 5fce7008d1ff..32c34ce9cbeb 100644
--- a/tools/testing/selftests/bpf/prog_tests/btf_dump.c
+++ b/tools/testing/selftests/bpf/prog_tests/btf_dump.c
@@ -764,7 +764,7 @@ static void test_btf_dump_struct_data(struct btf *btf, struct btf_dump *d,
 
 	/* union with nested struct */
 	TEST_BTF_DUMP_DATA(btf, d, "union", str, union bpf_iter_link_info, BTF_F_COMPACT,
-			   "(union bpf_iter_link_info){.map = (struct){.map_fd = (__u32)1,},}",
+			   "(union bpf_iter_link_info){.map = (struct){.map_fd = (__u32)1,},.task = (struct){.tid = (__u32)1,},}",
 			   { .map = { .map_fd = 1 }});
 
 	/* struct skb with nested structs/unions; because type output is so
diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_task.c b/tools/testing/selftests/bpf/progs/bpf_iter_task.c
index d22741272692..96131b9a1caa 100644
--- a/tools/testing/selftests/bpf/progs/bpf_iter_task.c
+++ b/tools/testing/selftests/bpf/progs/bpf_iter_task.c
@@ -6,6 +6,10 @@
 
 char _license[] SEC("license") = "GPL";
 
+uint32_t tid = 0;
+int num_unknown_tid = 0;
+int num_known_tid = 0;
+
 SEC("iter/task")
 int dump_task(struct bpf_iter__task *ctx)
 {
@@ -18,6 +22,11 @@ int dump_task(struct bpf_iter__task *ctx)
 		return 0;
 	}
 
+	if (task->pid != tid)
+		num_unknown_tid++;
+	else
+		num_known_tid++;
+
 	if (ctx->meta->seq_num == 0)
 		BPF_SEQ_PRINTF(seq, "    tgid      gid\n");
 
diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_task_file.c b/tools/testing/selftests/bpf/progs/bpf_iter_task_file.c
index 6e7b400888fe..031455ed8748 100644
--- a/tools/testing/selftests/bpf/progs/bpf_iter_task_file.c
+++ b/tools/testing/selftests/bpf/progs/bpf_iter_task_file.c
@@ -7,6 +7,8 @@ char _license[] SEC("license") = "GPL";
 
 int count = 0;
 int tgid = 0;
+int last_tgid = -1;
+int unique_tgid_count = 0;
 
 SEC("iter/task_file")
 int dump_task_file(struct bpf_iter__task_file *ctx)
@@ -27,6 +29,11 @@ int dump_task_file(struct bpf_iter__task_file *ctx)
 	if (tgid == task->tgid && task->tgid != task->pid)
 		count++;
 
+	if (last_tgid != task->tgid) {
+		last_tgid = task->tgid;
+		unique_tgid_count++;
+	}
+
 	BPF_SEQ_PRINTF(seq, "%8d %8d %8d %lx\n", task->tgid, task->pid, fd,
 		       (long)file->f_op);
 	return 0;
diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c b/tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c
index 4ea6a37d1345..44f4a31c2ddd 100644
--- a/tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c
+++ b/tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c
@@ -20,6 +20,7 @@ char _license[] SEC("license") = "GPL";
 #define D_PATH_BUF_SIZE 1024
 char d_path_buf[D_PATH_BUF_SIZE] = {};
 __u32 pid = 0;
+__u32 one_task = 0;
 
 SEC("iter/task_vma") int proc_maps(struct bpf_iter__task_vma *ctx)
 {
@@ -33,8 +34,11 @@ SEC("iter/task_vma") int proc_maps(struct bpf_iter__task_vma *ctx)
 		return 0;
 
 	file = vma->vm_file;
-	if (task->tgid != pid)
+	if (task->tgid != pid) {
+		if (one_task)
+			BPF_SEQ_PRINTF(seq, "unexpected task (%d != %d)", task->tgid, pid);
 		return 0;
+	}
 	perm_str[0] = (vma->vm_flags & VM_READ) ? 'r' : '-';
 	perm_str[1] = (vma->vm_flags & VM_WRITE) ? 'w' : '-';
 	perm_str[2] = (vma->vm_flags & VM_EXEC) ? 'x' : '-';
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v5 1/3] bpf: Parameterize task iterators.
  2022-08-11  0:16 ` [PATCH bpf-next v5 1/3] bpf: " Kui-Feng Lee
@ 2022-08-13 22:17   ` Yonghong Song
  2022-08-15  1:01     ` Alexei Starovoitov
                       ` (3 more replies)
  2022-08-14 20:24   ` Jiri Olsa
                     ` (2 subsequent siblings)
  3 siblings, 4 replies; 22+ messages in thread
From: Yonghong Song @ 2022-08-13 22:17 UTC (permalink / raw)
  To: Kui-Feng Lee, bpf, ast, daniel, andrii, kernel-team



On 8/10/22 5:16 PM, Kui-Feng Lee wrote:
> Allow creating an iterator that loops through resources of one task/thread.
> 
> People could only create iterators to loop through all resources of
> files, vma, and tasks in the system, even though they were interested
> in only the resources of a specific task or process.  Passing the
> additional parameters, people can now create an iterator to go
> through all resources or only the resources of a task.
> 
> Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
> ---
>   include/linux/bpf.h            |  29 ++++++++
>   include/uapi/linux/bpf.h       |   8 +++
>   kernel/bpf/task_iter.c         | 126 ++++++++++++++++++++++++++-------
>   tools/include/uapi/linux/bpf.h |   8 +++
>   4 files changed, 147 insertions(+), 24 deletions(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 11950029284f..6bbe53d06faa 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1716,8 +1716,37 @@ int bpf_obj_get_user(const char __user *pathname, int flags);
>   	extern int bpf_iter_ ## target(args);			\
>   	int __init bpf_iter_ ## target(args) { return 0; }
>   
> +/*
> + * The task type of iterators.
> + *
> + * For BPF task iterators, they can be parameterized with various
> + * parameters to visit only some of tasks.
> + *
> + * BPF_TASK_ITER_ALL (default)
> + *	Iterate over resources of every task.
> + *
> + * BPF_TASK_ITER_TID
> + *	Iterate over resources of a task/tid.
> + *
> + * BPF_TASK_ITER_TGID
> + *	Iterate over reosurces of evevry task of a process / task group.
> + */
> +enum bpf_iter_task_type {
> +	BPF_TASK_ITER_ALL = 0,
> +	BPF_TASK_ITER_TID,
> +	BPF_TASK_ITER_TGID,
> +};
> +
>   struct bpf_iter_aux_info {
>   	struct bpf_map *map;
> +	struct {
> +		enum bpf_iter_task_type	type;
> +		union {
> +			u32 tid;
> +			u32 tgid;
> +			u32 pid_fd;
> +		};
> +	} task;
>   };
>   
>   typedef int (*bpf_iter_attach_target_t)(struct bpf_prog *prog,
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index ffcbf79a556b..6328aca0cf5c 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -91,6 +91,14 @@ union bpf_iter_link_info {
>   	struct {
>   		__u32	map_fd;
>   	} map;
> +	/*
> +	 * Parameters of task iterators.
> +	 */

The comment can be put into one line.

> +	struct {
> +		__u32	tid;
> +		__u32	tgid;
> +		__u32	pid_fd;

The above is a max of kernel and user space terminologies.
tid/pid are user space concept and tgid is a kernel space
concept.

In bpf uapi header, we have

struct bpf_pidns_info {
         __u32 pid;
         __u32 tgid;
};

which uses kernel terminologies.

So I suggest the bpf_iter_link_info.task can also
use pure kernel terminology pid/tgid/tgid_fd here.

Alternative, using pure user space terminology
can be tid/pid/pid_fd but seems the kernel terminology
might be better since we already have precedence.


> +	} task;
>   };
>   
>   /* BPF syscall commands, see bpf(2) man-page for more details. */
> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> index 8c921799def4..f2e21efe075d 100644
> --- a/kernel/bpf/task_iter.c
> +++ b/kernel/bpf/task_iter.c
> @@ -12,6 +12,12 @@
>   
>   struct bpf_iter_seq_task_common {
>   	struct pid_namespace *ns;
> +	enum bpf_iter_task_type	type;
> +	union {
> +		u32 tid;
> +		u32 tgid;
> +		u32 pid_fd;
> +	};
>   };
>   
>   struct bpf_iter_seq_task_info {
> @@ -22,24 +28,40 @@ struct bpf_iter_seq_task_info {
>   	u32 tid;
>   };
>   
> -static struct task_struct *task_seq_get_next(struct pid_namespace *ns,
> +static struct task_struct *task_seq_get_next(struct bpf_iter_seq_task_common *common,
>   					     u32 *tid,
>   					     bool skip_if_dup_files)
>   {
>   	struct task_struct *task = NULL;
>   	struct pid *pid;
>   
> +	if (common->type == BPF_TASK_ITER_TID) {
> +		if (*tid && *tid != common->tid)
> +			return NULL;
> +		rcu_read_lock();
> +		pid = find_pid_ns(common->tid, common->ns);
> +		if (pid) {
> +			task = get_pid_task(pid, PIDTYPE_PID);
> +			*tid = common->tid;
> +		}
> +		rcu_read_unlock();
> +		return task;
> +	}
> +
>   	rcu_read_lock();
>   retry:
> -	pid = find_ge_pid(*tid, ns);
> +	pid = find_ge_pid(*tid, common->ns);
>   	if (pid) {
> -		*tid = pid_nr_ns(pid, ns);
> +		*tid = pid_nr_ns(pid, common->ns);
>   		task = get_pid_task(pid, PIDTYPE_PID);
> +

This extra line is unnecessary.

>   		if (!task) {
>   			++*tid;
>   			goto retry;
> -		} else if (skip_if_dup_files && !thread_group_leader(task) &&
> -			   task->files == task->group_leader->files) {
> +		} else if ((skip_if_dup_files && !thread_group_leader(task) &&
> +			    task->files == task->group_leader->files) ||
> +			   (common->type == BPF_TASK_ITER_TGID &&
> +			    __task_pid_nr_ns(task, PIDTYPE_TGID, common->ns) != common->tgid)) {
>   			put_task_struct(task);
>   			task = NULL;
>   			++*tid;
> @@ -56,7 +78,8 @@ static void *task_seq_start(struct seq_file *seq, loff_t *pos)
>   	struct bpf_iter_seq_task_info *info = seq->private;
>   	struct task_struct *task;
>   
> -	task = task_seq_get_next(info->common.ns, &info->tid, false);
> +	task = task_seq_get_next(&info->common, &info->tid, false);
> +

Extra line?

>   	if (!task)
>   		return NULL;
>   
> @@ -73,7 +96,8 @@ static void *task_seq_next(struct seq_file *seq, void *v, loff_t *pos)
>   	++*pos;
>   	++info->tid;
>   	put_task_struct((struct task_struct *)v);
> -	task = task_seq_get_next(info->common.ns, &info->tid, false);
> +

Extra line?

> +	task = task_seq_get_next(&info->common, &info->tid, false);
>   	if (!task)
>   		return NULL;
>   
> @@ -117,6 +141,43 @@ static void task_seq_stop(struct seq_file *seq, void *v)
>   		put_task_struct((struct task_struct *)v);
>   }
>   
> +static int bpf_iter_attach_task(struct bpf_prog *prog,
> +				union bpf_iter_link_info *linfo,
> +				struct bpf_iter_aux_info *aux)
> +{
> +	unsigned int flags;
> +	struct pid_namespace *ns;
> +	struct pid *pid;
> +	pid_t tgid;

Follow reverse chrismas tree style?

> +
> +	if (linfo->task.tid != 0) {
> +		aux->task.type = BPF_TASK_ITER_TID;
> +		aux->task.tid = linfo->task.tid;
> +	} else if (linfo->task.tgid != 0) {
> +		aux->task.type = BPF_TASK_ITER_TGID;
> +		aux->task.tgid = linfo->task.tgid;
> +	} else if (linfo->task.pid_fd != 0) {
> +		aux->task.type = BPF_TASK_ITER_TGID;
> +		pid = pidfd_get_pid(linfo->task.pid_fd, &flags);
> +		if (IS_ERR(pid))
> +			return PTR_ERR(pid);
> +
> +		ns = task_active_pid_ns(current);
> +		if (IS_ERR(ns))
> +			return PTR_ERR(ns);
> +
> +		tgid = pid_nr_ns(pid, ns);
> +		if (tgid <= 0)
> +			return -EINVAL;

Is it possible that tgid <= 0? I think no, so
the above two lines are unnecessary.

> +
> +		aux->task.tgid = tgid;

We leaks the reference count for 'pid' here.
We need to add
		put_pid(pid);
to release the reference for pid.
	
> +	} else {
> +		aux->task.type = BPF_TASK_ITER_ALL;
> +	}

What will happen if two or all of task.tid, task.tgid and
task.pid_fd non-zero? Should we fail here?

> +
> +	return 0;
> +}
> +
>   static const struct seq_operations task_seq_ops = {
>   	.start	= task_seq_start,
>   	.next	= task_seq_next,
> @@ -137,8 +198,7 @@ struct bpf_iter_seq_task_file_info {
>   static struct file *
[...]
>   
> @@ -307,11 +381,10 @@ enum bpf_task_vma_iter_find_op {
>   static struct vm_area_struct *
>   task_vma_seq_get_next(struct bpf_iter_seq_task_vma_info *info)
>   {
> -	struct pid_namespace *ns = info->common.ns;
>   	enum bpf_task_vma_iter_find_op op;
>   	struct vm_area_struct *curr_vma;
>   	struct task_struct *curr_task;
> -	u32 curr_tid = info->tid;
> +	u32 saved_tid = info->tid;
>   
>   	/* If this function returns a non-NULL vma, it holds a reference to
>   	 * the task_struct, and holds read lock on vma->mm->mmap_lock.
> @@ -371,14 +444,13 @@ task_vma_seq_get_next(struct bpf_iter_seq_task_vma_info *info)
>   		}
>   	} else {
>   again:
> -		curr_task = task_seq_get_next(ns, &curr_tid, true);
> +		curr_task = task_seq_get_next(&info->common, &info->tid, true);
>   		if (!curr_task) {
> -			info->tid = curr_tid + 1;
> +			info->tid++;
>   			goto finish;
>   		}
>   
> -		if (curr_tid != info->tid) {
> -			info->tid = curr_tid;
> +		if (saved_tid != info->tid) {
>   			/* new task, process the first vma */
>   			op = task_vma_iter_first_vma;
>   		} else {
> @@ -430,9 +502,12 @@ task_vma_seq_get_next(struct bpf_iter_seq_task_vma_info *info)
>   	return curr_vma;
>   
>   next_task:
> +	if (info->common.type == BPF_TASK_ITER_TID)
> +		goto finish;
> +
>   	put_task_struct(curr_task);
>   	info->task = NULL;
> -	curr_tid++;
> +	info->tid++;

saved_tid = ++info->tid?

>   	goto again;
>   
>   finish:
[...]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v5 2/3] bpf: Handle bpf_link_info for the parameterized task BPF iterators.
  2022-08-11  0:16 ` [PATCH bpf-next v5 2/3] bpf: Handle bpf_link_info for the parameterized task BPF iterators Kui-Feng Lee
@ 2022-08-13 22:23   ` Yonghong Song
  0 siblings, 0 replies; 22+ messages in thread
From: Yonghong Song @ 2022-08-13 22:23 UTC (permalink / raw)
  To: Kui-Feng Lee, bpf, ast, daniel, andrii, kernel-team



On 8/10/22 5:16 PM, Kui-Feng Lee wrote:
> Add new fields to bpf_link_info that users can query it through
> bpf_obj_get_info_by_fd().
> 
> Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
> ---
>   include/uapi/linux/bpf.h       |  4 ++++
>   kernel/bpf/task_iter.c         | 18 ++++++++++++++++++
>   tools/include/uapi/linux/bpf.h |  4 ++++
>   3 files changed, 26 insertions(+)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 6328aca0cf5c..627a16981c90 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -6143,6 +6143,10 @@ struct bpf_link_info {
>   				struct {
>   					__u32 map_id;
>   				} map;
> +				struct {
> +					__u32 tid;
> +					__u32 tgid;

pid/tgid or tid/pid?

> +				} task;

Please use another union outside of struct { ... } map.
Please see cgroup_iter patch for details:
   https://lore.kernel.org/bpf/20220812202802.3774257-2-haoluo@google.com/

>   			};
>   		} iter;
>   		struct  {
> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> index f2e21efe075d..d392b46c6d19 100644
> --- a/kernel/bpf/task_iter.c
> +++ b/kernel/bpf/task_iter.c
> @@ -606,6 +606,21 @@ static const struct bpf_iter_seq_info task_seq_info = {
>   	.seq_priv_size		= sizeof(struct bpf_iter_seq_task_info),
>   };
>   
> +static int bpf_iter_fill_link_info(const struct bpf_iter_aux_info *aux, struct bpf_link_info *info)
> +{
> +	switch (aux->task.type) {
> +	case BPF_TASK_ITER_TID:
> +		info->iter.task.tid = aux->task.tid;
> +		break;
> +	case BPF_TASK_ITER_TGID:
> +		info->iter.task.tgid = aux->task.tgid;
> +		break;
> +	default:
> +		break;
> +	}
> +	return 0;
> +}
> +
>   static struct bpf_iter_reg task_reg_info = {
>   	.target			= "task",
>   	.attach_target		= bpf_iter_attach_task,
> @@ -616,6 +631,7 @@ static struct bpf_iter_reg task_reg_info = {
>   		  PTR_TO_BTF_ID_OR_NULL },
>   	},
>   	.seq_info		= &task_seq_info,
> +	.fill_link_info		= bpf_iter_fill_link_info,

How about show_fdinfo?

>   };
>   
>   static const struct bpf_iter_seq_info task_file_seq_info = {
> @@ -637,6 +653,7 @@ static struct bpf_iter_reg task_file_reg_info = {
>   		  PTR_TO_BTF_ID_OR_NULL },
>   	},
>   	.seq_info		= &task_file_seq_info,
> +	.fill_link_info		= bpf_iter_fill_link_info,
>   };
>   
>   static const struct bpf_iter_seq_info task_vma_seq_info = {
> @@ -658,6 +675,7 @@ static struct bpf_iter_reg task_vma_reg_info = {
>   		  PTR_TO_BTF_ID_OR_NULL },
>   	},
>   	.seq_info		= &task_vma_seq_info,
> +	.fill_link_info		= bpf_iter_fill_link_info,
>   };
>   
[...]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v5 3/3] selftests/bpf: Test parameterized task BPF iterators.
  2022-08-11  0:16 ` [PATCH bpf-next v5 3/3] selftests/bpf: Test " Kui-Feng Lee
@ 2022-08-13 22:50   ` Yonghong Song
  2022-08-18 17:24     ` Kui-Feng Lee
  2022-08-16  5:15   ` Andrii Nakryiko
  1 sibling, 1 reply; 22+ messages in thread
From: Yonghong Song @ 2022-08-13 22:50 UTC (permalink / raw)
  To: Kui-Feng Lee, bpf, ast, daniel, andrii, kernel-team



On 8/10/22 5:16 PM, Kui-Feng Lee wrote:
> Test iterators of vma, files, and tasks of tasks.
> 
> Ensure the API works appropriately to visit all tasks,
> tasks in a process, or a particular task.
> 
> Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
> ---
>   .../selftests/bpf/prog_tests/bpf_iter.c       | 204 ++++++++++++++++--
>   .../selftests/bpf/prog_tests/btf_dump.c       |   2 +-
>   .../selftests/bpf/progs/bpf_iter_task.c       |   9 +
>   .../selftests/bpf/progs/bpf_iter_task_file.c  |   7 +
>   .../selftests/bpf/progs/bpf_iter_task_vma.c   |   6 +-
>   5 files changed, 203 insertions(+), 25 deletions(-)
> 
> diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
> index a33874b081b6..e66f1f3db562 100644
> --- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
> +++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
> @@ -1,6 +1,9 @@
>   // SPDX-License-Identifier: GPL-2.0
>   /* Copyright (c) 2020 Facebook */
>   #include <test_progs.h>
> +#include <sys/syscall.h>
> +#include <unistd.h>
> +#include <signal.h>

do we need unistd.h and signal.h?

>   #include "bpf_iter_ipv6_route.skel.h"
>   #include "bpf_iter_netlink.skel.h"
>   #include "bpf_iter_bpf_map.skel.h"
> @@ -42,13 +45,13 @@ static void test_btf_id_or_null(void)
>   	}
>   }
>   
> -static void do_dummy_read(struct bpf_program *prog)
> +static void do_dummy_read(struct bpf_program *prog, struct bpf_iter_attach_opts *opts)
>   {
>   	struct bpf_link *link;
>   	char buf[16] = {};
>   	int iter_fd, len;
>   
> -	link = bpf_program__attach_iter(prog, NULL);
> +	link = bpf_program__attach_iter(prog, opts);
>   	if (!ASSERT_OK_PTR(link, "attach_iter"))
>   		return;
>   
> @@ -91,7 +94,7 @@ static void test_ipv6_route(void)
>   	if (!ASSERT_OK_PTR(skel, "bpf_iter_ipv6_route__open_and_load"))
>   		return;
>   
> -	do_dummy_read(skel->progs.dump_ipv6_route);
> +	do_dummy_read(skel->progs.dump_ipv6_route, NULL);
>   
>   	bpf_iter_ipv6_route__destroy(skel);
>   }
> @@ -104,7 +107,7 @@ static void test_netlink(void)
>   	if (!ASSERT_OK_PTR(skel, "bpf_iter_netlink__open_and_load"))
>   		return;
>   
> -	do_dummy_read(skel->progs.dump_netlink);
> +	do_dummy_read(skel->progs.dump_netlink, NULL);
>   
>   	bpf_iter_netlink__destroy(skel);
>   }
> @@ -117,24 +120,139 @@ static void test_bpf_map(void)
>   	if (!ASSERT_OK_PTR(skel, "bpf_iter_bpf_map__open_and_load"))
>   		return;
>   
> -	do_dummy_read(skel->progs.dump_bpf_map);
> +	do_dummy_read(skel->progs.dump_bpf_map, NULL);
>   
>   	bpf_iter_bpf_map__destroy(skel);
>   }
>   
> -static void test_task(void)
> +static int pidfd_open(pid_t pid, unsigned int flags)
> +{
> +	return syscall(SYS_pidfd_open, pid, flags);
> +}
> +
> +static void check_bpf_link_info(const struct bpf_program *prog)
> +{
> +	DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
> +	union bpf_iter_link_info linfo;
> +	struct bpf_link_info info = {};
> +	__u32 info_len;
> +	struct bpf_link *link;
> +	int err;

Reverse christmas tree style?

> +
> +	memset(&linfo, 0, sizeof(linfo));
> +	linfo.task.tid = getpid();
> +	opts.link_info = &linfo;
> +	opts.link_info_len = sizeof(linfo);
> +
> +	link = bpf_program__attach_iter(prog, &opts);
> +	if (!ASSERT_OK_PTR(link, "attach_iter"))
> +		return;
> +
> +	info_len = sizeof(info);
> +	err = bpf_obj_get_info_by_fd(bpf_link__fd(link), &info, &info_len);
> +	if (ASSERT_OK(err, "bpf_obj_get_info_by_fd"))
> +		ASSERT_EQ(info.iter.task.tid, getpid(), "check_task_tid");
> +
> +	bpf_link__destroy(link);
> +}
> +
> +static pthread_mutex_t do_nothing_mutex;
> +
> +static void *do_nothing_wait(void *arg)
> +{
> +	pthread_mutex_lock(&do_nothing_mutex);
> +	pthread_mutex_unlock(&do_nothing_mutex);
> +
> +	pthread_exit(arg);
> +}
> +
> +static void test_task_(struct bpf_iter_attach_opts *opts, int num_unknown, int num_known)

The function test_task_ name is weird. Maybe test_task_common?

>   {
>   	struct bpf_iter_task *skel;
> +	pthread_t thread_id;
> +	void *ret;
>   
>   	skel = bpf_iter_task__open_and_load();
>   	if (!ASSERT_OK_PTR(skel, "bpf_iter_task__open_and_load"))
>   		return;
>   
> -	do_dummy_read(skel->progs.dump_task);
> +	if (!ASSERT_OK(pthread_mutex_init(&do_nothing_mutex, NULL), "pthread_mutex_init"))
> +		goto done;
> +	if (!ASSERT_OK(pthread_mutex_lock(&do_nothing_mutex), "pthread_mutex_lock"))
> +		goto done;
> +
> +	if (!ASSERT_OK(pthread_create(&thread_id, NULL, &do_nothing_wait, NULL),
> +		  "pthread_create"))
> +		goto done;
> +
> +
> +	skel->bss->tid = getpid();
> +
> +	do_dummy_read(skel->progs.dump_task, opts);
> +
> +	if (!ASSERT_OK(pthread_mutex_unlock(&do_nothing_mutex), "pthread_mutex_unlock"))
> +		goto done;
> +
> +	if (num_unknown >= 0)
> +		ASSERT_EQ(skel->bss->num_unknown_tid, num_unknown, "check_num_unknown_tid");
> +	if (num_known >= 0)
> +		ASSERT_EQ(skel->bss->num_known_tid, num_known, "check_num_known_tid");
>   
> +	ASSERT_FALSE(pthread_join(thread_id, &ret) || ret != NULL,
> +		     "pthread_join");
> +
> +done:
>   	bpf_iter_task__destroy(skel);
>   }
>   
> +static void test_task(void)
> +{
> +	DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
> +	union bpf_iter_link_info linfo;
> +
> +	memset(&linfo, 0, sizeof(linfo));
> +	linfo.task.tid = getpid();
> +	opts.link_info = &linfo;
> +	opts.link_info_len = sizeof(linfo);
> +
> +	test_task_(&opts, 0, 1);
> +
> +	test_task_(NULL, -1, 1);
> +}
> +
> +static void test_task_tgid(void)
> +{
> +	DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
> +	union bpf_iter_link_info linfo;
> +
> +	memset(&linfo, 0, sizeof(linfo));
> +	linfo.task.tgid = getpid();
> +	opts.link_info = &linfo;
> +	opts.link_info_len = sizeof(linfo);
> +
> +	test_task_(&opts, 1, 1);
> +}
> +
> +static void test_task_pidfd(void)
> +{
> +	DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
> +	union bpf_iter_link_info linfo;
> +	int pidfd;
> +
> +	pidfd = pidfd_open(getpid(), 0);
> +	if (!ASSERT_GE(pidfd, 0, "pidfd_open"))
> +		return;
> +
> +	memset(&linfo, 0, sizeof(linfo));
> +	linfo.task.pid_fd = pidfd;

In kernel, pidfd has to be > 0 to be effective.
So in the above, you should use ASSERT_GT instead of
ASSERT_GE. For test_progs, pidfd == 0 won't happen
since the program does not close stdin.

> +	opts.link_info = &linfo;
> +	opts.link_info_len = sizeof(linfo);
> +
> +	test_task_(&opts, 1, 1);
> +
> +	close(pidfd);
> +}
> +
>   static void test_task_sleepable(void)
>   {
>   	struct bpf_iter_task *skel;
> @@ -143,7 +261,7 @@ static void test_task_sleepable(void)
>   	if (!ASSERT_OK_PTR(skel, "bpf_iter_task__open_and_load"))
>   		return;
>   
> -	do_dummy_read(skel->progs.dump_task_sleepable);
> +	do_dummy_read(skel->progs.dump_task_sleepable, NULL);
>   
>   	ASSERT_GT(skel->bss->num_expected_failure_copy_from_user_task, 0,
>   		  "num_expected_failure_copy_from_user_task");
> @@ -161,8 +279,8 @@ static void test_task_stack(void)
>   	if (!ASSERT_OK_PTR(skel, "bpf_iter_task_stack__open_and_load"))
>   		return;
>   
> -	do_dummy_read(skel->progs.dump_task_stack);
> -	do_dummy_read(skel->progs.get_task_user_stacks);
> +	do_dummy_read(skel->progs.dump_task_stack, NULL);
> +	do_dummy_read(skel->progs.get_task_user_stacks, NULL);
>   
>   	bpf_iter_task_stack__destroy(skel);
>   }
> @@ -174,7 +292,9 @@ static void *do_nothing(void *arg)
>   
>   static void test_task_file(void)
>   {
> +	DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
>   	struct bpf_iter_task_file *skel;
> +	union bpf_iter_link_info linfo;
>   	pthread_t thread_id;
>   	void *ret;
>   
> @@ -188,15 +308,31 @@ static void test_task_file(void)
>   		  "pthread_create"))
>   		goto done;
>   
> -	do_dummy_read(skel->progs.dump_task_file);
> +	memset(&linfo, 0, sizeof(linfo));
> +	linfo.task.tid = getpid();
> +	opts.link_info = &linfo;
> +	opts.link_info_len = sizeof(linfo);
> +
> +	do_dummy_read(skel->progs.dump_task_file, &opts);
>   
>   	if (!ASSERT_FALSE(pthread_join(thread_id, &ret) || ret != NULL,
>   		  "pthread_join"))
>   		goto done;
>   
>   	ASSERT_EQ(skel->bss->count, 0, "check_count");
> +	ASSERT_EQ(skel->bss->unique_tgid_count, 1, "check_unique_tgid_count");
>   
> -done:
> +	skel->bss->count = 0;
> +	skel->bss->unique_tgid_count = 0;
> +
> +	do_dummy_read(skel->progs.dump_task_file, NULL);
> +
> +	ASSERT_GE(skel->bss->count, 0, "check_count");
> +	ASSERT_GE(skel->bss->unique_tgid_count, 1, "check_unique_tgid_count");

This is not precise. ASSERT_EQ will be better, right?
Maybe reset last_tgid as well?

> +
> +	check_bpf_link_info(skel->progs.dump_task_file);
> +
> + done:
>   	bpf_iter_task_file__destroy(skel);
>   }
>   
> @@ -274,7 +410,7 @@ static void test_tcp4(void)
>   	if (!ASSERT_OK_PTR(skel, "bpf_iter_tcp4__open_and_load"))
>   		return;
>   
> -	do_dummy_read(skel->progs.dump_tcp4);
> +	do_dummy_read(skel->progs.dump_tcp4, NULL);
>   
>   	bpf_iter_tcp4__destroy(skel);
>   }
> @@ -287,7 +423,7 @@ static void test_tcp6(void)
>   	if (!ASSERT_OK_PTR(skel, "bpf_iter_tcp6__open_and_load"))
>   		return;
>   
> -	do_dummy_read(skel->progs.dump_tcp6);
> +	do_dummy_read(skel->progs.dump_tcp6, NULL);
>   
>   	bpf_iter_tcp6__destroy(skel);
>   }
> @@ -300,7 +436,7 @@ static void test_udp4(void)
>   	if (!ASSERT_OK_PTR(skel, "bpf_iter_udp4__open_and_load"))
>   		return;
>   
> -	do_dummy_read(skel->progs.dump_udp4);
> +	do_dummy_read(skel->progs.dump_udp4, NULL);
>   
>   	bpf_iter_udp4__destroy(skel);
>   }
> @@ -313,7 +449,7 @@ static void test_udp6(void)
>   	if (!ASSERT_OK_PTR(skel, "bpf_iter_udp6__open_and_load"))
>   		return;
>   
> -	do_dummy_read(skel->progs.dump_udp6);
> +	do_dummy_read(skel->progs.dump_udp6, NULL);
>   
>   	bpf_iter_udp6__destroy(skel);
>   }
> @@ -326,7 +462,7 @@ static void test_unix(void)
>   	if (!ASSERT_OK_PTR(skel, "bpf_iter_unix__open_and_load"))
>   		return;
>   
> -	do_dummy_read(skel->progs.dump_unix);
> +	do_dummy_read(skel->progs.dump_unix, NULL);
>   
>   	bpf_iter_unix__destroy(skel);
>   }
> @@ -988,7 +1124,7 @@ static void test_bpf_sk_storage_get(void)
>   	if (!ASSERT_OK(err, "bpf_map_update_elem"))
>   		goto close_socket;
>   
> -	do_dummy_read(skel->progs.fill_socket_owner);
> +	do_dummy_read(skel->progs.fill_socket_owner, NULL);
>   
>   	err = bpf_map_lookup_elem(map_fd, &sock_fd, &val);
>   	if (CHECK(err || val != getpid(), "bpf_map_lookup_elem",
> @@ -996,7 +1132,7 @@ static void test_bpf_sk_storage_get(void)
>   	    getpid(), val, err))
>   		goto close_socket;
>   
> -	do_dummy_read(skel->progs.negate_socket_local_storage);
> +	do_dummy_read(skel->progs.negate_socket_local_storage, NULL);
>   
>   	err = bpf_map_lookup_elem(map_fd, &sock_fd, &val);
>   	CHECK(err || val != -getpid(), "bpf_map_lookup_elem",
> @@ -1116,7 +1252,7 @@ static void test_link_iter(void)
>   	if (!ASSERT_OK_PTR(skel, "bpf_iter_bpf_link__open_and_load"))
>   		return;
>   
> -	do_dummy_read(skel->progs.dump_bpf_link);
> +	do_dummy_read(skel->progs.dump_bpf_link, NULL);
>   
>   	bpf_iter_bpf_link__destroy(skel);
>   }
> @@ -1129,7 +1265,7 @@ static void test_ksym_iter(void)
>   	if (!ASSERT_OK_PTR(skel, "bpf_iter_ksym__open_and_load"))
>   		return;
>   
> -	do_dummy_read(skel->progs.dump_ksym);
> +	do_dummy_read(skel->progs.dump_ksym, NULL);
>   
>   	bpf_iter_ksym__destroy(skel);
>   }
> @@ -1154,7 +1290,7 @@ static void str_strip_first_line(char *str)
>   	*dst = '\0';
>   }
>   
> -static void test_task_vma(void)
> +static void test_task_vma_(struct bpf_iter_attach_opts *opts)

test_task_vma_common?

>   {
>   	int err, iter_fd = -1, proc_maps_fd = -1;
>   	struct bpf_iter_task_vma *skel;
> @@ -1166,13 +1302,14 @@ static void test_task_vma(void)
>   		return;
>   
>   	skel->bss->pid = getpid();
> +	skel->bss->one_task = opts ? 1 : 0;
>   
>   	err = bpf_iter_task_vma__load(skel);
>   	if (!ASSERT_OK(err, "bpf_iter_task_vma__load"))
>   		goto out;
>   
>   	skel->links.proc_maps = bpf_program__attach_iter(
> -		skel->progs.proc_maps, NULL);
> +		skel->progs.proc_maps, opts);
>   
>   	if (!ASSERT_OK_PTR(skel->links.proc_maps, "bpf_program__attach_iter")) {
>   		skel->links.proc_maps = NULL;
> @@ -1211,12 +1348,29 @@ static void test_task_vma(void)
>   	str_strip_first_line(proc_maps_output);
>   
>   	ASSERT_STREQ(task_vma_output, proc_maps_output, "compare_output");
> +
> +	check_bpf_link_info(skel->progs.proc_maps);
> +
>   out:
>   	close(proc_maps_fd);
>   	close(iter_fd);
>   	bpf_iter_task_vma__destroy(skel);
>   }
>   
> +static void test_task_vma(void)
> +{
> +	DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
> +	union bpf_iter_link_info linfo;
> +
> +	memset(&linfo, 0, sizeof(linfo));
> +	linfo.task.tid = getpid();
> +	opts.link_info = &linfo;
> +	opts.link_info_len = sizeof(linfo);
> +
> +	test_task_vma_(&opts);
> +	test_task_vma_(NULL);
> +}
> +
[...]
> diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c b/tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c
> index 4ea6a37d1345..44f4a31c2ddd 100644
> --- a/tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c
> +++ b/tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c
> @@ -20,6 +20,7 @@ char _license[] SEC("license") = "GPL";
>   #define D_PATH_BUF_SIZE 1024
>   char d_path_buf[D_PATH_BUF_SIZE] = {};
>   __u32 pid = 0;
> +__u32 one_task = 0;
>   
>   SEC("iter/task_vma") int proc_maps(struct bpf_iter__task_vma *ctx)
>   {
> @@ -33,8 +34,11 @@ SEC("iter/task_vma") int proc_maps(struct bpf_iter__task_vma *ctx)
>   		return 0;
>   
>   	file = vma->vm_file;
> -	if (task->tgid != pid)
> +	if (task->tgid != pid) {
> +		if (one_task)
> +			BPF_SEQ_PRINTF(seq, "unexpected task (%d != %d)", task->tgid, pid);

This doesn't sound good. Is it possible we add a global variable to 
indicate this condition and do an ASSERT in bpf_iter.c file?

>   		return 0;
> +	}
>   	perm_str[0] = (vma->vm_flags & VM_READ) ? 'r' : '-';
>   	perm_str[1] = (vma->vm_flags & VM_WRITE) ? 'w' : '-';
>   	perm_str[2] = (vma->vm_flags & VM_EXEC) ? 'x' : '-';

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v5 1/3] bpf: Parameterize task iterators.
  2022-08-11  0:16 ` [PATCH bpf-next v5 1/3] bpf: " Kui-Feng Lee
  2022-08-13 22:17   ` Yonghong Song
@ 2022-08-14 20:24   ` Jiri Olsa
  2022-08-16 17:21     ` Kui-Feng Lee
  2022-08-15 23:08   ` Hao Luo
  2022-08-16  5:02   ` Andrii Nakryiko
  3 siblings, 1 reply; 22+ messages in thread
From: Jiri Olsa @ 2022-08-14 20:24 UTC (permalink / raw)
  To: Kui-Feng Lee; +Cc: bpf, ast, daniel, andrii, kernel-team, yhs

On Wed, Aug 10, 2022 at 05:16:52PM -0700, Kui-Feng Lee wrote:

SNIP

> +static int bpf_iter_attach_task(struct bpf_prog *prog,
> +				union bpf_iter_link_info *linfo,
> +				struct bpf_iter_aux_info *aux)
> +{
> +	unsigned int flags;
> +	struct pid_namespace *ns;
> +	struct pid *pid;
> +	pid_t tgid;
> +
> +	if (linfo->task.tid != 0) {
> +		aux->task.type = BPF_TASK_ITER_TID;
> +		aux->task.tid = linfo->task.tid;
> +	} else if (linfo->task.tgid != 0) {
> +		aux->task.type = BPF_TASK_ITER_TGID;
> +		aux->task.tgid = linfo->task.tgid;
> +	} else if (linfo->task.pid_fd != 0) {
> +		aux->task.type = BPF_TASK_ITER_TGID;
> +		pid = pidfd_get_pid(linfo->task.pid_fd, &flags);
> +		if (IS_ERR(pid))
> +			return PTR_ERR(pid);
> +
> +		ns = task_active_pid_ns(current);
> +		if (IS_ERR(ns))
> +			return PTR_ERR(ns);
> +
> +		tgid = pid_nr_ns(pid, ns);
> +		if (tgid <= 0)
> +			return -EINVAL;
> +
> +		aux->task.tgid = tgid;
> +	} else {
> +		aux->task.type = BPF_TASK_ITER_ALL;
> +	}
> +
> +	return 0;
> +}
> +
>  static const struct seq_operations task_seq_ops = {
>  	.start	= task_seq_start,
>  	.next	= task_seq_next,
> @@ -137,8 +198,7 @@ struct bpf_iter_seq_task_file_info {
>  static struct file *
>  task_file_seq_get_next(struct bpf_iter_seq_task_file_info *info)
>  {
> -	struct pid_namespace *ns = info->common.ns;
> -	u32 curr_tid = info->tid;
> +	u32 saved_tid = info->tid;

we use 'curr_' prefix for other stuff in the function, like
curr_task, curr_fd .. I think we should either change all of
them or actually keep curr_tid, which seem better to me

jirka

>  	struct task_struct *curr_task;
>  	unsigned int curr_fd = info->fd;
>  
> @@ -151,21 +211,18 @@ task_file_seq_get_next(struct bpf_iter_seq_task_file_info *info)
>  		curr_task = info->task;
>  		curr_fd = info->fd;
>  	} else {
> -                curr_task = task_seq_get_next(ns, &curr_tid, true);
> +		curr_task = task_seq_get_next(&info->common, &info->tid, true);
>                  if (!curr_task) {
>                          info->task = NULL;
> -                        info->tid = curr_tid;
>                          return NULL;
>                  }
>  
> -                /* set info->task and info->tid */
> +		/* set info->task */
>  		info->task = curr_task;
> -		if (curr_tid == info->tid) {
> +		if (saved_tid == info->tid)
>  			curr_fd = info->fd;
> -		} else {
> -			info->tid = curr_tid;
> +		else
>  			curr_fd = 0;
> -		}
>  	}
>  

SNIP

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v5 1/3] bpf: Parameterize task iterators.
  2022-08-13 22:17   ` Yonghong Song
@ 2022-08-15  1:01     ` Alexei Starovoitov
  2022-08-16  4:42     ` Andrii Nakryiko
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 22+ messages in thread
From: Alexei Starovoitov @ 2022-08-15  1:01 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Kui-Feng Lee, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Kernel Team

On Sat, Aug 13, 2022 at 3:17 PM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 8/10/22 5:16 PM, Kui-Feng Lee wrote:
> > Allow creating an iterator that loops through resources of one task/thread.
> >
> > People could only create iterators to loop through all resources of
> > files, vma, and tasks in the system, even though they were interested
> > in only the resources of a specific task or process.  Passing the
> > additional parameters, people can now create an iterator to go
> > through all resources or only the resources of a task.
> >
> > Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
> > ---
> >   include/linux/bpf.h            |  29 ++++++++
> >   include/uapi/linux/bpf.h       |   8 +++
> >   kernel/bpf/task_iter.c         | 126 ++++++++++++++++++++++++++-------
> >   tools/include/uapi/linux/bpf.h |   8 +++
> >   4 files changed, 147 insertions(+), 24 deletions(-)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 11950029284f..6bbe53d06faa 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -1716,8 +1716,37 @@ int bpf_obj_get_user(const char __user *pathname, int flags);
> >       extern int bpf_iter_ ## target(args);                   \
> >       int __init bpf_iter_ ## target(args) { return 0; }
> >
> > +/*
> > + * The task type of iterators.
> > + *
> > + * For BPF task iterators, they can be parameterized with various
> > + * parameters to visit only some of tasks.
> > + *
> > + * BPF_TASK_ITER_ALL (default)
> > + *   Iterate over resources of every task.
> > + *
> > + * BPF_TASK_ITER_TID
> > + *   Iterate over resources of a task/tid.
> > + *
> > + * BPF_TASK_ITER_TGID
> > + *   Iterate over reosurces of evevry task of a process / task group.
> > + */
> > +enum bpf_iter_task_type {
> > +     BPF_TASK_ITER_ALL = 0,
> > +     BPF_TASK_ITER_TID,
> > +     BPF_TASK_ITER_TGID,
> > +};
> > +
> >   struct bpf_iter_aux_info {
> >       struct bpf_map *map;
> > +     struct {
> > +             enum bpf_iter_task_type type;
> > +             union {
> > +                     u32 tid;
> > +                     u32 tgid;
> > +                     u32 pid_fd;
> > +             };
> > +     } task;
> >   };
> >
> >   typedef int (*bpf_iter_attach_target_t)(struct bpf_prog *prog,
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index ffcbf79a556b..6328aca0cf5c 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -91,6 +91,14 @@ union bpf_iter_link_info {
> >       struct {
> >               __u32   map_fd;
> >       } map;
> > +     /*
> > +      * Parameters of task iterators.
> > +      */
>
> The comment can be put into one line.
>
> > +     struct {
> > +             __u32   tid;
> > +             __u32   tgid;
> > +             __u32   pid_fd;
>
> The above is a max of kernel and user space terminologies.
> tid/pid are user space concept and tgid is a kernel space
> concept.
>
> In bpf uapi header, we have
>
> struct bpf_pidns_info {
>          __u32 pid;
>          __u32 tgid;
> };
>
> which uses kernel terminologies.
>
> So I suggest the bpf_iter_link_info.task can also
> use pure kernel terminology pid/tgid/tgid_fd here.
>
> Alternative, using pure user space terminology
> can be tid/pid/pid_fd but seems the kernel terminology
> might be better since we already have precedence.

Great catch and excellent point!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v5 1/3] bpf: Parameterize task iterators.
  2022-08-11  0:16 ` [PATCH bpf-next v5 1/3] bpf: " Kui-Feng Lee
  2022-08-13 22:17   ` Yonghong Song
  2022-08-14 20:24   ` Jiri Olsa
@ 2022-08-15 23:08   ` Hao Luo
  2022-08-16 19:11     ` Kui-Feng Lee
  2022-08-16  5:02   ` Andrii Nakryiko
  3 siblings, 1 reply; 22+ messages in thread
From: Hao Luo @ 2022-08-15 23:08 UTC (permalink / raw)
  To: Kui-Feng Lee; +Cc: bpf, ast, daniel, andrii, kernel-team, yhs

Hi Kui-Feng,

On Wed, Aug 10, 2022 at 5:17 PM Kui-Feng Lee <kuifeng@fb.com> wrote:
>
> Allow creating an iterator that loops through resources of one task/thread.
>
> People could only create iterators to loop through all resources of
> files, vma, and tasks in the system, even though they were interested
> in only the resources of a specific task or process.  Passing the
> additional parameters, people can now create an iterator to go
> through all resources or only the resources of a task.
>
> Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
> ---
>  include/linux/bpf.h            |  29 ++++++++
>  include/uapi/linux/bpf.h       |   8 +++
>  kernel/bpf/task_iter.c         | 126 ++++++++++++++++++++++++++-------
>  tools/include/uapi/linux/bpf.h |   8 +++
>  4 files changed, 147 insertions(+), 24 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 11950029284f..6bbe53d06faa 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1716,8 +1716,37 @@ int bpf_obj_get_user(const char __user *pathname, int flags);
>         extern int bpf_iter_ ## target(args);                   \
>         int __init bpf_iter_ ## target(args) { return 0; }
>
> +/*
> + * The task type of iterators.
> + *
> + * For BPF task iterators, they can be parameterized with various
> + * parameters to visit only some of tasks.
> + *
> + * BPF_TASK_ITER_ALL (default)
> + *     Iterate over resources of every task.
> + *
> + * BPF_TASK_ITER_TID
> + *     Iterate over resources of a task/tid.
> + *
> + * BPF_TASK_ITER_TGID
> + *     Iterate over reosurces of evevry task of a process / task group.

typos: resources and every.

> + */
> +enum bpf_iter_task_type {
> +       BPF_TASK_ITER_ALL = 0,
> +       BPF_TASK_ITER_TID,
> +       BPF_TASK_ITER_TGID,
> +};
> +
>  struct bpf_iter_aux_info {
>         struct bpf_map *map;
> +       struct {
> +               enum bpf_iter_task_type type;
> +               union {
> +                       u32 tid;
> +                       u32 tgid;
> +                       u32 pid_fd;
> +               };
> +       } task;
>  };
>
>  typedef int (*bpf_iter_attach_target_t)(struct bpf_prog *prog,
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index ffcbf79a556b..6328aca0cf5c 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -91,6 +91,14 @@ union bpf_iter_link_info {
>         struct {
>                 __u32   map_fd;
>         } map;
> +       /*
> +        * Parameters of task iterators.
> +        */

We could remove this particular comment. It is kind of obvious.

> +       struct {
> +               __u32   tid;
> +               __u32   tgid;
> +               __u32   pid_fd;
> +       } task;
>  };
>
>  /* BPF syscall commands, see bpf(2) man-page for more details. */
> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> index 8c921799def4..f2e21efe075d 100644
> --- a/kernel/bpf/task_iter.c
> +++ b/kernel/bpf/task_iter.c
> @@ -12,6 +12,12 @@
>
>  struct bpf_iter_seq_task_common {
>         struct pid_namespace *ns;
> +       enum bpf_iter_task_type type;
> +       union {
> +               u32 tid;
> +               u32 tgid;
> +               u32 pid_fd;
> +       };
>  };
>
>  struct bpf_iter_seq_task_info {
> @@ -22,24 +28,40 @@ struct bpf_iter_seq_task_info {
>         u32 tid;
>  };
>
> -static struct task_struct *task_seq_get_next(struct pid_namespace *ns,
> +static struct task_struct *task_seq_get_next(struct bpf_iter_seq_task_common *common,
>                                              u32 *tid,
>                                              bool skip_if_dup_files)
>  {
>         struct task_struct *task = NULL;
>         struct pid *pid;
>
> +       if (common->type == BPF_TASK_ITER_TID) {
> +               if (*tid && *tid != common->tid)
> +                       return NULL;
> +               rcu_read_lock();
> +               pid = find_pid_ns(common->tid, common->ns);
> +               if (pid) {
> +                       task = get_pid_task(pid, PIDTYPE_PID);
> +                       *tid = common->tid;
> +               }
> +               rcu_read_unlock();

nit: this is ok. But I think the commonly used pattern (e.g. proc_pid_lookup) is

        rcu_read_lock();
        task = find_task_by_pid_ns(tid, ns);
        if (task)
                get_task_struct(task);
        rcu_read_unlock();

> +               return task;
> +       }
> +
>         rcu_read_lock();
>  retry:
> -       pid = find_ge_pid(*tid, ns);
> +       pid = find_ge_pid(*tid, common->ns);
>         if (pid) {
> -               *tid = pid_nr_ns(pid, ns);
> +               *tid = pid_nr_ns(pid, common->ns);
>                 task = get_pid_task(pid, PIDTYPE_PID);
> +
>                 if (!task) {
>                         ++*tid;
>                         goto retry;
> -               } else if (skip_if_dup_files && !thread_group_leader(task) &&
> -                          task->files == task->group_leader->files) {
> +               } else if ((skip_if_dup_files && !thread_group_leader(task) &&
> +                           task->files == task->group_leader->files) ||
> +                          (common->type == BPF_TASK_ITER_TGID &&
> +                           __task_pid_nr_ns(task, PIDTYPE_TGID, common->ns) != common->tgid)) {

Use task_tgid_nr_ns instead of __task_pid_nr_ns?

>                         put_task_struct(task);
>                         task = NULL;
>                         ++*tid;
> @@ -56,7 +78,8 @@ static void *task_seq_start(struct seq_file *seq, loff_t *pos)
>         struct bpf_iter_seq_task_info *info = seq->private;
>         struct task_struct *task;
>
> -       task = task_seq_get_next(info->common.ns, &info->tid, false);
> +       task = task_seq_get_next(&info->common, &info->tid, false);
> +
>         if (!task)
>                 return NULL;
>
> @@ -73,7 +96,8 @@ static void *task_seq_next(struct seq_file *seq, void *v, loff_t *pos)
>         ++*pos;
>         ++info->tid;
>         put_task_struct((struct task_struct *)v);
> -       task = task_seq_get_next(info->common.ns, &info->tid, false);
> +
> +       task = task_seq_get_next(&info->common, &info->tid, false);
>         if (!task)
>                 return NULL;
>
> @@ -117,6 +141,43 @@ static void task_seq_stop(struct seq_file *seq, void *v)
>                 put_task_struct((struct task_struct *)v);
>  }
>
> +static int bpf_iter_attach_task(struct bpf_prog *prog,
> +                               union bpf_iter_link_info *linfo,
> +                               struct bpf_iter_aux_info *aux)
> +{
> +       unsigned int flags;
> +       struct pid_namespace *ns;
> +       struct pid *pid;
> +       pid_t tgid;
> +
> +       if (linfo->task.tid != 0) {
> +               aux->task.type = BPF_TASK_ITER_TID;
> +               aux->task.tid = linfo->task.tid;
> +       } else if (linfo->task.tgid != 0) {
> +               aux->task.type = BPF_TASK_ITER_TGID;
> +               aux->task.tgid = linfo->task.tgid;
> +       } else if (linfo->task.pid_fd != 0) {
> +               aux->task.type = BPF_TASK_ITER_TGID;
> +               pid = pidfd_get_pid(linfo->task.pid_fd, &flags);
> +               if (IS_ERR(pid))
> +                       return PTR_ERR(pid);
> +
> +               ns = task_active_pid_ns(current);
> +               if (IS_ERR(ns))
> +                       return PTR_ERR(ns);
> +
> +               tgid = pid_nr_ns(pid, ns);
> +               if (tgid <= 0)
> +                       return -EINVAL;

Is this just pid_vnr?

> +
> +               aux->task.tgid = tgid;
> +       } else {
> +               aux->task.type = BPF_TASK_ITER_ALL;
> +       }

The same question as Yonghong has. Do we need to enforce that at most
one of {tid, tgid, pid_fd} is non-zero?

> +
> +       return 0;
> +}
> +
>  static const struct seq_operations task_seq_ops = {
>         .start  = task_seq_start,
>         .next   = task_seq_next,
> @@ -137,8 +198,7 @@ struct bpf_iter_seq_task_file_info {
>  static struct file *
>  task_file_seq_get_next(struct bpf_iter_seq_task_file_info *info)
>  {
> -       struct pid_namespace *ns = info->common.ns;
> -       u32 curr_tid = info->tid;
> +       u32 saved_tid = info->tid;
>         struct task_struct *curr_task;
>         unsigned int curr_fd = info->fd;
>
> @@ -151,21 +211,18 @@ task_file_seq_get_next(struct bpf_iter_seq_task_file_info *info)
>                 curr_task = info->task;
>                 curr_fd = info->fd;
>         } else {
> -                curr_task = task_seq_get_next(ns, &curr_tid, true);
> +               curr_task = task_seq_get_next(&info->common, &info->tid, true);
>                  if (!curr_task) {
>                          info->task = NULL;
> -                        info->tid = curr_tid;
>                          return NULL;
>                  }
>
> -                /* set info->task and info->tid */
> +               /* set info->task */
>                 info->task = curr_task;
> -               if (curr_tid == info->tid) {
> +               if (saved_tid == info->tid)
>                         curr_fd = info->fd;
> -               } else {
> -                       info->tid = curr_tid;
> +               else
>                         curr_fd = 0;
> -               }
>         }
>
>         rcu_read_lock();
> @@ -186,9 +243,15 @@ task_file_seq_get_next(struct bpf_iter_seq_task_file_info *info)
>         /* the current task is done, go to the next task */
>         rcu_read_unlock();
>         put_task_struct(curr_task);
> +
> +       if (info->common.type == BPF_TASK_ITER_TID) {
> +               info->task = NULL;
> +               return NULL;
> +       }
> +

Do we need to set info->fd to 0? I am not sure if the caller reads
info->fd anywhere. I think it would be good to do some refactoring on
task_file_seq_get_next().

>         info->task = NULL;
>         info->fd = 0;
> -       curr_tid = ++(info->tid);
> +       saved_tid = ++(info->tid);
>         goto again;
>  }
>
> @@ -269,6 +332,17 @@ static int init_seq_pidns(void *priv_data, struct bpf_iter_aux_info *aux)
>         struct bpf_iter_seq_task_common *common = priv_data;
>
>         common->ns = get_pid_ns(task_active_pid_ns(current));
> +       common->type = aux->task.type;
> +       switch (common->type) {
> +       case BPF_TASK_ITER_TID:
> +               common->tid = aux->task.tid;
> +               break;
> +       case BPF_TASK_ITER_TGID:
> +               common->tgid = aux->task.tgid;
> +               break;
> +       default:

very nit: IMHO we could place a warning here.

> +               break;
> +       }
>         return 0;
>  }
>
> @@ -307,11 +381,10 @@ enum bpf_task_vma_iter_find_op {
>  static struct vm_area_struct *
>  task_vma_seq_get_next(struct bpf_iter_seq_task_vma_info *info)
>  {
> -       struct pid_namespace *ns = info->common.ns;
>         enum bpf_task_vma_iter_find_op op;
>         struct vm_area_struct *curr_vma;
>         struct task_struct *curr_task;
> -       u32 curr_tid = info->tid;
> +       u32 saved_tid = info->tid;
>

Why do we need to directly operate on info->tid while other task iters
(e.g. vma_iter) uses curr_tid? IMHO, prefer staying using curr_tid if
possible, for two reasons:
 - consistent with other iters.
 - decouple refactoring changes from the changes that introduce new features

>         /* If this function returns a non-NULL vma, it holds a reference to
>          * the task_struct, and holds read lock on vma->mm->mmap_lock.
> @@ -371,14 +444,13 @@ task_vma_seq_get_next(struct bpf_iter_seq_task_vma_info *info)
>                 }
>         } else {
>  again:
> -               curr_task = task_seq_get_next(ns, &curr_tid, true);
> +               curr_task = task_seq_get_next(&info->common, &info->tid, true);
>                 if (!curr_task) {
> -                       info->tid = curr_tid + 1;
> +                       info->tid++;
>                         goto finish;
>                 }
>
> -               if (curr_tid != info->tid) {
> -                       info->tid = curr_tid;
> +               if (saved_tid != info->tid) {
>                         /* new task, process the first vma */
>                         op = task_vma_iter_first_vma;
>                 } else {
> @@ -430,9 +502,12 @@ task_vma_seq_get_next(struct bpf_iter_seq_task_vma_info *info)
>         return curr_vma;
>
>  next_task:
> +       if (info->common.type == BPF_TASK_ITER_TID)
> +               goto finish;
> +
>         put_task_struct(curr_task);
>         info->task = NULL;
> -       curr_tid++;
> +       info->tid++;
>         goto again;
>
>  finish:
> @@ -533,6 +608,7 @@ static const struct bpf_iter_seq_info task_seq_info = {
>
>  static struct bpf_iter_reg task_reg_info = {
>         .target                 = "task",
> +       .attach_target          = bpf_iter_attach_task,
>         .feature                = BPF_ITER_RESCHED,
>         .ctx_arg_info_size      = 1,
>         .ctx_arg_info           = {
> @@ -551,6 +627,7 @@ static const struct bpf_iter_seq_info task_file_seq_info = {
>
>  static struct bpf_iter_reg task_file_reg_info = {
>         .target                 = "task_file",
> +       .attach_target          = bpf_iter_attach_task,
>         .feature                = BPF_ITER_RESCHED,
>         .ctx_arg_info_size      = 2,
>         .ctx_arg_info           = {
> @@ -571,6 +648,7 @@ static const struct bpf_iter_seq_info task_vma_seq_info = {
>
>  static struct bpf_iter_reg task_vma_reg_info = {
>         .target                 = "task_vma",
> +       .attach_target          = bpf_iter_attach_task,
>         .feature                = BPF_ITER_RESCHED,
>         .ctx_arg_info_size      = 2,
>         .ctx_arg_info           = {
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index ffcbf79a556b..6328aca0cf5c 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -91,6 +91,14 @@ union bpf_iter_link_info {
>         struct {
>                 __u32   map_fd;
>         } map;
> +       /*
> +        * Parameters of task iterators.
> +        */
> +       struct {
> +               __u32   tid;
> +               __u32   tgid;
> +               __u32   pid_fd;
> +       } task;
>  };
>
>  /* BPF syscall commands, see bpf(2) man-page for more details. */
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v5 1/3] bpf: Parameterize task iterators.
  2022-08-13 22:17   ` Yonghong Song
  2022-08-15  1:01     ` Alexei Starovoitov
@ 2022-08-16  4:42     ` Andrii Nakryiko
  2022-08-18  3:40       ` Yonghong Song
  2022-08-16  5:25     ` Andrii Nakryiko
  2022-08-16 17:00     ` Kui-Feng Lee
  3 siblings, 1 reply; 22+ messages in thread
From: Andrii Nakryiko @ 2022-08-16  4:42 UTC (permalink / raw)
  To: Yonghong Song; +Cc: Kui-Feng Lee, bpf, ast, daniel, andrii, kernel-team

On Sat, Aug 13, 2022 at 3:17 PM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 8/10/22 5:16 PM, Kui-Feng Lee wrote:
> > Allow creating an iterator that loops through resources of one task/thread.
> >
> > People could only create iterators to loop through all resources of
> > files, vma, and tasks in the system, even though they were interested
> > in only the resources of a specific task or process.  Passing the
> > additional parameters, people can now create an iterator to go
> > through all resources or only the resources of a task.
> >
> > Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
> > ---
> >   include/linux/bpf.h            |  29 ++++++++
> >   include/uapi/linux/bpf.h       |   8 +++
> >   kernel/bpf/task_iter.c         | 126 ++++++++++++++++++++++++++-------
> >   tools/include/uapi/linux/bpf.h |   8 +++
> >   4 files changed, 147 insertions(+), 24 deletions(-)
> >

[...]

> > +     struct {
> > +             __u32   tid;
> > +             __u32   tgid;
> > +             __u32   pid_fd;
>
> The above is a max of kernel and user space terminologies.
> tid/pid are user space concept and tgid is a kernel space
> concept.
>
> In bpf uapi header, we have
>
> struct bpf_pidns_info {
>          __u32 pid;
>          __u32 tgid;
> };
>
> which uses kernel terminologies.
>
> So I suggest the bpf_iter_link_info.task can also
> use pure kernel terminology pid/tgid/tgid_fd here.

Except tgid_fd is a made up terminology. It is called pidfd in
documentation and even if pidfd gains add support for threads (tasks),
it would still be created through pidfd_open() syscall, probably. So
it seems better to stick to "pidfd" here.

As for pid/tgid precedent. Are we referring to
bpf_get_current_pid_tgid() BPF helper and struct bpf_pidns_info? Those
are kernel-side BPF helper and kernel-side auxiliary type to describe
return results of another in-kernel helper, so I think it's a bit less
relevant here to set a precedent.

On the other hand, if we look at user-space-facing perf_event
subsystem UAPI, for example, it seems to be using pid/tid terminology
(and so does getpid()/gettid() syscall, etc). So I wonder if for a
user-space-facing API it's better to stick with user-space-facing
terminology?

>
> Alternative, using pure user space terminology
> can be tid/pid/pid_fd but seems the kernel terminology
> might be better since we already have precedence.
>
>
> > +     } task;
> >   };
> >

[...]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v5 1/3] bpf: Parameterize task iterators.
  2022-08-11  0:16 ` [PATCH bpf-next v5 1/3] bpf: " Kui-Feng Lee
                     ` (2 preceding siblings ...)
  2022-08-15 23:08   ` Hao Luo
@ 2022-08-16  5:02   ` Andrii Nakryiko
  2022-08-16 18:45     ` Kui-Feng Lee
  3 siblings, 1 reply; 22+ messages in thread
From: Andrii Nakryiko @ 2022-08-16  5:02 UTC (permalink / raw)
  To: Kui-Feng Lee; +Cc: bpf, ast, daniel, andrii, kernel-team, yhs

On Wed, Aug 10, 2022 at 5:17 PM Kui-Feng Lee <kuifeng@fb.com> wrote:
>
> Allow creating an iterator that loops through resources of one task/thread.
>
> People could only create iterators to loop through all resources of
> files, vma, and tasks in the system, even though they were interested
> in only the resources of a specific task or process.  Passing the
> additional parameters, people can now create an iterator to go
> through all resources or only the resources of a task.
>
> Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
> ---
>  include/linux/bpf.h            |  29 ++++++++
>  include/uapi/linux/bpf.h       |   8 +++
>  kernel/bpf/task_iter.c         | 126 ++++++++++++++++++++++++++-------
>  tools/include/uapi/linux/bpf.h |   8 +++
>  4 files changed, 147 insertions(+), 24 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 11950029284f..6bbe53d06faa 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1716,8 +1716,37 @@ int bpf_obj_get_user(const char __user *pathname, int flags);
>         extern int bpf_iter_ ## target(args);                   \
>         int __init bpf_iter_ ## target(args) { return 0; }
>
> +/*
> + * The task type of iterators.
> + *
> + * For BPF task iterators, they can be parameterized with various
> + * parameters to visit only some of tasks.
> + *
> + * BPF_TASK_ITER_ALL (default)
> + *     Iterate over resources of every task.
> + *
> + * BPF_TASK_ITER_TID
> + *     Iterate over resources of a task/tid.
> + *
> + * BPF_TASK_ITER_TGID
> + *     Iterate over reosurces of evevry task of a process / task group.
> + */
> +enum bpf_iter_task_type {
> +       BPF_TASK_ITER_ALL = 0,
> +       BPF_TASK_ITER_TID,
> +       BPF_TASK_ITER_TGID,
> +};
> +
>  struct bpf_iter_aux_info {
>         struct bpf_map *map;
> +       struct {
> +               enum bpf_iter_task_type type;
> +               union {
> +                       u32 tid;
> +                       u32 tgid;
> +                       u32 pid_fd;
> +               };
> +       } task;

You don't seem to use pid_fd in bpf_iter_aux_info at all, is that
right? Drop it? And for tid/tgid, I'd use kernel-side terminology for
this internal data structure and just have single u32 pid here. Then
type determines whether you are iterating tasks or task leaders
(processes), no ambiguity.

>  };
>
>  typedef int (*bpf_iter_attach_target_t)(struct bpf_prog *prog,
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index ffcbf79a556b..6328aca0cf5c 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -91,6 +91,14 @@ union bpf_iter_link_info {
>         struct {
>                 __u32   map_fd;
>         } map;
> +       /*
> +        * Parameters of task iterators.
> +        */
> +       struct {
> +               __u32   tid;
> +               __u32   tgid;
> +               __u32   pid_fd;
> +       } task;
>  };
>

[...]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v5 3/3] selftests/bpf: Test parameterized task BPF iterators.
  2022-08-11  0:16 ` [PATCH bpf-next v5 3/3] selftests/bpf: Test " Kui-Feng Lee
  2022-08-13 22:50   ` Yonghong Song
@ 2022-08-16  5:15   ` Andrii Nakryiko
  1 sibling, 0 replies; 22+ messages in thread
From: Andrii Nakryiko @ 2022-08-16  5:15 UTC (permalink / raw)
  To: Kui-Feng Lee; +Cc: bpf, ast, daniel, andrii, kernel-team, yhs

On Wed, Aug 10, 2022 at 5:17 PM Kui-Feng Lee <kuifeng@fb.com> wrote:
>
> Test iterators of vma, files, and tasks of tasks.
>
> Ensure the API works appropriately to visit all tasks,
> tasks in a process, or a particular task.
>
> Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
> ---
>  .../selftests/bpf/prog_tests/bpf_iter.c       | 204 ++++++++++++++++--
>  .../selftests/bpf/prog_tests/btf_dump.c       |   2 +-
>  .../selftests/bpf/progs/bpf_iter_task.c       |   9 +
>  .../selftests/bpf/progs/bpf_iter_task_file.c  |   7 +
>  .../selftests/bpf/progs/bpf_iter_task_vma.c   |   6 +-
>  5 files changed, 203 insertions(+), 25 deletions(-)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
> index a33874b081b6..e66f1f3db562 100644
> --- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
> +++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
> @@ -1,6 +1,9 @@
>  // SPDX-License-Identifier: GPL-2.0
>  /* Copyright (c) 2020 Facebook */
>  #include <test_progs.h>
> +#include <sys/syscall.h>
> +#include <unistd.h>
> +#include <signal.h>
>  #include "bpf_iter_ipv6_route.skel.h"
>  #include "bpf_iter_netlink.skel.h"
>  #include "bpf_iter_bpf_map.skel.h"
> @@ -42,13 +45,13 @@ static void test_btf_id_or_null(void)
>         }
>  }
>
> -static void do_dummy_read(struct bpf_program *prog)
> +static void do_dummy_read(struct bpf_program *prog, struct bpf_iter_attach_opts *opts)

there are a bunch of uses of do_dummy_read that don't need to pass
opts and adding this argument causes unnecessary churn. why not add
do_dummy_read_opts() instead and only modify places that do need to
pass opts?

>  {
>         struct bpf_link *link;
>         char buf[16] = {};
>         int iter_fd, len;
>
> -       link = bpf_program__attach_iter(prog, NULL);
> +       link = bpf_program__attach_iter(prog, opts);
>         if (!ASSERT_OK_PTR(link, "attach_iter"))
>                 return;
>
> @@ -91,7 +94,7 @@ static void test_ipv6_route(void)
>         if (!ASSERT_OK_PTR(skel, "bpf_iter_ipv6_route__open_and_load"))
>                 return;
>
> -       do_dummy_read(skel->progs.dump_ipv6_route);
> +       do_dummy_read(skel->progs.dump_ipv6_route, NULL);
>
>         bpf_iter_ipv6_route__destroy(skel);
>  }
> @@ -104,7 +107,7 @@ static void test_netlink(void)
>         if (!ASSERT_OK_PTR(skel, "bpf_iter_netlink__open_and_load"))
>                 return;
>
> -       do_dummy_read(skel->progs.dump_netlink);
> +       do_dummy_read(skel->progs.dump_netlink, NULL);
>
>         bpf_iter_netlink__destroy(skel);
>  }
> @@ -117,24 +120,139 @@ static void test_bpf_map(void)
>         if (!ASSERT_OK_PTR(skel, "bpf_iter_bpf_map__open_and_load"))
>                 return;
>
> -       do_dummy_read(skel->progs.dump_bpf_map);
> +       do_dummy_read(skel->progs.dump_bpf_map, NULL);
>
>         bpf_iter_bpf_map__destroy(skel);
>  }
>
> -static void test_task(void)
> +static int pidfd_open(pid_t pid, unsigned int flags)
> +{
> +       return syscall(SYS_pidfd_open, pid, flags);
> +}
> +
> +static void check_bpf_link_info(const struct bpf_program *prog)
> +{
> +       DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);

nit: prefer LIBBPF_OPTS, DECLARE_LIBBPF_OPTS is verbose legacy name

> +       union bpf_iter_link_info linfo;
> +       struct bpf_link_info info = {};
> +       __u32 info_len;
> +       struct bpf_link *link;
> +       int err;
> +
> +       memset(&linfo, 0, sizeof(linfo));
> +       linfo.task.tid = getpid();
> +       opts.link_info = &linfo;
> +       opts.link_info_len = sizeof(linfo);
> +
> +       link = bpf_program__attach_iter(prog, &opts);
> +       if (!ASSERT_OK_PTR(link, "attach_iter"))
> +               return;
> +
> +       info_len = sizeof(info);
> +       err = bpf_obj_get_info_by_fd(bpf_link__fd(link), &info, &info_len);
> +       if (ASSERT_OK(err, "bpf_obj_get_info_by_fd"))
> +               ASSERT_EQ(info.iter.task.tid, getpid(), "check_task_tid");

nit: for checks that can't crash even if some error happened, it's
best to not nest them, so just:

ASSERT_OK(err, ...);
ASSERT_EQ(info.iter.task.tid, ...);

both will probably fail if err != 0, but that's fine

> +
> +       bpf_link__destroy(link);
> +}
> +
> +static pthread_mutex_t do_nothing_mutex;
> +
> +static void *do_nothing_wait(void *arg)
> +{
> +       pthread_mutex_lock(&do_nothing_mutex);
> +       pthread_mutex_unlock(&do_nothing_mutex);
> +

nice, clever!

> +       pthread_exit(arg);
> +}
> +
> +static void test_task_(struct bpf_iter_attach_opts *opts, int num_unknown, int num_known)
>  {
>         struct bpf_iter_task *skel;
> +       pthread_t thread_id;
> +       void *ret;
>
>         skel = bpf_iter_task__open_and_load();
>         if (!ASSERT_OK_PTR(skel, "bpf_iter_task__open_and_load"))
>                 return;
>
> -       do_dummy_read(skel->progs.dump_task);
> +       if (!ASSERT_OK(pthread_mutex_init(&do_nothing_mutex, NULL), "pthread_mutex_init"))
> +               goto done;
> +       if (!ASSERT_OK(pthread_mutex_lock(&do_nothing_mutex), "pthread_mutex_lock"))
> +               goto done;
> +
> +       if (!ASSERT_OK(pthread_create(&thread_id, NULL, &do_nothing_wait, NULL),
> +                 "pthread_create"))
> +               goto done;
> +
> +
> +       skel->bss->tid = getpid();
> +
> +       do_dummy_read(skel->progs.dump_task, opts);
> +
> +       if (!ASSERT_OK(pthread_mutex_unlock(&do_nothing_mutex), "pthread_mutex_unlock"))
> +               goto done;
> +
> +       if (num_unknown >= 0)
> +               ASSERT_EQ(skel->bss->num_unknown_tid, num_unknown, "check_num_unknown_tid");
> +       if (num_known >= 0)
> +               ASSERT_EQ(skel->bss->num_known_tid, num_known, "check_num_known_tid");
>
> +       ASSERT_FALSE(pthread_join(thread_id, &ret) || ret != NULL,
> +                    "pthread_join");

this will have better "debuggability" if expressed as more
semantically meaningful checks:

ASSERT_OK(pthread_join(...), ...);
ASSERT_NULL(ret, ...);

It's also easier to follow what is actually expected

> +
> +done:
>         bpf_iter_task__destroy(skel);
>  }
>
> +static void test_task(void)
> +{
> +       DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
> +       union bpf_iter_link_info linfo;
> +
> +       memset(&linfo, 0, sizeof(linfo));
> +       linfo.task.tid = getpid();
> +       opts.link_info = &linfo;
> +       opts.link_info_len = sizeof(linfo);
> +
> +       test_task_(&opts, 0, 1);
> +
> +       test_task_(NULL, -1, 1);
> +}
> +
> +static void test_task_tgid(void)
> +{
> +       DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
> +       union bpf_iter_link_info linfo;
> +
> +       memset(&linfo, 0, sizeof(linfo));
> +       linfo.task.tgid = getpid();
> +       opts.link_info = &linfo;
> +       opts.link_info_len = sizeof(linfo);
> +
> +       test_task_(&opts, 1, 1);
> +}
> +
> +static void test_task_pidfd(void)
> +{
> +       DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
> +       union bpf_iter_link_info linfo;
> +       int pidfd;
> +
> +       pidfd = pidfd_open(getpid(), 0);
> +       if (!ASSERT_GE(pidfd, 0, "pidfd_open"))
> +               return;
> +
> +       memset(&linfo, 0, sizeof(linfo));
> +       linfo.task.pid_fd = pidfd;
> +       opts.link_info = &linfo;
> +       opts.link_info_len = sizeof(linfo);
> +
> +       test_task_(&opts, 1, 1);
> +
> +       close(pidfd);
> +}
> +
>  static void test_task_sleepable(void)
>  {
>         struct bpf_iter_task *skel;
> @@ -143,7 +261,7 @@ static void test_task_sleepable(void)
>         if (!ASSERT_OK_PTR(skel, "bpf_iter_task__open_and_load"))
>                 return;
>
> -       do_dummy_read(skel->progs.dump_task_sleepable);
> +       do_dummy_read(skel->progs.dump_task_sleepable, NULL);
>
>         ASSERT_GT(skel->bss->num_expected_failure_copy_from_user_task, 0,
>                   "num_expected_failure_copy_from_user_task");
> @@ -161,8 +279,8 @@ static void test_task_stack(void)
>         if (!ASSERT_OK_PTR(skel, "bpf_iter_task_stack__open_and_load"))
>                 return;
>
> -       do_dummy_read(skel->progs.dump_task_stack);
> -       do_dummy_read(skel->progs.get_task_user_stacks);
> +       do_dummy_read(skel->progs.dump_task_stack, NULL);
> +       do_dummy_read(skel->progs.get_task_user_stacks, NULL);
>
>         bpf_iter_task_stack__destroy(skel);
>  }
> @@ -174,7 +292,9 @@ static void *do_nothing(void *arg)
>
>  static void test_task_file(void)
>  {
> +       DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
>         struct bpf_iter_task_file *skel;
> +       union bpf_iter_link_info linfo;
>         pthread_t thread_id;
>         void *ret;
>
> @@ -188,15 +308,31 @@ static void test_task_file(void)
>                   "pthread_create"))
>                 goto done;
>
> -       do_dummy_read(skel->progs.dump_task_file);
> +       memset(&linfo, 0, sizeof(linfo));
> +       linfo.task.tid = getpid();
> +       opts.link_info = &linfo;
> +       opts.link_info_len = sizeof(linfo);
> +
> +       do_dummy_read(skel->progs.dump_task_file, &opts);
>
>         if (!ASSERT_FALSE(pthread_join(thread_id, &ret) || ret != NULL,
>                   "pthread_join"))
>                 goto done;
>
>         ASSERT_EQ(skel->bss->count, 0, "check_count");
> +       ASSERT_EQ(skel->bss->unique_tgid_count, 1, "check_unique_tgid_count");
>
> -done:
> +       skel->bss->count = 0;
> +       skel->bss->unique_tgid_count = 0;
> +
> +       do_dummy_read(skel->progs.dump_task_file, NULL);
> +
> +       ASSERT_GE(skel->bss->count, 0, "check_count");
> +       ASSERT_GE(skel->bss->unique_tgid_count, 1, "check_unique_tgid_count");
> +
> +       check_bpf_link_info(skel->progs.dump_task_file);
> +
> + done:
>         bpf_iter_task_file__destroy(skel);
>  }
>
> @@ -274,7 +410,7 @@ static void test_tcp4(void)
>         if (!ASSERT_OK_PTR(skel, "bpf_iter_tcp4__open_and_load"))
>                 return;
>
> -       do_dummy_read(skel->progs.dump_tcp4);
> +       do_dummy_read(skel->progs.dump_tcp4, NULL);
>
>         bpf_iter_tcp4__destroy(skel);
>  }
> @@ -287,7 +423,7 @@ static void test_tcp6(void)
>         if (!ASSERT_OK_PTR(skel, "bpf_iter_tcp6__open_and_load"))
>                 return;
>
> -       do_dummy_read(skel->progs.dump_tcp6);
> +       do_dummy_read(skel->progs.dump_tcp6, NULL);
>
>         bpf_iter_tcp6__destroy(skel);
>  }
> @@ -300,7 +436,7 @@ static void test_udp4(void)
>         if (!ASSERT_OK_PTR(skel, "bpf_iter_udp4__open_and_load"))
>                 return;
>
> -       do_dummy_read(skel->progs.dump_udp4);
> +       do_dummy_read(skel->progs.dump_udp4, NULL);
>
>         bpf_iter_udp4__destroy(skel);
>  }
> @@ -313,7 +449,7 @@ static void test_udp6(void)
>         if (!ASSERT_OK_PTR(skel, "bpf_iter_udp6__open_and_load"))
>                 return;
>
> -       do_dummy_read(skel->progs.dump_udp6);
> +       do_dummy_read(skel->progs.dump_udp6, NULL);
>
>         bpf_iter_udp6__destroy(skel);
>  }
> @@ -326,7 +462,7 @@ static void test_unix(void)
>         if (!ASSERT_OK_PTR(skel, "bpf_iter_unix__open_and_load"))
>                 return;
>
> -       do_dummy_read(skel->progs.dump_unix);
> +       do_dummy_read(skel->progs.dump_unix, NULL);
>
>         bpf_iter_unix__destroy(skel);
>  }
> @@ -988,7 +1124,7 @@ static void test_bpf_sk_storage_get(void)
>         if (!ASSERT_OK(err, "bpf_map_update_elem"))
>                 goto close_socket;
>
> -       do_dummy_read(skel->progs.fill_socket_owner);
> +       do_dummy_read(skel->progs.fill_socket_owner, NULL);
>
>         err = bpf_map_lookup_elem(map_fd, &sock_fd, &val);
>         if (CHECK(err || val != getpid(), "bpf_map_lookup_elem",
> @@ -996,7 +1132,7 @@ static void test_bpf_sk_storage_get(void)
>             getpid(), val, err))
>                 goto close_socket;
>
> -       do_dummy_read(skel->progs.negate_socket_local_storage);
> +       do_dummy_read(skel->progs.negate_socket_local_storage, NULL);
>
>         err = bpf_map_lookup_elem(map_fd, &sock_fd, &val);
>         CHECK(err || val != -getpid(), "bpf_map_lookup_elem",
> @@ -1116,7 +1252,7 @@ static void test_link_iter(void)
>         if (!ASSERT_OK_PTR(skel, "bpf_iter_bpf_link__open_and_load"))
>                 return;
>
> -       do_dummy_read(skel->progs.dump_bpf_link);
> +       do_dummy_read(skel->progs.dump_bpf_link, NULL);
>
>         bpf_iter_bpf_link__destroy(skel);
>  }
> @@ -1129,7 +1265,7 @@ static void test_ksym_iter(void)
>         if (!ASSERT_OK_PTR(skel, "bpf_iter_ksym__open_and_load"))
>                 return;
>
> -       do_dummy_read(skel->progs.dump_ksym);
> +       do_dummy_read(skel->progs.dump_ksym, NULL);
>
>         bpf_iter_ksym__destroy(skel);
>  }
> @@ -1154,7 +1290,7 @@ static void str_strip_first_line(char *str)
>         *dst = '\0';
>  }
>
> -static void test_task_vma(void)
> +static void test_task_vma_(struct bpf_iter_attach_opts *opts)
>  {
>         int err, iter_fd = -1, proc_maps_fd = -1;
>         struct bpf_iter_task_vma *skel;
> @@ -1166,13 +1302,14 @@ static void test_task_vma(void)
>                 return;
>
>         skel->bss->pid = getpid();
> +       skel->bss->one_task = opts ? 1 : 0;
>
>         err = bpf_iter_task_vma__load(skel);
>         if (!ASSERT_OK(err, "bpf_iter_task_vma__load"))
>                 goto out;
>
>         skel->links.proc_maps = bpf_program__attach_iter(
> -               skel->progs.proc_maps, NULL);
> +               skel->progs.proc_maps, opts);
>
>         if (!ASSERT_OK_PTR(skel->links.proc_maps, "bpf_program__attach_iter")) {
>                 skel->links.proc_maps = NULL;
> @@ -1211,12 +1348,29 @@ static void test_task_vma(void)
>         str_strip_first_line(proc_maps_output);
>
>         ASSERT_STREQ(task_vma_output, proc_maps_output, "compare_output");
> +
> +       check_bpf_link_info(skel->progs.proc_maps);
> +
>  out:
>         close(proc_maps_fd);
>         close(iter_fd);
>         bpf_iter_task_vma__destroy(skel);
>  }
>
> +static void test_task_vma(void)
> +{
> +       DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
> +       union bpf_iter_link_info linfo;
> +
> +       memset(&linfo, 0, sizeof(linfo));
> +       linfo.task.tid = getpid();
> +       opts.link_info = &linfo;
> +       opts.link_info_len = sizeof(linfo);
> +
> +       test_task_vma_(&opts);
> +       test_task_vma_(NULL);
> +}
> +
>  void test_bpf_iter(void)
>  {
>         if (test__start_subtest("btf_id_or_null"))
> @@ -1229,6 +1383,10 @@ void test_bpf_iter(void)
>                 test_bpf_map();
>         if (test__start_subtest("task"))
>                 test_task();
> +       if (test__start_subtest("task_tgid"))
> +               test_task_tgid();
> +       if (test__start_subtest("task_pidfd"))
> +               test_task_pidfd();
>         if (test__start_subtest("task_sleepable"))
>                 test_task_sleepable();
>         if (test__start_subtest("task_stack"))
> diff --git a/tools/testing/selftests/bpf/prog_tests/btf_dump.c b/tools/testing/selftests/bpf/prog_tests/btf_dump.c
> index 5fce7008d1ff..32c34ce9cbeb 100644
> --- a/tools/testing/selftests/bpf/prog_tests/btf_dump.c
> +++ b/tools/testing/selftests/bpf/prog_tests/btf_dump.c
> @@ -764,7 +764,7 @@ static void test_btf_dump_struct_data(struct btf *btf, struct btf_dump *d,
>
>         /* union with nested struct */
>         TEST_BTF_DUMP_DATA(btf, d, "union", str, union bpf_iter_link_info, BTF_F_COMPACT,
> -                          "(union bpf_iter_link_info){.map = (struct){.map_fd = (__u32)1,},}",
> +                          "(union bpf_iter_link_info){.map = (struct){.map_fd = (__u32)1,},.task = (struct){.tid = (__u32)1,},}",
>                            { .map = { .map_fd = 1 }});
>
>         /* struct skb with nested structs/unions; because type output is so
> diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_task.c b/tools/testing/selftests/bpf/progs/bpf_iter_task.c
> index d22741272692..96131b9a1caa 100644
> --- a/tools/testing/selftests/bpf/progs/bpf_iter_task.c
> +++ b/tools/testing/selftests/bpf/progs/bpf_iter_task.c
> @@ -6,6 +6,10 @@
>
>  char _license[] SEC("license") = "GPL";
>
> +uint32_t tid = 0;
> +int num_unknown_tid = 0;
> +int num_known_tid = 0;
> +
>  SEC("iter/task")
>  int dump_task(struct bpf_iter__task *ctx)
>  {
> @@ -18,6 +22,11 @@ int dump_task(struct bpf_iter__task *ctx)
>                 return 0;
>         }
>
> +       if (task->pid != tid)
> +               num_unknown_tid++;
> +       else
> +               num_known_tid++;
> +
>         if (ctx->meta->seq_num == 0)
>                 BPF_SEQ_PRINTF(seq, "    tgid      gid\n");
>
> diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_task_file.c b/tools/testing/selftests/bpf/progs/bpf_iter_task_file.c
> index 6e7b400888fe..031455ed8748 100644
> --- a/tools/testing/selftests/bpf/progs/bpf_iter_task_file.c
> +++ b/tools/testing/selftests/bpf/progs/bpf_iter_task_file.c
> @@ -7,6 +7,8 @@ char _license[] SEC("license") = "GPL";
>
>  int count = 0;
>  int tgid = 0;
> +int last_tgid = -1;
> +int unique_tgid_count = 0;
>
>  SEC("iter/task_file")
>  int dump_task_file(struct bpf_iter__task_file *ctx)
> @@ -27,6 +29,11 @@ int dump_task_file(struct bpf_iter__task_file *ctx)
>         if (tgid == task->tgid && task->tgid != task->pid)
>                 count++;
>
> +       if (last_tgid != task->tgid) {
> +               last_tgid = task->tgid;
> +               unique_tgid_count++;
> +       }
> +
>         BPF_SEQ_PRINTF(seq, "%8d %8d %8d %lx\n", task->tgid, task->pid, fd,
>                        (long)file->f_op);
>         return 0;
> diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c b/tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c
> index 4ea6a37d1345..44f4a31c2ddd 100644
> --- a/tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c
> +++ b/tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c
> @@ -20,6 +20,7 @@ char _license[] SEC("license") = "GPL";
>  #define D_PATH_BUF_SIZE 1024
>  char d_path_buf[D_PATH_BUF_SIZE] = {};
>  __u32 pid = 0;
> +__u32 one_task = 0;
>
>  SEC("iter/task_vma") int proc_maps(struct bpf_iter__task_vma *ctx)
>  {
> @@ -33,8 +34,11 @@ SEC("iter/task_vma") int proc_maps(struct bpf_iter__task_vma *ctx)
>                 return 0;
>
>         file = vma->vm_file;
> -       if (task->tgid != pid)
> +       if (task->tgid != pid) {
> +               if (one_task)
> +                       BPF_SEQ_PRINTF(seq, "unexpected task (%d != %d)", task->tgid, pid);
>                 return 0;
> +       }
>         perm_str[0] = (vma->vm_flags & VM_READ) ? 'r' : '-';
>         perm_str[1] = (vma->vm_flags & VM_WRITE) ? 'w' : '-';
>         perm_str[2] = (vma->vm_flags & VM_EXEC) ? 'x' : '-';
> --
> 2.30.2
>

It would be great to have a test that uses task_vma iterator to
calculate the same value that get_uprobe_offset() does through
/proc/self/maps. Can you add such test? I'd also calculate the number
of iterated tasks to make sure we only go through one task.

This is the most direct intended use case for parameterized task iter,
so would be nice to have a test proving it does work and it works as
expected. Thanks!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v5 1/3] bpf: Parameterize task iterators.
  2022-08-13 22:17   ` Yonghong Song
  2022-08-15  1:01     ` Alexei Starovoitov
  2022-08-16  4:42     ` Andrii Nakryiko
@ 2022-08-16  5:25     ` Andrii Nakryiko
  2022-08-18  4:31       ` Yonghong Song
  2022-08-16 17:00     ` Kui-Feng Lee
  3 siblings, 1 reply; 22+ messages in thread
From: Andrii Nakryiko @ 2022-08-16  5:25 UTC (permalink / raw)
  To: Yonghong Song; +Cc: Kui-Feng Lee, bpf, ast, daniel, andrii, kernel-team

On Sat, Aug 13, 2022 at 3:17 PM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 8/10/22 5:16 PM, Kui-Feng Lee wrote:
> > Allow creating an iterator that loops through resources of one task/thread.
> >
> > People could only create iterators to loop through all resources of
> > files, vma, and tasks in the system, even though they were interested
> > in only the resources of a specific task or process.  Passing the
> > additional parameters, people can now create an iterator to go
> > through all resources or only the resources of a task.
> >
> > Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
> > ---
> >   include/linux/bpf.h            |  29 ++++++++
> >   include/uapi/linux/bpf.h       |   8 +++
> >   kernel/bpf/task_iter.c         | 126 ++++++++++++++++++++++++++-------
> >   tools/include/uapi/linux/bpf.h |   8 +++
> >   4 files changed, 147 insertions(+), 24 deletions(-)
> >

Btw, Yonghong, I tried to figure it out myself, but got lost in all
the kernel functions that don't seem to be very well documented. Sorry
for being lazy and throwing this at you :)

Is it easy and efficient to iterate only processes using whatever
kernel helpers we have at our disposal? E.g., if I wanted to write an
iterator that would go only over processes (not individual threads,
just task leaders of each different process) within a cgroup, is that
possible?

I see task iterator as consisting of two different parts (and that
makes it a bit hard to define nice and clean interface, but if we can
crack this, we'd get an elegant and powerful construct):

1. What entity to iterate: threads or processes? (I'm ignoring
task_vma and task_files here, but one could task about files of each
thread or files of each process, but it's less practical, probably)

2. What's the scope of objects to iterate: just a thread by tid, just
a process by pid/pidfd, once cgroup iter lands, we'll be able to talk
about threads or processes within a cgroup or cgroup hierarchy (this
is where descendants_{pre,post}, cgroup_self_only and ancestors
ordering comes in as well).

Currently Kui-Feng is addressing first half of #2 (tid/pid/pidfd
parameters), we can use cgroup iter's parameters to define the scope
of tasks/processes by cgroup "filter" in a follow up (it naturally
extends what we have in this patch set).

So now I'm wondering if there is any benefit to also somehow
specifying threads vs processes as entities to iterate? And if we do
that, does kernel support efficient iteration of processes (as opposed
to threads).


To be clear, there is a lot of value in having just #2, but while we
are all at this topic, I thought I'd clarify for myself #1 as well.

Thanks!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v5 1/3] bpf: Parameterize task iterators.
  2022-08-13 22:17   ` Yonghong Song
                       ` (2 preceding siblings ...)
  2022-08-16  5:25     ` Andrii Nakryiko
@ 2022-08-16 17:00     ` Kui-Feng Lee
  3 siblings, 0 replies; 22+ messages in thread
From: Kui-Feng Lee @ 2022-08-16 17:00 UTC (permalink / raw)
  To: daniel, Yonghong Song, Kernel Team, ast, andrii, bpf

On Sat, 2022-08-13 at 15:17 -0700, Yonghong Song wrote:
> 
> 
> On 8/10/22 5:16 PM, Kui-Feng Lee wrote:
> > Allow creating an iterator that loops through resources of one
> > task/thread.
> > 
> > People could only create iterators to loop through all resources of
> > files, vma, and tasks in the system, even though they were
> > interested
> > in only the resources of a specific task or process.  Passing the
> > additional parameters, people can now create an iterator to go
> > through all resources or only the resources of a task.
> > 
> > Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
> > ---
> >   include/linux/bpf.h            |  29 ++++++++
> >   include/uapi/linux/bpf.h       |   8 +++
> >   kernel/bpf/task_iter.c         | 126 ++++++++++++++++++++++++++--
> > -----
> >   tools/include/uapi/linux/bpf.h |   8 +++
> >   4 files changed, 147 insertions(+), 24 deletions(-)
> > 
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 11950029284f..6bbe53d06faa 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -1716,8 +1716,37 @@ int bpf_obj_get_user(const char __user
> > *pathname, int flags);
> >         extern int bpf_iter_ ## target(args);                   \
> >         int __init bpf_iter_ ## target(args) { return 0; }
> >   
> > +/*
> > + * The task type of iterators.
> > + *
> > + * For BPF task iterators, they can be parameterized with various
> > + * parameters to visit only some of tasks.
> > + *
> > + * BPF_TASK_ITER_ALL (default)
> > + *     Iterate over resources of every task.
> > + *
> > + * BPF_TASK_ITER_TID
> > + *     Iterate over resources of a task/tid.
> > + *
> > + * BPF_TASK_ITER_TGID
> > + *     Iterate over reosurces of evevry task of a process / task
> > group.
> > + */
> > +enum bpf_iter_task_type {
> > +       BPF_TASK_ITER_ALL = 0,
> > +       BPF_TASK_ITER_TID,
> > +       BPF_TASK_ITER_TGID,
> > +};
> > +
> >   struct bpf_iter_aux_info {
> >         struct bpf_map *map;
> > +       struct {
> > +               enum bpf_iter_task_type type;
> > +               union {
> > +                       u32 tid;
> > +                       u32 tgid;
> > +                       u32 pid_fd;
> > +               };
> > +       } task;
> >   };
> >   
> >   typedef int (*bpf_iter_attach_target_t)(struct bpf_prog *prog,
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index ffcbf79a556b..6328aca0cf5c 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -91,6 +91,14 @@ union bpf_iter_link_info {
> >         struct {
> >                 __u32   map_fd;
> >         } map;
> > +       /*
> > +        * Parameters of task iterators.
> > +        */
> 
> The comment can be put into one line.
> 
> > +       struct {
> > +               __u32   tid;
> > +               __u32   tgid;
> > +               __u32   pid_fd;
> 
> The above is a max of kernel and user space terminologies.
> tid/pid are user space concept and tgid is a kernel space
> concept.
> 
> In bpf uapi header, we have
> 
> struct bpf_pidns_info {
>          __u32 pid;
>          __u32 tgid;
> };
> 
> which uses kernel terminologies.
> 
> So I suggest the bpf_iter_link_info.task can also
> use pure kernel terminology pid/tgid/tgid_fd here.
> 
> Alternative, using pure user space terminology
> can be tid/pid/pid_fd but seems the kernel terminology
> might be better since we already have precedence.
> 
> 
> > +       } task;
> >   };
> >   
> >   /* BPF syscall commands, see bpf(2) man-page for more details. */
> > diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> > index 8c921799def4..f2e21efe075d 100644
> > --- a/kernel/bpf/task_iter.c
> > +++ b/kernel/bpf/task_iter.c
> > @@ -12,6 +12,12 @@
> >   
> >   struct bpf_iter_seq_task_common {
> >         struct pid_namespace *ns;
> > +       enum bpf_iter_task_type type;
> > +       union {
> > +               u32 tid;
> > +               u32 tgid;
> > +               u32 pid_fd;
> > +       };
> >   };
> >   
> >   struct bpf_iter_seq_task_info {
> > @@ -22,24 +28,40 @@ struct bpf_iter_seq_task_info {
> >         u32 tid;
> >   };
> >   
> > -static struct task_struct *task_seq_get_next(struct pid_namespace
> > *ns,
> > +static struct task_struct *task_seq_get_next(struct
> > bpf_iter_seq_task_common *common,
> >                                              u32 *tid,
> >                                              bool
> > skip_if_dup_files)
> >   {
> >         struct task_struct *task = NULL;
> >         struct pid *pid;
> >   
> > +       if (common->type == BPF_TASK_ITER_TID) {
> > +               if (*tid && *tid != common->tid)
> > +                       return NULL;
> > +               rcu_read_lock();
> > +               pid = find_pid_ns(common->tid, common->ns);
> > +               if (pid) {
> > +                       task = get_pid_task(pid, PIDTYPE_PID);
> > +                       *tid = common->tid;
> > +               }
> > +               rcu_read_unlock();
> > +               return task;
> > +       }
> > +
> >         rcu_read_lock();
> >   retry:
> > -       pid = find_ge_pid(*tid, ns);
> > +       pid = find_ge_pid(*tid, common->ns);
> >         if (pid) {
> > -               *tid = pid_nr_ns(pid, ns);
> > +               *tid = pid_nr_ns(pid, common->ns);
> >                 task = get_pid_task(pid, PIDTYPE_PID);
> > +
> 
> This extra line is unnecessary.
> 
> >                 if (!task) {
> >                         ++*tid;
> >                         goto retry;
> > -               } else if (skip_if_dup_files &&
> > !thread_group_leader(task) &&
> > -                          task->files == task->group_leader-
> > >files) {
> > +               } else if ((skip_if_dup_files &&
> > !thread_group_leader(task) &&
> > +                           task->files == task->group_leader-
> > >files) ||
> > +                          (common->type == BPF_TASK_ITER_TGID &&
> > +                           __task_pid_nr_ns(task, PIDTYPE_TGID,
> > common->ns) != common->tgid)) {
> >                         put_task_struct(task);
> >                         task = NULL;
> >                         ++*tid;
> > @@ -56,7 +78,8 @@ static void *task_seq_start(struct seq_file *seq,
> > loff_t *pos)
> >         struct bpf_iter_seq_task_info *info = seq->private;
> >         struct task_struct *task;
> >   
> > -       task = task_seq_get_next(info->common.ns, &info->tid,
> > false);
> > +       task = task_seq_get_next(&info->common, &info->tid, false);
> > +
> 
> Extra line?
> 
> >         if (!task)
> >                 return NULL;
> >   
> > @@ -73,7 +96,8 @@ static void *task_seq_next(struct seq_file *seq,
> > void *v, loff_t *pos)
> >         ++*pos;
> >         ++info->tid;
> >         put_task_struct((struct task_struct *)v);
> > -       task = task_seq_get_next(info->common.ns, &info->tid,
> > false);
> > +
> 
> Extra line?
> 
> > +       task = task_seq_get_next(&info->common, &info->tid, false);
> >         if (!task)
> >                 return NULL;
> >   
> > @@ -117,6 +141,43 @@ static void task_seq_stop(struct seq_file
> > *seq, void *v)
> >                 put_task_struct((struct task_struct *)v);
> >   }
> >   
> > +static int bpf_iter_attach_task(struct bpf_prog *prog,
> > +                               union bpf_iter_link_info *linfo,
> > +                               struct bpf_iter_aux_info *aux)
> > +{
> > +       unsigned int flags;
> > +       struct pid_namespace *ns;
> > +       struct pid *pid;
> > +       pid_t tgid;
> 
> Follow reverse chrismas tree style?
> 
> > +
> > +       if (linfo->task.tid != 0) {
> > +               aux->task.type = BPF_TASK_ITER_TID;
> > +               aux->task.tid = linfo->task.tid;
> > +       } else if (linfo->task.tgid != 0) {
> > +               aux->task.type = BPF_TASK_ITER_TGID;
> > +               aux->task.tgid = linfo->task.tgid;
> > +       } else if (linfo->task.pid_fd != 0) {
> > +               aux->task.type = BPF_TASK_ITER_TGID;
> > +               pid = pidfd_get_pid(linfo->task.pid_fd, &flags);
> > +               if (IS_ERR(pid))
> > +                       return PTR_ERR(pid);
> > +
> > +               ns = task_active_pid_ns(current);
> > +               if (IS_ERR(ns))
> > +                       return PTR_ERR(ns);
> > +
> > +               tgid = pid_nr_ns(pid, ns);
> > +               if (tgid <= 0)
> > +                       return -EINVAL;
> 
> Is it possible that tgid <= 0? I think no, so
> the above two lines are unnecessary.
> 
> > +
> > +               aux->task.tgid = tgid;
> 
> We leaks the reference count for 'pid' here.
> We need to add
>                 put_pid(pid);
> to release the reference for pid.
>         
> > +       } else {
> > +               aux->task.type = BPF_TASK_ITER_ALL;
> > +       }
> 
> What will happen if two or all of task.tid, task.tgid and
> task.pid_fd non-zero? Should we fail here?
> 
> > +
> > +       return 0;
> > +}
> > +
> >   static const struct seq_operations task_seq_ops = {
> >         .start  = task_seq_start,
> >         .next   = task_seq_next,
> > @@ -137,8 +198,7 @@ struct bpf_iter_seq_task_file_info {
> >   static struct file *
> [...]
> >   
> > @@ -307,11 +381,10 @@ enum bpf_task_vma_iter_find_op {
> >   static struct vm_area_struct *
> >   task_vma_seq_get_next(struct bpf_iter_seq_task_vma_info *info)
> >   {
> > -       struct pid_namespace *ns = info->common.ns;
> >         enum bpf_task_vma_iter_find_op op;
> >         struct vm_area_struct *curr_vma;
> >         struct task_struct *curr_task;
> > -       u32 curr_tid = info->tid;
> > +       u32 saved_tid = info->tid;
> >   
> >         /* If this function returns a non-NULL vma, it holds a
> > reference to
> >          * the task_struct, and holds read lock on vma->mm-
> > >mmap_lock.
> > @@ -371,14 +444,13 @@ task_vma_seq_get_next(struct
> > bpf_iter_seq_task_vma_info *info)
> >                 }
> >         } else {
> >   again:
> > -               curr_task = task_seq_get_next(ns, &curr_tid, true);
> > +               curr_task = task_seq_get_next(&info->common, &info-
> > >tid, true);
> >                 if (!curr_task) {
> > -                       info->tid = curr_tid + 1;
> > +                       info->tid++;
> >                         goto finish;
> >                 }
> >   
> > -               if (curr_tid != info->tid) {
> > -                       info->tid = curr_tid;
> > +               if (saved_tid != info->tid) {
> >                         /* new task, process the first vma */
> >                         op = task_vma_iter_first_vma;
> >                 } else {
> > @@ -430,9 +502,12 @@ task_vma_seq_get_next(struct
> > bpf_iter_seq_task_vma_info *info)
> >         return curr_vma;
> >   
> >   next_task:
> > +       if (info->common.type == BPF_TASK_ITER_TID)
> > +               goto finish;
> > +
> >         put_task_struct(curr_task);
> >         info->task = NULL;
> > -       curr_tid++;
> > +       info->tid++;
> 
> saved_tid = ++info->tid?

saved_tid is the value of info->tid when entering this funciton.
It is used to check if the current visiting task is the same one
entering this function.  For this purpose, updating saved_tid or not
will not change the result.  The value of info->tid will be different
from saved_tid after info->tid++ anyway, and it will show that the
current visiting task is not the one when entering this function.

> 
> >         goto again;
> >   
> >   finish:
> [...]


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v5 1/3] bpf: Parameterize task iterators.
  2022-08-14 20:24   ` Jiri Olsa
@ 2022-08-16 17:21     ` Kui-Feng Lee
  0 siblings, 0 replies; 22+ messages in thread
From: Kui-Feng Lee @ 2022-08-16 17:21 UTC (permalink / raw)
  To: olsajiri; +Cc: daniel, Kernel Team, Yonghong Song, ast, andrii, bpf

On Sun, 2022-08-14 at 22:24 +0200, Jiri Olsa wrote:
> On Wed, Aug 10, 2022 at 05:16:52PM -0700, Kui-Feng Lee wrote:
> 
> SNIP
> 
> > +static int bpf_iter_attach_task(struct bpf_prog *prog,
> > +                               union bpf_iter_link_info *linfo,
> > +                               struct bpf_iter_aux_info *aux)
> > +{
> > +       unsigned int flags;
> > +       struct pid_namespace *ns;
> > +       struct pid *pid;
> > +       pid_t tgid;
> > +
> > +       if (linfo->task.tid != 0) {
> > +               aux->task.type = BPF_TASK_ITER_TID;
> > +               aux->task.tid = linfo->task.tid;
> > +       } else if (linfo->task.tgid != 0) {
> > +               aux->task.type = BPF_TASK_ITER_TGID;
> > +               aux->task.tgid = linfo->task.tgid;
> > +       } else if (linfo->task.pid_fd != 0) {
> > +               aux->task.type = BPF_TASK_ITER_TGID;
> > +               pid = pidfd_get_pid(linfo->task.pid_fd, &flags);
> > +               if (IS_ERR(pid))
> > +                       return PTR_ERR(pid);
> > +
> > +               ns = task_active_pid_ns(current);
> > +               if (IS_ERR(ns))
> > +                       return PTR_ERR(ns);
> > +
> > +               tgid = pid_nr_ns(pid, ns);
> > +               if (tgid <= 0)
> > +                       return -EINVAL;
> > +
> > +               aux->task.tgid = tgid;
> > +       } else {
> > +               aux->task.type = BPF_TASK_ITER_ALL;
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> >  static const struct seq_operations task_seq_ops = {
> >         .start  = task_seq_start,
> >         .next   = task_seq_next,
> > @@ -137,8 +198,7 @@ struct bpf_iter_seq_task_file_info {
> >  static struct file *
> >  task_file_seq_get_next(struct bpf_iter_seq_task_file_info *info)
> >  {
> > -       struct pid_namespace *ns = info->common.ns;
> > -       u32 curr_tid = info->tid;
> > +       u32 saved_tid = info->tid;
> 
> we use 'curr_' prefix for other stuff in the function, like
> curr_task, curr_fd .. I think we should either change all of
> them or actually keep curr_tid, which seem better to me

The purpose of the variable is changed, so I think 'curr_tid' is not
feasible anymore.  It was the tid of the task that we are visiting. 
But, now, it is the tid of the task that the iterator was visiting
when/before entering the function.  In this patch, info->tid is always
the value of the current visiting task.

> 
> jirka
> 
> >         struct task_struct *curr_task;
> >         unsigned int curr_fd = info->fd;
> >  
> > @@ -151,21 +211,18 @@ task_file_seq_get_next(struct
> > bpf_iter_seq_task_file_info *info)
> >                 curr_task = info->task;
> >                 curr_fd = info->fd;
> >         } else {
> > -                curr_task = task_seq_get_next(ns, &curr_tid,
> > true);
> > +               curr_task = task_seq_get_next(&info->common, &info-
> > >tid, true);
> >                  if (!curr_task) {
> >                          info->task = NULL;
> > -                        info->tid = curr_tid;
> >                          return NULL;
> >                  }
> >  
> > -                /* set info->task and info->tid */
> > +               /* set info->task */
> >                 info->task = curr_task;
> > -               if (curr_tid == info->tid) {
> > +               if (saved_tid == info->tid)
> >                         curr_fd = info->fd;
> > -               } else {
> > -                       info->tid = curr_tid;
> > +               else
> >                         curr_fd = 0;
> > -               }
> >         }
> >  
> 
> SNIP


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v5 1/3] bpf: Parameterize task iterators.
  2022-08-16  5:02   ` Andrii Nakryiko
@ 2022-08-16 18:45     ` Kui-Feng Lee
  0 siblings, 0 replies; 22+ messages in thread
From: Kui-Feng Lee @ 2022-08-16 18:45 UTC (permalink / raw)
  To: andrii.nakryiko; +Cc: daniel, Kernel Team, Yonghong Song, ast, andrii, bpf

On Mon, 2022-08-15 at 22:02 -0700, Andrii Nakryiko wrote:
> On Wed, Aug 10, 2022 at 5:17 PM Kui-Feng Lee <kuifeng@fb.com> wrote:
> > 
> > Allow creating an iterator that loops through resources of one
> > task/thread.
> > 
> > People could only create iterators to loop through all resources of
> > files, vma, and tasks in the system, even though they were
> > interested
> > in only the resources of a specific task or process.  Passing the
> > additional parameters, people can now create an iterator to go
> > through all resources or only the resources of a task.
> > 
> > Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
> > ---
> >  include/linux/bpf.h            |  29 ++++++++
> >  include/uapi/linux/bpf.h       |   8 +++
> >  kernel/bpf/task_iter.c         | 126 ++++++++++++++++++++++++++---
> > ----
> >  tools/include/uapi/linux/bpf.h |   8 +++
> >  4 files changed, 147 insertions(+), 24 deletions(-)
> > 
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 11950029284f..6bbe53d06faa 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -1716,8 +1716,37 @@ int bpf_obj_get_user(const char __user
> > *pathname, int flags);
> >         extern int bpf_iter_ ## target(args);                   \
> >         int __init bpf_iter_ ## target(args) { return 0; }
> > 
> > +/*
> > + * The task type of iterators.
> > + *
> > + * For BPF task iterators, they can be parameterized with various
> > + * parameters to visit only some of tasks.
> > + *
> > + * BPF_TASK_ITER_ALL (default)
> > + *     Iterate over resources of every task.
> > + *
> > + * BPF_TASK_ITER_TID
> > + *     Iterate over resources of a task/tid.
> > + *
> > + * BPF_TASK_ITER_TGID
> > + *     Iterate over reosurces of evevry task of a process / task
> > group.
> > + */
> > +enum bpf_iter_task_type {
> > +       BPF_TASK_ITER_ALL = 0,
> > +       BPF_TASK_ITER_TID,
> > +       BPF_TASK_ITER_TGID,
> > +};
> > +
> >  struct bpf_iter_aux_info {
> >         struct bpf_map *map;
> > +       struct {
> > +               enum bpf_iter_task_type type;
> > +               union {
> > +                       u32 tid;
> > +                       u32 tgid;
> > +                       u32 pid_fd;
> > +               };
> > +       } task;
> 
> You don't seem to use pid_fd in bpf_iter_aux_info at all, is that
> right? Drop it? And for tid/tgid, I'd use kernel-side terminology for
> this internal data structure and just have single u32 pid here. Then
> type determines whether you are iterating tasks or task leaders
> (processes), no ambiguity.

Yes, it should be removed from struct bpf_iter_aux_info.

Using just single u32 pid here is also fine to distinguish field names
by the type of data instead of the purpose of data.

> 
> >  };
> > 
> >  typedef int (*bpf_iter_attach_target_t)(struct bpf_prog *prog,
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index ffcbf79a556b..6328aca0cf5c 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -91,6 +91,14 @@ union bpf_iter_link_info {
> >         struct {
> >                 __u32   map_fd;
> >         } map;
> > +       /*
> > +        * Parameters of task iterators.
> > +        */
> > +       struct {
> > +               __u32   tid;
> > +               __u32   tgid;
> > +               __u32   pid_fd;
> > +       } task;
> >  };
> > 
> 
> [...]


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v5 1/3] bpf: Parameterize task iterators.
  2022-08-15 23:08   ` Hao Luo
@ 2022-08-16 19:11     ` Kui-Feng Lee
  0 siblings, 0 replies; 22+ messages in thread
From: Kui-Feng Lee @ 2022-08-16 19:11 UTC (permalink / raw)
  To: haoluo; +Cc: daniel, Kernel Team, Yonghong Song, ast, andrii, bpf

On Mon, 2022-08-15 at 16:08 -0700, Hao Luo wrote:
> !-------------------------------------------------------------------|
>   This Message Is From an External Sender
> 
> > -------------------------------------------------------------------
> > !
> 
> Hi Kui-Feng,
> 
> On Wed, Aug 10, 2022 at 5:17 PM Kui-Feng Lee <kuifeng@fb.com> wrote:
> > 
> > Allow creating an iterator that loops through resources of one
> > task/thread.
> > 
> > People could only create iterators to loop through all resources of
> > files, vma, and tasks in the system, even though they were
> > interested
> > in only the resources of a specific task or process.  Passing the
> > additional parameters, people can now create an iterator to go
> > through all resources or only the resources of a task.
> > 
> > Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
> > ---
> >  include/linux/bpf.h            |  29 ++++++++
> >  include/uapi/linux/bpf.h       |   8 +++
> >  kernel/bpf/task_iter.c         | 126 ++++++++++++++++++++++++++---
> > ----
> >  tools/include/uapi/linux/bpf.h |   8 +++
> >  4 files changed, 147 insertions(+), 24 deletions(-)
> > 
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 11950029284f..6bbe53d06faa 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -1716,8 +1716,37 @@ int bpf_obj_get_user(const char __user
> > *pathname, int flags);
> >         extern int bpf_iter_ ## target(args);                   \
> >         int __init bpf_iter_ ## target(args) { return 0; }
> > 
> > +/*
> > + * The task type of iterators.
> > + *
> > + * For BPF task iterators, they can be parameterized with various
> > + * parameters to visit only some of tasks.
> > + *
> > + * BPF_TASK_ITER_ALL (default)
> > + *     Iterate over resources of every task.
> > + *
> > + * BPF_TASK_ITER_TID
> > + *     Iterate over resources of a task/tid.
> > + *
> > + * BPF_TASK_ITER_TGID
> > + *     Iterate over reosurces of evevry task of a process / task
> > group.
> 
> typos: resources and every.
> 
> > + */
> > +enum bpf_iter_task_type {
> > +       BPF_TASK_ITER_ALL = 0,
> > +       BPF_TASK_ITER_TID,
> > +       BPF_TASK_ITER_TGID,
> > +};
> > +
> >  struct bpf_iter_aux_info {
> >         struct bpf_map *map;
> > +       struct {
> > +               enum bpf_iter_task_type type;
> > +               union {
> > +                       u32 tid;
> > +                       u32 tgid;
> > +                       u32 pid_fd;
> > +               };
> > +       } task;
> >  };
> > 
> >  typedef int (*bpf_iter_attach_target_t)(struct bpf_prog *prog,
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index ffcbf79a556b..6328aca0cf5c 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -91,6 +91,14 @@ union bpf_iter_link_info {
> >         struct {
> >                 __u32   map_fd;
> >         } map;
> > +       /*
> > +        * Parameters of task iterators.
> > +        */
> 
> We could remove this particular comment. It is kind of obvious.
> 
> > +       struct {
> > +               __u32   tid;
> > +               __u32   tgid;
> > +               __u32   pid_fd;
> > +       } task;
> >  };
> > 
> >  /* BPF syscall commands, see bpf(2) man-page for more details. */
> > diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> > index 8c921799def4..f2e21efe075d 100644
> > --- a/kernel/bpf/task_iter.c
> > +++ b/kernel/bpf/task_iter.c
> > @@ -12,6 +12,12 @@
> > 
> >  struct bpf_iter_seq_task_common {
> >         struct pid_namespace *ns;
> > +       enum bpf_iter_task_type type;
> > +       union {
> > +               u32 tid;
> > +               u32 tgid;
> > +               u32 pid_fd;
> > +       };
> >  };
> > 
> >  struct bpf_iter_seq_task_info {
> > @@ -22,24 +28,40 @@ struct bpf_iter_seq_task_info {
> >         u32 tid;
> >  };
> > 
> > -static struct task_struct *task_seq_get_next(struct pid_namespace
> > *ns,
> > +static struct task_struct *task_seq_get_next(struct
> > bpf_iter_seq_task_common *common,
> >                                              u32 *tid,
> >                                              bool
> > skip_if_dup_files)
> >  {
> >         struct task_struct *task = NULL;
> >         struct pid *pid;
> > 
> > +       if (common->type == BPF_TASK_ITER_TID) {
> > +               if (*tid && *tid != common->tid)
> > +                       return NULL;
> > +               rcu_read_lock();
> > +               pid = find_pid_ns(common->tid, common->ns);
> > +               if (pid) {
> > +                       task = get_pid_task(pid, PIDTYPE_PID);
> > +                       *tid = common->tid;
> > +               }
> > +               rcu_read_unlock();
> 
> nit: this is ok. But I think the commonly used pattern (e.g.
> proc_pid_lookup) is
> 
>         rcu_read_lock();
>         task = find_task_by_pid_ns(tid, ns);
>         if (task)
>                 get_task_struct(task);
>         rcu_read_unlock();
> 
> > +               return task;
> > +       }
> > +
> >         rcu_read_lock();
> >  retry:
> > -       pid = find_ge_pid(*tid, ns);
> > +       pid = find_ge_pid(*tid, common->ns);
> >         if (pid) {
> > -               *tid = pid_nr_ns(pid, ns);
> > +               *tid = pid_nr_ns(pid, common->ns);
> >                 task = get_pid_task(pid, PIDTYPE_PID);
> > +
> >                 if (!task) {
> >                         ++*tid;
> >                         goto retry;
> > -               } else if (skip_if_dup_files &&
> > !thread_group_leader(task) &&
> > -                          task->files == task->group_leader-
> > >files) {
> > +               } else if ((skip_if_dup_files &&
> > !thread_group_leader(task) &&
> > +                           task->files == task->group_leader-
> > >files) ||
> > +                          (common->type == BPF_TASK_ITER_TGID &&
> > +                           __task_pid_nr_ns(task, PIDTYPE_TGID,
> > common->ns) != common->tgid)) {
> 
> Use task_tgid_nr_ns instead of __task_pid_nr_ns?
> 
> >                         put_task_struct(task);
> >                         task = NULL;
> >                         ++*tid;
> > @@ -56,7 +78,8 @@ static void *task_seq_start(struct seq_file *seq,
> > loff_t *pos)
> >         struct bpf_iter_seq_task_info *info = seq->private;
> >         struct task_struct *task;
> > 
> > -       task = task_seq_get_next(info->common.ns, &info->tid,
> > false);
> > +       task = task_seq_get_next(&info->common, &info->tid, false);
> > +
> >         if (!task)
> >                 return NULL;
> > 
> > @@ -73,7 +96,8 @@ static void *task_seq_next(struct seq_file *seq,
> > void *v, loff_t *pos)
> >         ++*pos;
> >         ++info->tid;
> >         put_task_struct((struct task_struct *)v);
> > -       task = task_seq_get_next(info->common.ns, &info->tid,
> > false);
> > +
> > +       task = task_seq_get_next(&info->common, &info->tid, false);
> >         if (!task)
> >                 return NULL;
> > 
> > @@ -117,6 +141,43 @@ static void task_seq_stop(struct seq_file
> > *seq, void *v)
> >                 put_task_struct((struct task_struct *)v);
> >  }
> > 
> > +static int bpf_iter_attach_task(struct bpf_prog *prog,
> > +                               union bpf_iter_link_info *linfo,
> > +                               struct bpf_iter_aux_info *aux)
> > +{
> > +       unsigned int flags;
> > +       struct pid_namespace *ns;
> > +       struct pid *pid;
> > +       pid_t tgid;
> > +
> > +       if (linfo->task.tid != 0) {
> > +               aux->task.type = BPF_TASK_ITER_TID;
> > +               aux->task.tid = linfo->task.tid;
> > +       } else if (linfo->task.tgid != 0) {
> > +               aux->task.type = BPF_TASK_ITER_TGID;
> > +               aux->task.tgid = linfo->task.tgid;
> > +       } else if (linfo->task.pid_fd != 0) {
> > +               aux->task.type = BPF_TASK_ITER_TGID;
> > +               pid = pidfd_get_pid(linfo->task.pid_fd, &flags);
> > +               if (IS_ERR(pid))
> > +                       return PTR_ERR(pid);
> > +
> > +               ns = task_active_pid_ns(current);
> > +               if (IS_ERR(ns))
> > +                       return PTR_ERR(ns);
> > +
> > +               tgid = pid_nr_ns(pid, ns);
> > +               if (tgid <= 0)
> > +                       return -EINVAL;
> 
> Is this just pid_vnr?
> 
> > +
> > +               aux->task.tgid = tgid;
> > +       } else {
> > +               aux->task.type = BPF_TASK_ITER_ALL;
> > +       }
> 
> The same question as Yonghong has. Do we need to enforce that at most
> one of {tid, tgid, pid_fd} is non-zero?
> 
> > +
> > +       return 0;
> > +}
> > +
> >  static const struct seq_operations task_seq_ops = {
> >         .start  = task_seq_start,
> >         .next   = task_seq_next,
> > @@ -137,8 +198,7 @@ struct bpf_iter_seq_task_file_info {
> >  static struct file *
> >  task_file_seq_get_next(struct bpf_iter_seq_task_file_info *info)
> >  {
> > -       struct pid_namespace *ns = info->common.ns;
> > -       u32 curr_tid = info->tid;
> > +       u32 saved_tid = info->tid;
> >         struct task_struct *curr_task;
> >         unsigned int curr_fd = info->fd;
> > 
> > @@ -151,21 +211,18 @@ task_file_seq_get_next(struct
> > bpf_iter_seq_task_file_info *info)
> >                 curr_task = info->task;
> >                 curr_fd = info->fd;
> >         } else {
> > -                curr_task = task_seq_get_next(ns, &curr_tid,
> > true);
> > +               curr_task = task_seq_get_next(&info->common, &info-
> > >tid, true);
> >                  if (!curr_task) {
> >                          info->task = NULL;
> > -                        info->tid = curr_tid;
> >                          return NULL;
> >                  }
> > 
> > -                /* set info->task and info->tid */
> > +               /* set info->task */
> >                 info->task = curr_task;
> > -               if (curr_tid == info->tid) {
> > +               if (saved_tid == info->tid)
> >                         curr_fd = info->fd;
> > -               } else {
> > -                       info->tid = curr_tid;
> > +               else
> >                         curr_fd = 0;
> > -               }
> >         }
> > 
> >         rcu_read_lock();
> > @@ -186,9 +243,15 @@ task_file_seq_get_next(struct
> > bpf_iter_seq_task_file_info *info)
> >         /* the current task is done, go to the next task */
> >         rcu_read_unlock();
> >         put_task_struct(curr_task);
> > +
> > +       if (info->common.type == BPF_TASK_ITER_TID) {
> > +               info->task = NULL;
> > +               return NULL;
> > +       }
> > +
> 
> Do we need to set info->fd to 0? I am not sure if the caller reads
> info->fd anywhere. I think it would be good to do some refactoring on
> task_file_seq_get_next().
> 
> >         info->task = NULL;
> >         info->fd = 0;
> > -       curr_tid = ++(info->tid);
> > +       saved_tid = ++(info->tid);
> >         goto again;
> >  }
> > 
> > @@ -269,6 +332,17 @@ static int init_seq_pidns(void *priv_data,
> > struct bpf_iter_aux_info *aux)
> >         struct bpf_iter_seq_task_common *common = priv_data;
> > 
> >         common->ns = get_pid_ns(task_active_pid_ns(current));
> > +       common->type = aux->task.type;
> > +       switch (common->type) {
> > +       case BPF_TASK_ITER_TID:
> > +               common->tid = aux->task.tid;
> > +               break;
> > +       case BPF_TASK_ITER_TGID:
> > +               common->tgid = aux->task.tgid;
> > +               break;
> > +       default:
> 
> very nit: IMHO we could place a warning here.
> 
> > +               break;
> > +       }
> >         return 0;
> >  }
> > 
> > @@ -307,11 +381,10 @@ enum bpf_task_vma_iter_find_op {
> >  static struct vm_area_struct *
> >  task_vma_seq_get_next(struct bpf_iter_seq_task_vma_info *info)
> >  {
> > -       struct pid_namespace *ns = info->common.ns;
> >         enum bpf_task_vma_iter_find_op op;
> >         struct vm_area_struct *curr_vma;
> >         struct task_struct *curr_task;
> > -       u32 curr_tid = info->tid;
> > +       u32 saved_tid = info->tid;
> > 
> 
> Why do we need to directly operate on info->tid while other task
> iters
> (e.g. vma_iter) uses curr_tid? IMHO, prefer staying using curr_tid if
> possible, for two reasons:
>  - consistent with other iters.
>  - decouple refactoring changes from the changes that introduce new
> features

These are reasonable considerations.  I chagned it to make it easy to
maintained the value of info->tid.  With curr_tid, we always put the
curr_tid's value back to info->tid before leaving the function, and it
is obfuscating and error-prone, IMHO.

I may add another patch for the purpose of factoring.


> 
> >         /* If this function returns a non-NULL vma, it holds a
> > reference to
> >          * the task_struct, and holds read lock on vma->mm-
> > >mmap_lock.
> > @@ -371,14 +444,13 @@ task_vma_seq_get_next(struct
> > bpf_iter_seq_task_vma_info *info)
> >                 }
> >         } else {
> >  again:
> > -               curr_task = task_seq_get_next(ns, &curr_tid, true);
> > +               curr_task = task_seq_get_next(&info->common, &info-
> > >tid, true);
> >                 if (!curr_task) {
> > -                       info->tid = curr_tid + 1;
> > +                       info->tid++;
> >                         goto finish;
> >                 }
> > 
> > -               if (curr_tid != info->tid) {
> > -                       info->tid = curr_tid;
> > +               if (saved_tid != info->tid) {
> >                         /* new task, process the first vma */
> >                         op = task_vma_iter_first_vma;
> >                 } else {
> > @@ -430,9 +502,12 @@ task_vma_seq_get_next(struct
> > bpf_iter_seq_task_vma_info *info)
> >         return curr_vma;
> > 
> >  next_task:
> > +       if (info->common.type == BPF_TASK_ITER_TID)
> > +               goto finish;
> > +
> >         put_task_struct(curr_task);
> >         info->task = NULL;
> > -       curr_tid++;
> > +       info->tid++;
> >         goto again;
> > 
> >  finish:
> > @@ -533,6 +608,7 @@ static const struct bpf_iter_seq_info
> > task_seq_info = {
> > 
> >  static struct bpf_iter_reg task_reg_info = {
> >         .target                 = "task",
> > +       .attach_target          = bpf_iter_attach_task,
> >         .feature                = BPF_ITER_RESCHED,
> >         .ctx_arg_info_size      = 1,
> >         .ctx_arg_info           = {
> > @@ -551,6 +627,7 @@ static const struct bpf_iter_seq_info
> > task_file_seq_info = {
> > 
> >  static struct bpf_iter_reg task_file_reg_info = {
> >         .target                 = "task_file",
> > +       .attach_target          = bpf_iter_attach_task,
> >         .feature                = BPF_ITER_RESCHED,
> >         .ctx_arg_info_size      = 2,
> >         .ctx_arg_info           = {
> > @@ -571,6 +648,7 @@ static const struct bpf_iter_seq_info
> > task_vma_seq_info = {
> > 
> >  static struct bpf_iter_reg task_vma_reg_info = {
> >         .target                 = "task_vma",
> > +       .attach_target          = bpf_iter_attach_task,
> >         .feature                = BPF_ITER_RESCHED,
> >         .ctx_arg_info_size      = 2,
> >         .ctx_arg_info           = {
> > diff --git a/tools/include/uapi/linux/bpf.h
> > b/tools/include/uapi/linux/bpf.h
> > index ffcbf79a556b..6328aca0cf5c 100644
> > --- a/tools/include/uapi/linux/bpf.h
> > +++ b/tools/include/uapi/linux/bpf.h
> > @@ -91,6 +91,14 @@ union bpf_iter_link_info {
> >         struct {
> >                 __u32   map_fd;
> >         } map;
> > +       /*
> > +        * Parameters of task iterators.
> > +        */
> > +       struct {
> > +               __u32   tid;
> > +               __u32   tgid;
> > +               __u32   pid_fd;
> > +       } task;
> >  };
> > 
> >  /* BPF syscall commands, see bpf(2) man-page for more details. */
> > --
> > 2.30.2
> > 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v5 1/3] bpf: Parameterize task iterators.
  2022-08-16  4:42     ` Andrii Nakryiko
@ 2022-08-18  3:40       ` Yonghong Song
  0 siblings, 0 replies; 22+ messages in thread
From: Yonghong Song @ 2022-08-18  3:40 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: Kui-Feng Lee, bpf, ast, daniel, andrii, kernel-team



On 8/15/22 9:42 PM, Andrii Nakryiko wrote:
> On Sat, Aug 13, 2022 at 3:17 PM Yonghong Song <yhs@fb.com> wrote:
>>
>>
>>
>> On 8/10/22 5:16 PM, Kui-Feng Lee wrote:
>>> Allow creating an iterator that loops through resources of one task/thread.
>>>
>>> People could only create iterators to loop through all resources of
>>> files, vma, and tasks in the system, even though they were interested
>>> in only the resources of a specific task or process.  Passing the
>>> additional parameters, people can now create an iterator to go
>>> through all resources or only the resources of a task.
>>>
>>> Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
>>> ---
>>>    include/linux/bpf.h            |  29 ++++++++
>>>    include/uapi/linux/bpf.h       |   8 +++
>>>    kernel/bpf/task_iter.c         | 126 ++++++++++++++++++++++++++-------
>>>    tools/include/uapi/linux/bpf.h |   8 +++
>>>    4 files changed, 147 insertions(+), 24 deletions(-)
>>>
> 
> [...]
> 
>>> +     struct {
>>> +             __u32   tid;
>>> +             __u32   tgid;
>>> +             __u32   pid_fd;
>>
>> The above is a max of kernel and user space terminologies.
>> tid/pid are user space concept and tgid is a kernel space
>> concept.
>>
>> In bpf uapi header, we have
>>
>> struct bpf_pidns_info {
>>           __u32 pid;
>>           __u32 tgid;
>> };
>>
>> which uses kernel terminologies.
>>
>> So I suggest the bpf_iter_link_info.task can also
>> use pure kernel terminology pid/tgid/tgid_fd here.
> 
> Except tgid_fd is a made up terminology. It is called pidfd in
> documentation and even if pidfd gains add support for threads (tasks),
> it would still be created through pidfd_open() syscall, probably. So
> it seems better to stick to "pidfd" here.
> 
> As for pid/tgid precedent. Are we referring to
> bpf_get_current_pid_tgid() BPF helper and struct bpf_pidns_info? Those
> are kernel-side BPF helper and kernel-side auxiliary type to describe
> return results of another in-kernel helper, so I think it's a bit less
> relevant here to set a precedent.
> 
> On the other hand, if we look at user-space-facing perf_event
> subsystem UAPI, for example, it seems to be using pid/tid terminology
> (and so does getpid()/gettid() syscall, etc). So I wonder if for a
> user-space-facing API it's better to stick with user-space-facing
> terminology?

I don't have strong preferences here as long as all terminologies are
consistent. user-space-facing API is okay.

Currently, we only have pid_fd to traverse all tasks in a process.
Based on an early discussion, it is possible that
pidfd_open syscall might adapt to return a fd for a task
in the future if necessary.
So we might have tid_fd as well to traverse a single task.

> 
>>
>> Alternative, using pure user space terminology
>> can be tid/pid/pid_fd but seems the kernel terminology
>> might be better since we already have precedence.
>>
>>
>>> +     } task;
>>>    };
>>>
> 
> [...]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v5 1/3] bpf: Parameterize task iterators.
  2022-08-16  5:25     ` Andrii Nakryiko
@ 2022-08-18  4:31       ` Yonghong Song
  2022-08-25 17:50         ` Andrii Nakryiko
  0 siblings, 1 reply; 22+ messages in thread
From: Yonghong Song @ 2022-08-18  4:31 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: Kui-Feng Lee, bpf, ast, daniel, andrii, kernel-team



On 8/15/22 10:25 PM, Andrii Nakryiko wrote:
> On Sat, Aug 13, 2022 at 3:17 PM Yonghong Song <yhs@fb.com> wrote:
>>
>>
>>
>> On 8/10/22 5:16 PM, Kui-Feng Lee wrote:
>>> Allow creating an iterator that loops through resources of one task/thread.
>>>
>>> People could only create iterators to loop through all resources of
>>> files, vma, and tasks in the system, even though they were interested
>>> in only the resources of a specific task or process.  Passing the
>>> additional parameters, people can now create an iterator to go
>>> through all resources or only the resources of a task.
>>>
>>> Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
>>> ---
>>>    include/linux/bpf.h            |  29 ++++++++
>>>    include/uapi/linux/bpf.h       |   8 +++
>>>    kernel/bpf/task_iter.c         | 126 ++++++++++++++++++++++++++-------
>>>    tools/include/uapi/linux/bpf.h |   8 +++
>>>    4 files changed, 147 insertions(+), 24 deletions(-)
>>>
> 
> Btw, Yonghong, I tried to figure it out myself, but got lost in all
> the kernel functions that don't seem to be very well documented. Sorry
> for being lazy and throwing this at you :)
> 
> Is it easy and efficient to iterate only processes using whatever
> kernel helpers we have at our disposal? E.g., if I wanted to write an
> iterator that would go only over processes (not individual threads,
> just task leaders of each different process) within a cgroup, is that
> possible?

To traverse processes in a cgroup, the best location is in
kernel/cgroup/cgroup.c where there exists a seq_ops to
traverse all processes in cgroup.procs file. If we try
to implement a bpf based iterator, we could reuse some
codes in that file.

> 
> I see task iterator as consisting of two different parts (and that
> makes it a bit hard to define nice and clean interface, but if we can
> crack this, we'd get an elegant and powerful construct):
> 
> 1. What entity to iterate: threads or processes? (I'm ignoring
> task_vma and task_files here, but one could task about files of each
> thread or files of each process, but it's less practical, probably)
> 
> 2. What's the scope of objects to iterate: just a thread by tid, just
> a process by pid/pidfd, once cgroup iter lands, we'll be able to talk
> about threads or processes within a cgroup or cgroup hierarchy (this
> is where descendants_{pre,post}, cgroup_self_only and ancestors
> ordering comes in as well).
> 
> Currently Kui-Feng is addressing first half of #2 (tid/pid/pidfd
> parameters), we can use cgroup iter's parameters to define the scope
> of tasks/processes by cgroup "filter" in a follow up (it naturally
> extends what we have in this patch set).

For #2 as well, it is also possible to have a complete new seq_ops
if the traversal is only once. That is why in Kui-Feng's patch,
there are a few special case w.r.t. TID. But current approach
is also okay.

> 
> So now I'm wondering if there is any benefit to also somehow
> specifying threads vs processes as entities to iterate? And if we do
> that, does kernel support efficient iteration of processes (as opposed
> to threads).

IIUC, I didn't find an efficient way to traverse processes only.
The current pid_ns.idr records all tasks so traversing processes
have to skip intermediate non-main-thread tasks.

> 
> 
> To be clear, there is a lot of value in having just #2, but while we
> are all at this topic, I thought I'd clarify for myself #1 as well.
> 
> Thanks!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v5 3/3] selftests/bpf: Test parameterized task BPF iterators.
  2022-08-13 22:50   ` Yonghong Song
@ 2022-08-18 17:24     ` Kui-Feng Lee
  0 siblings, 0 replies; 22+ messages in thread
From: Kui-Feng Lee @ 2022-08-18 17:24 UTC (permalink / raw)
  To: daniel, Yonghong Song, Kernel Team, ast, andrii, bpf

On Sat, 2022-08-13 at 15:50 -0700, Yonghong Song wrote:
> 
> 
> On 8/10/22 5:16 PM, Kui-Feng Lee wrote:
> > Test iterators of vma, files, and tasks of tasks.
> > 
> > Ensure the API works appropriately to visit all tasks,
> > tasks in a process, or a particular task.
> > 
> > Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
> > ---
> >   .../selftests/bpf/prog_tests/bpf_iter.c       | 204
> > ++++++++++++++++--
> >   .../selftests/bpf/prog_tests/btf_dump.c       |   2 +-
> >   .../selftests/bpf/progs/bpf_iter_task.c       |   9 +
> >   .../selftests/bpf/progs/bpf_iter_task_file.c  |   7 +
> >   .../selftests/bpf/progs/bpf_iter_task_vma.c   |   6 +-
> >   5 files changed, 203 insertions(+), 25 deletions(-)
> > 
> > diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
> > b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
> > index a33874b081b6..e66f1f3db562 100644
> > --- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
> > +++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
> > @@ -1,6 +1,9 @@
> >   // SPDX-License-Identifier: GPL-2.0
> >   /* Copyright (c) 2020 Facebook */
> >   #include <test_progs.h>
> > +#include <sys/syscall.h>
> > +#include <unistd.h>
> > +#include <signal.h>
> 
> do we need unistd.h and signal.h?
> 
> >   #include "bpf_iter_ipv6_route.skel.h"
> >   #include "bpf_iter_netlink.skel.h"
> >   #include "bpf_iter_bpf_map.skel.h"
> > @@ -42,13 +45,13 @@ static void test_btf_id_or_null(void)
> >         }
> >   }
> >   
> > -static void do_dummy_read(struct bpf_program *prog)
> > +static void do_dummy_read(struct bpf_program *prog, struct
> > bpf_iter_attach_opts *opts)
> >   {
> >         struct bpf_link *link;
> >         char buf[16] = {};
> >         int iter_fd, len;
> >   
> > -       link = bpf_program__attach_iter(prog, NULL);
> > +       link = bpf_program__attach_iter(prog, opts);
> >         if (!ASSERT_OK_PTR(link, "attach_iter"))
> >                 return;
> >   
> > @@ -91,7 +94,7 @@ static void test_ipv6_route(void)
> >         if (!ASSERT_OK_PTR(skel,
> > "bpf_iter_ipv6_route__open_and_load"))
> >                 return;
> >   
> > -       do_dummy_read(skel->progs.dump_ipv6_route);
> > +       do_dummy_read(skel->progs.dump_ipv6_route, NULL);
> >   
> >         bpf_iter_ipv6_route__destroy(skel);
> >   }
> > @@ -104,7 +107,7 @@ static void test_netlink(void)
> >         if (!ASSERT_OK_PTR(skel,
> > "bpf_iter_netlink__open_and_load"))
> >                 return;
> >   
> > -       do_dummy_read(skel->progs.dump_netlink);
> > +       do_dummy_read(skel->progs.dump_netlink, NULL);
> >   
> >         bpf_iter_netlink__destroy(skel);
> >   }
> > @@ -117,24 +120,139 @@ static void test_bpf_map(void)
> >         if (!ASSERT_OK_PTR(skel,
> > "bpf_iter_bpf_map__open_and_load"))
> >                 return;
> >   
> > -       do_dummy_read(skel->progs.dump_bpf_map);
> > +       do_dummy_read(skel->progs.dump_bpf_map, NULL);
> >   
> >         bpf_iter_bpf_map__destroy(skel);
> >   }
> >   
> > -static void test_task(void)
> > +static int pidfd_open(pid_t pid, unsigned int flags)
> > +{
> > +       return syscall(SYS_pidfd_open, pid, flags);
> > +}
> > +
> > +static void check_bpf_link_info(const struct bpf_program *prog)
> > +{
> > +       DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
> > +       union bpf_iter_link_info linfo;
> > +       struct bpf_link_info info = {};
> > +       __u32 info_len;
> > +       struct bpf_link *link;
> > +       int err;
> 
> Reverse christmas tree style?
> 
> > +
> > +       memset(&linfo, 0, sizeof(linfo));
> > +       linfo.task.tid = getpid();
> > +       opts.link_info = &linfo;
> > +       opts.link_info_len = sizeof(linfo);
> > +
> > +       link = bpf_program__attach_iter(prog, &opts);
> > +       if (!ASSERT_OK_PTR(link, "attach_iter"))
> > +               return;
> > +
> > +       info_len = sizeof(info);
> > +       err = bpf_obj_get_info_by_fd(bpf_link__fd(link), &info,
> > &info_len);
> > +       if (ASSERT_OK(err, "bpf_obj_get_info_by_fd"))
> > +               ASSERT_EQ(info.iter.task.tid, getpid(),
> > "check_task_tid");
> > +
> > +       bpf_link__destroy(link);
> > +}
> > +
> > +static pthread_mutex_t do_nothing_mutex;
> > +
> > +static void *do_nothing_wait(void *arg)
> > +{
> > +       pthread_mutex_lock(&do_nothing_mutex);
> > +       pthread_mutex_unlock(&do_nothing_mutex);
> > +
> > +       pthread_exit(arg);
> > +}
> > +
> > +static void test_task_(struct bpf_iter_attach_opts *opts, int
> > num_unknown, int num_known)
> 
> The function test_task_ name is weird. Maybe test_task_common?
> 
> >   {
> >         struct bpf_iter_task *skel;
> > +       pthread_t thread_id;
> > +       void *ret;
> >   
> >         skel = bpf_iter_task__open_and_load();
> >         if (!ASSERT_OK_PTR(skel, "bpf_iter_task__open_and_load"))
> >                 return;
> >   
> > -       do_dummy_read(skel->progs.dump_task);
> > +       if (!ASSERT_OK(pthread_mutex_init(&do_nothing_mutex, NULL),
> > "pthread_mutex_init"))
> > +               goto done;
> > +       if (!ASSERT_OK(pthread_mutex_lock(&do_nothing_mutex),
> > "pthread_mutex_lock"))
> > +               goto done;
> > +
> > +       if (!ASSERT_OK(pthread_create(&thread_id, NULL,
> > &do_nothing_wait, NULL),
> > +                 "pthread_create"))
> > +               goto done;
> > +
> > +
> > +       skel->bss->tid = getpid();
> > +
> > +       do_dummy_read(skel->progs.dump_task, opts);
> > +
> > +       if (!ASSERT_OK(pthread_mutex_unlock(&do_nothing_mutex),
> > "pthread_mutex_unlock"))
> > +               goto done;
> > +
> > +       if (num_unknown >= 0)
> > +               ASSERT_EQ(skel->bss->num_unknown_tid, num_unknown,
> > "check_num_unknown_tid");
> > +       if (num_known >= 0)
> > +               ASSERT_EQ(skel->bss->num_known_tid, num_known,
> > "check_num_known_tid");
> >   
> > +       ASSERT_FALSE(pthread_join(thread_id, &ret) || ret != NULL,
> > +                    "pthread_join");
> > +
> > +done:
> >         bpf_iter_task__destroy(skel);
> >   }
> >   
> > +static void test_task(void)
> > +{
> > +       DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
> > +       union bpf_iter_link_info linfo;
> > +
> > +       memset(&linfo, 0, sizeof(linfo));
> > +       linfo.task.tid = getpid();
> > +       opts.link_info = &linfo;
> > +       opts.link_info_len = sizeof(linfo);
> > +
> > +       test_task_(&opts, 0, 1);
> > +
> > +       test_task_(NULL, -1, 1);
> > +}
> > +
> > +static void test_task_tgid(void)
> > +{
> > +       DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
> > +       union bpf_iter_link_info linfo;
> > +
> > +       memset(&linfo, 0, sizeof(linfo));
> > +       linfo.task.tgid = getpid();
> > +       opts.link_info = &linfo;
> > +       opts.link_info_len = sizeof(linfo);
> > +
> > +       test_task_(&opts, 1, 1);
> > +}
> > +
> > +static void test_task_pidfd(void)
> > +{
> > +       DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
> > +       union bpf_iter_link_info linfo;
> > +       int pidfd;
> > +
> > +       pidfd = pidfd_open(getpid(), 0);
> > +       if (!ASSERT_GE(pidfd, 0, "pidfd_open"))
> > +               return;
> > +
> > +       memset(&linfo, 0, sizeof(linfo));
> > +       linfo.task.pid_fd = pidfd;
> 
> In kernel, pidfd has to be > 0 to be effective.
> So in the above, you should use ASSERT_GT instead of
> ASSERT_GE. For test_progs, pidfd == 0 won't happen
> since the program does not close stdin.
> 
> > +       opts.link_info = &linfo;
> > +       opts.link_info_len = sizeof(linfo);
> > +
> > +       test_task_(&opts, 1, 1);
> > +
> > +       close(pidfd);
> > +}
> > +
> >   static void test_task_sleepable(void)
> >   {
> >         struct bpf_iter_task *skel;
> > @@ -143,7 +261,7 @@ static void test_task_sleepable(void)
> >         if (!ASSERT_OK_PTR(skel, "bpf_iter_task__open_and_load"))
> >                 return;
> >   
> > -       do_dummy_read(skel->progs.dump_task_sleepable);
> > +       do_dummy_read(skel->progs.dump_task_sleepable, NULL);
> >   
> >         ASSERT_GT(skel->bss-
> > >num_expected_failure_copy_from_user_task, 0,
> >                   "num_expected_failure_copy_from_user_task");
> > @@ -161,8 +279,8 @@ static void test_task_stack(void)
> >         if (!ASSERT_OK_PTR(skel,
> > "bpf_iter_task_stack__open_and_load"))
> >                 return;
> >   
> > -       do_dummy_read(skel->progs.dump_task_stack);
> > -       do_dummy_read(skel->progs.get_task_user_stacks);
> > +       do_dummy_read(skel->progs.dump_task_stack, NULL);
> > +       do_dummy_read(skel->progs.get_task_user_stacks, NULL);
> >   
> >         bpf_iter_task_stack__destroy(skel);
> >   }
> > @@ -174,7 +292,9 @@ static void *do_nothing(void *arg)
> >   
> >   static void test_task_file(void)
> >   {
> > +       DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
> >         struct bpf_iter_task_file *skel;
> > +       union bpf_iter_link_info linfo;
> >         pthread_t thread_id;
> >         void *ret;
> >   
> > @@ -188,15 +308,31 @@ static void test_task_file(void)
> >                   "pthread_create"))
> >                 goto done;
> >   
> > -       do_dummy_read(skel->progs.dump_task_file);
> > +       memset(&linfo, 0, sizeof(linfo));
> > +       linfo.task.tid = getpid();
> > +       opts.link_info = &linfo;
> > +       opts.link_info_len = sizeof(linfo);
> > +
> > +       do_dummy_read(skel->progs.dump_task_file, &opts);
> >   
> >         if (!ASSERT_FALSE(pthread_join(thread_id, &ret) || ret !=
> > NULL,
> >                   "pthread_join"))
> >                 goto done;
> >   
> >         ASSERT_EQ(skel->bss->count, 0, "check_count");
> > +       ASSERT_EQ(skel->bss->unique_tgid_count, 1,
> > "check_unique_tgid_count");
> >   
> > -done:
> > +       skel->bss->count = 0;
> > +       skel->bss->unique_tgid_count = 0;
> > +
> > +       do_dummy_read(skel->progs.dump_task_file, NULL);
> > +
> > +       ASSERT_GE(skel->bss->count, 0, "check_count");
> > +       ASSERT_GE(skel->bss->unique_tgid_count, 1,
> > "check_unique_tgid_count");
> 
> This is not precise. ASSERT_EQ will be better, right?
This test will visit every process in the system. So, it should be GE.
However, I should use GT instead since we expect to see more than one
process here, not just the test process itself.

> Maybe reset last_tgid as well?
> 
> > +
> > +       check_bpf_link_info(skel->progs.dump_task_file);
> > +
> > + done:
> >         bpf_iter_task_file__destroy(skel);
> >   }
> >   
> > @@ -274,7 +410,7 @@ static void test_tcp4(void)
> >         if (!ASSERT_OK_PTR(skel, "bpf_iter_tcp4__open_and_load"))
> >                 return;
> >   
> > -       do_dummy_read(skel->progs.dump_tcp4);
> > +       do_dummy_read(skel->progs.dump_tcp4, NULL);
> >   
> >         bpf_iter_tcp4__destroy(skel);
> >   }
> > @@ -287,7 +423,7 @@ static void test_tcp6(void)
> >         if (!ASSERT_OK_PTR(skel, "bpf_iter_tcp6__open_and_load"))
> >                 return;
> >   
> > -       do_dummy_read(skel->progs.dump_tcp6);
> > +       do_dummy_read(skel->progs.dump_tcp6, NULL);
> >   
> >         bpf_iter_tcp6__destroy(skel);
> >   }
> > @@ -300,7 +436,7 @@ static void test_udp4(void)
> >         if (!ASSERT_OK_PTR(skel, "bpf_iter_udp4__open_and_load"))
> >                 return;
> >   
> > -       do_dummy_read(skel->progs.dump_udp4);
> > +       do_dummy_read(skel->progs.dump_udp4, NULL);
> >   
> >         bpf_iter_udp4__destroy(skel);
> >   }
> > @@ -313,7 +449,7 @@ static void test_udp6(void)
> >         if (!ASSERT_OK_PTR(skel, "bpf_iter_udp6__open_and_load"))
> >                 return;
> >   
> > -       do_dummy_read(skel->progs.dump_udp6);
> > +       do_dummy_read(skel->progs.dump_udp6, NULL);
> >   
> >         bpf_iter_udp6__destroy(skel);
> >   }
> > @@ -326,7 +462,7 @@ static void test_unix(void)
> >         if (!ASSERT_OK_PTR(skel, "bpf_iter_unix__open_and_load"))
> >                 return;
> >   
> > -       do_dummy_read(skel->progs.dump_unix);
> > +       do_dummy_read(skel->progs.dump_unix, NULL);
> >   
> >         bpf_iter_unix__destroy(skel);
> >   }
> > @@ -988,7 +1124,7 @@ static void test_bpf_sk_storage_get(void)
> >         if (!ASSERT_OK(err, "bpf_map_update_elem"))
> >                 goto close_socket;
> >   
> > -       do_dummy_read(skel->progs.fill_socket_owner);
> > +       do_dummy_read(skel->progs.fill_socket_owner, NULL);
> >   
> >         err = bpf_map_lookup_elem(map_fd, &sock_fd, &val);
> >         if (CHECK(err || val != getpid(), "bpf_map_lookup_elem",
> > @@ -996,7 +1132,7 @@ static void test_bpf_sk_storage_get(void)
> >             getpid(), val, err))
> >                 goto close_socket;
> >   
> > -       do_dummy_read(skel->progs.negate_socket_local_storage);
> > +       do_dummy_read(skel->progs.negate_socket_local_storage,
> > NULL);
> >   
> >         err = bpf_map_lookup_elem(map_fd, &sock_fd, &val);
> >         CHECK(err || val != -getpid(), "bpf_map_lookup_elem",
> > @@ -1116,7 +1252,7 @@ static void test_link_iter(void)
> >         if (!ASSERT_OK_PTR(skel,
> > "bpf_iter_bpf_link__open_and_load"))
> >                 return;
> >   
> > -       do_dummy_read(skel->progs.dump_bpf_link);
> > +       do_dummy_read(skel->progs.dump_bpf_link, NULL);
> >   
> >         bpf_iter_bpf_link__destroy(skel);
> >   }
> > @@ -1129,7 +1265,7 @@ static void test_ksym_iter(void)
> >         if (!ASSERT_OK_PTR(skel, "bpf_iter_ksym__open_and_load"))
> >                 return;
> >   
> > -       do_dummy_read(skel->progs.dump_ksym);
> > +       do_dummy_read(skel->progs.dump_ksym, NULL);
> >   
> >         bpf_iter_ksym__destroy(skel);
> >   }
> > @@ -1154,7 +1290,7 @@ static void str_strip_first_line(char *str)
> >         *dst = '\0';
> >   }
> >   
> > -static void test_task_vma(void)
> > +static void test_task_vma_(struct bpf_iter_attach_opts *opts)
> 
> test_task_vma_common?
> 
> >   {
> >         int err, iter_fd = -1, proc_maps_fd = -1;
> >         struct bpf_iter_task_vma *skel;
> > @@ -1166,13 +1302,14 @@ static void test_task_vma(void)
> >                 return;
> >   
> >         skel->bss->pid = getpid();
> > +       skel->bss->one_task = opts ? 1 : 0;
> >   
> >         err = bpf_iter_task_vma__load(skel);
> >         if (!ASSERT_OK(err, "bpf_iter_task_vma__load"))
> >                 goto out;
> >   
> >         skel->links.proc_maps = bpf_program__attach_iter(
> > -               skel->progs.proc_maps, NULL);
> > +               skel->progs.proc_maps, opts);
> >   
> >         if (!ASSERT_OK_PTR(skel->links.proc_maps,
> > "bpf_program__attach_iter")) {
> >                 skel->links.proc_maps = NULL;
> > @@ -1211,12 +1348,29 @@ static void test_task_vma(void)
> >         str_strip_first_line(proc_maps_output);
> >   
> >         ASSERT_STREQ(task_vma_output, proc_maps_output,
> > "compare_output");
> > +
> > +       check_bpf_link_info(skel->progs.proc_maps);
> > +
> >   out:
> >         close(proc_maps_fd);
> >         close(iter_fd);
> >         bpf_iter_task_vma__destroy(skel);
> >   }
> >   
> > +static void test_task_vma(void)
> > +{
> > +       DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
> > +       union bpf_iter_link_info linfo;
> > +
> > +       memset(&linfo, 0, sizeof(linfo));
> > +       linfo.task.tid = getpid();
> > +       opts.link_info = &linfo;
> > +       opts.link_info_len = sizeof(linfo);
> > +
> > +       test_task_vma_(&opts);
> > +       test_task_vma_(NULL);
> > +}
> > +
> [...]
> > diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c
> > b/tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c
> > index 4ea6a37d1345..44f4a31c2ddd 100644
> > --- a/tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c
> > +++ b/tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c
> > @@ -20,6 +20,7 @@ char _license[] SEC("license") = "GPL";
> >   #define D_PATH_BUF_SIZE 1024
> >   char d_path_buf[D_PATH_BUF_SIZE] = {};
> >   __u32 pid = 0;
> > +__u32 one_task = 0;
> >   
> >   SEC("iter/task_vma") int proc_maps(struct bpf_iter__task_vma
> > *ctx)
> >   {
> > @@ -33,8 +34,11 @@ SEC("iter/task_vma") int proc_maps(struct
> > bpf_iter__task_vma *ctx)
> >                 return 0;
> >   
> >         file = vma->vm_file;
> > -       if (task->tgid != pid)
> > +       if (task->tgid != pid) {
> > +               if (one_task)
> > +                       BPF_SEQ_PRINTF(seq, "unexpected task (%d !=
> > %d)", task->tgid, pid);
> 
> This doesn't sound good. Is it possible we add a global variable to 
> indicate this condition and do an ASSERT in bpf_iter.c file?
> 
> >                 return 0;
> > +       }
> >         perm_str[0] = (vma->vm_flags & VM_READ) ? 'r' : '-';
> >         perm_str[1] = (vma->vm_flags & VM_WRITE) ? 'w' : '-';
> >         perm_str[2] = (vma->vm_flags & VM_EXEC) ? 'x' : '-';


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v5 1/3] bpf: Parameterize task iterators.
  2022-08-18  4:31       ` Yonghong Song
@ 2022-08-25 17:50         ` Andrii Nakryiko
  0 siblings, 0 replies; 22+ messages in thread
From: Andrii Nakryiko @ 2022-08-25 17:50 UTC (permalink / raw)
  To: Yonghong Song; +Cc: Kui-Feng Lee, bpf, ast, daniel, andrii, kernel-team

On Wed, Aug 17, 2022 at 9:31 PM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 8/15/22 10:25 PM, Andrii Nakryiko wrote:
> > On Sat, Aug 13, 2022 at 3:17 PM Yonghong Song <yhs@fb.com> wrote:
> >>
> >>
> >>
> >> On 8/10/22 5:16 PM, Kui-Feng Lee wrote:
> >>> Allow creating an iterator that loops through resources of one task/thread.
> >>>
> >>> People could only create iterators to loop through all resources of
> >>> files, vma, and tasks in the system, even though they were interested
> >>> in only the resources of a specific task or process.  Passing the
> >>> additional parameters, people can now create an iterator to go
> >>> through all resources or only the resources of a task.
> >>>
> >>> Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
> >>> ---
> >>>    include/linux/bpf.h            |  29 ++++++++
> >>>    include/uapi/linux/bpf.h       |   8 +++
> >>>    kernel/bpf/task_iter.c         | 126 ++++++++++++++++++++++++++-------
> >>>    tools/include/uapi/linux/bpf.h |   8 +++
> >>>    4 files changed, 147 insertions(+), 24 deletions(-)
> >>>
> >
> > Btw, Yonghong, I tried to figure it out myself, but got lost in all
> > the kernel functions that don't seem to be very well documented. Sorry
> > for being lazy and throwing this at you :)
> >
> > Is it easy and efficient to iterate only processes using whatever
> > kernel helpers we have at our disposal? E.g., if I wanted to write an
> > iterator that would go only over processes (not individual threads,
> > just task leaders of each different process) within a cgroup, is that
> > possible?
>
> To traverse processes in a cgroup, the best location is in
> kernel/cgroup/cgroup.c where there exists a seq_ops to
> traverse all processes in cgroup.procs file. If we try
> to implement a bpf based iterator, we could reuse some
> codes in that file.
>

yep

> >
> > I see task iterator as consisting of two different parts (and that
> > makes it a bit hard to define nice and clean interface, but if we can
> > crack this, we'd get an elegant and powerful construct):
> >
> > 1. What entity to iterate: threads or processes? (I'm ignoring
> > task_vma and task_files here, but one could task about files of each
> > thread or files of each process, but it's less practical, probably)
> >
> > 2. What's the scope of objects to iterate: just a thread by tid, just
> > a process by pid/pidfd, once cgroup iter lands, we'll be able to talk
> > about threads or processes within a cgroup or cgroup hierarchy (this
> > is where descendants_{pre,post}, cgroup_self_only and ancestors
> > ordering comes in as well).
> >
> > Currently Kui-Feng is addressing first half of #2 (tid/pid/pidfd
> > parameters), we can use cgroup iter's parameters to define the scope
> > of tasks/processes by cgroup "filter" in a follow up (it naturally
> > extends what we have in this patch set).
>
> For #2 as well, it is also possible to have a complete new seq_ops
> if the traversal is only once. That is why in Kui-Feng's patch,
> there are a few special case w.r.t. TID. But current approach
> is also okay.
>

sounds good!

> >
> > So now I'm wondering if there is any benefit to also somehow
> > specifying threads vs processes as entities to iterate? And if we do
> > that, does kernel support efficient iteration of processes (as opposed
> > to threads).
>
> IIUC, I didn't find an efficient way to traverse processes only.
> The current pid_ns.idr records all tasks so traversing processes
> have to skip intermediate non-main-thread tasks.
>

I see, too bad, but thanks for checking!

> >
> >
> > To be clear, there is a lot of value in having just #2, but while we
> > are all at this topic, I thought I'd clarify for myself #1 as well.
> >
> > Thanks!

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2022-08-25 17:50 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-11  0:16 [PATCH bpf-next v5 0/3] Parameterize task iterators Kui-Feng Lee
2022-08-11  0:16 ` [PATCH bpf-next v5 1/3] bpf: " Kui-Feng Lee
2022-08-13 22:17   ` Yonghong Song
2022-08-15  1:01     ` Alexei Starovoitov
2022-08-16  4:42     ` Andrii Nakryiko
2022-08-18  3:40       ` Yonghong Song
2022-08-16  5:25     ` Andrii Nakryiko
2022-08-18  4:31       ` Yonghong Song
2022-08-25 17:50         ` Andrii Nakryiko
2022-08-16 17:00     ` Kui-Feng Lee
2022-08-14 20:24   ` Jiri Olsa
2022-08-16 17:21     ` Kui-Feng Lee
2022-08-15 23:08   ` Hao Luo
2022-08-16 19:11     ` Kui-Feng Lee
2022-08-16  5:02   ` Andrii Nakryiko
2022-08-16 18:45     ` Kui-Feng Lee
2022-08-11  0:16 ` [PATCH bpf-next v5 2/3] bpf: Handle bpf_link_info for the parameterized task BPF iterators Kui-Feng Lee
2022-08-13 22:23   ` Yonghong Song
2022-08-11  0:16 ` [PATCH bpf-next v5 3/3] selftests/bpf: Test " Kui-Feng Lee
2022-08-13 22:50   ` Yonghong Song
2022-08-18 17:24     ` Kui-Feng Lee
2022-08-16  5:15   ` Andrii Nakryiko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.