bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v6 0/4] Parameterize task iterators.
@ 2022-08-19 22:09 Kui-Feng Lee
  2022-08-19 22:09 ` [PATCH bpf-next v6 1/4] bpf: " Kui-Feng Lee
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Kui-Feng Lee @ 2022-08-19 22:09 UTC (permalink / raw)
  To: bpf, ast, daniel, andrii, kernel-team, yhs; +Cc: Kui-Feng Lee

Allow creating an iterator that loops through resources of one task/thread.

People could only create iterators to loop through all resources of
files, vma, and tasks in the system, even though they were interested in only the
resources of a specific task or process.  Passing the addintional
parameters, people can now create an iterator to go through all
resources or only the resources of a task.

Major Changes:

 - Add new parameters in bpf_iter_link_info to indicate to go through
   all tasks or to go through a specific task.

 - Change the implementations of BPF iterators of vma, files, and
   tasks to allow going through only the resources of a specific task.

 - Provide the arguments of parameterized task iterators in
   bpf_link_info.

Differences from v5:

 - Use user-speace tid/pid terminologies in bpf_iter_link_info *** BLURB HERE ***
   bpf_link_info.

 - Fix reference count

 - Merge all variants to one 'u32 pid' in internal structs.
   (bpf_iter_aux_info and bpf_iter_seq_task_common)

 - Compare the result of get_uprobe_offset() with the implementation
   with the vma iterators.

 - Implement show_fdinfo.

Differences from v4:

 - Remove 'type' from bpf_iter_link_info and bpf_link_info.

v5: https://lore.kernel.org/bpf/20220811001654.1316689-1-kuifeng@fb.com/
v4: https://lore.kernel.org/bpf/20220809195429.1043220-1-kuifeng@fb.com/
v3: https://lore.kernel.org/bpf/20220809063501.667610-1-kuifeng@fb.com/
v2: https://lore.kernel.org/bpf/20220801232649.2306614-1-kuifeng@fb.com/
v1: https://lore.kernel.org/bpf/20220726051713.840431-1-kuifeng@fb.com/

Kui-Feng Lee (4):
  bpf: Parameterize task iterators.
  bpf: Handle bpf_link_info for the parameterized task BPF iterators.
  bpf: Handle show_fdinfo for the parameterized task BPF iterators
  selftests/bpf: Test parameterized task BPF iterators.

 include/linux/bpf.h                           |  25 ++
 include/uapi/linux/bpf.h                      |  12 +
 kernel/bpf/task_iter.c                        | 142 +++++++--
 tools/include/uapi/linux/bpf.h                |  12 +
 .../selftests/bpf/prog_tests/bpf_iter.c       | 284 +++++++++++++++++-
 .../selftests/bpf/prog_tests/btf_dump.c       |   2 +-
 .../selftests/bpf/progs/bpf_iter_task.c       |   9 +
 .../selftests/bpf/progs/bpf_iter_task_file.c  |   9 +-
 .../selftests/bpf/progs/bpf_iter_task_vma.c   |   6 +-
 .../bpf/progs/bpf_iter_uprobe_offset.c        |  35 +++
 10 files changed, 493 insertions(+), 43 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_uprobe_offset.c

-- 
2.30.2


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH bpf-next v6 1/4] bpf: Parameterize task iterators.
  2022-08-19 22:09 [PATCH bpf-next v6 0/4] Parameterize task iterators Kui-Feng Lee
@ 2022-08-19 22:09 ` Kui-Feng Lee
  2022-08-24 19:33   ` Yonghong Song
  2022-08-24 22:20   ` Andrii Nakryiko
  2022-08-19 22:09 ` [PATCH bpf-next v6 2/4] bpf: Handle bpf_link_info for the parameterized task BPF iterators Kui-Feng Lee
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 17+ messages in thread
From: Kui-Feng Lee @ 2022-08-19 22:09 UTC (permalink / raw)
  To: bpf, ast, daniel, andrii, kernel-team, yhs; +Cc: Kui-Feng Lee

Allow creating an iterator that loops through resources of one task/thread.

People could only create iterators to loop through all resources of
files, vma, and tasks in the system, even though they were interested
in only the resources of a specific task or process.  Passing the
additional parameters, people can now create an iterator to go
through all resources or only the resources of a task.

Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
---
 include/linux/bpf.h            |  25 +++++++
 include/uapi/linux/bpf.h       |   6 ++
 kernel/bpf/task_iter.c         | 116 ++++++++++++++++++++++++++-------
 tools/include/uapi/linux/bpf.h |   6 ++
 4 files changed, 129 insertions(+), 24 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 39bd36359c1e..59712dd917d8 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1729,8 +1729,33 @@ int bpf_obj_get_user(const char __user *pathname, int flags);
 	extern int bpf_iter_ ## target(args);			\
 	int __init bpf_iter_ ## target(args) { return 0; }
 
+/*
+ * The task type of iterators.
+ *
+ * For BPF task iterators, they can be parameterized with various
+ * parameters to visit only some of tasks.
+ *
+ * BPF_TASK_ITER_ALL (default)
+ *	Iterate over resources of every task.
+ *
+ * BPF_TASK_ITER_TID
+ *	Iterate over resources of a task/tid.
+ *
+ * BPF_TASK_ITER_TGID
+ *	Iterate over reosurces of evevry task of a process / task group.
+ */
+enum bpf_iter_task_type {
+	BPF_TASK_ITER_ALL = 0,
+	BPF_TASK_ITER_TID,
+	BPF_TASK_ITER_TGID,
+};
+
 struct bpf_iter_aux_info {
 	struct bpf_map *map;
+	struct {
+		enum bpf_iter_task_type	type;
+		u32 pid;
+	} task;
 };
 
 typedef int (*bpf_iter_attach_target_t)(struct bpf_prog *prog,
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 934a2a8beb87..778fbf11aa00 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -91,6 +91,12 @@ union bpf_iter_link_info {
 	struct {
 		__u32	map_fd;
 	} map;
+	/* Parameters of task iterators. */
+	struct {
+		__u32	tid;
+		__u32	pid;
+		__u32	pid_fd;
+	} task;
 };
 
 /* BPF syscall commands, see bpf(2) man-page for more details. */
diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index 8c921799def4..2f5fc6602917 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -12,6 +12,8 @@
 
 struct bpf_iter_seq_task_common {
 	struct pid_namespace *ns;
+	enum bpf_iter_task_type	type;
+	u32 pid;
 };
 
 struct bpf_iter_seq_task_info {
@@ -22,24 +24,39 @@ struct bpf_iter_seq_task_info {
 	u32 tid;
 };
 
-static struct task_struct *task_seq_get_next(struct pid_namespace *ns,
+static struct task_struct *task_seq_get_next(struct bpf_iter_seq_task_common *common,
 					     u32 *tid,
 					     bool skip_if_dup_files)
 {
 	struct task_struct *task = NULL;
 	struct pid *pid;
 
+	if (common->type == BPF_TASK_ITER_TID) {
+		if (*tid && *tid != common->pid)
+			return NULL;
+		rcu_read_lock();
+		pid = find_pid_ns(common->pid, common->ns);
+		if (pid) {
+			task = get_pid_task(pid, PIDTYPE_PID);
+			*tid = common->pid;
+		}
+		rcu_read_unlock();
+		return task;
+	}
+
 	rcu_read_lock();
 retry:
-	pid = find_ge_pid(*tid, ns);
+	pid = find_ge_pid(*tid, common->ns);
 	if (pid) {
-		*tid = pid_nr_ns(pid, ns);
+		*tid = pid_nr_ns(pid, common->ns);
 		task = get_pid_task(pid, PIDTYPE_PID);
 		if (!task) {
 			++*tid;
 			goto retry;
-		} else if (skip_if_dup_files && !thread_group_leader(task) &&
-			   task->files == task->group_leader->files) {
+		} else if ((skip_if_dup_files && !thread_group_leader(task) &&
+			    task->files == task->group_leader->files) ||
+			   (common->type == BPF_TASK_ITER_TGID &&
+			    __task_pid_nr_ns(task, PIDTYPE_TGID, common->ns) != common->pid)) {
 			put_task_struct(task);
 			task = NULL;
 			++*tid;
@@ -56,7 +73,7 @@ static void *task_seq_start(struct seq_file *seq, loff_t *pos)
 	struct bpf_iter_seq_task_info *info = seq->private;
 	struct task_struct *task;
 
-	task = task_seq_get_next(info->common.ns, &info->tid, false);
+	task = task_seq_get_next(&info->common, &info->tid, false);
 	if (!task)
 		return NULL;
 
@@ -73,7 +90,7 @@ static void *task_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 	++*pos;
 	++info->tid;
 	put_task_struct((struct task_struct *)v);
-	task = task_seq_get_next(info->common.ns, &info->tid, false);
+	task = task_seq_get_next(&info->common, &info->tid, false);
 	if (!task)
 		return NULL;
 
@@ -117,6 +134,48 @@ static void task_seq_stop(struct seq_file *seq, void *v)
 		put_task_struct((struct task_struct *)v);
 }
 
+static int bpf_iter_attach_task(struct bpf_prog *prog,
+				union bpf_iter_link_info *linfo,
+				struct bpf_iter_aux_info *aux)
+{
+	unsigned int flags;
+	struct pid_namespace *ns;
+	struct pid *pid;
+	pid_t tgid;
+
+	aux->task.type = BPF_TASK_ITER_ALL;
+	if (linfo->task.tid != 0) {
+		aux->task.type = BPF_TASK_ITER_TID;
+		aux->task.pid = linfo->task.tid;
+	}
+	if (linfo->task.pid != 0) {
+		if (aux->task.type != BPF_TASK_ITER_ALL)
+			return -EINVAL;
+
+		aux->task.type = BPF_TASK_ITER_TGID;
+		aux->task.pid = linfo->task.pid;
+	}
+	if (linfo->task.pid_fd != 0) {
+		if (aux->task.type != BPF_TASK_ITER_ALL)
+			return -EINVAL;
+
+		aux->task.type = BPF_TASK_ITER_TGID;
+		ns = task_active_pid_ns(current);
+		if (IS_ERR(ns))
+			return PTR_ERR(ns);
+
+		pid = pidfd_get_pid(linfo->task.pid_fd, &flags);
+		if (IS_ERR(pid))
+			return PTR_ERR(pid);
+
+		tgid = pid_nr_ns(pid, ns);
+		aux->task.pid = tgid;
+		put_pid(pid);
+	}
+
+	return 0;
+}
+
 static const struct seq_operations task_seq_ops = {
 	.start	= task_seq_start,
 	.next	= task_seq_next,
@@ -137,8 +196,7 @@ struct bpf_iter_seq_task_file_info {
 static struct file *
 task_file_seq_get_next(struct bpf_iter_seq_task_file_info *info)
 {
-	struct pid_namespace *ns = info->common.ns;
-	u32 curr_tid = info->tid;
+	u32 saved_tid = info->tid;
 	struct task_struct *curr_task;
 	unsigned int curr_fd = info->fd;
 
@@ -151,21 +209,18 @@ task_file_seq_get_next(struct bpf_iter_seq_task_file_info *info)
 		curr_task = info->task;
 		curr_fd = info->fd;
 	} else {
-                curr_task = task_seq_get_next(ns, &curr_tid, true);
+		curr_task = task_seq_get_next(&info->common, &info->tid, true);
                 if (!curr_task) {
                         info->task = NULL;
-                        info->tid = curr_tid;
                         return NULL;
                 }
 
-                /* set info->task and info->tid */
+		/* set info->task */
 		info->task = curr_task;
-		if (curr_tid == info->tid) {
+		if (saved_tid == info->tid)
 			curr_fd = info->fd;
-		} else {
-			info->tid = curr_tid;
+		else
 			curr_fd = 0;
-		}
 	}
 
 	rcu_read_lock();
@@ -186,9 +241,15 @@ task_file_seq_get_next(struct bpf_iter_seq_task_file_info *info)
 	/* the current task is done, go to the next task */
 	rcu_read_unlock();
 	put_task_struct(curr_task);
+
+	if (info->common.type == BPF_TASK_ITER_TID) {
+		info->task = NULL;
+		return NULL;
+	}
+
 	info->task = NULL;
 	info->fd = 0;
-	curr_tid = ++(info->tid);
+	saved_tid = ++(info->tid);
 	goto again;
 }
 
@@ -269,6 +330,9 @@ static int init_seq_pidns(void *priv_data, struct bpf_iter_aux_info *aux)
 	struct bpf_iter_seq_task_common *common = priv_data;
 
 	common->ns = get_pid_ns(task_active_pid_ns(current));
+	common->type = aux->task.type;
+	common->pid = aux->task.pid;
+
 	return 0;
 }
 
@@ -307,11 +371,10 @@ enum bpf_task_vma_iter_find_op {
 static struct vm_area_struct *
 task_vma_seq_get_next(struct bpf_iter_seq_task_vma_info *info)
 {
-	struct pid_namespace *ns = info->common.ns;
 	enum bpf_task_vma_iter_find_op op;
 	struct vm_area_struct *curr_vma;
 	struct task_struct *curr_task;
-	u32 curr_tid = info->tid;
+	u32 saved_tid = info->tid;
 
 	/* If this function returns a non-NULL vma, it holds a reference to
 	 * the task_struct, and holds read lock on vma->mm->mmap_lock.
@@ -371,14 +434,13 @@ task_vma_seq_get_next(struct bpf_iter_seq_task_vma_info *info)
 		}
 	} else {
 again:
-		curr_task = task_seq_get_next(ns, &curr_tid, true);
+		curr_task = task_seq_get_next(&info->common, &info->tid, true);
 		if (!curr_task) {
-			info->tid = curr_tid + 1;
+			info->tid++;
 			goto finish;
 		}
 
-		if (curr_tid != info->tid) {
-			info->tid = curr_tid;
+		if (saved_tid != info->tid) {
 			/* new task, process the first vma */
 			op = task_vma_iter_first_vma;
 		} else {
@@ -430,9 +492,12 @@ task_vma_seq_get_next(struct bpf_iter_seq_task_vma_info *info)
 	return curr_vma;
 
 next_task:
+	if (info->common.type == BPF_TASK_ITER_TID)
+		goto finish;
+
 	put_task_struct(curr_task);
 	info->task = NULL;
-	curr_tid++;
+	info->tid++;
 	goto again;
 
 finish:
@@ -533,6 +598,7 @@ static const struct bpf_iter_seq_info task_seq_info = {
 
 static struct bpf_iter_reg task_reg_info = {
 	.target			= "task",
+	.attach_target		= bpf_iter_attach_task,
 	.feature		= BPF_ITER_RESCHED,
 	.ctx_arg_info_size	= 1,
 	.ctx_arg_info		= {
@@ -551,6 +617,7 @@ static const struct bpf_iter_seq_info task_file_seq_info = {
 
 static struct bpf_iter_reg task_file_reg_info = {
 	.target			= "task_file",
+	.attach_target		= bpf_iter_attach_task,
 	.feature		= BPF_ITER_RESCHED,
 	.ctx_arg_info_size	= 2,
 	.ctx_arg_info		= {
@@ -571,6 +638,7 @@ static const struct bpf_iter_seq_info task_vma_seq_info = {
 
 static struct bpf_iter_reg task_vma_reg_info = {
 	.target			= "task_vma",
+	.attach_target		= bpf_iter_attach_task,
 	.feature		= BPF_ITER_RESCHED,
 	.ctx_arg_info_size	= 2,
 	.ctx_arg_info		= {
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 1d6085e15fc8..7a0268749a48 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -91,6 +91,12 @@ union bpf_iter_link_info {
 	struct {
 		__u32	map_fd;
 	} map;
+	/* Parameters of task iterators. */
+	struct {
+		__u32	tid;
+		__u32	pid;
+		__u32	pid_fd;
+	} task;
 };
 
 /* BPF syscall commands, see bpf(2) man-page for more details. */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH bpf-next v6 2/4] bpf: Handle bpf_link_info for the parameterized task BPF iterators.
  2022-08-19 22:09 [PATCH bpf-next v6 0/4] Parameterize task iterators Kui-Feng Lee
  2022-08-19 22:09 ` [PATCH bpf-next v6 1/4] bpf: " Kui-Feng Lee
@ 2022-08-19 22:09 ` Kui-Feng Lee
  2022-08-24 19:41   ` Yonghong Song
  2022-08-19 22:09 ` [PATCH bpf-next v6 3/4] bpf: Handle show_fdinfo " Kui-Feng Lee
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Kui-Feng Lee @ 2022-08-19 22:09 UTC (permalink / raw)
  To: bpf, ast, daniel, andrii, kernel-team, yhs; +Cc: Kui-Feng Lee

Add new fields to bpf_link_info that users can query it through
bpf_obj_get_info_by_fd().

Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
---
 include/uapi/linux/bpf.h       |  6 ++++++
 kernel/bpf/task_iter.c         | 18 ++++++++++++++++++
 tools/include/uapi/linux/bpf.h |  6 ++++++
 3 files changed, 30 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 778fbf11aa00..6647e052dd00 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -6170,6 +6170,12 @@ struct bpf_link_info {
 					__u32 map_id;
 				} map;
 			};
+			union {
+				struct {
+					__u32 tid;
+					__u32 pid;
+				} task;
+			};
 		} iter;
 		struct  {
 			__u32 netns_ino;
diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index 2f5fc6602917..927b3a1cf354 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -596,6 +596,21 @@ static const struct bpf_iter_seq_info task_seq_info = {
 	.seq_priv_size		= sizeof(struct bpf_iter_seq_task_info),
 };
 
+static int bpf_iter_fill_link_info(const struct bpf_iter_aux_info *aux, struct bpf_link_info *info)
+{
+	switch (aux->task.type) {
+	case BPF_TASK_ITER_TID:
+		info->iter.task.tid = aux->task.pid;
+		break;
+	case BPF_TASK_ITER_TGID:
+		info->iter.task.pid = aux->task.pid;
+		break;
+	default:
+		break;
+	}
+	return 0;
+}
+
 static struct bpf_iter_reg task_reg_info = {
 	.target			= "task",
 	.attach_target		= bpf_iter_attach_task,
@@ -606,6 +621,7 @@ static struct bpf_iter_reg task_reg_info = {
 		  PTR_TO_BTF_ID_OR_NULL },
 	},
 	.seq_info		= &task_seq_info,
+	.fill_link_info		= bpf_iter_fill_link_info,
 };
 
 static const struct bpf_iter_seq_info task_file_seq_info = {
@@ -627,6 +643,7 @@ static struct bpf_iter_reg task_file_reg_info = {
 		  PTR_TO_BTF_ID_OR_NULL },
 	},
 	.seq_info		= &task_file_seq_info,
+	.fill_link_info		= bpf_iter_fill_link_info,
 };
 
 static const struct bpf_iter_seq_info task_vma_seq_info = {
@@ -648,6 +665,7 @@ static struct bpf_iter_reg task_vma_reg_info = {
 		  PTR_TO_BTF_ID_OR_NULL },
 	},
 	.seq_info		= &task_vma_seq_info,
+	.fill_link_info		= bpf_iter_fill_link_info,
 };
 
 BPF_CALL_5(bpf_find_vma, struct task_struct *, task, u64, start,
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 7a0268749a48..177722c5dd62 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -6170,6 +6170,12 @@ struct bpf_link_info {
 					__u32 map_id;
 				} map;
 			};
+			union {
+				struct {
+					__u32 tid;
+					__u32 pid;
+				} task;
+			};
 		} iter;
 		struct  {
 			__u32 netns_ino;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH bpf-next v6 3/4] bpf: Handle show_fdinfo for the parameterized task BPF iterators
  2022-08-19 22:09 [PATCH bpf-next v6 0/4] Parameterize task iterators Kui-Feng Lee
  2022-08-19 22:09 ` [PATCH bpf-next v6 1/4] bpf: " Kui-Feng Lee
  2022-08-19 22:09 ` [PATCH bpf-next v6 2/4] bpf: Handle bpf_link_info for the parameterized task BPF iterators Kui-Feng Lee
@ 2022-08-19 22:09 ` Kui-Feng Lee
  2022-08-24 19:49   ` Yonghong Song
  2022-08-19 22:09 ` [PATCH bpf-next v6 4/4] selftests/bpf: Test " Kui-Feng Lee
  2022-08-24 22:06 ` [PATCH bpf-next v6 0/4] Parameterize task iterators Andrii Nakryiko
  4 siblings, 1 reply; 17+ messages in thread
From: Kui-Feng Lee @ 2022-08-19 22:09 UTC (permalink / raw)
  To: bpf, ast, daniel, andrii, kernel-team, yhs; +Cc: Kui-Feng Lee

Show information of iterators in the respective files under
/proc/<pid>/fdinfo/.

Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
---
 kernel/bpf/task_iter.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index 927b3a1cf354..5303eddb264b 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -611,6 +611,11 @@ static int bpf_iter_fill_link_info(const struct bpf_iter_aux_info *aux, struct b
 	return 0;
 }
 
+static void bpf_iter_task_show_fdinfo(const struct bpf_iter_aux_info *aux, struct seq_file *seq)
+{
+	seq_printf(seq, "task_type:\t%d\npid:\t%d\n", aux->task.type, aux->task.pid);
+}
+
 static struct bpf_iter_reg task_reg_info = {
 	.target			= "task",
 	.attach_target		= bpf_iter_attach_task,
@@ -622,6 +627,7 @@ static struct bpf_iter_reg task_reg_info = {
 	},
 	.seq_info		= &task_seq_info,
 	.fill_link_info		= bpf_iter_fill_link_info,
+	.show_fdinfo		= bpf_iter_task_show_fdinfo,
 };
 
 static const struct bpf_iter_seq_info task_file_seq_info = {
@@ -644,6 +650,7 @@ static struct bpf_iter_reg task_file_reg_info = {
 	},
 	.seq_info		= &task_file_seq_info,
 	.fill_link_info		= bpf_iter_fill_link_info,
+	.show_fdinfo		= bpf_iter_task_show_fdinfo,
 };
 
 static const struct bpf_iter_seq_info task_vma_seq_info = {
@@ -666,6 +673,7 @@ static struct bpf_iter_reg task_vma_reg_info = {
 	},
 	.seq_info		= &task_vma_seq_info,
 	.fill_link_info		= bpf_iter_fill_link_info,
+	.show_fdinfo		= bpf_iter_task_show_fdinfo,
 };
 
 BPF_CALL_5(bpf_find_vma, struct task_struct *, task, u64, start,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH bpf-next v6 4/4] selftests/bpf: Test parameterized task BPF iterators.
  2022-08-19 22:09 [PATCH bpf-next v6 0/4] Parameterize task iterators Kui-Feng Lee
                   ` (2 preceding siblings ...)
  2022-08-19 22:09 ` [PATCH bpf-next v6 3/4] bpf: Handle show_fdinfo " Kui-Feng Lee
@ 2022-08-19 22:09 ` Kui-Feng Lee
  2022-08-24 20:50   ` Yonghong Song
  2022-08-24 22:30   ` Andrii Nakryiko
  2022-08-24 22:06 ` [PATCH bpf-next v6 0/4] Parameterize task iterators Andrii Nakryiko
  4 siblings, 2 replies; 17+ messages in thread
From: Kui-Feng Lee @ 2022-08-19 22:09 UTC (permalink / raw)
  To: bpf, ast, daniel, andrii, kernel-team, yhs; +Cc: Kui-Feng Lee

Test iterators of vma, files, and tasks of tasks.

Ensure the API works appropriately to visit all tasks,
tasks in a process, or a particular task.

Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
---
 .../selftests/bpf/prog_tests/bpf_iter.c       | 284 +++++++++++++++++-
 .../selftests/bpf/prog_tests/btf_dump.c       |   2 +-
 .../selftests/bpf/progs/bpf_iter_task.c       |   9 +
 .../selftests/bpf/progs/bpf_iter_task_file.c  |   9 +-
 .../selftests/bpf/progs/bpf_iter_task_vma.c   |   6 +-
 .../bpf/progs/bpf_iter_uprobe_offset.c        |  35 +++
 6 files changed, 326 insertions(+), 19 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_uprobe_offset.c

diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
index e89685bd587c..c1ef7ffa6a43 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 /* Copyright (c) 2020 Facebook */
 #include <test_progs.h>
+#include <sys/syscall.h>
 #include "bpf_iter_ipv6_route.skel.h"
 #include "bpf_iter_netlink.skel.h"
 #include "bpf_iter_bpf_map.skel.h"
@@ -14,6 +15,7 @@
 #include "bpf_iter_udp4.skel.h"
 #include "bpf_iter_udp6.skel.h"
 #include "bpf_iter_unix.skel.h"
+#include "bpf_iter_uprobe_offset.skel.h"
 #include "bpf_iter_test_kern1.skel.h"
 #include "bpf_iter_test_kern2.skel.h"
 #include "bpf_iter_test_kern3.skel.h"
@@ -43,13 +45,13 @@ static void test_btf_id_or_null(void)
 	}
 }
 
-static void do_dummy_read(struct bpf_program *prog)
+static void do_dummy_read_opts(struct bpf_program *prog, struct bpf_iter_attach_opts *opts)
 {
 	struct bpf_link *link;
 	char buf[16] = {};
 	int iter_fd, len;
 
-	link = bpf_program__attach_iter(prog, NULL);
+	link = bpf_program__attach_iter(prog, opts);
 	if (!ASSERT_OK_PTR(link, "attach_iter"))
 		return;
 
@@ -68,6 +70,11 @@ static void do_dummy_read(struct bpf_program *prog)
 	bpf_link__destroy(link);
 }
 
+static void do_dummy_read(struct bpf_program *prog)
+{
+	do_dummy_read_opts(prog, NULL);
+}
+
 static void do_read_map_iter_fd(struct bpf_object_skeleton **skel, struct bpf_program *prog,
 				struct bpf_map *map)
 {
@@ -167,19 +174,149 @@ static void test_bpf_map(void)
 	bpf_iter_bpf_map__destroy(skel);
 }
 
-static void test_task(void)
+static int pidfd_open(pid_t pid, unsigned int flags)
+{
+	return syscall(SYS_pidfd_open, pid, flags);
+}
+
+static void check_bpf_link_info(const struct bpf_program *prog)
+{
+	DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
+	union bpf_iter_link_info linfo;
+	struct bpf_link_info info = {};
+	struct bpf_link *link;
+	__u32 info_len;
+	int err;
+
+	memset(&linfo, 0, sizeof(linfo));
+	linfo.task.tid = getpid();
+	opts.link_info = &linfo;
+	opts.link_info_len = sizeof(linfo);
+
+	link = bpf_program__attach_iter(prog, &opts);
+	if (!ASSERT_OK_PTR(link, "attach_iter"))
+		return;
+
+	info_len = sizeof(info);
+	err = bpf_obj_get_info_by_fd(bpf_link__fd(link), &info, &info_len);
+	ASSERT_OK(err, "bpf_obj_get_info_by_fd");
+	ASSERT_EQ(info.iter.task.tid, getpid(), "check_task_tid");
+
+	bpf_link__destroy(link);
+}
+
+static pthread_mutex_t do_nothing_mutex;
+
+static void *do_nothing_wait(void *arg)
+{
+	pthread_mutex_lock(&do_nothing_mutex);
+	pthread_mutex_unlock(&do_nothing_mutex);
+
+	pthread_exit(arg);
+}
+
+static void test_task_common_nocheck(struct bpf_iter_attach_opts *opts,
+				     int *num_unknown, int *num_known)
 {
 	struct bpf_iter_task *skel;
+	pthread_t thread_id;
+	bool locked = false;
+	void *ret;
 
 	skel = bpf_iter_task__open_and_load();
 	if (!ASSERT_OK_PTR(skel, "bpf_iter_task__open_and_load"))
 		return;
 
-	do_dummy_read(skel->progs.dump_task);
+	if (!ASSERT_OK(pthread_mutex_lock(&do_nothing_mutex), "pthread_mutex_lock"))
+		goto done;
+	locked = true;
+
+	if (!ASSERT_OK(pthread_create(&thread_id, NULL, &do_nothing_wait, NULL),
+		  "pthread_create"))
+		goto done;
+
+
+	skel->bss->tid = getpid();
+
+	do_dummy_read_opts(skel->progs.dump_task, opts);
+
+	*num_unknown = skel->bss->num_unknown_tid;
+	*num_known = skel->bss->num_known_tid;
+
+	ASSERT_OK(pthread_mutex_unlock(&do_nothing_mutex), "pthread_mutex_unlock");
+	locked = false;
+	ASSERT_FALSE(pthread_join(thread_id, &ret) || ret != NULL,
+		     "pthread_join");
 
+done:
+	if (locked)
+		ASSERT_OK(pthread_mutex_unlock(&do_nothing_mutex), "pthread_mutex_unlock");
 	bpf_iter_task__destroy(skel);
 }
 
+static void test_task_common(struct bpf_iter_attach_opts *opts, int num_unknown, int num_known)
+{
+	int num_unknown_tid, num_known_tid;
+
+	test_task_common_nocheck(opts, &num_unknown_tid, &num_known_tid);
+	ASSERT_EQ(num_unknown_tid, num_unknown, "check_num_unknown_tid");
+	ASSERT_EQ(num_known_tid, num_known, "check_num_known_tid");
+}
+
+static void test_task(void)
+{
+	DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
+	union bpf_iter_link_info linfo;
+	int num_unknown_tid, num_known_tid;
+
+	memset(&linfo, 0, sizeof(linfo));
+	linfo.task.tid = getpid();
+	opts.link_info = &linfo;
+	opts.link_info_len = sizeof(linfo);
+	test_task_common(&opts, 0, 1);
+
+	linfo.task.tid = 0;
+	linfo.task.pid = getpid();
+	test_task_common(&opts, 1, 1);
+
+	test_task_common_nocheck(NULL, &num_unknown_tid, &num_known_tid);
+	ASSERT_GT(num_unknown_tid, 1, "check_num_unknown_tid");
+	ASSERT_EQ(num_known_tid, 1, "check_num_known_tid");
+}
+
+static void test_task_tgid(void)
+{
+	DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
+	union bpf_iter_link_info linfo;
+
+	memset(&linfo, 0, sizeof(linfo));
+	linfo.task.pid = getpid();
+	opts.link_info = &linfo;
+	opts.link_info_len = sizeof(linfo);
+
+	test_task_common(&opts, 1, 1);
+}
+
+static void test_task_pidfd(void)
+{
+	DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
+	union bpf_iter_link_info linfo;
+	int pidfd;
+
+	pidfd = pidfd_open(getpid(), 0);
+	if (!ASSERT_GT(pidfd, 0, "pidfd_open"))
+		return;
+
+	memset(&linfo, 0, sizeof(linfo));
+	linfo.task.pid_fd = pidfd;
+	opts.link_info = &linfo;
+	opts.link_info_len = sizeof(linfo);
+
+	test_task_common(&opts, 1, 1);
+
+	close(pidfd);
+}
+
 static void test_task_sleepable(void)
 {
 	struct bpf_iter_task *skel;
@@ -212,15 +349,13 @@ static void test_task_stack(void)
 	bpf_iter_task_stack__destroy(skel);
 }
 
-static void *do_nothing(void *arg)
-{
-	pthread_exit(arg);
-}
-
 static void test_task_file(void)
 {
+	DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
 	struct bpf_iter_task_file *skel;
+	union bpf_iter_link_info linfo;
 	pthread_t thread_id;
+	bool locked = false;
 	void *ret;
 
 	skel = bpf_iter_task_file__open_and_load();
@@ -229,19 +364,43 @@ static void test_task_file(void)
 
 	skel->bss->tgid = getpid();
 
-	if (!ASSERT_OK(pthread_create(&thread_id, NULL, &do_nothing, NULL),
+	if (!ASSERT_OK(pthread_mutex_lock(&do_nothing_mutex), "pthread_mutex_lock"))
+		goto done;
+	locked = true;
+
+	if (!ASSERT_OK(pthread_create(&thread_id, NULL, &do_nothing_wait, NULL),
 		  "pthread_create"))
 		goto done;
 
-	do_dummy_read(skel->progs.dump_task_file);
+	memset(&linfo, 0, sizeof(linfo));
+	linfo.task.tid = getpid();
+	opts.link_info = &linfo;
+	opts.link_info_len = sizeof(linfo);
 
-	if (!ASSERT_FALSE(pthread_join(thread_id, &ret) || ret != NULL,
-		  "pthread_join"))
-		goto done;
+	do_dummy_read_opts(skel->progs.dump_task_file, &opts);
+
+	ASSERT_EQ(skel->bss->count, 0, "check_count");
+	ASSERT_EQ(skel->bss->unique_tgid_count, 1, "check_unique_tgid_count");
+
+	skel->bss->last_tgid = 0;
+	skel->bss->count = 0;
+	skel->bss->unique_tgid_count = 0;
+
+	do_dummy_read(skel->progs.dump_task_file);
 
 	ASSERT_EQ(skel->bss->count, 0, "check_count");
+	ASSERT_GT(skel->bss->unique_tgid_count, 1, "check_unique_tgid_count");
+
+	check_bpf_link_info(skel->progs.dump_task_file);
+
+	ASSERT_OK(pthread_mutex_unlock(&do_nothing_mutex), "pthread_mutex_unlock");
+	locked = false;
+	ASSERT_OK(pthread_join(thread_id, &ret), "pthread_join");
+	ASSERT_NULL(ret, "phtread_join");
 
 done:
+	if (locked)
+		ASSERT_OK(pthread_mutex_unlock(&do_nothing_mutex), "pthread_mutex_unlock");
 	bpf_iter_task_file__destroy(skel);
 }
 
@@ -1249,7 +1408,7 @@ static void str_strip_first_line(char *str)
 	*dst = '\0';
 }
 
-static void test_task_vma(void)
+static void test_task_vma_common(struct bpf_iter_attach_opts *opts)
 {
 	int err, iter_fd = -1, proc_maps_fd = -1;
 	struct bpf_iter_task_vma *skel;
@@ -1261,13 +1420,14 @@ static void test_task_vma(void)
 		return;
 
 	skel->bss->pid = getpid();
+	skel->bss->one_task = opts ? 1 : 0;
 
 	err = bpf_iter_task_vma__load(skel);
 	if (!ASSERT_OK(err, "bpf_iter_task_vma__load"))
 		goto out;
 
 	skel->links.proc_maps = bpf_program__attach_iter(
-		skel->progs.proc_maps, NULL);
+		skel->progs.proc_maps, opts);
 
 	if (!ASSERT_OK_PTR(skel->links.proc_maps, "bpf_program__attach_iter")) {
 		skel->links.proc_maps = NULL;
@@ -1306,6 +1466,9 @@ static void test_task_vma(void)
 	str_strip_first_line(proc_maps_output);
 
 	ASSERT_STREQ(task_vma_output, proc_maps_output, "compare_output");
+
+	check_bpf_link_info(skel->progs.proc_maps);
+
 out:
 	close(proc_maps_fd);
 	close(iter_fd);
@@ -1325,8 +1488,91 @@ void test_bpf_sockmap_map_iter_fd(void)
 	bpf_iter_sockmap__destroy(skel);
 }
 
+static void test_task_vma(void)
+{
+	DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
+	union bpf_iter_link_info linfo;
+
+	memset(&linfo, 0, sizeof(linfo));
+	linfo.task.tid = getpid();
+	opts.link_info = &linfo;
+	opts.link_info_len = sizeof(linfo);
+
+	test_task_vma_common(&opts);
+	test_task_vma_common(NULL);
+}
+
+/* uprobe attach point */
+static noinline int trigger_func(int arg)
+{
+	asm volatile ("");
+	return arg + 1;
+}
+
+static void test_task_uprobe_offset_common(struct bpf_iter_attach_opts *opts, bool one_proc)
+{
+	struct bpf_iter_uprobe_offset *skel;
+	struct bpf_link *link;
+	char buf[16] = {};
+	int iter_fd, len;
+
+	skel = bpf_iter_uprobe_offset__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "bpf_iter_uprobe_offset__open_and_load"))
+		return;
+
+	skel->bss->pid = getpid();
+	skel->bss->address = (uintptr_t)trigger_func;
+
+	link = bpf_program__attach_iter(skel->progs.get_uprobe_offset, opts);
+	if (!ASSERT_OK_PTR(link, "attach_iter"))
+		return;
+
+	iter_fd = bpf_iter_create(bpf_link__fd(link));
+	if (!ASSERT_GT(iter_fd, 0, "create_iter"))
+		goto exit;
+
+	while ((len = read(iter_fd, buf, sizeof(buf))) > 0)
+		;
+	CHECK(len < 0, "read", "read failed: %s\n", strerror(errno));
+	buf[15] = 0;
+	ASSERT_EQ(strcmp(buf, "OK\n"), 0, "strcmp");
+
+	ASSERT_EQ(skel->bss->offset, get_uprobe_offset(trigger_func), "offset");
+	if (one_proc)
+		ASSERT_EQ(skel->bss->unique_tgid_cnt, 1, "unique_tgid_count");
+	else
+		ASSERT_GT(skel->bss->unique_tgid_cnt, 1, "unique_tgid_count");
+
+	close(iter_fd);
+
+exit:
+	bpf_link__destroy(link);
+}
+
+static void test_task_uprobe_offset(void)
+{
+	LIBBPF_OPTS(bpf_iter_attach_opts, opts);
+	union bpf_iter_link_info linfo;
+
+	memset(&linfo, 0, sizeof(linfo));
+	linfo.task.pid = getpid();
+	opts.link_info = &linfo;
+	opts.link_info_len = sizeof(linfo);
+
+	test_task_uprobe_offset_common(&opts, true);
+
+	linfo.task.pid = 0;
+	linfo.task.tid = getpid();
+	test_task_uprobe_offset_common(&opts, true);
+
+	test_task_uprobe_offset_common(NULL, false);
+}
+
 void test_bpf_iter(void)
 {
+	if (!ASSERT_OK(pthread_mutex_init(&do_nothing_mutex, NULL), "pthread_mutex_init"))
+		return;
+
 	if (test__start_subtest("btf_id_or_null"))
 		test_btf_id_or_null();
 	if (test__start_subtest("ipv6_route"))
@@ -1337,6 +1583,10 @@ void test_bpf_iter(void)
 		test_bpf_map();
 	if (test__start_subtest("task"))
 		test_task();
+	if (test__start_subtest("task_tgid"))
+		test_task_tgid();
+	if (test__start_subtest("task_pidfd"))
+		test_task_pidfd();
 	if (test__start_subtest("task_sleepable"))
 		test_task_sleepable();
 	if (test__start_subtest("task_stack"))
@@ -1397,4 +1647,6 @@ void test_bpf_iter(void)
 		test_ksym_iter();
 	if (test__start_subtest("bpf_sockmap_map_iter_fd"))
 		test_bpf_sockmap_map_iter_fd();
+	if (test__start_subtest("uprobe_offset"))
+		test_task_uprobe_offset();
 }
diff --git a/tools/testing/selftests/bpf/prog_tests/btf_dump.c b/tools/testing/selftests/bpf/prog_tests/btf_dump.c
index 5fce7008d1ff..32c34ce9cbeb 100644
--- a/tools/testing/selftests/bpf/prog_tests/btf_dump.c
+++ b/tools/testing/selftests/bpf/prog_tests/btf_dump.c
@@ -764,7 +764,7 @@ static void test_btf_dump_struct_data(struct btf *btf, struct btf_dump *d,
 
 	/* union with nested struct */
 	TEST_BTF_DUMP_DATA(btf, d, "union", str, union bpf_iter_link_info, BTF_F_COMPACT,
-			   "(union bpf_iter_link_info){.map = (struct){.map_fd = (__u32)1,},}",
+			   "(union bpf_iter_link_info){.map = (struct){.map_fd = (__u32)1,},.task = (struct){.tid = (__u32)1,},}",
 			   { .map = { .map_fd = 1 }});
 
 	/* struct skb with nested structs/unions; because type output is so
diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_task.c b/tools/testing/selftests/bpf/progs/bpf_iter_task.c
index d22741272692..96131b9a1caa 100644
--- a/tools/testing/selftests/bpf/progs/bpf_iter_task.c
+++ b/tools/testing/selftests/bpf/progs/bpf_iter_task.c
@@ -6,6 +6,10 @@
 
 char _license[] SEC("license") = "GPL";
 
+uint32_t tid = 0;
+int num_unknown_tid = 0;
+int num_known_tid = 0;
+
 SEC("iter/task")
 int dump_task(struct bpf_iter__task *ctx)
 {
@@ -18,6 +22,11 @@ int dump_task(struct bpf_iter__task *ctx)
 		return 0;
 	}
 
+	if (task->pid != tid)
+		num_unknown_tid++;
+	else
+		num_known_tid++;
+
 	if (ctx->meta->seq_num == 0)
 		BPF_SEQ_PRINTF(seq, "    tgid      gid\n");
 
diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_task_file.c b/tools/testing/selftests/bpf/progs/bpf_iter_task_file.c
index 6e7b400888fe..b0255080662d 100644
--- a/tools/testing/selftests/bpf/progs/bpf_iter_task_file.c
+++ b/tools/testing/selftests/bpf/progs/bpf_iter_task_file.c
@@ -7,14 +7,16 @@ char _license[] SEC("license") = "GPL";
 
 int count = 0;
 int tgid = 0;
+int last_tgid = 0;
+int unique_tgid_count = 0;
 
 SEC("iter/task_file")
 int dump_task_file(struct bpf_iter__task_file *ctx)
 {
 	struct seq_file *seq = ctx->meta->seq;
 	struct task_struct *task = ctx->task;
-	__u32 fd = ctx->fd;
 	struct file *file = ctx->file;
+	__u32 fd = ctx->fd;
 
 	if (task == (void *)0 || file == (void *)0)
 		return 0;
@@ -27,6 +29,11 @@ int dump_task_file(struct bpf_iter__task_file *ctx)
 	if (tgid == task->tgid && task->tgid != task->pid)
 		count++;
 
+	if (last_tgid != task->tgid) {
+		last_tgid = task->tgid;
+		unique_tgid_count++;
+	}
+
 	BPF_SEQ_PRINTF(seq, "%8d %8d %8d %lx\n", task->tgid, task->pid, fd,
 		       (long)file->f_op);
 	return 0;
diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c b/tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c
index 4ea6a37d1345..44f4a31c2ddd 100644
--- a/tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c
+++ b/tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c
@@ -20,6 +20,7 @@ char _license[] SEC("license") = "GPL";
 #define D_PATH_BUF_SIZE 1024
 char d_path_buf[D_PATH_BUF_SIZE] = {};
 __u32 pid = 0;
+__u32 one_task = 0;
 
 SEC("iter/task_vma") int proc_maps(struct bpf_iter__task_vma *ctx)
 {
@@ -33,8 +34,11 @@ SEC("iter/task_vma") int proc_maps(struct bpf_iter__task_vma *ctx)
 		return 0;
 
 	file = vma->vm_file;
-	if (task->tgid != pid)
+	if (task->tgid != pid) {
+		if (one_task)
+			BPF_SEQ_PRINTF(seq, "unexpected task (%d != %d)", task->tgid, pid);
 		return 0;
+	}
 	perm_str[0] = (vma->vm_flags & VM_READ) ? 'r' : '-';
 	perm_str[1] = (vma->vm_flags & VM_WRITE) ? 'w' : '-';
 	perm_str[2] = (vma->vm_flags & VM_EXEC) ? 'x' : '-';
diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_uprobe_offset.c b/tools/testing/selftests/bpf/progs/bpf_iter_uprobe_offset.c
new file mode 100644
index 000000000000..825ca86678bd
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bpf_iter_uprobe_offset.c
@@ -0,0 +1,35 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+#include "bpf_iter.h"
+#include <bpf/bpf_helpers.h>
+
+char _license[] SEC("license") = "GPL";
+
+__u32 unique_tgid_cnt = 0;
+uintptr_t address = 0;
+uintptr_t offset = 0;
+__u32 last_tgid = 0;
+__u32 pid = 0;
+
+SEC("iter/task_vma") int get_uprobe_offset(struct bpf_iter__task_vma *ctx)
+{
+	struct vm_area_struct *vma = ctx->vma;
+	struct seq_file *seq = ctx->meta->seq;
+	struct task_struct *task = ctx->task;
+
+	if (task == NULL || vma == NULL)
+		return 0;
+
+	if (last_tgid != task->tgid)
+		unique_tgid_cnt++;
+	last_tgid = task->tgid;
+
+	if (task->tgid != pid)
+		return 0;
+
+	if (vma->vm_start <= address && vma->vm_end > address) {
+		offset = address - vma->vm_start + (vma->vm_pgoff << 12);
+		BPF_SEQ_PRINTF(seq, "OK\n");
+	}
+	return 0;
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH bpf-next v6 1/4] bpf: Parameterize task iterators.
  2022-08-19 22:09 ` [PATCH bpf-next v6 1/4] bpf: " Kui-Feng Lee
@ 2022-08-24 19:33   ` Yonghong Song
  2022-08-24 22:20   ` Andrii Nakryiko
  1 sibling, 0 replies; 17+ messages in thread
From: Yonghong Song @ 2022-08-24 19:33 UTC (permalink / raw)
  To: Kui-Feng Lee, bpf, ast, daniel, andrii, kernel-team



On 8/19/22 3:09 PM, Kui-Feng Lee wrote:
> Allow creating an iterator that loops through resources of one task/thread.

one thread/process.

> 
> People could only create iterators to loop through all resources of
> files, vma, and tasks in the system, even though they were interested
> in only the resources of a specific task or process.  Passing the
> additional parameters, people can now create an iterator to go
> through all resources or only the resources of a task.
> 
> Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>

Acked-by: Yonghong Song <yhs@fb.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH bpf-next v6 2/4] bpf: Handle bpf_link_info for the parameterized task BPF iterators.
  2022-08-19 22:09 ` [PATCH bpf-next v6 2/4] bpf: Handle bpf_link_info for the parameterized task BPF iterators Kui-Feng Lee
@ 2022-08-24 19:41   ` Yonghong Song
  0 siblings, 0 replies; 17+ messages in thread
From: Yonghong Song @ 2022-08-24 19:41 UTC (permalink / raw)
  To: Kui-Feng Lee, bpf, ast, daniel, andrii, kernel-team



On 8/19/22 3:09 PM, Kui-Feng Lee wrote:
> Add new fields to bpf_link_info that users can query it through
> bpf_obj_get_info_by_fd().
> 
> Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>

Acked-by: Yonghong Song <yhs@fb.com>

But we are missing bpftool implementation which actually dumps the
tid/pid here. So you need another bpftool patch to do this.
See file tools/bpf/bpftool/link.c to see how map_id is
printed.

> ---
>   include/uapi/linux/bpf.h       |  6 ++++++
>   kernel/bpf/task_iter.c         | 18 ++++++++++++++++++
>   tools/include/uapi/linux/bpf.h |  6 ++++++
>   3 files changed, 30 insertions(+)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 778fbf11aa00..6647e052dd00 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -6170,6 +6170,12 @@ struct bpf_link_info {
>   					__u32 map_id;
>   				} map;
>   			};
> +			union {
> +				struct {
> +					__u32 tid;
> +					__u32 pid;
> +				} task;
> +			};
>   		} iter;
>   		struct  {
>   			__u32 netns_ino;
> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> index 2f5fc6602917..927b3a1cf354 100644
> --- a/kernel/bpf/task_iter.c
> +++ b/kernel/bpf/task_iter.c
> @@ -596,6 +596,21 @@ static const struct bpf_iter_seq_info task_seq_info = {
>   	.seq_priv_size		= sizeof(struct bpf_iter_seq_task_info),
>   };
>   
> +static int bpf_iter_fill_link_info(const struct bpf_iter_aux_info *aux, struct bpf_link_info *info)
> +{
> +	switch (aux->task.type) {
> +	case BPF_TASK_ITER_TID:
> +		info->iter.task.tid = aux->task.pid;
> +		break;
> +	case BPF_TASK_ITER_TGID:
> +		info->iter.task.pid = aux->task.pid;
> +		break;
> +	default:
> +		break;
> +	}
> +	return 0;
> +}
> +
>   static struct bpf_iter_reg task_reg_info = {
>   	.target			= "task",
>   	.attach_target		= bpf_iter_attach_task,
> @@ -606,6 +621,7 @@ static struct bpf_iter_reg task_reg_info = {
>   		  PTR_TO_BTF_ID_OR_NULL },
>   	},
>   	.seq_info		= &task_seq_info,
> +	.fill_link_info		= bpf_iter_fill_link_info,
>   };
>   
>   static const struct bpf_iter_seq_info task_file_seq_info = {
> @@ -627,6 +643,7 @@ static struct bpf_iter_reg task_file_reg_info = {
>   		  PTR_TO_BTF_ID_OR_NULL },
>   	},
>   	.seq_info		= &task_file_seq_info,
> +	.fill_link_info		= bpf_iter_fill_link_info,
>   };
>   
>   static const struct bpf_iter_seq_info task_vma_seq_info = {
> @@ -648,6 +665,7 @@ static struct bpf_iter_reg task_vma_reg_info = {
>   		  PTR_TO_BTF_ID_OR_NULL },
>   	},
>   	.seq_info		= &task_vma_seq_info,
> +	.fill_link_info		= bpf_iter_fill_link_info,
>   };
>   
>   BPF_CALL_5(bpf_find_vma, struct task_struct *, task, u64, start,
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 7a0268749a48..177722c5dd62 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -6170,6 +6170,12 @@ struct bpf_link_info {
>   					__u32 map_id;
>   				} map;
>   			};
> +			union {
> +				struct {
> +					__u32 tid;
> +					__u32 pid;
> +				} task;
> +			};
>   		} iter;
>   		struct  {
>   			__u32 netns_ino;

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH bpf-next v6 3/4] bpf: Handle show_fdinfo for the parameterized task BPF iterators
  2022-08-19 22:09 ` [PATCH bpf-next v6 3/4] bpf: Handle show_fdinfo " Kui-Feng Lee
@ 2022-08-24 19:49   ` Yonghong Song
  0 siblings, 0 replies; 17+ messages in thread
From: Yonghong Song @ 2022-08-24 19:49 UTC (permalink / raw)
  To: Kui-Feng Lee, bpf, ast, daniel, andrii, kernel-team



On 8/19/22 3:09 PM, Kui-Feng Lee wrote:
> Show information of iterators in the respective files under
> /proc/<pid>/fdinfo/.

Please show more information about what are dumped in
the commit message.

> 
> Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
> ---
>   kernel/bpf/task_iter.c | 8 ++++++++
>   1 file changed, 8 insertions(+)
> 
> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> index 927b3a1cf354..5303eddb264b 100644
> --- a/kernel/bpf/task_iter.c
> +++ b/kernel/bpf/task_iter.c
> @@ -611,6 +611,11 @@ static int bpf_iter_fill_link_info(const struct bpf_iter_aux_info *aux, struct b
>   	return 0;
>   }
>   
> +static void bpf_iter_task_show_fdinfo(const struct bpf_iter_aux_info *aux, struct seq_file *seq)
> +{
> +	seq_printf(seq, "task_type:\t%d\npid:\t%d\n", aux->task.type, aux->task.pid);

It would be good if we can print either 'tid: <number>' or 'pid: 
<number>' instead of just 'pid: <number>' in all cases.
Also you don't need to print pid if it is 0 (to traverse all tasks).

We should use a string instead of an int for aux->task.type so user 
doesn't need to look at kernel source which they may not have.

> +}
> +
>   static struct bpf_iter_reg task_reg_info = {
>   	.target			= "task",
>   	.attach_target		= bpf_iter_attach_task,
> @@ -622,6 +627,7 @@ static struct bpf_iter_reg task_reg_info = {
>   	},
>   	.seq_info		= &task_seq_info,
>   	.fill_link_info		= bpf_iter_fill_link_info,
> +	.show_fdinfo		= bpf_iter_task_show_fdinfo,
>   };
>   
>   static const struct bpf_iter_seq_info task_file_seq_info = {
> @@ -644,6 +650,7 @@ static struct bpf_iter_reg task_file_reg_info = {
>   	},
>   	.seq_info		= &task_file_seq_info,
>   	.fill_link_info		= bpf_iter_fill_link_info,
> +	.show_fdinfo		= bpf_iter_task_show_fdinfo,
>   };
>   
>   static const struct bpf_iter_seq_info task_vma_seq_info = {
> @@ -666,6 +673,7 @@ static struct bpf_iter_reg task_vma_reg_info = {
>   	},
>   	.seq_info		= &task_vma_seq_info,
>   	.fill_link_info		= bpf_iter_fill_link_info,
> +	.show_fdinfo		= bpf_iter_task_show_fdinfo,
>   };
>   
>   BPF_CALL_5(bpf_find_vma, struct task_struct *, task, u64, start,

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH bpf-next v6 4/4] selftests/bpf: Test parameterized task BPF iterators.
  2022-08-19 22:09 ` [PATCH bpf-next v6 4/4] selftests/bpf: Test " Kui-Feng Lee
@ 2022-08-24 20:50   ` Yonghong Song
  2022-08-24 22:30   ` Andrii Nakryiko
  1 sibling, 0 replies; 17+ messages in thread
From: Yonghong Song @ 2022-08-24 20:50 UTC (permalink / raw)
  To: Kui-Feng Lee, bpf, ast, daniel, andrii, kernel-team



On 8/19/22 3:09 PM, Kui-Feng Lee wrote:
> Test iterators of vma, files, and tasks of tasks.
> 
> Ensure the API works appropriately to visit all tasks,
> tasks in a process, or a particular task.
> 
> Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
> ---
>   .../selftests/bpf/prog_tests/bpf_iter.c       | 284 +++++++++++++++++-
>   .../selftests/bpf/prog_tests/btf_dump.c       |   2 +-
>   .../selftests/bpf/progs/bpf_iter_task.c       |   9 +
>   .../selftests/bpf/progs/bpf_iter_task_file.c  |   9 +-
>   .../selftests/bpf/progs/bpf_iter_task_vma.c   |   6 +-
>   .../bpf/progs/bpf_iter_uprobe_offset.c        |  35 +++
>   6 files changed, 326 insertions(+), 19 deletions(-)
>   create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_uprobe_offset.c
> 
[...]
> +
> +static pthread_mutex_t do_nothing_mutex;
> +
> +static void *do_nothing_wait(void *arg)
> +{
> +	pthread_mutex_lock(&do_nothing_mutex);
> +	pthread_mutex_unlock(&do_nothing_mutex);
> +
> +	pthread_exit(arg);
> +}
> +
> +static void test_task_common_nocheck(struct bpf_iter_attach_opts *opts,
> +				     int *num_unknown, int *num_known)
>   {
>   	struct bpf_iter_task *skel;
> +	pthread_t thread_id;
> +	bool locked = false;

We can have a more 'kernel'-way implementation than using
'locked'.
	if (!ASSERT_OK(pthread_mutex_lock(&do_nothing_mutex), 
"pthread_mutex_lock"))
		goto done;
	if (!ASSERT_OK(pthread_create(&thread_id, NULL, &do_nothing_wait, 
NULL), "pthread_create"))
		goto unlock;

	...
unlock:
	ASSERT_OK(pthread_mutex_unlock(&do_nothing_mutex), "pthread_mutex_unlock");
done:
	bpf_iter_task__destroy(skel);


> +	void *ret;
>   
>   	skel = bpf_iter_task__open_and_load();
>   	if (!ASSERT_OK_PTR(skel, "bpf_iter_task__open_and_load"))
>   		return;
>   
> -	do_dummy_read(skel->progs.dump_task);
> +	if (!ASSERT_OK(pthread_mutex_lock(&do_nothing_mutex), "pthread_mutex_lock"))
> +		goto done;
> +	locked = true;
> +
> +	if (!ASSERT_OK(pthread_create(&thread_id, NULL, &do_nothing_wait, NULL),
> +		  "pthread_create"))
> +		goto done;
> +
> +
> +	skel->bss->tid = getpid();
> +
> +	do_dummy_read_opts(skel->progs.dump_task, opts);
> +
> +	*num_unknown = skel->bss->num_unknown_tid;
> +	*num_known = skel->bss->num_known_tid;
> +
> +	ASSERT_OK(pthread_mutex_unlock(&do_nothing_mutex), "pthread_mutex_unlock");
> +	locked = false;
> +	ASSERT_FALSE(pthread_join(thread_id, &ret) || ret != NULL,
> +		     "pthread_join");
>   
> +done:
> +	if (locked)
> +		ASSERT_OK(pthread_mutex_unlock(&do_nothing_mutex), "pthread_mutex_unlock");
>   	bpf_iter_task__destroy(skel);
>   }
>   
> +static void test_task_common(struct bpf_iter_attach_opts *opts, int num_unknown, int num_known)
> +{
> +	int num_unknown_tid, num_known_tid;
> +
> +	test_task_common_nocheck(opts, &num_unknown_tid, &num_known_tid);
> +	ASSERT_EQ(num_unknown_tid, num_unknown, "check_num_unknown_tid");
> +	ASSERT_EQ(num_known_tid, num_known, "check_num_known_tid");
> +}
> +
> +static void test_task(void)

test_task_tid?

> +{
> +	DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
> +	union bpf_iter_link_info linfo;
> +	int num_unknown_tid, num_known_tid;
> +
> +	memset(&linfo, 0, sizeof(linfo));
> +	linfo.task.tid = getpid();
> +	opts.link_info = &linfo;
> +	opts.link_info_len = sizeof(linfo);
> +	test_task_common(&opts, 0, 1);
> +
> +	linfo.task.tid = 0;
> +	linfo.task.pid = getpid();
> +	test_task_common(&opts, 1, 1);
> +
> +	test_task_common_nocheck(NULL, &num_unknown_tid, &num_known_tid);
> +	ASSERT_GT(num_unknown_tid, 1, "check_num_unknown_tid");
> +	ASSERT_EQ(num_known_tid, 1, "check_num_known_tid");
> +}
> +
> +static void test_task_tgid(void)

test_task_pid?

> +{
> +	DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
> +	union bpf_iter_link_info linfo;
> +
> +	memset(&linfo, 0, sizeof(linfo));
> +	linfo.task.pid = getpid();
> +	opts.link_info = &linfo;
> +	opts.link_info_len = sizeof(linfo);
> +
> +	test_task_common(&opts, 1, 1);
> +}
> +
> +static void test_task_pidfd(void)
> +{
> +	DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
> +	union bpf_iter_link_info linfo;
> +	int pidfd;
> +
> +	pidfd = pidfd_open(getpid(), 0);
> +	if (!ASSERT_GT(pidfd, 0, "pidfd_open"))
> +		return;
> +
> +	memset(&linfo, 0, sizeof(linfo));
> +	linfo.task.pid_fd = pidfd;
> +	opts.link_info = &linfo;
> +	opts.link_info_len = sizeof(linfo);
> +
> +	test_task_common(&opts, 1, 1);
> +
> +	close(pidfd);
> +}
> +
>   static void test_task_sleepable(void)
>   {
>   	struct bpf_iter_task *skel;
> @@ -212,15 +349,13 @@ static void test_task_stack(void)
>   	bpf_iter_task_stack__destroy(skel);
>   }
>   
> -static void *do_nothing(void *arg)
> -{
> -	pthread_exit(arg);
> -}
> -
>   static void test_task_file(void)
>   {
> +	DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
>   	struct bpf_iter_task_file *skel;
> +	union bpf_iter_link_info linfo;
>   	pthread_t thread_id;
> +	bool locked = false;

similar to the above, 'locked' variable can be removed
by implementing an alternative approach.

>   	void *ret;
>   
>   	skel = bpf_iter_task_file__open_and_load();
> @@ -229,19 +364,43 @@ static void test_task_file(void)
>   
>   	skel->bss->tgid = getpid();
>   
> -	if (!ASSERT_OK(pthread_create(&thread_id, NULL, &do_nothing, NULL),
> +	if (!ASSERT_OK(pthread_mutex_lock(&do_nothing_mutex), "pthread_mutex_lock"))
> +		goto done;
> +	locked = true;
> +
> +	if (!ASSERT_OK(pthread_create(&thread_id, NULL, &do_nothing_wait, NULL),
>   		  "pthread_create"))
>   		goto done;
>   
> -	do_dummy_read(skel->progs.dump_task_file);
> +	memset(&linfo, 0, sizeof(linfo));
> +	linfo.task.tid = getpid();
> +	opts.link_info = &linfo;
> +	opts.link_info_len = sizeof(linfo);
>   
> -	if (!ASSERT_FALSE(pthread_join(thread_id, &ret) || ret != NULL,
> -		  "pthread_join"))
> -		goto done;
> +	do_dummy_read_opts(skel->progs.dump_task_file, &opts);
> +
> +	ASSERT_EQ(skel->bss->count, 0, "check_count");
> +	ASSERT_EQ(skel->bss->unique_tgid_count, 1, "check_unique_tgid_count");
> +
> +	skel->bss->last_tgid = 0;
> +	skel->bss->count = 0;
> +	skel->bss->unique_tgid_count = 0;
> +
> +	do_dummy_read(skel->progs.dump_task_file);
>   
>   	ASSERT_EQ(skel->bss->count, 0, "check_count");
> +	ASSERT_GT(skel->bss->unique_tgid_count, 1, "check_unique_tgid_count");
> +
> +	check_bpf_link_info(skel->progs.dump_task_file);
> +
> +	ASSERT_OK(pthread_mutex_unlock(&do_nothing_mutex), "pthread_mutex_unlock");
> +	locked = false;
> +	ASSERT_OK(pthread_join(thread_id, &ret), "pthread_join");
> +	ASSERT_NULL(ret, "phtread_join");
>   
>   done:
> +	if (locked)
> +		ASSERT_OK(pthread_mutex_unlock(&do_nothing_mutex), "pthread_mutex_unlock");
>   	bpf_iter_task_file__destroy(skel);
>   }
>   
> @@ -1249,7 +1408,7 @@ static void str_strip_first_line(char *str)
>   	*dst = '\0';
>   }
>   
[...
> +
>   void test_bpf_iter(void)
>   {
> +	if (!ASSERT_OK(pthread_mutex_init(&do_nothing_mutex, NULL), "pthread_mutex_init"))
> +		return;
> +
>   	if (test__start_subtest("btf_id_or_null"))
>   		test_btf_id_or_null();
>   	if (test__start_subtest("ipv6_route"))
> @@ -1337,6 +1583,10 @@ void test_bpf_iter(void)
>   		test_bpf_map();
>   	if (test__start_subtest("task"))
>   		test_task();
> +	if (test__start_subtest("task_tgid"))
> +		test_task_tgid();
> +	if (test__start_subtest("task_pidfd"))
> +		test_task_pidfd();
>   	if (test__start_subtest("task_sleepable"))
>   		test_task_sleepable();
>   	if (test__start_subtest("task_stack"))
> @@ -1397,4 +1647,6 @@ void test_bpf_iter(void)
>   		test_ksym_iter();
>   	if (test__start_subtest("bpf_sockmap_map_iter_fd"))
>   		test_bpf_sockmap_map_iter_fd();
> +	if (test__start_subtest("uprobe_offset"))
> +		test_task_uprobe_offset();

uprobe_offset -> vma_offset? See below.

>   }
> diff --git a/tools/testing/selftests/bpf/prog_tests/btf_dump.c b/tools/testing/selftests/bpf/prog_tests/btf_dump.c
> index 5fce7008d1ff..32c34ce9cbeb 100644
> --- a/tools/testing/selftests/bpf/prog_tests/btf_dump.c
> +++ b/tools/testing/selftests/bpf/prog_tests/btf_dump.c
> @@ -764,7 +764,7 @@ static void test_btf_dump_struct_data(struct btf *btf, struct btf_dump *d,
>   
>   	/* union with nested struct */
>   	TEST_BTF_DUMP_DATA(btf, d, "union", str, union bpf_iter_link_info, BTF_F_COMPACT,
> -			   "(union bpf_iter_link_info){.map = (struct){.map_fd = (__u32)1,},}",
> +			   "(union bpf_iter_link_info){.map = (struct){.map_fd = (__u32)1,},.task = (struct){.tid = (__u32)1,},}",
>   			   { .map = { .map_fd = 1 }});
>   
>   	/* struct skb with nested structs/unions; because type output is so
> diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_task.c b/tools/testing/selftests/bpf/progs/bpf_iter_task.c
> index d22741272692..96131b9a1caa 100644
> --- a/tools/testing/selftests/bpf/progs/bpf_iter_task.c
> +++ b/tools/testing/selftests/bpf/progs/bpf_iter_task.c
> @@ -6,6 +6,10 @@
>   
>   char _license[] SEC("license") = "GPL";
>   
> +uint32_t tid = 0;
> +int num_unknown_tid = 0;
> +int num_known_tid = 0;
> +
>   SEC("iter/task")
>   int dump_task(struct bpf_iter__task *ctx)
>   {
> @@ -18,6 +22,11 @@ int dump_task(struct bpf_iter__task *ctx)
>   		return 0;
>   	}
>   
> +	if (task->pid != tid)
> +		num_unknown_tid++;
> +	else
> +		num_known_tid++;
> +
>   	if (ctx->meta->seq_num == 0)
>   		BPF_SEQ_PRINTF(seq, "    tgid      gid\n");
>   
> diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_task_file.c b/tools/testing/selftests/bpf/progs/bpf_iter_task_file.c
> index 6e7b400888fe..b0255080662d 100644
> --- a/tools/testing/selftests/bpf/progs/bpf_iter_task_file.c
> +++ b/tools/testing/selftests/bpf/progs/bpf_iter_task_file.c
> @@ -7,14 +7,16 @@ char _license[] SEC("license") = "GPL";
>   
>   int count = 0;
>   int tgid = 0;
> +int last_tgid = 0;
> +int unique_tgid_count = 0;
>   
>   SEC("iter/task_file")
>   int dump_task_file(struct bpf_iter__task_file *ctx)
>   {
>   	struct seq_file *seq = ctx->meta->seq;
>   	struct task_struct *task = ctx->task;
> -	__u32 fd = ctx->fd;
>   	struct file *file = ctx->file;
> +	__u32 fd = ctx->fd;
>   
>   	if (task == (void *)0 || file == (void *)0)
>   		return 0;
> @@ -27,6 +29,11 @@ int dump_task_file(struct bpf_iter__task_file *ctx)
>   	if (tgid == task->tgid && task->tgid != task->pid)
>   		count++;
>   
> +	if (last_tgid != task->tgid) {
> +		last_tgid = task->tgid;
> +		unique_tgid_count++;
> +	}
> +
>   	BPF_SEQ_PRINTF(seq, "%8d %8d %8d %lx\n", task->tgid, task->pid, fd,
>   		       (long)file->f_op);
>   	return 0;
> diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c b/tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c
> index 4ea6a37d1345..44f4a31c2ddd 100644
> --- a/tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c
> +++ b/tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c
> @@ -20,6 +20,7 @@ char _license[] SEC("license") = "GPL";
>   #define D_PATH_BUF_SIZE 1024
>   char d_path_buf[D_PATH_BUF_SIZE] = {};
>   __u32 pid = 0;
> +__u32 one_task = 0;
>   
>   SEC("iter/task_vma") int proc_maps(struct bpf_iter__task_vma *ctx)
>   {
> @@ -33,8 +34,11 @@ SEC("iter/task_vma") int proc_maps(struct bpf_iter__task_vma *ctx)
>   		return 0;
>   
>   	file = vma->vm_file;
> -	if (task->tgid != pid)
> +	if (task->tgid != pid) {
> +		if (one_task)
> +			BPF_SEQ_PRINTF(seq, "unexpected task (%d != %d)", task->tgid, pid);

Let us change this to an error code like
__u32 one_task_error = 0;
...
if (task->tgid != pid) {
	if (one_task)
		one_task_error = 1;
	return 0;
}
In bpf_iter.c, we can assert one_task_error must be 0?

>   		return 0;
> +	}
>   	perm_str[0] = (vma->vm_flags & VM_READ) ? 'r' : '-';
>   	perm_str[1] = (vma->vm_flags & VM_WRITE) ? 'w' : '-';
>   	perm_str[2] = (vma->vm_flags & VM_EXEC) ? 'x' : '-';
> diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_uprobe_offset.c b/tools/testing/selftests/bpf/progs/bpf_iter_uprobe_offset.c
> new file mode 100644
> index 000000000000..825ca86678bd
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/bpf_iter_uprobe_offset.c

Maybe change file name to bpf_iter_vma_offset so we know the test
is related 'iter/task_vma'? the offset can be used by uprobe, but
it can be used for other purposes, e.g., symbolization.

> @@ -0,0 +1,35 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
> +#include "bpf_iter.h"
> +#include <bpf/bpf_helpers.h>
> +
> +char _license[] SEC("license") = "GPL";
> +
> +__u32 unique_tgid_cnt = 0;
> +uintptr_t address = 0;
> +uintptr_t offset = 0;
> +__u32 last_tgid = 0;
> +__u32 pid = 0;
> +
> +SEC("iter/task_vma") int get_uprobe_offset(struct bpf_iter__task_vma *ctx)

get_vma_offset?

> +{
> +	struct vm_area_struct *vma = ctx->vma;
> +	struct seq_file *seq = ctx->meta->seq;
> +	struct task_struct *task = ctx->task;
> +
> +	if (task == NULL || vma == NULL)
> +		return 0;
> +
> +	if (last_tgid != task->tgid)
> +		unique_tgid_cnt++;
> +	last_tgid = task->tgid;
> +
> +	if (task->tgid != pid)
> +		return 0;
> +
> +	if (vma->vm_start <= address && vma->vm_end > address) {
> +		offset = address - vma->vm_start + (vma->vm_pgoff << 12);
> +		BPF_SEQ_PRINTF(seq, "OK\n");
> +	}
> +	return 0;
> +}

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH bpf-next v6 0/4] Parameterize task iterators.
  2022-08-19 22:09 [PATCH bpf-next v6 0/4] Parameterize task iterators Kui-Feng Lee
                   ` (3 preceding siblings ...)
  2022-08-19 22:09 ` [PATCH bpf-next v6 4/4] selftests/bpf: Test " Kui-Feng Lee
@ 2022-08-24 22:06 ` Andrii Nakryiko
  4 siblings, 0 replies; 17+ messages in thread
From: Andrii Nakryiko @ 2022-08-24 22:06 UTC (permalink / raw)
  To: Kui-Feng Lee; +Cc: bpf, ast, daniel, andrii, kernel-team, yhs

On Fri, Aug 19, 2022 at 3:09 PM Kui-Feng Lee <kuifeng@fb.com> wrote:
>
> Allow creating an iterator that loops through resources of one task/thread.
>
> People could only create iterators to loop through all resources of
> files, vma, and tasks in the system, even though they were interested in only the
> resources of a specific task or process.  Passing the addintional
> parameters, people can now create an iterator to go through all
> resources or only the resources of a task.
>
> Major Changes:
>
>  - Add new parameters in bpf_iter_link_info to indicate to go through
>    all tasks or to go through a specific task.
>
>  - Change the implementations of BPF iterators of vma, files, and
>    tasks to allow going through only the resources of a specific task.
>
>  - Provide the arguments of parameterized task iterators in
>    bpf_link_info.
>
> Differences from v5:
>
>  - Use user-speace tid/pid terminologies in bpf_iter_link_info *** BLURB HERE ***

BLURB HERE :)

>    bpf_link_info.
>
>  - Fix reference count
>
>  - Merge all variants to one 'u32 pid' in internal structs.
>    (bpf_iter_aux_info and bpf_iter_seq_task_common)
>
>  - Compare the result of get_uprobe_offset() with the implementation
>    with the vma iterators.
>
>  - Implement show_fdinfo.
>
> Differences from v4:
>
>  - Remove 'type' from bpf_iter_link_info and bpf_link_info.
>

We normally carry over entire change history across revisions, so we
don't have to jump through multiple links to see what happened
earlier. Please consider preserving all the changes in one place for
next revision.


> v5: https://lore.kernel.org/bpf/20220811001654.1316689-1-kuifeng@fb.com/
> v4: https://lore.kernel.org/bpf/20220809195429.1043220-1-kuifeng@fb.com/
> v3: https://lore.kernel.org/bpf/20220809063501.667610-1-kuifeng@fb.com/
> v2: https://lore.kernel.org/bpf/20220801232649.2306614-1-kuifeng@fb.com/
> v1: https://lore.kernel.org/bpf/20220726051713.840431-1-kuifeng@fb.com/
>
> Kui-Feng Lee (4):
>   bpf: Parameterize task iterators.
>   bpf: Handle bpf_link_info for the parameterized task BPF iterators.
>   bpf: Handle show_fdinfo for the parameterized task BPF iterators
>   selftests/bpf: Test parameterized task BPF iterators.
>
>  include/linux/bpf.h                           |  25 ++
>  include/uapi/linux/bpf.h                      |  12 +
>  kernel/bpf/task_iter.c                        | 142 +++++++--
>  tools/include/uapi/linux/bpf.h                |  12 +
>  .../selftests/bpf/prog_tests/bpf_iter.c       | 284 +++++++++++++++++-
>  .../selftests/bpf/prog_tests/btf_dump.c       |   2 +-
>  .../selftests/bpf/progs/bpf_iter_task.c       |   9 +
>  .../selftests/bpf/progs/bpf_iter_task_file.c  |   9 +-
>  .../selftests/bpf/progs/bpf_iter_task_vma.c   |   6 +-
>  .../bpf/progs/bpf_iter_uprobe_offset.c        |  35 +++
>  10 files changed, 493 insertions(+), 43 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_uprobe_offset.c
>
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH bpf-next v6 1/4] bpf: Parameterize task iterators.
  2022-08-19 22:09 ` [PATCH bpf-next v6 1/4] bpf: " Kui-Feng Lee
  2022-08-24 19:33   ` Yonghong Song
@ 2022-08-24 22:20   ` Andrii Nakryiko
  2022-08-25  0:16     ` Kui-Feng Lee
  1 sibling, 1 reply; 17+ messages in thread
From: Andrii Nakryiko @ 2022-08-24 22:20 UTC (permalink / raw)
  To: Kui-Feng Lee; +Cc: bpf, ast, daniel, andrii, kernel-team, yhs

On Fri, Aug 19, 2022 at 3:09 PM Kui-Feng Lee <kuifeng@fb.com> wrote:
>
> Allow creating an iterator that loops through resources of one task/thread.
>
> People could only create iterators to loop through all resources of
> files, vma, and tasks in the system, even though they were interested
> in only the resources of a specific task or process.  Passing the
> additional parameters, people can now create an iterator to go
> through all resources or only the resources of a task.
>
> Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
> ---
>  include/linux/bpf.h            |  25 +++++++
>  include/uapi/linux/bpf.h       |   6 ++
>  kernel/bpf/task_iter.c         | 116 ++++++++++++++++++++++++++-------
>  tools/include/uapi/linux/bpf.h |   6 ++
>  4 files changed, 129 insertions(+), 24 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 39bd36359c1e..59712dd917d8 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1729,8 +1729,33 @@ int bpf_obj_get_user(const char __user *pathname, int flags);
>         extern int bpf_iter_ ## target(args);                   \
>         int __init bpf_iter_ ## target(args) { return 0; }
>
> +/*
> + * The task type of iterators.
> + *
> + * For BPF task iterators, they can be parameterized with various
> + * parameters to visit only some of tasks.
> + *
> + * BPF_TASK_ITER_ALL (default)
> + *     Iterate over resources of every task.
> + *
> + * BPF_TASK_ITER_TID
> + *     Iterate over resources of a task/tid.
> + *
> + * BPF_TASK_ITER_TGID
> + *     Iterate over reosurces of evevry task of a process / task group.

typos: resources, every

> + */
> +enum bpf_iter_task_type {
> +       BPF_TASK_ITER_ALL = 0,
> +       BPF_TASK_ITER_TID,
> +       BPF_TASK_ITER_TGID,
> +};
> +

[...]

>         rcu_read_lock();
>  retry:
> -       pid = find_ge_pid(*tid, ns);
> +       pid = find_ge_pid(*tid, common->ns);
>         if (pid) {
> -               *tid = pid_nr_ns(pid, ns);
> +               *tid = pid_nr_ns(pid, common->ns);
>                 task = get_pid_task(pid, PIDTYPE_PID);
>                 if (!task) {
>                         ++*tid;
>                         goto retry;
> -               } else if (skip_if_dup_files && !thread_group_leader(task) &&
> -                          task->files == task->group_leader->files) {
> +               } else if ((skip_if_dup_files && !thread_group_leader(task) &&
> +                           task->files == task->group_leader->files) ||
> +                          (common->type == BPF_TASK_ITER_TGID &&
> +                           __task_pid_nr_ns(task, PIDTYPE_TGID, common->ns) != common->pid)) {

it gets super hard to follow this logic, would a simple helper
function to calculate this condition (and maybe some comments to
explain the logic behind these checks?) make it a bit more readable?

>                         put_task_struct(task);
>                         task = NULL;
>                         ++*tid;
> @@ -56,7 +73,7 @@ static void *task_seq_start(struct seq_file *seq, loff_t *pos)
>         struct bpf_iter_seq_task_info *info = seq->private;
>         struct task_struct *task;
>
> -       task = task_seq_get_next(info->common.ns, &info->tid, false);
> +       task = task_seq_get_next(&info->common, &info->tid, false);
>         if (!task)
>                 return NULL;
>
> @@ -73,7 +90,7 @@ static void *task_seq_next(struct seq_file *seq, void *v, loff_t *pos)
>         ++*pos;
>         ++info->tid;
>         put_task_struct((struct task_struct *)v);
> -       task = task_seq_get_next(info->common.ns, &info->tid, false);
> +       task = task_seq_get_next(&info->common, &info->tid, false);
>         if (!task)
>                 return NULL;
>
> @@ -117,6 +134,48 @@ static void task_seq_stop(struct seq_file *seq, void *v)
>                 put_task_struct((struct task_struct *)v);
>  }
>
> +static int bpf_iter_attach_task(struct bpf_prog *prog,
> +                               union bpf_iter_link_info *linfo,
> +                               struct bpf_iter_aux_info *aux)
> +{
> +       unsigned int flags;
> +       struct pid_namespace *ns;
> +       struct pid *pid;
> +       pid_t tgid;

it seems it would be simpler to first check that at most one of
tid/pid/pid_fd is set instead of making sure that aux->task.type
wasn't already set.

How about

if (!!linfo->task.tid + !!linfo->task.pid + !!linfo->task.pid_fd > 1)
    return -EINVAL;

?

> +
> +       aux->task.type = BPF_TASK_ITER_ALL;
> +       if (linfo->task.tid != 0) {
> +               aux->task.type = BPF_TASK_ITER_TID;
> +               aux->task.pid = linfo->task.tid;
> +       }
> +       if (linfo->task.pid != 0) {
> +               if (aux->task.type != BPF_TASK_ITER_ALL)
> +                       return -EINVAL;
> +
> +               aux->task.type = BPF_TASK_ITER_TGID;
> +               aux->task.pid = linfo->task.pid;
> +       }
> +       if (linfo->task.pid_fd != 0) {
> +               if (aux->task.type != BPF_TASK_ITER_ALL)
> +                       return -EINVAL;
> +
> +               aux->task.type = BPF_TASK_ITER_TGID;
> +               ns = task_active_pid_ns(current);
> +               if (IS_ERR(ns))
> +                       return PTR_ERR(ns);
> +
> +               pid = pidfd_get_pid(linfo->task.pid_fd, &flags);
> +               if (IS_ERR(pid))
> +                       return PTR_ERR(pid);
> +
> +               tgid = pid_nr_ns(pid, ns);
> +               aux->task.pid = tgid;
> +               put_pid(pid);
> +       }
> +
> +       return 0;
> +}
> +

[...]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH bpf-next v6 4/4] selftests/bpf: Test parameterized task BPF iterators.
  2022-08-19 22:09 ` [PATCH bpf-next v6 4/4] selftests/bpf: Test " Kui-Feng Lee
  2022-08-24 20:50   ` Yonghong Song
@ 2022-08-24 22:30   ` Andrii Nakryiko
  2022-08-25  1:07     ` Kui-Feng Lee
  1 sibling, 1 reply; 17+ messages in thread
From: Andrii Nakryiko @ 2022-08-24 22:30 UTC (permalink / raw)
  To: Kui-Feng Lee; +Cc: bpf, ast, daniel, andrii, kernel-team, yhs

On Fri, Aug 19, 2022 at 3:09 PM Kui-Feng Lee <kuifeng@fb.com> wrote:
>
> Test iterators of vma, files, and tasks of tasks.
>
> Ensure the API works appropriately to visit all tasks,
> tasks in a process, or a particular task.
>
> Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
> ---
>  .../selftests/bpf/prog_tests/bpf_iter.c       | 284 +++++++++++++++++-
>  .../selftests/bpf/prog_tests/btf_dump.c       |   2 +-
>  .../selftests/bpf/progs/bpf_iter_task.c       |   9 +
>  .../selftests/bpf/progs/bpf_iter_task_file.c  |   9 +-
>  .../selftests/bpf/progs/bpf_iter_task_vma.c   |   6 +-
>  .../bpf/progs/bpf_iter_uprobe_offset.c        |  35 +++
>  6 files changed, 326 insertions(+), 19 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_uprobe_offset.c
>

[...]

> -       do_dummy_read(skel->progs.dump_task);
> +       if (!ASSERT_OK(pthread_mutex_lock(&do_nothing_mutex), "pthread_mutex_lock"))
> +               goto done;
> +       locked = true;
> +
> +       if (!ASSERT_OK(pthread_create(&thread_id, NULL, &do_nothing_wait, NULL),
> +                 "pthread_create"))
> +               goto done;
> +
> +

extra empty line

> +       skel->bss->tid = getpid();
> +
> +       do_dummy_read_opts(skel->progs.dump_task, opts);
> +
> +       *num_unknown = skel->bss->num_unknown_tid;
> +       *num_known = skel->bss->num_known_tid;
> +
> +       ASSERT_OK(pthread_mutex_unlock(&do_nothing_mutex), "pthread_mutex_unlock");
> +       locked = false;
> +       ASSERT_FALSE(pthread_join(thread_id, &ret) || ret != NULL,
> +                    "pthread_join");
>
> +done:
> +       if (locked)

it's a bit of an overkill to expect and handle that
pthread_mutex_lock() might fail, I'd remove those asserts and locked
flag, just assume that lock works (if it's not, it's either test bug
and would be caught early, or something is very broken in the system
anyway)

> +               ASSERT_OK(pthread_mutex_unlock(&do_nothing_mutex), "pthread_mutex_unlock");
>         bpf_iter_task__destroy(skel);
>  }
>

[...]

>  static void test_task_sleepable(void)
>  {
>         struct bpf_iter_task *skel;
> @@ -212,15 +349,13 @@ static void test_task_stack(void)
>         bpf_iter_task_stack__destroy(skel);
>  }
>
> -static void *do_nothing(void *arg)
> -{
> -       pthread_exit(arg);
> -}
> -
>  static void test_task_file(void)
>  {
> +       DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);

DECLARE_LIBBPF_OPTS is discouraged, best to use shorter LIBBPF_OPTS

>         struct bpf_iter_task_file *skel;
> +       union bpf_iter_link_info linfo;
>         pthread_t thread_id;
> +       bool locked = false;
>         void *ret;
>
>         skel = bpf_iter_task_file__open_and_load();
> @@ -229,19 +364,43 @@ static void test_task_file(void)
>
>         skel->bss->tgid = getpid();
>
> -       if (!ASSERT_OK(pthread_create(&thread_id, NULL, &do_nothing, NULL),
> +       if (!ASSERT_OK(pthread_mutex_lock(&do_nothing_mutex), "pthread_mutex_lock"))
> +               goto done;
> +       locked = true;

same about failing mutex_lock, it shouldn't and it's fair to expect
that it won't

> +
> +       if (!ASSERT_OK(pthread_create(&thread_id, NULL, &do_nothing_wait, NULL),
>                   "pthread_create"))
>                 goto done;
>
> -       do_dummy_read(skel->progs.dump_task_file);
> +       memset(&linfo, 0, sizeof(linfo));
> +       linfo.task.tid = getpid();
> +       opts.link_info = &linfo;
> +       opts.link_info_len = sizeof(linfo);

[...]

> +       link = bpf_program__attach_iter(skel->progs.get_uprobe_offset, opts);
> +       if (!ASSERT_OK_PTR(link, "attach_iter"))
> +               return;
> +
> +       iter_fd = bpf_iter_create(bpf_link__fd(link));
> +       if (!ASSERT_GT(iter_fd, 0, "create_iter"))
> +               goto exit;
> +
> +       while ((len = read(iter_fd, buf, sizeof(buf))) > 0)
> +               ;
> +       CHECK(len < 0, "read", "read failed: %s\n", strerror(errno));

no checks, please

> +       buf[15] = 0;
> +       ASSERT_EQ(strcmp(buf, "OK\n"), 0, "strcmp");
> +
> +       ASSERT_EQ(skel->bss->offset, get_uprobe_offset(trigger_func), "offset");
> +       if (one_proc)
> +               ASSERT_EQ(skel->bss->unique_tgid_cnt, 1, "unique_tgid_count");
> +       else
> +               ASSERT_GT(skel->bss->unique_tgid_cnt, 1, "unique_tgid_count");
> +
> +       close(iter_fd);
> +
> +exit:
> +       bpf_link__destroy(link);
> +}
> +
> +static void test_task_uprobe_offset(void)
> +{
> +       LIBBPF_OPTS(bpf_iter_attach_opts, opts);
> +       union bpf_iter_link_info linfo;
> +
> +       memset(&linfo, 0, sizeof(linfo));
> +       linfo.task.pid = getpid();
> +       opts.link_info = &linfo;
> +       opts.link_info_len = sizeof(linfo);
> +
> +       test_task_uprobe_offset_common(&opts, true);
> +
> +       linfo.task.pid = 0;
> +       linfo.task.tid = getpid();
> +       test_task_uprobe_offset_common(&opts, true);
> +
> +       test_task_uprobe_offset_common(NULL, false);
> +}
> +
>  void test_bpf_iter(void)
>  {
> +       if (!ASSERT_OK(pthread_mutex_init(&do_nothing_mutex, NULL), "pthread_mutex_init"))
> +               return;
> +

ditto, too paranoid, IMO

>         if (test__start_subtest("btf_id_or_null"))
>                 test_btf_id_or_null();
>         if (test__start_subtest("ipv6_route"))

[...]

> diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_uprobe_offset.c b/tools/testing/selftests/bpf/progs/bpf_iter_uprobe_offset.c
> new file mode 100644
> index 000000000000..825ca86678bd
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/bpf_iter_uprobe_offset.c
> @@ -0,0 +1,35 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
> +#include "bpf_iter.h"
> +#include <bpf/bpf_helpers.h>
> +
> +char _license[] SEC("license") = "GPL";
> +
> +__u32 unique_tgid_cnt = 0;
> +uintptr_t address = 0;
> +uintptr_t offset = 0;
> +__u32 last_tgid = 0;
> +__u32 pid = 0;
> +
> +SEC("iter/task_vma") int get_uprobe_offset(struct bpf_iter__task_vma *ctx)

please keep SEC() on separate line

> +{
> +       struct vm_area_struct *vma = ctx->vma;
> +       struct seq_file *seq = ctx->meta->seq;
> +       struct task_struct *task = ctx->task;
> +
> +       if (task == NULL || vma == NULL)
> +               return 0;
> +
> +       if (last_tgid != task->tgid)
> +               unique_tgid_cnt++;
> +       last_tgid = task->tgid;
> +
> +       if (task->tgid != pid)
> +               return 0;
> +
> +       if (vma->vm_start <= address && vma->vm_end > address) {
> +               offset = address - vma->vm_start + (vma->vm_pgoff << 12);

it's best not to assume page_size is 4K, you can pass actual value
through global variable from user-space (we've previously fixed a
bunch of tests with fixed page_size assumption as they break some
platforms, let's not regress that)

> +               BPF_SEQ_PRINTF(seq, "OK\n");
> +       }
> +       return 0;
> +}
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH bpf-next v6 1/4] bpf: Parameterize task iterators.
  2022-08-24 22:20   ` Andrii Nakryiko
@ 2022-08-25  0:16     ` Kui-Feng Lee
  2022-08-25 21:13       ` Andrii Nakryiko
  0 siblings, 1 reply; 17+ messages in thread
From: Kui-Feng Lee @ 2022-08-25  0:16 UTC (permalink / raw)
  To: andrii.nakryiko; +Cc: daniel, Kernel Team, Yonghong Song, ast, andrii, bpf

On Wed, 2022-08-24 at 15:20 -0700, Andrii Nakryiko wrote:
> On Fri, Aug 19, 2022 at 3:09 PM Kui-Feng Lee <kuifeng@fb.com> wrote:
> > 
> > Allow creating an iterator that loops through resources of one
> > task/thread.
> > 
> > People could only create iterators to loop through all resources of
> > files, vma, and tasks in the system, even though they were
> > interested
> > in only the resources of a specific task or process.  Passing the
> > additional parameters, people can now create an iterator to go
> > through all resources or only the resources of a task.
> > 
> > Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
> > ---
> >  include/linux/bpf.h            |  25 +++++++
> >  include/uapi/linux/bpf.h       |   6 ++
> >  kernel/bpf/task_iter.c         | 116 ++++++++++++++++++++++++++---
> > ----
> >  tools/include/uapi/linux/bpf.h |   6 ++
> >  4 files changed, 129 insertions(+), 24 deletions(-)
> > 
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 39bd36359c1e..59712dd917d8 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -1729,8 +1729,33 @@ int bpf_obj_get_user(const char __user
> > *pathname, int flags);
> >         extern int bpf_iter_ ## target(args);                   \
> >         int __init bpf_iter_ ## target(args) { return 0; }
> > 
> > +/*
> > + * The task type of iterators.
> > + *
> > + * For BPF task iterators, they can be parameterized with various
> > + * parameters to visit only some of tasks.
> > + *
> > + * BPF_TASK_ITER_ALL (default)
> > + *     Iterate over resources of every task.
> > + *
> > + * BPF_TASK_ITER_TID
> > + *     Iterate over resources of a task/tid.
> > + *
> > + * BPF_TASK_ITER_TGID
> > + *     Iterate over reosurces of evevry task of a process / task
> > group.
> 
> typos: resources, every
> 
> > + */
> > +enum bpf_iter_task_type {
> > +       BPF_TASK_ITER_ALL = 0,
> > +       BPF_TASK_ITER_TID,
> > +       BPF_TASK_ITER_TGID,
> > +};
> > +
> 
> [...]
> 
> >         rcu_read_lock();
> >  retry:
> > -       pid = find_ge_pid(*tid, ns);
> > +       pid = find_ge_pid(*tid, common->ns);
> >         if (pid) {
> > -               *tid = pid_nr_ns(pid, ns);
> > +               *tid = pid_nr_ns(pid, common->ns);
> >                 task = get_pid_task(pid, PIDTYPE_PID);
> >                 if (!task) {
> >                         ++*tid;
> >                         goto retry;
> > -               } else if (skip_if_dup_files &&
> > !thread_group_leader(task) &&
> > -                          task->files == task->group_leader-
> > >files) {
> > +               } else if ((skip_if_dup_files &&
> > !thread_group_leader(task) &&
> > +                           task->files == task->group_leader-
> > >files) ||
> > +                          (common->type == BPF_TASK_ITER_TGID &&
> > +                           __task_pid_nr_ns(task, PIDTYPE_TGID,
> > common->ns) != common->pid)) {
> 
> it gets super hard to follow this logic, would a simple helper
> function to calculate this condition (and maybe some comments to
> explain the logic behind these checks?) make it a bit more readable?

!matched_task(task, common, skip_if_dup_file)?

bool matched_task(struct task_struct *task, 
                  struct bpf_iter_seq_task_common *common,
                  bool skip_if_dup_file) {
        /* Should not have the same 'files' if skip_if_dup_file is true
*/
        bool diff_files_if =
                !skip_if_dup_file ||
                (thread_group_leader(task) &&
                task->file != task->gorup_leader->fies);
        /* Should have the given tgid if the type is BPF_TASK_ITER_TGI
*/
        bool have_tgid_if =
                common->type != BPF_TASK_ITER_TGID ||
                __task_pid_nr_ns(task, PIDTYPE_TGID,
                common->ns) == common->pid;
        return diff_files_if && have_tgid_if;
}

How about this?

> 
> >                         put_task_struct(task);
> >                         task = NULL;
> >                         ++*tid;
> > @@ -56,7 +73,7 @@ static void *task_seq_start(struct seq_file *seq,
> > loff_t *pos)
> >         struct bpf_iter_seq_task_info *info = seq->private;
> >         struct task_struct *task;
> > 
> > -       task = task_seq_get_next(info->common.ns, &info->tid,
> > false);
> > +       task = task_seq_get_next(&info->common, &info->tid, false);
> >         if (!task)
> >                 return NULL;
> > 
> > @@ -73,7 +90,7 @@ static void *task_seq_next(struct seq_file *seq,
> > void *v, loff_t *pos)
> >         ++*pos;
> >         ++info->tid;
> >         put_task_struct((struct task_struct *)v);
> > -       task = task_seq_get_next(info->common.ns, &info->tid,
> > false);
> > +       task = task_seq_get_next(&info->common, &info->tid, false);
> >         if (!task)
> >                 return NULL;
> > 
> > @@ -117,6 +134,48 @@ static void task_seq_stop(struct seq_file
> > *seq, void *v)
> >                 put_task_struct((struct task_struct *)v);
> >  }
> > 
> > +static int bpf_iter_attach_task(struct bpf_prog *prog,
> > +                               union bpf_iter_link_info *linfo,
> > +                               struct bpf_iter_aux_info *aux)
> > +{
> > +       unsigned int flags;
> > +       struct pid_namespace *ns;
> > +       struct pid *pid;
> > +       pid_t tgid;
> 
> it seems it would be simpler to first check that at most one of
> tid/pid/pid_fd is set instead of making sure that aux->task.type
> wasn't already set.
> 
> How about
> 
> if (!!linfo->task.tid + !!linfo->task.pid + !!linfo->task.pid_fd > 1)
>     return -EINVAL;
> 
> ?

Agree

> 
> > +
> > +       aux->task.type = BPF_TASK_ITER_ALL;
> > +       if (linfo->task.tid != 0) {
> > +               aux->task.type = BPF_TASK_ITER_TID;
> > +               aux->task.pid = linfo->task.tid;
> > +       }
> > +       if (linfo->task.pid != 0) {
> > +               if (aux->task.type != BPF_TASK_ITER_ALL)
> > +                       return -EINVAL;
> > +
> > +               aux->task.type = BPF_TASK_ITER_TGID;
> > +               aux->task.pid = linfo->task.pid;
> > +       }
> > +       if (linfo->task.pid_fd != 0) {
> > +               if (aux->task.type != BPF_TASK_ITER_ALL)
> > +                       return -EINVAL;
> > +
> > +               aux->task.type = BPF_TASK_ITER_TGID;
> > +               ns = task_active_pid_ns(current);
> > +               if (IS_ERR(ns))
> > +                       return PTR_ERR(ns);
> > +
> > +               pid = pidfd_get_pid(linfo->task.pid_fd, &flags);
> > +               if (IS_ERR(pid))
> > +                       return PTR_ERR(pid);
> > +
> > +               tgid = pid_nr_ns(pid, ns);
> > +               aux->task.pid = tgid;
> > +               put_pid(pid);
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> 
> [...]


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH bpf-next v6 4/4] selftests/bpf: Test parameterized task BPF iterators.
  2022-08-24 22:30   ` Andrii Nakryiko
@ 2022-08-25  1:07     ` Kui-Feng Lee
  0 siblings, 0 replies; 17+ messages in thread
From: Kui-Feng Lee @ 2022-08-25  1:07 UTC (permalink / raw)
  To: andrii.nakryiko; +Cc: daniel, Kernel Team, Yonghong Song, ast, andrii, bpf

On Wed, 2022-08-24 at 15:30 -0700, Andrii Nakryiko wrote:
> On Fri, Aug 19, 2022 at 3:09 PM Kui-Feng Lee <kuifeng@fb.com> wrote:
> > 
> > Test iterators of vma, files, and tasks of tasks.
> > 
> > Ensure the API works appropriately to visit all tasks,
> > tasks in a process, or a particular task.
> > 
> > Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
> > ---
> >  .../selftests/bpf/prog_tests/bpf_iter.c       | 284
> > +++++++++++++++++-
> >  .../selftests/bpf/prog_tests/btf_dump.c       |   2 +-
> >  .../selftests/bpf/progs/bpf_iter_task.c       |   9 +
> >  .../selftests/bpf/progs/bpf_iter_task_file.c  |   9 +-
> >  .../selftests/bpf/progs/bpf_iter_task_vma.c   |   6 +-
> >  .../bpf/progs/bpf_iter_uprobe_offset.c        |  35 +++
> >  6 files changed, 326 insertions(+), 19 deletions(-)
> >  create mode 100644
> > tools/testing/selftests/bpf/progs/bpf_iter_uprobe_offset.c
> 
[...]
> > +
> >  void test_bpf_iter(void)
> >  {
> > +       if (!ASSERT_OK(pthread_mutex_init(&do_nothing_mutex, NULL),
> > "pthread_mutex_init"))
> > +               return;
> > +
> 
> ditto, too paranoid, IMO

Right, for test cases, we can ease checks.
I would like to have something like

    INFALLIBLE(pthread_mutex_init(......)...);

It will crash at the first point.

> 
> >         if (test__start_subtest("btf_id_or_null"))
> >                 test_btf_id_or_null();
> >         if (test__start_subtest("ipv6_route"))
> 
> [...]
> 
> > diff --git
> > a/tools/testing/selftests/bpf/progs/bpf_iter_uprobe_offset.c
> > b/tools/testing/selftests/bpf/progs/bpf_iter_uprobe_offset.c
> > new file mode 100644
> > index 000000000000..825ca86678bd
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/progs/bpf_iter_uprobe_offset.c
> > @@ -0,0 +1,35 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
> > +#include "bpf_iter.h"
> > +#include <bpf/bpf_helpers.h>
> > +
> > +char _license[] SEC("license") = "GPL";
> > +
> > +__u32 unique_tgid_cnt = 0;
> > +uintptr_t address = 0;
> > +uintptr_t offset = 0;
> > +__u32 last_tgid = 0;
> > +__u32 pid = 0;
> > +
> > +SEC("iter/task_vma") int get_uprobe_offset(struct
> > bpf_iter__task_vma *ctx)
> 
> please keep SEC() on separate line
> 
> > +{
> > +       struct vm_area_struct *vma = ctx->vma;
> > +       struct seq_file *seq = ctx->meta->seq;
> > +       struct task_struct *task = ctx->task;
> > +
> > +       if (task == NULL || vma == NULL)
> > +               return 0;
> > +
> > +       if (last_tgid != task->tgid)
> > +               unique_tgid_cnt++;
> > +       last_tgid = task->tgid;
> > +
> > +       if (task->tgid != pid)
> > +               return 0;
> > +
> > +       if (vma->vm_start <= address && vma->vm_end > address) {
> > +               offset = address - vma->vm_start + (vma->vm_pgoff
> > << 12);
> 
> it's best not to assume page_size is 4K, you can pass actual value
> through global variable from user-space (we've previously fixed a
> bunch of tests with fixed page_size assumption as they break some
> platforms, let's not regress that)

A getpagesize() helper would help :)

> 
> > +               BPF_SEQ_PRINTF(seq, "OK\n");
> > +       }
> > +       return 0;
> > +}
> > --
> > 2.30.2
> > 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH bpf-next v6 1/4] bpf: Parameterize task iterators.
  2022-08-25  0:16     ` Kui-Feng Lee
@ 2022-08-25 21:13       ` Andrii Nakryiko
  2022-08-26 14:49         ` Kui-Feng Lee
  0 siblings, 1 reply; 17+ messages in thread
From: Andrii Nakryiko @ 2022-08-25 21:13 UTC (permalink / raw)
  To: Kui-Feng Lee; +Cc: daniel, Kernel Team, Yonghong Song, ast, andrii, bpf

On Wed, Aug 24, 2022 at 5:16 PM Kui-Feng Lee <kuifeng@fb.com> wrote:
>
> On Wed, 2022-08-24 at 15:20 -0700, Andrii Nakryiko wrote:
> > On Fri, Aug 19, 2022 at 3:09 PM Kui-Feng Lee <kuifeng@fb.com> wrote:
> > >
> > > Allow creating an iterator that loops through resources of one
> > > task/thread.
> > >
> > > People could only create iterators to loop through all resources of
> > > files, vma, and tasks in the system, even though they were
> > > interested
> > > in only the resources of a specific task or process.  Passing the
> > > additional parameters, people can now create an iterator to go
> > > through all resources or only the resources of a task.
> > >
> > > Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
> > > ---
> > >  include/linux/bpf.h            |  25 +++++++
> > >  include/uapi/linux/bpf.h       |   6 ++
> > >  kernel/bpf/task_iter.c         | 116 ++++++++++++++++++++++++++---
> > > ----
> > >  tools/include/uapi/linux/bpf.h |   6 ++
> > >  4 files changed, 129 insertions(+), 24 deletions(-)
> > >
> > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > index 39bd36359c1e..59712dd917d8 100644
> > > --- a/include/linux/bpf.h
> > > +++ b/include/linux/bpf.h
> > > @@ -1729,8 +1729,33 @@ int bpf_obj_get_user(const char __user
> > > *pathname, int flags);
> > >         extern int bpf_iter_ ## target(args);                   \
> > >         int __init bpf_iter_ ## target(args) { return 0; }
> > >
> > > +/*
> > > + * The task type of iterators.
> > > + *
> > > + * For BPF task iterators, they can be parameterized with various
> > > + * parameters to visit only some of tasks.
> > > + *
> > > + * BPF_TASK_ITER_ALL (default)
> > > + *     Iterate over resources of every task.
> > > + *
> > > + * BPF_TASK_ITER_TID
> > > + *     Iterate over resources of a task/tid.
> > > + *
> > > + * BPF_TASK_ITER_TGID
> > > + *     Iterate over reosurces of evevry task of a process / task
> > > group.
> >
> > typos: resources, every
> >
> > > + */
> > > +enum bpf_iter_task_type {
> > > +       BPF_TASK_ITER_ALL = 0,
> > > +       BPF_TASK_ITER_TID,
> > > +       BPF_TASK_ITER_TGID,
> > > +};
> > > +
> >
> > [...]
> >
> > >         rcu_read_lock();
> > >  retry:
> > > -       pid = find_ge_pid(*tid, ns);
> > > +       pid = find_ge_pid(*tid, common->ns);
> > >         if (pid) {
> > > -               *tid = pid_nr_ns(pid, ns);
> > > +               *tid = pid_nr_ns(pid, common->ns);
> > >                 task = get_pid_task(pid, PIDTYPE_PID);
> > >                 if (!task) {
> > >                         ++*tid;
> > >                         goto retry;
> > > -               } else if (skip_if_dup_files &&
> > > !thread_group_leader(task) &&
> > > -                          task->files == task->group_leader-
> > > >files) {
> > > +               } else if ((skip_if_dup_files &&
> > > !thread_group_leader(task) &&
> > > +                           task->files == task->group_leader-
> > > >files) ||
> > > +                          (common->type == BPF_TASK_ITER_TGID &&
> > > +                           __task_pid_nr_ns(task, PIDTYPE_TGID,
> > > common->ns) != common->pid)) {
> >
> > it gets super hard to follow this logic, would a simple helper
> > function to calculate this condition (and maybe some comments to
> > explain the logic behind these checks?) make it a bit more readable?
>
> !matched_task(task, common, skip_if_dup_file)?
>
> bool matched_task(struct task_struct *task,
>                   struct bpf_iter_seq_task_common *common,
>                   bool skip_if_dup_file) {
>         /* Should not have the same 'files' if skip_if_dup_file is true
> */
>         bool diff_files_if =
>                 !skip_if_dup_file ||
>                 (thread_group_leader(task) &&
>                 task->file != task->gorup_leader->fies);
>         /* Should have the given tgid if the type is BPF_TASK_ITER_TGI
> */
>         bool have_tgid_if =
>                 common->type != BPF_TASK_ITER_TGID ||
>                 __task_pid_nr_ns(task, PIDTYPE_TGID,
>                 common->ns) == common->pid;
>         return diff_files_if && have_tgid_if;
> }
>
> How about this?
>

Hm... "matched_task" doesn't mean much, tbh, so not really. I wanted
to suggest having a separate helper just for your TGID check and call
it something more meaningful like "task_belongs_to_tgid". Can't come
up with a good name for existing dup_file check, so I'd probably keep
it as is. But also seems like there is same_thread_group() helper in
include/linux/sched/signal.h, so let's look if we can use it, it seems
like it's just comparing signal pointers (probably quite faster than
what you are doing right now).

But looking at this some more made me realize that even if we specify
pid (tgid in kernel terms) we are still going to iterate through all
the tasks, essentially. Is that right? So TGID mode isn't great for
speeding up, we should point out to users that if they want to iterate
files of the process, they probably want to use TID mode and set tid
to pid to use the early termination condition in TID.

It wasn't obvious to me until I re-read this patch like 3 times and
wrote three different replies here :)

But then I also went looking at what procfs doing for
/proc/<pid/task/* dirs. It does seem like there are faster ways to
iterate all threads of a process. See next_tid() which uses
next_thread(), etc. Can you please check those and see if we can have
faster in-process iteration?


> >
> > >                         put_task_struct(task);
> > >                         task = NULL;
> > >                         ++*tid;
> > > @@ -56,7 +73,7 @@ static void *task_seq_start(struct seq_file *seq,
> > > loff_t *pos)
> > >         struct bpf_iter_seq_task_info *info = seq->private;
> > >         struct task_struct *task;
> > >
> > > -       task = task_seq_get_next(info->common.ns, &info->tid,
> > > false);
> > > +       task = task_seq_get_next(&info->common, &info->tid, false);
> > >         if (!task)
> > >                 return NULL;
> > >

[...]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH bpf-next v6 1/4] bpf: Parameterize task iterators.
  2022-08-25 21:13       ` Andrii Nakryiko
@ 2022-08-26 14:49         ` Kui-Feng Lee
  2022-08-27  4:50           ` Andrii Nakryiko
  0 siblings, 1 reply; 17+ messages in thread
From: Kui-Feng Lee @ 2022-08-26 14:49 UTC (permalink / raw)
  To: andrii.nakryiko; +Cc: daniel, Kernel Team, Yonghong Song, ast, andrii, bpf

On Thu, 2022-08-25 at 14:13 -0700, Andrii Nakryiko wrote:
> On Wed, Aug 24, 2022 at 5:16 PM Kui-Feng Lee <kuifeng@fb.com> wrote:
> > 
> > On Wed, 2022-08-24 at 15:20 -0700, Andrii Nakryiko wrote:
> > > On Fri, Aug 19, 2022 at 3:09 PM Kui-Feng Lee <kuifeng@fb.com>
> > > wrote:
> > > > 
> > > > Allow creating an iterator that loops through resources of one
> > > > task/thread.
> > > > 
> > > > People could only create iterators to loop through all
> > > > resources of
> > > > files, vma, and tasks in the system, even though they were
> > > > interested
> > > > in only the resources of a specific task or process.  Passing
> > > > the
> > > > additional parameters, people can now create an iterator to go
> > > > through all resources or only the resources of a task.
> > > > 
> > > > Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
> > > > ---
> > > >  include/linux/bpf.h            |  25 +++++++
> > > >  include/uapi/linux/bpf.h       |   6 ++
> > > >  kernel/bpf/task_iter.c         | 116
> > > > ++++++++++++++++++++++++++---
> > > > ----
> > > >  tools/include/uapi/linux/bpf.h |   6 ++
> > > >  4 files changed, 129 insertions(+), 24 deletions(-)
> > > > 
> > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > index 39bd36359c1e..59712dd917d8 100644
> > > > --- a/include/linux/bpf.h
> > > > +++ b/include/linux/bpf.h
> > > > @@ -1729,8 +1729,33 @@ int bpf_obj_get_user(const char __user
> > > > *pathname, int flags);
> > > >         extern int bpf_iter_ ## target(args);                  
> > > > \
> > > >         int __init bpf_iter_ ## target(args) { return 0; }
> > > > 
> > > > +/*
> > > > + * The task type of iterators.
> > > > + *
> > > > + * For BPF task iterators, they can be parameterized with
> > > > various
> > > > + * parameters to visit only some of tasks.
> > > > + *
> > > > + * BPF_TASK_ITER_ALL (default)
> > > > + *     Iterate over resources of every task.
> > > > + *
> > > > + * BPF_TASK_ITER_TID
> > > > + *     Iterate over resources of a task/tid.
> > > > + *
> > > > + * BPF_TASK_ITER_TGID
> > > > + *     Iterate over reosurces of evevry task of a process /
> > > > task
> > > > group.
> > > 
> > > typos: resources, every
> > > 
> > > > + */
> > > > +enum bpf_iter_task_type {
> > > > +       BPF_TASK_ITER_ALL = 0,
> > > > +       BPF_TASK_ITER_TID,
> > > > +       BPF_TASK_ITER_TGID,
> > > > +};
> > > > +
> > > 
> > > [...]
> > > 
> > > >         rcu_read_lock();
> > > >  retry:
> > > > -       pid = find_ge_pid(*tid, ns);
> > > > +       pid = find_ge_pid(*tid, common->ns);
> > > >         if (pid) {
> > > > -               *tid = pid_nr_ns(pid, ns);
> > > > +               *tid = pid_nr_ns(pid, common->ns);
> > > >                 task = get_pid_task(pid, PIDTYPE_PID);
> > > >                 if (!task) {
> > > >                         ++*tid;
> > > >                         goto retry;
> > > > -               } else if (skip_if_dup_files &&
> > > > !thread_group_leader(task) &&
> > > > -                          task->files == task->group_leader-
> > > > > files) {
> > > > +               } else if ((skip_if_dup_files &&
> > > > !thread_group_leader(task) &&
> > > > +                           task->files == task->group_leader-
> > > > > files) ||
> > > > +                          (common->type == BPF_TASK_ITER_TGID
> > > > &&
> > > > +                           __task_pid_nr_ns(task,
> > > > PIDTYPE_TGID,
> > > > common->ns) != common->pid)) {
> > > 
> > > it gets super hard to follow this logic, would a simple helper
> > > function to calculate this condition (and maybe some comments to
> > > explain the logic behind these checks?) make it a bit more
> > > readable?
> > 
> > !matched_task(task, common, skip_if_dup_file)?
> > 
> > bool matched_task(struct task_struct *task,
> >                   struct bpf_iter_seq_task_common *common,
> >                   bool skip_if_dup_file) {
> >         /* Should not have the same 'files' if skip_if_dup_file is
> > true
> > */
> >         bool diff_files_if =
> >                 !skip_if_dup_file ||
> >                 (thread_group_leader(task) &&
> >                 task->file != task->gorup_leader->fies);
> >         /* Should have the given tgid if the type is
> > BPF_TASK_ITER_TGI
> > */
> >         bool have_tgid_if =
> >                 common->type != BPF_TASK_ITER_TGID ||
> >                 __task_pid_nr_ns(task, PIDTYPE_TGID,
> >                 common->ns) == common->pid;
> >         return diff_files_if && have_tgid_if;
> > }
> > 
> > How about this?
> > 
> 
> Hm... "matched_task" doesn't mean much, tbh, so not really. I wanted
> to suggest having a separate helper just for your TGID check and call
> it something more meaningful like "task_belongs_to_tgid". Can't come
> up with a good name for existing dup_file check, so I'd probably keep
> it as is. But also seems like there is same_thread_group() helper in
> include/linux/sched/signal.h, so let's look if we can use it, it
> seems
> like it's just comparing signal pointers (probably quite faster than
> what you are doing right now).
> 
> But looking at this some more made me realize that even if we specify
> pid (tgid in kernel terms) we are still going to iterate through all
> the tasks, essentially. Is that right? So TGID mode isn't great for
> speeding up, we should point out to users that if they want to
> iterate
> files of the process, they probably want to use TID mode and set tid
> to pid to use the early termination condition in TID.
> 
> It wasn't obvious to me until I re-read this patch like 3 times and
> wrote three different replies here :)
> 
> But then I also went looking at what procfs doing for
> /proc/<pid/task/* dirs. It does seem like there are faster ways to
> iterate all threads of a process. See next_tid() which uses
> next_thread(), etc. Can you please check those and see if we can have
> faster in-process iteration?
> 

I haven't notice this message until now.
It looks like very promise.  I will add it to next version.

> 
> > > 
> > > >                         put_task_struct(task);
> > > >                         task = NULL;
> > > >                         ++*tid;
> > > > @@ -56,7 +73,7 @@ static void *task_seq_start(struct seq_file
> > > > *seq,
> > > > loff_t *pos)
> > > >         struct bpf_iter_seq_task_info *info = seq->private;
> > > >         struct task_struct *task;
> > > > 
> > > > -       task = task_seq_get_next(info->common.ns, &info->tid,
> > > > false);
> > > > +       task = task_seq_get_next(&info->common, &info->tid,
> > > > false);
> > > >         if (!task)
> > > >                 return NULL;
> > > > 
> 
> [...]


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH bpf-next v6 1/4] bpf: Parameterize task iterators.
  2022-08-26 14:49         ` Kui-Feng Lee
@ 2022-08-27  4:50           ` Andrii Nakryiko
  0 siblings, 0 replies; 17+ messages in thread
From: Andrii Nakryiko @ 2022-08-27  4:50 UTC (permalink / raw)
  To: Kui-Feng Lee; +Cc: daniel, Kernel Team, Yonghong Song, ast, andrii, bpf

On Fri, Aug 26, 2022 at 7:49 AM Kui-Feng Lee <kuifeng@fb.com> wrote:
>
> On Thu, 2022-08-25 at 14:13 -0700, Andrii Nakryiko wrote:
> > On Wed, Aug 24, 2022 at 5:16 PM Kui-Feng Lee <kuifeng@fb.com> wrote:
> > >
> > > On Wed, 2022-08-24 at 15:20 -0700, Andrii Nakryiko wrote:
> > > > On Fri, Aug 19, 2022 at 3:09 PM Kui-Feng Lee <kuifeng@fb.com>
> > > > wrote:
> > > > >
> > > > > Allow creating an iterator that loops through resources of one
> > > > > task/thread.
> > > > >
> > > > > People could only create iterators to loop through all
> > > > > resources of
> > > > > files, vma, and tasks in the system, even though they were
> > > > > interested
> > > > > in only the resources of a specific task or process.  Passing
> > > > > the
> > > > > additional parameters, people can now create an iterator to go
> > > > > through all resources or only the resources of a task.
> > > > >
> > > > > Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
> > > > > ---
> > > > >  include/linux/bpf.h            |  25 +++++++
> > > > >  include/uapi/linux/bpf.h       |   6 ++
> > > > >  kernel/bpf/task_iter.c         | 116
> > > > > ++++++++++++++++++++++++++---
> > > > > ----
> > > > >  tools/include/uapi/linux/bpf.h |   6 ++
> > > > >  4 files changed, 129 insertions(+), 24 deletions(-)
> > > > >
> > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > > index 39bd36359c1e..59712dd917d8 100644
> > > > > --- a/include/linux/bpf.h
> > > > > +++ b/include/linux/bpf.h
> > > > > @@ -1729,8 +1729,33 @@ int bpf_obj_get_user(const char __user
> > > > > *pathname, int flags);
> > > > >         extern int bpf_iter_ ## target(args);
> > > > > \
> > > > >         int __init bpf_iter_ ## target(args) { return 0; }
> > > > >
> > > > > +/*
> > > > > + * The task type of iterators.
> > > > > + *
> > > > > + * For BPF task iterators, they can be parameterized with
> > > > > various
> > > > > + * parameters to visit only some of tasks.
> > > > > + *
> > > > > + * BPF_TASK_ITER_ALL (default)
> > > > > + *     Iterate over resources of every task.
> > > > > + *
> > > > > + * BPF_TASK_ITER_TID
> > > > > + *     Iterate over resources of a task/tid.
> > > > > + *
> > > > > + * BPF_TASK_ITER_TGID
> > > > > + *     Iterate over reosurces of evevry task of a process /
> > > > > task
> > > > > group.
> > > >
> > > > typos: resources, every
> > > >
> > > > > + */
> > > > > +enum bpf_iter_task_type {
> > > > > +       BPF_TASK_ITER_ALL = 0,
> > > > > +       BPF_TASK_ITER_TID,
> > > > > +       BPF_TASK_ITER_TGID,
> > > > > +};
> > > > > +
> > > >
> > > > [...]
> > > >
> > > > >         rcu_read_lock();
> > > > >  retry:
> > > > > -       pid = find_ge_pid(*tid, ns);
> > > > > +       pid = find_ge_pid(*tid, common->ns);
> > > > >         if (pid) {
> > > > > -               *tid = pid_nr_ns(pid, ns);
> > > > > +               *tid = pid_nr_ns(pid, common->ns);
> > > > >                 task = get_pid_task(pid, PIDTYPE_PID);
> > > > >                 if (!task) {
> > > > >                         ++*tid;
> > > > >                         goto retry;
> > > > > -               } else if (skip_if_dup_files &&
> > > > > !thread_group_leader(task) &&
> > > > > -                          task->files == task->group_leader-
> > > > > > files) {
> > > > > +               } else if ((skip_if_dup_files &&
> > > > > !thread_group_leader(task) &&
> > > > > +                           task->files == task->group_leader-
> > > > > > files) ||
> > > > > +                          (common->type == BPF_TASK_ITER_TGID
> > > > > &&
> > > > > +                           __task_pid_nr_ns(task,
> > > > > PIDTYPE_TGID,
> > > > > common->ns) != common->pid)) {
> > > >
> > > > it gets super hard to follow this logic, would a simple helper
> > > > function to calculate this condition (and maybe some comments to
> > > > explain the logic behind these checks?) make it a bit more
> > > > readable?
> > >
> > > !matched_task(task, common, skip_if_dup_file)?
> > >
> > > bool matched_task(struct task_struct *task,
> > >                   struct bpf_iter_seq_task_common *common,
> > >                   bool skip_if_dup_file) {
> > >         /* Should not have the same 'files' if skip_if_dup_file is
> > > true
> > > */
> > >         bool diff_files_if =
> > >                 !skip_if_dup_file ||
> > >                 (thread_group_leader(task) &&
> > >                 task->file != task->gorup_leader->fies);
> > >         /* Should have the given tgid if the type is
> > > BPF_TASK_ITER_TGI
> > > */
> > >         bool have_tgid_if =
> > >                 common->type != BPF_TASK_ITER_TGID ||
> > >                 __task_pid_nr_ns(task, PIDTYPE_TGID,
> > >                 common->ns) == common->pid;
> > >         return diff_files_if && have_tgid_if;
> > > }
> > >
> > > How about this?
> > >
> >
> > Hm... "matched_task" doesn't mean much, tbh, so not really. I wanted
> > to suggest having a separate helper just for your TGID check and call
> > it something more meaningful like "task_belongs_to_tgid". Can't come
> > up with a good name for existing dup_file check, so I'd probably keep
> > it as is. But also seems like there is same_thread_group() helper in
> > include/linux/sched/signal.h, so let's look if we can use it, it
> > seems
> > like it's just comparing signal pointers (probably quite faster than
> > what you are doing right now).
> >
> > But looking at this some more made me realize that even if we specify
> > pid (tgid in kernel terms) we are still going to iterate through all
> > the tasks, essentially. Is that right? So TGID mode isn't great for
> > speeding up, we should point out to users that if they want to
> > iterate
> > files of the process, they probably want to use TID mode and set tid
> > to pid to use the early termination condition in TID.
> >
> > It wasn't obvious to me until I re-read this patch like 3 times and
> > wrote three different replies here :)
> >
> > But then I also went looking at what procfs doing for
> > /proc/<pid/task/* dirs. It does seem like there are faster ways to
> > iterate all threads of a process. See next_tid() which uses
> > next_thread(), etc. Can you please check those and see if we can have
> > faster in-process iteration?
> >
>
> I haven't notice this message until now.
> It looks like very promise.  I will add it to next version.

No worries. Thanks! Hopefully there is a way to make efficient
per-process file and vma iteration.

>
> >
> > > >
> > > > >                         put_task_struct(task);
> > > > >                         task = NULL;
> > > > >                         ++*tid;
> > > > > @@ -56,7 +73,7 @@ static void *task_seq_start(struct seq_file
> > > > > *seq,
> > > > > loff_t *pos)
> > > > >         struct bpf_iter_seq_task_info *info = seq->private;
> > > > >         struct task_struct *task;
> > > > >
> > > > > -       task = task_seq_get_next(info->common.ns, &info->tid,
> > > > > false);
> > > > > +       task = task_seq_get_next(&info->common, &info->tid,
> > > > > false);
> > > > >         if (!task)
> > > > >                 return NULL;
> > > > >
> >
> > [...]
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-08-27  4:51 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-19 22:09 [PATCH bpf-next v6 0/4] Parameterize task iterators Kui-Feng Lee
2022-08-19 22:09 ` [PATCH bpf-next v6 1/4] bpf: " Kui-Feng Lee
2022-08-24 19:33   ` Yonghong Song
2022-08-24 22:20   ` Andrii Nakryiko
2022-08-25  0:16     ` Kui-Feng Lee
2022-08-25 21:13       ` Andrii Nakryiko
2022-08-26 14:49         ` Kui-Feng Lee
2022-08-27  4:50           ` Andrii Nakryiko
2022-08-19 22:09 ` [PATCH bpf-next v6 2/4] bpf: Handle bpf_link_info for the parameterized task BPF iterators Kui-Feng Lee
2022-08-24 19:41   ` Yonghong Song
2022-08-19 22:09 ` [PATCH bpf-next v6 3/4] bpf: Handle show_fdinfo " Kui-Feng Lee
2022-08-24 19:49   ` Yonghong Song
2022-08-19 22:09 ` [PATCH bpf-next v6 4/4] selftests/bpf: Test " Kui-Feng Lee
2022-08-24 20:50   ` Yonghong Song
2022-08-24 22:30   ` Andrii Nakryiko
2022-08-25  1:07     ` Kui-Feng Lee
2022-08-24 22:06 ` [PATCH bpf-next v6 0/4] Parameterize task iterators Andrii Nakryiko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).