linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] bpf: task_group_seq_get_next: use __next_thread()
@ 2023-08-25 16:18 Oleg Nesterov
  2023-08-25 16:19 ` [PATCH 1/6] bpf: task_group_seq_get_next: cleanup the usage of next_thread() Oleg Nesterov
                   ` (5 more replies)
  0 siblings, 6 replies; 20+ messages in thread
From: Oleg Nesterov @ 2023-08-25 16:18 UTC (permalink / raw)
  To: Andrew Morton, Yonghong Song
  Cc: Eric W. Biederman, Linus Torvalds, Daniel Borkmann, Kui-Feng Lee,
	Andrii Nakryiko, Martin KaFai Lau, bpf, linux-kernel

Compile tested, 1-5 need the review from bpf maintainers, quite possibly
I did some silly mistakes. I tried to cleanup this code because I could
not look at it, but it has other problems and imo should be rewritten.

6/6 obviously depends on

	[PATCH 1/2] introduce __next_thread(), fix next_tid() vs exec() race
	https://lore.kernel.org/all/20230824143142.GA31222@redhat.com/

which was not merged yet.

To simplify the review, this is the code after 6/6:

	static struct task_struct *task_group_seq_get_next(struct bpf_iter_seq_task_common *common,
							   u32 *tid,
							   bool skip_if_dup_files)
	{
		struct task_struct *task;
		struct pid *pid;
		u32 next_tid;

		if (!*tid) {
			/* The first time, the iterator calls this function. */
			pid = find_pid_ns(common->pid, common->ns);
			task = get_pid_task(pid, PIDTYPE_TGID);
			if (!task)
				return NULL;

			*tid = common->pid;
			common->pid_visiting = common->pid;

			return task;
		}

		/* If the control returns to user space and comes back to the
		 * kernel again, *tid and common->pid_visiting should be the
		 * same for task_seq_start() to pick up the correct task.
		 */
		if (*tid == common->pid_visiting) {
			pid = find_pid_ns(common->pid_visiting, common->ns);
			task = get_pid_task(pid, PIDTYPE_PID);

			return task;
		}

		task = find_task_by_pid_ns(common->pid_visiting, common->ns);
		if (!task)
			return NULL;

	retry:
		task = __next_thread(task);
		if (!task)
			return NULL;

		next_tid = __task_pid_nr_ns(task, PIDTYPE_PID, common->ns);
		if (!next_tid)
			goto retry;

		if (skip_if_dup_files && task->files == task->group_leader->files)
			goto retry;

		*tid = common->pid_visiting = next_tid;
		get_task_struct(task);
		return task;
	}

Oleg.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/6] bpf: task_group_seq_get_next: cleanup the usage of next_thread()
  2023-08-25 16:18 [PATCH 0/6] bpf: task_group_seq_get_next: use __next_thread() Oleg Nesterov
@ 2023-08-25 16:19 ` Oleg Nesterov
  2023-08-25 22:45   ` Yonghong Song
  2023-08-25 16:19 ` [PATCH 2/6] bpf: task_group_seq_get_next: cleanup the usage of get/put_task_struct Oleg Nesterov
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Oleg Nesterov @ 2023-08-25 16:19 UTC (permalink / raw)
  To: Andrew Morton, Yonghong Song
  Cc: Eric W. Biederman, Linus Torvalds, Daniel Borkmann, Kui-Feng Lee,
	Andrii Nakryiko, Martin KaFai Lau, bpf, linux-kernel

1. find_pid_ns() + get_pid_task() under rcu_read_lock() guarantees that we
   can safely iterate the task->thread_group list. Even if this task exits
   right after get_pid_task() (or goto retry) and pid_alive() returns 0.

   Kill the unnecessary pid_alive() check.

2. next_thread() simply can't return NULL, kill the bogus "if (!next_task)"
   check.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/bpf/task_iter.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index c4ab9d6cdbe9..4d1125108014 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -75,15 +75,8 @@ static struct task_struct *task_group_seq_get_next(struct bpf_iter_seq_task_comm
 		return NULL;
 
 retry:
-	if (!pid_alive(task)) {
-		put_task_struct(task);
-		return NULL;
-	}
-
 	next_task = next_thread(task);
 	put_task_struct(task);
-	if (!next_task)
-		return NULL;
 
 	saved_tid = *tid;
 	*tid = __task_pid_nr_ns(next_task, PIDTYPE_PID, common->ns);
-- 
2.25.1.362.g51ebf55


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/6] bpf: task_group_seq_get_next: cleanup the usage of get/put_task_struct
  2023-08-25 16:18 [PATCH 0/6] bpf: task_group_seq_get_next: use __next_thread() Oleg Nesterov
  2023-08-25 16:19 ` [PATCH 1/6] bpf: task_group_seq_get_next: cleanup the usage of next_thread() Oleg Nesterov
@ 2023-08-25 16:19 ` Oleg Nesterov
  2023-08-25 16:19 ` [PATCH 3/6] bpf: task_group_seq_get_next: fix the skip_if_dup_files check Oleg Nesterov
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 20+ messages in thread
From: Oleg Nesterov @ 2023-08-25 16:19 UTC (permalink / raw)
  To: Andrew Morton, Yonghong Song
  Cc: Eric W. Biederman, Linus Torvalds, Daniel Borkmann, Kui-Feng Lee,
	Andrii Nakryiko, Martin KaFai Lau, bpf, linux-kernel

get_pid_task() makes no sense, the code does put_task_struct() soon after.
Use find_task_by_pid_ns() instead of find_pid_ns + get_pid_task and kill
put_task_struct(), this allows to do get_task_struct() only once before
return.

While at it, kill the unnecessary "if (!pid)" check in the "if (!*tid)"
block, this matches the next usage of find_pid_ns() + get_pid_task() in
this function.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
---
 kernel/bpf/task_iter.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index 4d1125108014..1589ec3faded 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -42,9 +42,6 @@ static struct task_struct *task_group_seq_get_next(struct bpf_iter_seq_task_comm
 	if (!*tid) {
 		/* The first time, the iterator calls this function. */
 		pid = find_pid_ns(common->pid, common->ns);
-		if (!pid)
-			return NULL;
-
 		task = get_pid_task(pid, PIDTYPE_TGID);
 		if (!task)
 			return NULL;
@@ -66,17 +63,12 @@ static struct task_struct *task_group_seq_get_next(struct bpf_iter_seq_task_comm
 		return task;
 	}
 
-	pid = find_pid_ns(common->pid_visiting, common->ns);
-	if (!pid)
-		return NULL;
-
-	task = get_pid_task(pid, PIDTYPE_PID);
+	task = find_task_by_pid_ns(common->pid_visiting, common->ns);
 	if (!task)
 		return NULL;
 
 retry:
 	next_task = next_thread(task);
-	put_task_struct(task);
 
 	saved_tid = *tid;
 	*tid = __task_pid_nr_ns(next_task, PIDTYPE_PID, common->ns);
@@ -88,7 +80,6 @@ static struct task_struct *task_group_seq_get_next(struct bpf_iter_seq_task_comm
 		return NULL;
 	}
 
-	get_task_struct(next_task);
 	common->pid_visiting = *tid;
 
 	if (skip_if_dup_files && task->files == task->group_leader->files) {
@@ -96,6 +87,7 @@ static struct task_struct *task_group_seq_get_next(struct bpf_iter_seq_task_comm
 		goto retry;
 	}
 
+	get_task_struct(next_task);
 	return next_task;
 }
 
-- 
2.25.1.362.g51ebf55


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/6] bpf: task_group_seq_get_next: fix the skip_if_dup_files check
  2023-08-25 16:18 [PATCH 0/6] bpf: task_group_seq_get_next: use __next_thread() Oleg Nesterov
  2023-08-25 16:19 ` [PATCH 1/6] bpf: task_group_seq_get_next: cleanup the usage of next_thread() Oleg Nesterov
  2023-08-25 16:19 ` [PATCH 2/6] bpf: task_group_seq_get_next: cleanup the usage of get/put_task_struct Oleg Nesterov
@ 2023-08-25 16:19 ` Oleg Nesterov
  2023-08-25 17:04   ` Oleg Nesterov
  2023-08-25 22:49   ` Yonghong Song
  2023-08-25 16:19 ` [PATCH 4/6] bpf: task_group_seq_get_next: kill next_task Oleg Nesterov
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 20+ messages in thread
From: Oleg Nesterov @ 2023-08-25 16:19 UTC (permalink / raw)
  To: Andrew Morton, Yonghong Song
  Cc: Eric W. Biederman, Linus Torvalds, Daniel Borkmann, Kui-Feng Lee,
	Andrii Nakryiko, Martin KaFai Lau, bpf, linux-kernel

Unless I am notally confused it is wrong. We are going to return or
skip next_task so we need to check next_task-files, not task->files.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/bpf/task_iter.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index 1589ec3faded..2264870ae3fc 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -82,7 +82,7 @@ static struct task_struct *task_group_seq_get_next(struct bpf_iter_seq_task_comm
 
 	common->pid_visiting = *tid;
 
-	if (skip_if_dup_files && task->files == task->group_leader->files) {
+	if (skip_if_dup_files && next_task->files == next_task->group_leader->files) {
 		task = next_task;
 		goto retry;
 	}
-- 
2.25.1.362.g51ebf55


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/6] bpf: task_group_seq_get_next: kill next_task
  2023-08-25 16:18 [PATCH 0/6] bpf: task_group_seq_get_next: use __next_thread() Oleg Nesterov
                   ` (2 preceding siblings ...)
  2023-08-25 16:19 ` [PATCH 3/6] bpf: task_group_seq_get_next: fix the skip_if_dup_files check Oleg Nesterov
@ 2023-08-25 16:19 ` Oleg Nesterov
  2023-08-25 22:55   ` Yonghong Song
  2023-08-25 16:19 ` [PATCH 5/6] bpf: task_group_seq_get_next: simplify the "next tid" logic Oleg Nesterov
  2023-08-25 16:19 ` [PATCH 6/6] bpf: task_group_seq_get_next: use __next_thread() rather than next_thread() Oleg Nesterov
  5 siblings, 1 reply; 20+ messages in thread
From: Oleg Nesterov @ 2023-08-25 16:19 UTC (permalink / raw)
  To: Andrew Morton, Yonghong Song
  Cc: Eric W. Biederman, Linus Torvalds, Daniel Borkmann, Kui-Feng Lee,
	Andrii Nakryiko, Martin KaFai Lau, bpf, linux-kernel

It only adds the unnecessary confusion and compicates the "retry" code.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/bpf/task_iter.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index 2264870ae3fc..f51f476ec679 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -35,7 +35,7 @@ static struct task_struct *task_group_seq_get_next(struct bpf_iter_seq_task_comm
 						   u32 *tid,
 						   bool skip_if_dup_files)
 {
-	struct task_struct *task, *next_task;
+	struct task_struct *task;
 	struct pid *pid;
 	u32 saved_tid;
 
@@ -68,10 +68,10 @@ static struct task_struct *task_group_seq_get_next(struct bpf_iter_seq_task_comm
 		return NULL;
 
 retry:
-	next_task = next_thread(task);
+	task = next_thread(task);
 
 	saved_tid = *tid;
-	*tid = __task_pid_nr_ns(next_task, PIDTYPE_PID, common->ns);
+	*tid = __task_pid_nr_ns(task, PIDTYPE_PID, common->ns);
 	if (!*tid || *tid == common->pid) {
 		/* Run out of tasks of a process.  The tasks of a
 		 * thread_group are linked as circular linked list.
@@ -82,13 +82,11 @@ static struct task_struct *task_group_seq_get_next(struct bpf_iter_seq_task_comm
 
 	common->pid_visiting = *tid;
 
-	if (skip_if_dup_files && next_task->files == next_task->group_leader->files) {
-		task = next_task;
+	if (skip_if_dup_files && task->files == task->group_leader->files)
 		goto retry;
-	}
 
-	get_task_struct(next_task);
-	return next_task;
+	get_task_struct(task);
+	return task;
 }
 
 static struct task_struct *task_seq_get_next(struct bpf_iter_seq_task_common *common,
-- 
2.25.1.362.g51ebf55


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/6] bpf: task_group_seq_get_next: simplify the "next tid" logic
  2023-08-25 16:18 [PATCH 0/6] bpf: task_group_seq_get_next: use __next_thread() Oleg Nesterov
                   ` (3 preceding siblings ...)
  2023-08-25 16:19 ` [PATCH 4/6] bpf: task_group_seq_get_next: kill next_task Oleg Nesterov
@ 2023-08-25 16:19 ` Oleg Nesterov
  2023-08-25 22:57   ` Yonghong Song
  2023-08-25 16:19 ` [PATCH 6/6] bpf: task_group_seq_get_next: use __next_thread() rather than next_thread() Oleg Nesterov
  5 siblings, 1 reply; 20+ messages in thread
From: Oleg Nesterov @ 2023-08-25 16:19 UTC (permalink / raw)
  To: Andrew Morton, Yonghong Song
  Cc: Eric W. Biederman, Linus Torvalds, Daniel Borkmann, Kui-Feng Lee,
	Andrii Nakryiko, Martin KaFai Lau, bpf, linux-kernel

Kill saved_tid. It looks ugly to update *tid and then restore the
previous value if __task_pid_nr_ns() returns 0. Change this code
to update *tid and common->pid_visiting once before return.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/bpf/task_iter.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index f51f476ec679..7473068ed313 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -37,7 +37,7 @@ static struct task_struct *task_group_seq_get_next(struct bpf_iter_seq_task_comm
 {
 	struct task_struct *task;
 	struct pid *pid;
-	u32 saved_tid;
+	u32 next_tid;
 
 	if (!*tid) {
 		/* The first time, the iterator calls this function. */
@@ -70,21 +70,18 @@ static struct task_struct *task_group_seq_get_next(struct bpf_iter_seq_task_comm
 retry:
 	task = next_thread(task);
 
-	saved_tid = *tid;
-	*tid = __task_pid_nr_ns(task, PIDTYPE_PID, common->ns);
-	if (!*tid || *tid == common->pid) {
+	next_tid = __task_pid_nr_ns(task, PIDTYPE_PID, common->ns);
+	if (!next_tid || next_tid == common->pid) {
 		/* Run out of tasks of a process.  The tasks of a
 		 * thread_group are linked as circular linked list.
 		 */
-		*tid = saved_tid;
 		return NULL;
 	}
 
-	common->pid_visiting = *tid;
-
 	if (skip_if_dup_files && task->files == task->group_leader->files)
 		goto retry;
 
+	*tid = common->pid_visiting = next_tid;
 	get_task_struct(task);
 	return task;
 }
-- 
2.25.1.362.g51ebf55


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 6/6] bpf: task_group_seq_get_next: use __next_thread() rather than next_thread()
  2023-08-25 16:18 [PATCH 0/6] bpf: task_group_seq_get_next: use __next_thread() Oleg Nesterov
                   ` (4 preceding siblings ...)
  2023-08-25 16:19 ` [PATCH 5/6] bpf: task_group_seq_get_next: simplify the "next tid" logic Oleg Nesterov
@ 2023-08-25 16:19 ` Oleg Nesterov
  5 siblings, 0 replies; 20+ messages in thread
From: Oleg Nesterov @ 2023-08-25 16:19 UTC (permalink / raw)
  To: Andrew Morton, Yonghong Song
  Cc: Eric W. Biederman, Linus Torvalds, Daniel Borkmann, Kui-Feng Lee,
	Andrii Nakryiko, Martin KaFai Lau, bpf, linux-kernel

Lockless use of next_thread() should be avoided, task_group_seq_get_next()
is the last user, it too can return the group leader twice if it races with
mt-thread exec which changes the group->leader's pid.

Change the main loop to use __next_thread(), kill "next_tid == common->pid"
check.

__next_thread() can't loop forever, so we can also change this code to retry
if next_tid == 0.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/bpf/task_iter.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index 7473068ed313..8c847d91cdd9 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -68,15 +68,13 @@ static struct task_struct *task_group_seq_get_next(struct bpf_iter_seq_task_comm
 		return NULL;
 
 retry:
-	task = next_thread(task);
+	task = __next_thread(task);
+	if (!task)
+		return NULL;
 
 	next_tid = __task_pid_nr_ns(task, PIDTYPE_PID, common->ns);
-	if (!next_tid || next_tid == common->pid) {
-		/* Run out of tasks of a process.  The tasks of a
-		 * thread_group are linked as circular linked list.
-		 */
-		return NULL;
-	}
+	if (!next_tid)
+		goto retry;
 
 	if (skip_if_dup_files && task->files == task->group_leader->files)
 		goto retry;
-- 
2.25.1.362.g51ebf55


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/6] bpf: task_group_seq_get_next: fix the skip_if_dup_files check
  2023-08-25 16:19 ` [PATCH 3/6] bpf: task_group_seq_get_next: fix the skip_if_dup_files check Oleg Nesterov
@ 2023-08-25 17:04   ` Oleg Nesterov
  2023-08-25 22:52     ` Yonghong Song
  2023-08-25 22:49   ` Yonghong Song
  1 sibling, 1 reply; 20+ messages in thread
From: Oleg Nesterov @ 2023-08-25 17:04 UTC (permalink / raw)
  To: Andrew Morton, Yonghong Song
  Cc: Eric W. Biederman, Linus Torvalds, Daniel Borkmann, Kui-Feng Lee,
	Andrii Nakryiko, Martin KaFai Lau, bpf, linux-kernel

Forgot to mention in the changelog...

In any case this doesn't look right. ->group_leader can exit before other
threads, call exit_files(), and in this case task_group_seq_get_next() will
check task->files == NULL.

On 08/25, Oleg Nesterov wrote:
>
> Unless I am notally confused it is wrong. We are going to return or
> skip next_task so we need to check next_task-files, not task->files.
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
>  kernel/bpf/task_iter.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> index 1589ec3faded..2264870ae3fc 100644
> --- a/kernel/bpf/task_iter.c
> +++ b/kernel/bpf/task_iter.c
> @@ -82,7 +82,7 @@ static struct task_struct *task_group_seq_get_next(struct bpf_iter_seq_task_comm
>
>  	common->pid_visiting = *tid;
>
> -	if (skip_if_dup_files && task->files == task->group_leader->files) {
> +	if (skip_if_dup_files && next_task->files == next_task->group_leader->files) {
>  		task = next_task;
>  		goto retry;
>  	}
> --
> 2.25.1.362.g51ebf55


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/6] bpf: task_group_seq_get_next: cleanup the usage of next_thread()
  2023-08-25 16:19 ` [PATCH 1/6] bpf: task_group_seq_get_next: cleanup the usage of next_thread() Oleg Nesterov
@ 2023-08-25 22:45   ` Yonghong Song
  0 siblings, 0 replies; 20+ messages in thread
From: Yonghong Song @ 2023-08-25 22:45 UTC (permalink / raw)
  To: Oleg Nesterov, Andrew Morton, Yonghong Song
  Cc: Eric W. Biederman, Linus Torvalds, Daniel Borkmann, Kui-Feng Lee,
	Andrii Nakryiko, Martin KaFai Lau, bpf, linux-kernel



On 8/25/23 9:19 AM, Oleg Nesterov wrote:
> 1. find_pid_ns() + get_pid_task() under rcu_read_lock() guarantees that we
>     can safely iterate the task->thread_group list. Even if this task exits
>     right after get_pid_task() (or goto retry) and pid_alive() returns 0.
> 
>     Kill the unnecessary pid_alive() check.
> 
> 2. next_thread() simply can't return NULL, kill the bogus "if (!next_task)"
>     check.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

__unhash_process() tries to detach_pid() (pid_alive() may become false)
and ist_del_rcu(&p->thread_group). So yes, next_thread() should be safe
to grab next_task since &task->thread_group is rcu protected. So

Acked-by: Yonghong Song <yonghong.song@linux.dev>

> ---
>   kernel/bpf/task_iter.c | 7 -------
>   1 file changed, 7 deletions(-)
> 
> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> index c4ab9d6cdbe9..4d1125108014 100644
> --- a/kernel/bpf/task_iter.c
> +++ b/kernel/bpf/task_iter.c
> @@ -75,15 +75,8 @@ static struct task_struct *task_group_seq_get_next(struct bpf_iter_seq_task_comm
>   		return NULL;
>   
>   retry:
> -	if (!pid_alive(task)) {
> -		put_task_struct(task);
> -		return NULL;
> -	}
> -
>   	next_task = next_thread(task);
>   	put_task_struct(task);
> -	if (!next_task)
> -		return NULL;
>   
>   	saved_tid = *tid;
>   	*tid = __task_pid_nr_ns(next_task, PIDTYPE_PID, common->ns);

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/6] bpf: task_group_seq_get_next: fix the skip_if_dup_files check
  2023-08-25 16:19 ` [PATCH 3/6] bpf: task_group_seq_get_next: fix the skip_if_dup_files check Oleg Nesterov
  2023-08-25 17:04   ` Oleg Nesterov
@ 2023-08-25 22:49   ` Yonghong Song
  1 sibling, 0 replies; 20+ messages in thread
From: Yonghong Song @ 2023-08-25 22:49 UTC (permalink / raw)
  To: Oleg Nesterov, Andrew Morton, Yonghong Song
  Cc: Eric W. Biederman, Linus Torvalds, Daniel Borkmann, Kui-Feng Lee,
	Andrii Nakryiko, Martin KaFai Lau, bpf, linux-kernel



On 8/25/23 9:19 AM, Oleg Nesterov wrote:
> Unless I am notally confused it is wrong. We are going to return or
> skip next_task so we need to check next_task-files, not task->files.

Thanks for capturing this. This is indeed an oversight.

Acked-by: Yonghong Song <yonghong.song@linux.dev>

> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
>   kernel/bpf/task_iter.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> index 1589ec3faded..2264870ae3fc 100644
> --- a/kernel/bpf/task_iter.c
> +++ b/kernel/bpf/task_iter.c
> @@ -82,7 +82,7 @@ static struct task_struct *task_group_seq_get_next(struct bpf_iter_seq_task_comm
>   
>   	common->pid_visiting = *tid;
>   
> -	if (skip_if_dup_files && task->files == task->group_leader->files) {
> +	if (skip_if_dup_files && next_task->files == next_task->group_leader->files) {
>   		task = next_task;
>   		goto retry;
>   	}

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/6] bpf: task_group_seq_get_next: fix the skip_if_dup_files check
  2023-08-25 17:04   ` Oleg Nesterov
@ 2023-08-25 22:52     ` Yonghong Song
  2023-08-27 20:19       ` Oleg Nesterov
  0 siblings, 1 reply; 20+ messages in thread
From: Yonghong Song @ 2023-08-25 22:52 UTC (permalink / raw)
  To: Oleg Nesterov, Andrew Morton, Yonghong Song
  Cc: Eric W. Biederman, Linus Torvalds, Daniel Borkmann, Kui-Feng Lee,
	Andrii Nakryiko, Martin KaFai Lau, bpf, linux-kernel



On 8/25/23 10:04 AM, Oleg Nesterov wrote:
> Forgot to mention in the changelog...
> 
> In any case this doesn't look right. ->group_leader can exit before other
> threads, call exit_files(), and in this case task_group_seq_get_next() will
> check task->files == NULL.

It is okay. This won't be affecting correctness. We will end with
calling bpf program for 'next_task'.

> 
> On 08/25, Oleg Nesterov wrote:
>>
>> Unless I am notally confused it is wrong. We are going to return or
>> skip next_task so we need to check next_task-files, not task->files.
>>
>> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
>> ---
>>   kernel/bpf/task_iter.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
>> index 1589ec3faded..2264870ae3fc 100644
>> --- a/kernel/bpf/task_iter.c
>> +++ b/kernel/bpf/task_iter.c
>> @@ -82,7 +82,7 @@ static struct task_struct *task_group_seq_get_next(struct bpf_iter_seq_task_comm
>>
>>   	common->pid_visiting = *tid;
>>
>> -	if (skip_if_dup_files && task->files == task->group_leader->files) {
>> +	if (skip_if_dup_files && next_task->files == next_task->group_leader->files) {
>>   		task = next_task;
>>   		goto retry;
>>   	}
>> --
>> 2.25.1.362.g51ebf55
> 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/6] bpf: task_group_seq_get_next: kill next_task
  2023-08-25 16:19 ` [PATCH 4/6] bpf: task_group_seq_get_next: kill next_task Oleg Nesterov
@ 2023-08-25 22:55   ` Yonghong Song
  0 siblings, 0 replies; 20+ messages in thread
From: Yonghong Song @ 2023-08-25 22:55 UTC (permalink / raw)
  To: Oleg Nesterov, Andrew Morton, Yonghong Song
  Cc: Eric W. Biederman, Linus Torvalds, Daniel Borkmann, Kui-Feng Lee,
	Andrii Nakryiko, Martin KaFai Lau, bpf, linux-kernel



On 8/25/23 9:19 AM, Oleg Nesterov wrote:
> It only adds the unnecessary confusion and compicates the "retry" code.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Acked-by: Yonghong Song <yonghong.song@linux.dev>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/6] bpf: task_group_seq_get_next: simplify the "next tid" logic
  2023-08-25 16:19 ` [PATCH 5/6] bpf: task_group_seq_get_next: simplify the "next tid" logic Oleg Nesterov
@ 2023-08-25 22:57   ` Yonghong Song
  0 siblings, 0 replies; 20+ messages in thread
From: Yonghong Song @ 2023-08-25 22:57 UTC (permalink / raw)
  To: Oleg Nesterov, Andrew Morton, Yonghong Song
  Cc: Eric W. Biederman, Linus Torvalds, Daniel Borkmann, Kui-Feng Lee,
	Andrii Nakryiko, Martin KaFai Lau, bpf, linux-kernel



On 8/25/23 9:19 AM, Oleg Nesterov wrote:
> Kill saved_tid. It looks ugly to update *tid and then restore the
> previous value if __task_pid_nr_ns() returns 0. Change this code
> to update *tid and common->pid_visiting once before return.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Acked-by: Yonghong Song <yonghong.song@linux.dev>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/6] bpf: task_group_seq_get_next: fix the skip_if_dup_files check
  2023-08-25 22:52     ` Yonghong Song
@ 2023-08-27 20:19       ` Oleg Nesterov
  2023-08-28  1:18         ` Yonghong Song
  0 siblings, 1 reply; 20+ messages in thread
From: Oleg Nesterov @ 2023-08-27 20:19 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Andrew Morton, Yonghong Song, Eric W. Biederman, Linus Torvalds,
	Daniel Borkmann, Kui-Feng Lee, Andrii Nakryiko, Martin KaFai Lau,
	bpf, linux-kernel

On 08/25, Yonghong Song wrote:
>
> On 8/25/23 10:04 AM, Oleg Nesterov wrote:
> >Forgot to mention in the changelog...
> >
> >In any case this doesn't look right. ->group_leader can exit before other
> >threads, call exit_files(), and in this case task_group_seq_get_next() will
> >check task->files == NULL.
>
> It is okay. This won't be affecting correctness. We will end with
> calling bpf program for 'next_task'.

Well, I didn't mean it is necessarily wrong, I simply do not know.

But let's suppose that we have a thread group with the main thread M + 1000
sub-threads. In the likely case they all have the same ->files, CLONE_THREAD
without CLONE_FILES is not that common.

Let's assume the BPF_TASK_ITER_TGID case for simplicity.

Now lets look at task_file_seq_get_next() which passes skip_if_dup_files == 1
to task_seq_get_next() and thus to task_group_seq_get_next().

Now, in this case task_seq_get_next() will return non-NULL only once (OK, unless
task_file_seq_ops.stop() was called), it will return the group leader M first,
then after task_file_seq_get_next() "reports" all the fd's of M and increments
info->tid, the next task_seq_get_next(&info->tid, true) should return NULL because
of the skip_if_dup_files check in task_group_seq_get_next().

Right?

But. if the group leader M exits then M->files == NULL. And in this case
task_seq_get_next() will need to "inspect" all the sub-threads even if they all
have the same ->files pointer.

No?

Again, I am not saying this is a bug and quite possibly I misread this code, but
in any case the skip_if_dup_files logic looks sub-optimal and confusing to me.

Nevermind, please forget. This is minor even if I am right.

Thanks for rewiev!

Oleg.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/6] bpf: task_group_seq_get_next: fix the skip_if_dup_files check
  2023-08-27 20:19       ` Oleg Nesterov
@ 2023-08-28  1:18         ` Yonghong Song
  2023-08-28 10:54           ` Oleg Nesterov
  0 siblings, 1 reply; 20+ messages in thread
From: Yonghong Song @ 2023-08-28  1:18 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Yonghong Song, Eric W. Biederman, Linus Torvalds,
	Daniel Borkmann, Kui-Feng Lee, Andrii Nakryiko, Martin KaFai Lau,
	bpf, linux-kernel



On 8/27/23 1:19 PM, Oleg Nesterov wrote:
> On 08/25, Yonghong Song wrote:
>>
>> On 8/25/23 10:04 AM, Oleg Nesterov wrote:
>>> Forgot to mention in the changelog...
>>>
>>> In any case this doesn't look right. ->group_leader can exit before other
>>> threads, call exit_files(), and in this case task_group_seq_get_next() will
>>> check task->files == NULL.
>>
>> It is okay. This won't be affecting correctness. We will end with
>> calling bpf program for 'next_task'.
> 
> Well, I didn't mean it is necessarily wrong, I simply do not know.
> 
> But let's suppose that we have a thread group with the main thread M + 1000
> sub-threads. In the likely case they all have the same ->files, CLONE_THREAD
> without CLONE_FILES is not that common.
> 
> Let's assume the BPF_TASK_ITER_TGID case for simplicity.
> 
> Now lets look at task_file_seq_get_next() which passes skip_if_dup_files == 1
> to task_seq_get_next() and thus to task_group_seq_get_next().
> 
> Now, in this case task_seq_get_next() will return non-NULL only once (OK, unless
> task_file_seq_ops.stop() was called), it will return the group leader M first,
> then after task_file_seq_get_next() "reports" all the fd's of M and increments
> info->tid, the next task_seq_get_next(&info->tid, true) should return NULL because
> of the skip_if_dup_files check in task_group_seq_get_next().
> 
> Right?
> 
> But. if the group leader M exits then M->files == NULL. And in this case
> task_seq_get_next() will need to "inspect" all the sub-threads even if they all
> have the same ->files pointer.

That is correct. I do not have practical experience on how much
possibility this scenario may happen. I assume it should be very low.
If this is not the case, we might need to revisit.

> 
> No?
> 
> Again, I am not saying this is a bug and quite possibly I misread this code, but
> in any case the skip_if_dup_files logic looks sub-optimal and confusing to me.
> 
> Nevermind, please forget. This is minor even if I am right.
> 
> Thanks for rewiev!
> 
> Oleg.
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/6] bpf: task_group_seq_get_next: fix the skip_if_dup_files check
  2023-08-28  1:18         ` Yonghong Song
@ 2023-08-28 10:54           ` Oleg Nesterov
  2023-08-29  0:30             ` Yonghong Song
  0 siblings, 1 reply; 20+ messages in thread
From: Oleg Nesterov @ 2023-08-28 10:54 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Andrew Morton, Yonghong Song, Eric W. Biederman, Linus Torvalds,
	Daniel Borkmann, Kui-Feng Lee, Andrii Nakryiko, Martin KaFai Lau,
	bpf, linux-kernel

On 08/27, Yonghong Song wrote:
>
> On 8/27/23 1:19 PM, Oleg Nesterov wrote:
> >
> >But. if the group leader M exits then M->files == NULL. And in this case
> >task_seq_get_next() will need to "inspect" all the sub-threads even if they all
> >have the same ->files pointer.
>
> That is correct. I do not have practical experience on how much
> possibility this scenario may happen. I assume it should be very low.

Yes. I just tried to explain why the ->files check looks confusing to me.
Nevermind.

Could you review 6/6 as well?

Should I fold 1-5 into a single patch? I tried to document every change
and simplify the review, but I do not want to blow the git history.

Oleg.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/6] bpf: task_group_seq_get_next: fix the skip_if_dup_files check
  2023-08-28 10:54           ` Oleg Nesterov
@ 2023-08-29  0:30             ` Yonghong Song
  2023-08-30 23:54               ` Oleg Nesterov
  0 siblings, 1 reply; 20+ messages in thread
From: Yonghong Song @ 2023-08-29  0:30 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Yonghong Song, Eric W. Biederman, Linus Torvalds,
	Daniel Borkmann, Kui-Feng Lee, Andrii Nakryiko, Martin KaFai Lau,
	bpf, linux-kernel



On 8/28/23 3:54 AM, Oleg Nesterov wrote:
> On 08/27, Yonghong Song wrote:
>>
>> On 8/27/23 1:19 PM, Oleg Nesterov wrote:
>>>
>>> But. if the group leader M exits then M->files == NULL. And in this case
>>> task_seq_get_next() will need to "inspect" all the sub-threads even if they all
>>> have the same ->files pointer.
>>
>> That is correct. I do not have practical experience on how much
>> possibility this scenario may happen. I assume it should be very low.
> 
> Yes. I just tried to explain why the ->files check looks confusing to me.
> Nevermind.
> 
> Could you review 6/6 as well?

I think we can wait patch 6/6 after
    https://lore.kernel.org/all/20230824143142.GA31222@redhat.com/
is merged.

> 
> Should I fold 1-5 into a single patch? I tried to document every change
> and simplify the review, but I do not want to blow the git history.

Currently, because patch 6, the whole patch set cannot be tested by
bpf CI since it has a build failure:
   https://github.com/kernel-patches/bpf/pull/5580
I suggest you get patch 1-5 and resubmit with tag like
   "bpf-next v2"
   [Patch bpf-next v2 x/5] ...
so CI can build with different architectures and compilers to
ensure everything builds and runs fine.

> 
> Oleg.
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/6] bpf: task_group_seq_get_next: fix the skip_if_dup_files check
  2023-08-29  0:30             ` Yonghong Song
@ 2023-08-30 23:54               ` Oleg Nesterov
  2023-08-31 11:29                 ` Yonghong Song
  0 siblings, 1 reply; 20+ messages in thread
From: Oleg Nesterov @ 2023-08-30 23:54 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Andrew Morton, Yonghong Song, Eric W. Biederman, Linus Torvalds,
	Daniel Borkmann, Kui-Feng Lee, Andrii Nakryiko, Martin KaFai Lau,
	bpf, linux-kernel

On 08/28, Yonghong Song wrote:
>
> On 8/28/23 3:54 AM, Oleg Nesterov wrote:
> >
> >Could you review 6/6 as well?
>
> I think we can wait patch 6/6 after
>    https://lore.kernel.org/all/20230824143142.GA31222@redhat.com/
> is merged.

OK.

> >Should I fold 1-5 into a single patch? I tried to document every change
> >and simplify the review, but I do not want to blow the git history.
>
> Currently, because patch 6, the whole patch set cannot be tested by
> bpf CI since it has a build failure:
>   https://github.com/kernel-patches/bpf/pull/5580

Heh. I thought this is obvious. I thought you can test 1-5 without 6/6
and _review_ 6/6.

I simply can't understand how can this pull/5580 come when I specially
mentioned

	> 6/6 obviously depends on
	>
	>	[PATCH 1/2] introduce __next_thread(), fix next_tid() vs exec() race
	>	https://lore.kernel.org/all/20230824143142.GA31222@redhat.com/
	>
	> which was not merged yet.

in 0/6.

> I suggest you get patch 1-5 and resubmit with tag like
>   "bpf-next v2"
>   [Patch bpf-next v2 x/5] ...
> so CI can build with different architectures and compilers to
> ensure everything builds and runs fine.

I think we can wait for

	https://lore.kernel.org/all/20230824143142.GA31222@redhat.com/

as you suggest above, then I'll send the s/next_thread/__next_thread/
oneliner without 1-5. I no longer think it makes sense to try to cleanup
the poor task_group_seq_get_next() when IMHO the whole task_iter logic
needs the complete rewrite. Yes, yes, I know, it is very easy to blame
someone else's code, sorry can't resist ;)

The only "fix" in this series is 3/6, but this code has more serious
bugs, so I guess we can forget it.

Oleg.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/6] bpf: task_group_seq_get_next: fix the skip_if_dup_files check
  2023-08-30 23:54               ` Oleg Nesterov
@ 2023-08-31 11:29                 ` Yonghong Song
  2023-08-31 12:06                   ` Oleg Nesterov
  0 siblings, 1 reply; 20+ messages in thread
From: Yonghong Song @ 2023-08-31 11:29 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Yonghong Song, Eric W. Biederman, Linus Torvalds,
	Daniel Borkmann, Kui-Feng Lee, Andrii Nakryiko, Martin KaFai Lau,
	bpf, linux-kernel



On 8/30/23 7:54 PM, Oleg Nesterov wrote:
> On 08/28, Yonghong Song wrote:
>>
>> On 8/28/23 3:54 AM, Oleg Nesterov wrote:
>>>
>>> Could you review 6/6 as well?
>>
>> I think we can wait patch 6/6 after
>>     https://lore.kernel.org/all/20230824143142.GA31222@redhat.com/
>> is merged.
> 
> OK.
> 
>>> Should I fold 1-5 into a single patch? I tried to document every change
>>> and simplify the review, but I do not want to blow the git history.
>>
>> Currently, because patch 6, the whole patch set cannot be tested by
>> bpf CI since it has a build failure:
>>    https://github.com/kernel-patches/bpf/pull/5580
> 
> Heh. I thought this is obvious. I thought you can test 1-5 without 6/6
> and _review_ 6/6.
> 
> I simply can't understand how can this pull/5580 come when I specially
> mentioned
> 
> 	> 6/6 obviously depends on
> 	>
> 	>	[PATCH 1/2] introduce __next_thread(), fix next_tid() vs exec() race
> 	>	https://lore.kernel.org/all/20230824143142.GA31222@redhat.com/
> 	>
> 	> which was not merged yet.
> 
> in 0/6.

The process in CI for testing is fully automated, and it does
not look at commit message. That is why it takes the whole
series. This is true for all other patch set.

> 
>> I suggest you get patch 1-5 and resubmit with tag like
>>    "bpf-next v2"
>>    [Patch bpf-next v2 x/5] ...
>> so CI can build with different architectures and compilers to
>> ensure everything builds and runs fine.
> 
> I think we can wait for
> 
> 	https://lore.kernel.org/all/20230824143142.GA31222@redhat.com/
> 
> as you suggest above, then I'll send the s/next_thread/__next_thread/
> oneliner without 1-5. I no longer think it makes sense to try to cleanup
> the poor task_group_seq_get_next() when IMHO the whole task_iter logic
> needs the complete rewrite. Yes, yes, I know, it is very easy to blame
> someone else's code, sorry can't resist ;)
> 
> The only "fix" in this series is 3/6, but this code has more serious
> bugs, so I guess we can forget it.
> 
> Oleg.
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/6] bpf: task_group_seq_get_next: fix the skip_if_dup_files check
  2023-08-31 11:29                 ` Yonghong Song
@ 2023-08-31 12:06                   ` Oleg Nesterov
  0 siblings, 0 replies; 20+ messages in thread
From: Oleg Nesterov @ 2023-08-31 12:06 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Andrew Morton, Yonghong Song, Eric W. Biederman, Linus Torvalds,
	Daniel Borkmann, Kui-Feng Lee, Andrii Nakryiko, Martin KaFai Lau,
	bpf, linux-kernel

On 08/31, Yonghong Song wrote:
>
> On 8/30/23 7:54 PM, Oleg Nesterov wrote:
> >
> >I simply can't understand how can this pull/5580 come when I specially
> >mentioned
> >
> >	> 6/6 obviously depends on
> >	>
> >	>	[PATCH 1/2] introduce __next_thread(), fix next_tid() vs exec() race
> >	>	https://lore.kernel.org/all/20230824143142.GA31222@redhat.com/
> >	>
> >	> which was not merged yet.
> >
> >in 0/6.
>
> The process in CI for testing is fully automated,

Ah, OK, sorry then.

> >>I suggest you get patch 1-5 and resubmit with tag like
> >>   "bpf-next v2"
> >>   [Patch bpf-next v2 x/5] ...
> >>so CI can build with different architectures and compilers to
> >>ensure everything builds and runs fine.

OK, will do when I have time.

Thanks,

Oleg.


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2023-08-31 12:07 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-25 16:18 [PATCH 0/6] bpf: task_group_seq_get_next: use __next_thread() Oleg Nesterov
2023-08-25 16:19 ` [PATCH 1/6] bpf: task_group_seq_get_next: cleanup the usage of next_thread() Oleg Nesterov
2023-08-25 22:45   ` Yonghong Song
2023-08-25 16:19 ` [PATCH 2/6] bpf: task_group_seq_get_next: cleanup the usage of get/put_task_struct Oleg Nesterov
2023-08-25 16:19 ` [PATCH 3/6] bpf: task_group_seq_get_next: fix the skip_if_dup_files check Oleg Nesterov
2023-08-25 17:04   ` Oleg Nesterov
2023-08-25 22:52     ` Yonghong Song
2023-08-27 20:19       ` Oleg Nesterov
2023-08-28  1:18         ` Yonghong Song
2023-08-28 10:54           ` Oleg Nesterov
2023-08-29  0:30             ` Yonghong Song
2023-08-30 23:54               ` Oleg Nesterov
2023-08-31 11:29                 ` Yonghong Song
2023-08-31 12:06                   ` Oleg Nesterov
2023-08-25 22:49   ` Yonghong Song
2023-08-25 16:19 ` [PATCH 4/6] bpf: task_group_seq_get_next: kill next_task Oleg Nesterov
2023-08-25 22:55   ` Yonghong Song
2023-08-25 16:19 ` [PATCH 5/6] bpf: task_group_seq_get_next: simplify the "next tid" logic Oleg Nesterov
2023-08-25 22:57   ` Yonghong Song
2023-08-25 16:19 ` [PATCH 6/6] bpf: task_group_seq_get_next: use __next_thread() rather than next_thread() Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).