linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] do_wait: make PIDTYPE_PID case O(1) instead of O(n)
@ 2021-03-09 20:39 Jim Newsome
  2021-03-10 17:27 ` Oleg Nesterov
  2021-03-10 22:40 ` Eric W. Biederman
  0 siblings, 2 replies; 9+ messages in thread
From: Jim Newsome @ 2021-03-09 20:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Oleg Nesterov, Eric W . Biederman, Christian Brauner,
	linux-kernel, Jim Newsome

do_wait is an internal function used to implement waitpid, waitid,
wait4, etc. To handle the general case, it does an O(n) linear scan of
the thread group's children and tracees.

This patch adds a special-case when waiting on a pid to skip these scans
and instead do an O(1) lookup. This improves performance when waiting on
a pid from a thread group with many children and/or tracees.

Signed-off-by: James Newsome <jnewsome@torproject.org>
---
 kernel/exit.c | 53 +++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 43 insertions(+), 10 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 04029e35e69a..c2438d4ba262 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1439,9 +1439,34 @@ void __wake_up_parent(struct task_struct *p, struct task_struct *parent)
 			   TASK_INTERRUPTIBLE, p);
 }
 
+// Optimization for waiting on PIDTYPE_PID. No need to iterate through child
+// and tracee lists to find the target task.
+static int do_wait_pid(struct wait_opts *wo)
+{
+	struct task_struct *target = pid_task(wo->wo_pid, PIDTYPE_PID);
+	int retval;
+
+	if (!target)
+		return 0;
+	if (current == target->real_parent ||
+	    (!(wo->wo_flags & __WNOTHREAD) &&
+	     same_thread_group(current, target->real_parent))) {
+		retval = wait_consider_task(wo, /* ptrace= */ 0, target);
+		if (retval)
+			return retval;
+	}
+	if (target->ptrace && (current == target->parent ||
+			       (!(wo->wo_flags & __WNOTHREAD) &&
+				same_thread_group(current, target->parent)))) {
+		retval = wait_consider_task(wo, /* ptrace= */ 1, target);
+		if (retval)
+			return retval;
+	}
+	return 0;
+}
+
 static long do_wait(struct wait_opts *wo)
 {
-	struct task_struct *tsk;
 	int retval;
 
 	trace_sched_process_wait(wo->wo_pid);
@@ -1463,19 +1488,27 @@ static long do_wait(struct wait_opts *wo)
 
 	set_current_state(TASK_INTERRUPTIBLE);
 	read_lock(&tasklist_lock);
-	tsk = current;
-	do {
-		retval = do_wait_thread(wo, tsk);
-		if (retval)
-			goto end;
 
-		retval = ptrace_do_wait(wo, tsk);
+	if (wo->wo_type == PIDTYPE_PID) {
+		retval = do_wait_pid(wo);
 		if (retval)
 			goto end;
+	} else {
+		struct task_struct *tsk = current;
 
-		if (wo->wo_flags & __WNOTHREAD)
-			break;
-	} while_each_thread(current, tsk);
+		do {
+			retval = do_wait_thread(wo, tsk);
+			if (retval)
+				goto end;
+
+			retval = ptrace_do_wait(wo, tsk);
+			if (retval)
+				goto end;
+
+			if (wo->wo_flags & __WNOTHREAD)
+				break;
+		} while_each_thread(current, tsk);
+	}
 	read_unlock(&tasklist_lock);
 
 notask:
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] do_wait: make PIDTYPE_PID case O(1) instead of O(n)
  2021-03-09 20:39 [PATCH v3] do_wait: make PIDTYPE_PID case O(1) instead of O(n) Jim Newsome
@ 2021-03-10 17:27 ` Oleg Nesterov
  2021-03-10 22:40 ` Eric W. Biederman
  1 sibling, 0 replies; 9+ messages in thread
From: Oleg Nesterov @ 2021-03-10 17:27 UTC (permalink / raw)
  To: Jim Newsome
  Cc: Andrew Morton, Eric W . Biederman, Christian Brauner, linux-kernel

On 03/09, Jim Newsome wrote:
>
> do_wait is an internal function used to implement waitpid, waitid,
> wait4, etc. To handle the general case, it does an O(n) linear scan of
> the thread group's children and tracees.
>
> This patch adds a special-case when waiting on a pid to skip these scans
> and instead do an O(1) lookup. This improves performance when waiting on
> a pid from a thread group with many children and/or tracees.
>
> Signed-off-by: James Newsome <jnewsome@torproject.org>

Reviewed-by: Oleg Nesterov <oleg@redhat.com>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] do_wait: make PIDTYPE_PID case O(1) instead of O(n)
  2021-03-09 20:39 [PATCH v3] do_wait: make PIDTYPE_PID case O(1) instead of O(n) Jim Newsome
  2021-03-10 17:27 ` Oleg Nesterov
@ 2021-03-10 22:40 ` Eric W. Biederman
  2021-03-11  0:14   ` Jim Newsome
  2021-03-11 15:08   ` Oleg Nesterov
  1 sibling, 2 replies; 9+ messages in thread
From: Eric W. Biederman @ 2021-03-10 22:40 UTC (permalink / raw)
  To: Jim Newsome; +Cc: Andrew Morton, Oleg Nesterov, Christian Brauner, linux-kernel

Jim Newsome <jnewsome@torproject.org> writes:

> do_wait is an internal function used to implement waitpid, waitid,
> wait4, etc. To handle the general case, it does an O(n) linear scan of
> the thread group's children and tracees.
>
> This patch adds a special-case when waiting on a pid to skip these scans
> and instead do an O(1) lookup. This improves performance when waiting on
> a pid from a thread group with many children and/or tracees.
>
> Signed-off-by: James Newsome <jnewsome@torproject.org>
> ---
>  kernel/exit.c | 53 +++++++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 43 insertions(+), 10 deletions(-)
>
> diff --git a/kernel/exit.c b/kernel/exit.c
> index 04029e35e69a..c2438d4ba262 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -1439,9 +1439,34 @@ void __wake_up_parent(struct task_struct *p, struct task_struct *parent)
>  			   TASK_INTERRUPTIBLE, p);
>  }
>  
> +// Optimization for waiting on PIDTYPE_PID. No need to iterate through child
> +// and tracee lists to find the target task.

Minor nit:  C++ style comments look very out of place in this file
            which uses old school C /* */ comment delimiters for
            all of it's block comments.
                      
> +static int do_wait_pid(struct wait_opts *wo)
> +{
> +	struct task_struct *target = pid_task(wo->wo_pid, PIDTYPE_PID);
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This is subtle change in behavior.

Today on the task->children list we only place thread group leaders.

Which means that your do_wait_pid wait for thread of someone else's
process and that is a change in behavior.

So the code either needs a thread_group_leader filter on target before
the ptrace=0 case or we need to use "pid_task(wo->wo_pid, PIDTYPE_TGID)"
and "pid_task(wo->wo_pid, PIDTYPE_PID)" for the "ptrace=1" case.

I would like to make thread_group_leaders go away so I would favor two
pid_task calls.  But either will work right now.

Eric

                                     
> +	int retval;
> +
> +	if (!target)
> +		return 0;
> +	if (current == target->real_parent ||
> +	    (!(wo->wo_flags & __WNOTHREAD) &&
> +	     same_thread_group(current, target->real_parent))) {
> +		retval = wait_consider_task(wo, /* ptrace= */ 0, target);
> +		if (retval)
> +			return retval;
> +	}
> +	if (target->ptrace && (current == target->parent ||
> +			       (!(wo->wo_flags & __WNOTHREAD) &&
> +				same_thread_group(current, target->parent)))) {
> +		retval = wait_consider_task(wo, /* ptrace= */ 1, target);
> +		if (retval)
> +			return retval;
> +	}
> +	return 0;
> +}
> +
>  static long do_wait(struct wait_opts *wo)
>  {
> -	struct task_struct *tsk;
>  	int retval;
>  
>  	trace_sched_process_wait(wo->wo_pid);
> @@ -1463,19 +1488,27 @@ static long do_wait(struct wait_opts *wo)
>  
>  	set_current_state(TASK_INTERRUPTIBLE);
>  	read_lock(&tasklist_lock);
> -	tsk = current;
> -	do {
> -		retval = do_wait_thread(wo, tsk);
> -		if (retval)
> -			goto end;
>  
> -		retval = ptrace_do_wait(wo, tsk);
> +	if (wo->wo_type == PIDTYPE_PID) {
> +		retval = do_wait_pid(wo);
>  		if (retval)
>  			goto end;
> +	} else {
> +		struct task_struct *tsk = current;
>  
> -		if (wo->wo_flags & __WNOTHREAD)
> -			break;
> -	} while_each_thread(current, tsk);
> +		do {
> +			retval = do_wait_thread(wo, tsk);
> +			if (retval)
> +				goto end;
> +
> +			retval = ptrace_do_wait(wo, tsk);
> +			if (retval)
> +				goto end;
> +
> +			if (wo->wo_flags & __WNOTHREAD)
> +				break;
> +		} while_each_thread(current, tsk);
> +	}
>  	read_unlock(&tasklist_lock);
>  
>  notask:

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] do_wait: make PIDTYPE_PID case O(1) instead of O(n)
  2021-03-10 22:40 ` Eric W. Biederman
@ 2021-03-11  0:14   ` Jim Newsome
  2021-03-11 15:15     ` Oleg Nesterov
  2021-03-11 16:30     ` Eric W. Biederman
  2021-03-11 15:08   ` Oleg Nesterov
  1 sibling, 2 replies; 9+ messages in thread
From: Jim Newsome @ 2021-03-11  0:14 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, Oleg Nesterov, Christian Brauner, linux-kernel

On 3/10/21 16:40, Eric W. Biederman wrote:
>> +// Optimization for waiting on PIDTYPE_PID. No need to iterate
through child
>> +// and tracee lists to find the target task.
>
> Minor nit:  C++ style comments look very out of place in this file
>             which uses old school C /* */ comment delimiters for
>             all of it's block comments.

Will do

>> +static int do_wait_pid(struct wait_opts *wo)
>> +{
>> +	struct task_struct *target = pid_task(wo->wo_pid, PIDTYPE_PID);
>                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> This is subtle change in behavior.
> 
> Today on the task->children list we only place thread group leaders.

Shouldn't we allow waiting on clone children if __WALL or __WCLONE is set?

This is already checked later in `eligible_child`, called from
`wait_consider_task`, so I *think* the current form should already do
the right thing. Now I'm confused though how the general path (through
`do_wait_thread`) works if clone children aren't on the task->children
list...?

(In any case it seems this will need another version with at least an
explanatory comment here)

Thanks!
-Jim

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] do_wait: make PIDTYPE_PID case O(1) instead of O(n)
  2021-03-10 22:40 ` Eric W. Biederman
  2021-03-11  0:14   ` Jim Newsome
@ 2021-03-11 15:08   ` Oleg Nesterov
  2021-03-11 16:37     ` Eric W. Biederman
  1 sibling, 1 reply; 9+ messages in thread
From: Oleg Nesterov @ 2021-03-11 15:08 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jim Newsome, Andrew Morton, Christian Brauner, linux-kernel

On 03/10, Eric W. Biederman wrote:
>
> Jim Newsome <jnewsome@torproject.org> writes:
>
> > +static int do_wait_pid(struct wait_opts *wo)
> > +{
> > +	struct task_struct *target = pid_task(wo->wo_pid, PIDTYPE_PID);
>                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> This is subtle change in behavior.
>
> Today on the task->children list we only place thread group leaders.

Aaah, yes, thanks Eric!

> So the code either needs a thread_group_leader filter on target before
> the ptrace=0 case or we need to use "pid_task(wo->wo_pid, PIDTYPE_TGID)"
> and "pid_task(wo->wo_pid, PIDTYPE_PID)" for the "ptrace=1" case.

Agreed,

> I would like to make thread_group_leaders go away

Hmm, why?

Oleg.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] do_wait: make PIDTYPE_PID case O(1) instead of O(n)
  2021-03-11  0:14   ` Jim Newsome
@ 2021-03-11 15:15     ` Oleg Nesterov
  2021-03-11 16:26       ` Jim Newsome
  2021-03-11 16:30     ` Eric W. Biederman
  1 sibling, 1 reply; 9+ messages in thread
From: Oleg Nesterov @ 2021-03-11 15:15 UTC (permalink / raw)
  To: Jim Newsome
  Cc: Eric W. Biederman, Andrew Morton, Christian Brauner, linux-kernel

On 03/10, Jim Newsome wrote:
>
> On 3/10/21 16:40, Eric W. Biederman wrote:
>
> >> +static int do_wait_pid(struct wait_opts *wo)
> >> +{
> >> +	struct task_struct *target = pid_task(wo->wo_pid, PIDTYPE_PID);
> >                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > This is subtle change in behavior.
> >
> > Today on the task->children list we only place thread group leaders.
>
> Shouldn't we allow waiting on clone children if __WALL or __WCLONE is set?

Don't confuse clone child with child's sub-thread.

Oleg.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] do_wait: make PIDTYPE_PID case O(1) instead of O(n)
  2021-03-11 15:15     ` Oleg Nesterov
@ 2021-03-11 16:26       ` Jim Newsome
  0 siblings, 0 replies; 9+ messages in thread
From: Jim Newsome @ 2021-03-11 16:26 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Eric W. Biederman, Andrew Morton, Christian Brauner, linux-kernel


On 3/11/21 09:15, Oleg Nesterov wrote:
> On 03/10, Jim Newsome wrote:
>> On 3/10/21 16:40, Eric W. Biederman wrote:
>>
>>>> +static int do_wait_pid(struct wait_opts *wo)
>>>> +{
>>>> +	struct task_struct *target = pid_task(wo->wo_pid, PIDTYPE_PID);
>>>                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>> This is subtle change in behavior.
>>>
>>> Today on the task->children list we only place thread group leaders.
>> Shouldn't we allow waiting on clone children if __WALL or __WCLONE is set?
> Don't confuse clone child with child's sub-thread.

Oops! Thanks; got it. v4 coming shortly



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] do_wait: make PIDTYPE_PID case O(1) instead of O(n)
  2021-03-11  0:14   ` Jim Newsome
  2021-03-11 15:15     ` Oleg Nesterov
@ 2021-03-11 16:30     ` Eric W. Biederman
  1 sibling, 0 replies; 9+ messages in thread
From: Eric W. Biederman @ 2021-03-11 16:30 UTC (permalink / raw)
  To: Jim Newsome; +Cc: Andrew Morton, Oleg Nesterov, Christian Brauner, linux-kernel

Jim Newsome <jnewsome@torproject.org> writes:

> On 3/10/21 16:40, Eric W. Biederman wrote:
>>> +// Optimization for waiting on PIDTYPE_PID. No need to iterate
> through child
>>> +// and tracee lists to find the target task.
>>
>> Minor nit:  C++ style comments look very out of place in this file
>>             which uses old school C /* */ comment delimiters for
>>             all of it's block comments.
>
> Will do
>
>>> +static int do_wait_pid(struct wait_opts *wo)
>>> +{
>>> +	struct task_struct *target = pid_task(wo->wo_pid, PIDTYPE_PID);
>>                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> This is subtle change in behavior.
>> 
>> Today on the task->children list we only place thread group leaders.
>
> Shouldn't we allow waiting on clone children if __WALL or __WCLONE is set?
>
> This is already checked later in `eligible_child`, called from
> `wait_consider_task`, so I *think* the current form should already do
> the right thing. Now I'm confused though how the general path (through
> `do_wait_thread`) works if clone children aren't on the task->children
> list...?
>
> (In any case it seems this will need another version with at least an
> explanatory comment here)

What I am worried about are not clone children.  AKA ordinary children
that have a different exit signal but CLONE_THREAD children that are
never put on the children list so are naturally excluded from today's
do_wait (except in the case of ptrace). These are also known as threads.

Maybe I am missing it but I don't see anything in wait_consider_task
or in the way that you are calling it that would exclude CLONE_THREAD
children for the non-ptrace case.

Eric

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] do_wait: make PIDTYPE_PID case O(1) instead of O(n)
  2021-03-11 15:08   ` Oleg Nesterov
@ 2021-03-11 16:37     ` Eric W. Biederman
  0 siblings, 0 replies; 9+ messages in thread
From: Eric W. Biederman @ 2021-03-11 16:37 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Jim Newsome, Andrew Morton, Christian Brauner, linux-kernel

Oleg Nesterov <oleg@redhat.com> writes:

> On 03/10, Eric W. Biederman wrote:
>>
>> Jim Newsome <jnewsome@torproject.org> writes:
>>
>> > +static int do_wait_pid(struct wait_opts *wo)
>> > +{
>> > +	struct task_struct *target = pid_task(wo->wo_pid, PIDTYPE_PID);
>>                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> This is subtle change in behavior.
>>
>> Today on the task->children list we only place thread group leaders.
>
> Aaah, yes, thanks Eric!
>
>> So the code either needs a thread_group_leader filter on target before
>> the ptrace=0 case or we need to use "pid_task(wo->wo_pid, PIDTYPE_TGID)"
>> and "pid_task(wo->wo_pid, PIDTYPE_PID)" for the "ptrace=1" case.
>
> Agreed,
>
>> I would like to make thread_group_leaders go away
>
> Hmm, why?

Mostly because we have class of very nasty bugs to fix because code
thinks one thread is special.

There has been and I think still is code that mishandles zombie thread
group leaders.

Particularly nasty are zombie thread group leaders after userspace has
called setresuid in a way that changes signal permissions.

Eric

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-03-11 16:38 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-09 20:39 [PATCH v3] do_wait: make PIDTYPE_PID case O(1) instead of O(n) Jim Newsome
2021-03-10 17:27 ` Oleg Nesterov
2021-03-10 22:40 ` Eric W. Biederman
2021-03-11  0:14   ` Jim Newsome
2021-03-11 15:15     ` Oleg Nesterov
2021-03-11 16:26       ` Jim Newsome
2021-03-11 16:30     ` Eric W. Biederman
2021-03-11 15:08   ` Oleg Nesterov
2021-03-11 16:37     ` Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).