LKML Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v2] kernel: release ptraced tasks before zap_pid_ns_processes
@ 2019-01-10 17:52 Andrei Vagin
  2019-01-10 23:37 ` Andrew Morton
  2019-02-26  9:19 ` Jiri Slaby
  0 siblings, 2 replies; 6+ messages in thread
From: Andrei Vagin @ 2019-01-10 17:52 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov
  Cc: linux-kernel, Andrei Vagin, Eric W. Biederman

Currently, exit_ptrace() adds all ptraced tasks in a dead list, than
zap_pid_ns_processes() waits all tasks in a current pidns, and only
then tasks from the dead list are released.

zap_pid_ns_processes() can stuck on waiting tasks from the dead list. In
this case, we will have one unkillable process with one or more dead
children.

Thanks to Oleg for the advice to release tasks in find_child_reaper().

Fixes: 7c8bd2322c7f ("exit: ptrace: shift "reap dead" code from exit_ptrace() to forget_original_parent()")

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
---

v2: Oleg showed that ptraced tasks can be released in
find_child_reaper(). This allows to avoid additional
write_lock/unlock(tasklist), and another list_for_each_entry_safe(dead)
loop is called only if it is actually needed.

 kernel/exit.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 2d14979577ee..5df787a497f5 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -558,12 +558,14 @@ static struct task_struct *find_alive_thread(struct task_struct *p)
 	return NULL;
 }
 
-static struct task_struct *find_child_reaper(struct task_struct *father)
+static struct task_struct *find_child_reaper(struct task_struct *father,
+						struct list_head *dead)
 	__releases(&tasklist_lock)
 	__acquires(&tasklist_lock)
 {
 	struct pid_namespace *pid_ns = task_active_pid_ns(father);
 	struct task_struct *reaper = pid_ns->child_reaper;
+	struct task_struct *p, *n;
 
 	if (likely(reaper != father))
 		return reaper;
@@ -579,6 +581,12 @@ static struct task_struct *find_child_reaper(struct task_struct *father)
 		panic("Attempted to kill init! exitcode=0x%08x\n",
 			father->signal->group_exit_code ?: father->exit_code);
 	}
+
+	list_for_each_entry_safe(p, n, dead, ptrace_entry) {
+		list_del_init(&p->ptrace_entry);
+		release_task(p);
+	}
+
 	zap_pid_ns_processes(pid_ns);
 	write_lock_irq(&tasklist_lock);
 
@@ -668,7 +676,7 @@ static void forget_original_parent(struct task_struct *father,
 		exit_ptrace(father, dead);
 
 	/* Can drop and reacquire tasklist_lock */
-	reaper = find_child_reaper(father);
+	reaper = find_child_reaper(father, dead);
 	if (list_empty(&father->children))
 		return;
 
-- 
2.17.2


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] kernel: release ptraced tasks before zap_pid_ns_processes
  2019-01-10 17:52 [PATCH v2] kernel: release ptraced tasks before zap_pid_ns_processes Andrei Vagin
@ 2019-01-10 23:37 ` Andrew Morton
  2019-01-11 15:39   ` Oleg Nesterov
  2019-02-26  9:19 ` Jiri Slaby
  1 sibling, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2019-01-10 23:37 UTC (permalink / raw)
  To: Andrei Vagin; +Cc: Oleg Nesterov, linux-kernel, Eric W. Biederman

On Thu, 10 Jan 2019 09:52:00 -0800 Andrei Vagin <avagin@gmail.com> wrote:

> Currently, exit_ptrace() adds all ptraced tasks in a dead list, than
> zap_pid_ns_processes() waits all tasks in a current pidns, and only
> then tasks from the dead list are released.
> 
> zap_pid_ns_processes() can stuck on waiting tasks from the dead list. In
> this case, we will have one unkillable process with one or more dead
> children.
> 
> Thanks to Oleg for the advice to release tasks in find_child_reaper().
> 
> Fixes: 7c8bd2322c7f ("exit: ptrace: shift "reap dead" code from exit_ptrace() to forget_original_parent()")
> 
> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> Signed-off-by: Andrei Vagin <avagin@gmail.com>

Does this warrant a -stable backport?  7c8bd2322c7f was 4+ years ago...

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] kernel: release ptraced tasks before zap_pid_ns_processes
  2019-01-10 23:37 ` Andrew Morton
@ 2019-01-11 15:39   ` Oleg Nesterov
  0 siblings, 0 replies; 6+ messages in thread
From: Oleg Nesterov @ 2019-01-11 15:39 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Andrei Vagin, linux-kernel, Eric W. Biederman

On 01/10, Andrew Morton wrote:
>
> > Fixes: 7c8bd2322c7f ("exit: ptrace: shift "reap dead" code from exit_ptrace() to forget_original_parent()")
> >
> > Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> > Signed-off-by: Andrei Vagin <avagin@gmail.com>
>
> Does this warrant a -stable backport?  7c8bd2322c7f was 4+ years ago...

Agreed, the problem is trivially reproducible.

Oleg.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] kernel: release ptraced tasks before zap_pid_ns_processes
  2019-01-10 17:52 [PATCH v2] kernel: release ptraced tasks before zap_pid_ns_processes Andrei Vagin
  2019-01-10 23:37 ` Andrew Morton
@ 2019-02-26  9:19 ` Jiri Slaby
  2019-02-26 15:30   ` Oleg Nesterov
  1 sibling, 1 reply; 6+ messages in thread
From: Jiri Slaby @ 2019-02-26  9:19 UTC (permalink / raw)
  To: Andrei Vagin, Andrew Morton, Oleg Nesterov
  Cc: linux-kernel, Eric W. Biederman

On 10. 01. 19, 18:52, Andrei Vagin wrote:
> Currently, exit_ptrace() adds all ptraced tasks in a dead list, than
> zap_pid_ns_processes() waits all tasks in a current pidns, and only
> then tasks from the dead list are released.
> 
> zap_pid_ns_processes() can stuck on waiting tasks from the dead list. In
> this case, we will have one unkillable process with one or more dead
> children.
> 
> Thanks to Oleg for the advice to release tasks in find_child_reaper().
> 
> Fixes: 7c8bd2322c7f ("exit: ptrace: shift "reap dead" code from exit_ptrace() to forget_original_parent()")
> 
> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> Signed-off-by: Andrei Vagin <avagin@gmail.com>
> ---
> 
> v2: Oleg showed that ptraced tasks can be released in
> find_child_reaper(). This allows to avoid additional
> write_lock/unlock(tasklist), and another list_for_each_entry_safe(dead)
> loop is called only if it is actually needed.
> 
>  kernel/exit.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/exit.c b/kernel/exit.c
> index 2d14979577ee..5df787a497f5 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -558,12 +558,14 @@ static struct task_struct *find_alive_thread(struct task_struct *p)
>  	return NULL;
>  }
>  
> -static struct task_struct *find_child_reaper(struct task_struct *father)
> +static struct task_struct *find_child_reaper(struct task_struct *father,
> +						struct list_head *dead)
>  	__releases(&tasklist_lock)
>  	__acquires(&tasklist_lock)
>  {
>  	struct pid_namespace *pid_ns = task_active_pid_ns(father);
>  	struct task_struct *reaper = pid_ns->child_reaper;
> +	struct task_struct *p, *n;
>  
>  	if (likely(reaper != father))
>  		return reaper;
> @@ -579,6 +581,12 @@ static struct task_struct *find_child_reaper(struct task_struct *father)
>  		panic("Attempted to kill init! exitcode=0x%08x\n",
>  			father->signal->group_exit_code ?: father->exit_code);
>  	}
> +
> +	list_for_each_entry_safe(p, n, dead, ptrace_entry) {
> +		list_del_init(&p->ptrace_entry);
> +		release_task(p);
> +	}
> +

Hi,

from our (SUSE) QA we received a report that this patch causes a
performance decline in libmicro pthread_* benchmark as reported in:
https://bugzilla.suse.com/show_bug.cgi?id=1126762

I tried myself from the repo:
https://github.com/redhat-performance/libMicro

I ran
pthread_create -B 8 -C 200 -S

and with the patch applied:
# STATISTICS       usecs/call (raw)          usecs/call (outliers removed)
#                   mean     23.38611                17.29311

Without:
#                   mean     41.36539                39.21347

The values vary, but they are around 23 and 42, respectively.

The benchmark seems to create 8 (-B above) pthreads, does lock/unlock in
them and then the threads exit. The benchmark reaps the threads via
pthread_join. This all happens 200 times (-C above).

Any idea how to restore the performance close to the previous state?

>  	zap_pid_ns_processes(pid_ns);
>  	write_lock_irq(&tasklist_lock);
>  
> @@ -668,7 +676,7 @@ static void forget_original_parent(struct task_struct *father,
>  		exit_ptrace(father, dead);
>  
>  	/* Can drop and reacquire tasklist_lock */
> -	reaper = find_child_reaper(father);
> +	reaper = find_child_reaper(father, dead);
>  	if (list_empty(&father->children))
>  		return;

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] kernel: release ptraced tasks before zap_pid_ns_processes
  2019-02-26  9:19 ` Jiri Slaby
@ 2019-02-26 15:30   ` Oleg Nesterov
  2019-02-27  8:03     ` Jiri Slaby
  0 siblings, 1 reply; 6+ messages in thread
From: Oleg Nesterov @ 2019-02-26 15:30 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: Andrei Vagin, Andrew Morton, linux-kernel, Eric W. Biederman

On 02/26, Jiri Slaby wrote:
>
> On 10. 01. 19, 18:52, Andrei Vagin wrote:
> > --- a/kernel/exit.c
> > +++ b/kernel/exit.c
> > @@ -558,12 +558,14 @@ static struct task_struct *find_alive_thread(struct task_struct *p)
> >  	return NULL;
> >  }
> >
> > -static struct task_struct *find_child_reaper(struct task_struct *father)
> > +static struct task_struct *find_child_reaper(struct task_struct *father,
> > +						struct list_head *dead)
> >  	__releases(&tasklist_lock)
> >  	__acquires(&tasklist_lock)
> >  {
> >  	struct pid_namespace *pid_ns = task_active_pid_ns(father);
> >  	struct task_struct *reaper = pid_ns->child_reaper;
> > +	struct task_struct *p, *n;
> >
> >  	if (likely(reaper != father))
> >  		return reaper;
> > @@ -579,6 +581,12 @@ static struct task_struct *find_child_reaper(struct task_struct *father)
> >  		panic("Attempted to kill init! exitcode=0x%08x\n",
> >  			father->signal->group_exit_code ?: father->exit_code);
> >  	}
> > +
> > +	list_for_each_entry_safe(p, n, dead, ptrace_entry) {
> > +		list_del_init(&p->ptrace_entry);
> > +		release_task(p);
> > +	}
> > +
>
> Hi,
>
> from our (SUSE) QA we received a report that this patch causes a
> performance decline in libmicro pthread_* benchmark as reported in:
> https://bugzilla.suse.com/show_bug.cgi?id=1126762

Access Denied

> I tried myself from the repo:
> https://github.com/redhat-performance/libMicro
>
> I ran
> pthread_create -B 8 -C 200 -S
>
> and with the patch applied:
> # STATISTICS       usecs/call (raw)          usecs/call (outliers removed)
> #                   mean     23.38611                17.29311
>
> Without:
> #                   mean     41.36539                39.21347

can't reproduce, I see the same numbers with or without this patch.
However, I did "./bin/pthread_create -B 8 -C 200 -S" under KVM.

> The benchmark seems to create 8 (-B above) pthreads, does lock/unlock in
> them and then the threads exit. The benchmark reaps the threads via
> pthread_join. This all happens 200 times (-C above).

Given that this test-case doesn't use CLONE_PID, I fail to understand how
this patch can make any noticeable difference performance wise...

with this patch forget_original_parent() just passes the additional argument
to find_child_reaper(), nothing else.

The extra list_for_each_entry_safe/release_task loop can't happen, and even
if it could it shouldn't cause any performance regression too.

> Any idea how to restore the performance close to the previous state?

maybe you can try perf to find out where does this difference come from?

Oleg.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] kernel: release ptraced tasks before zap_pid_ns_processes
  2019-02-26 15:30   ` Oleg Nesterov
@ 2019-02-27  8:03     ` Jiri Slaby
  0 siblings, 0 replies; 6+ messages in thread
From: Jiri Slaby @ 2019-02-27  8:03 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrei Vagin, Andrew Morton, linux-kernel, Eric W. Biederman

On 26. 02. 19, 16:30, Oleg Nesterov wrote:
>> from our (SUSE) QA we received a report that this patch causes a
>> performance decline in libmicro pthread_* benchmark as reported in:
>> https://bugzilla.suse.com/show_bug.cgi?id=1126762
> 
> Access Denied
> 
>> I tried myself from the repo:
>> https://github.com/redhat-performance/libMicro
>>
>> I ran
>> pthread_create -B 8 -C 200 -S
>>
>> and with the patch applied:
>> # STATISTICS       usecs/call (raw)          usecs/call (outliers removed)
>> #                   mean     23.38611                17.29311
>>
>> Without:
>> #                   mean     41.36539                39.21347
> 
> can't reproduce, I see the same numbers with or without this patch.
> However, I did "./bin/pthread_create -B 8 -C 200 -S" under KVM.

Correct. I did the tests also under KVM. The above difference happens
when you compare (by stupid mistake) a LOCKDEP and non-LOCKDEP kernel --
sorry. So the proper results are comparably the same -- the difference
is on the noise level. So it must be their setup broken too.

Thanks for the input anyway.

-- 
js
suse labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, back to index

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-10 17:52 [PATCH v2] kernel: release ptraced tasks before zap_pid_ns_processes Andrei Vagin
2019-01-10 23:37 ` Andrew Morton
2019-01-11 15:39   ` Oleg Nesterov
2019-02-26  9:19 ` Jiri Slaby
2019-02-26 15:30   ` Oleg Nesterov
2019-02-27  8:03     ` Jiri Slaby

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org linux-kernel@archiver.kernel.org
	public-inbox-index lkml


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox