linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Mike Galbraith <efault@gmx.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Oleg Nesterov <oleg@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Pavel Emelyanov <xemul@parallels.com>,
	Cyrill Gorcunov <gorcunov@openvz.org>,
	Louis Rilling <louis.rilling@kerlabs.com>
Subject: Re: [PATCH]  Re: [RFC PATCH] namespaces: fix leak on fork() failure
Date: Fri, 04 May 2012 13:29:14 -0700	[thread overview]
Message-ID: <m1fwbfd71h.fsf@fess.ebiederm.org> (raw)
In-Reply-To: <1336150643.7502.4.camel@marge.simpson.net> (Mike Galbraith's message of "Fri, 04 May 2012 18:57:23 +0200")

Mike Galbraith <efault@gmx.de> writes:

> On Fri, 2012-05-04 at 08:36 -0700, Eric W. Biederman wrote: 
>> Mike Galbraith <efault@gmx.de> writes:
>> 
>> > On Fri, 2012-05-04 at 07:13 -0700, Eric W. Biederman wrote: 
>> >> Mike Galbraith <efault@gmx.de> writes:
>> 
>> >> Did you have HZ=100 in that kernel?  400 tasks at 100Hz all serialized
>> >> somehow and then doing synchronize_rcu at a jiffy each would account
>> >> for 4 seconds.  And the nsproxy certainly has a synchronize_rcu call.
>> >
>> > HZ=250
>> 
>> Rats.  Then non of my theories even approaches holding water.
>> 
>> >> The network namespace is comparatively heavy weight, at least in the
>> >> amount of code and other things it has to go through, so that would be
>> >> my prime suspect for those 29 seconds.  There are 2-4 synchronize_rcu
>> >> calls needed to put the loopback device.  Still we use
>> >> synchronize_rcu_expedited and that work should be out of line and all of
>> >> those calls should batch.
>> >> 
>> >> Mike is this something you are looking at a pursuing farther?
>> >
>> > Not really, but I can put it on my good intentions list.
>> 
>> About what I expected.  I just wanted to make certain I understood the
>> situation.
>> 
>> I will remember this as something weird and when I have time perhaps
>> I will investigate and track it.
>> 
>> >> I want to guess the serialization comes from waiting on children to be
>> >> reaped but the namespaces are all cleaned up in exit_notify() called
>> >> from do_exit() so that theory doesn't hold water.  The worst case
>> >> I can see is detach_pid from exit_signal running under the task list lock.
>> >> but nothing sleeps under that lock.  :(
>> >
>> > I'm up to my ears in zombies with several instances of the testcase
>> > running in parallel, so I imagine it's the same with hackbench.
>> 
>> Oh interesting.
>> 
>> > marge:/usr/local/tmp/starvation # taskset -c 3 ./hackbench -namespace& for i in 1 2 3 4 5 6 7 ; do ps ax|grep defunct|wc -l;sleep 1; done
>> > [1] 29985
>> > Running with 10*40 (== 400) tasks.
>> > 1
>> > 397
>> > 327
>> > 261
>> > 199
>> > 135
>> > 72
>> > marge:/usr/local/tmp/starvation # Time: 7.675
>> 
>> So if I read your output right the first second is spent running the
>> code and the rest of the time is spent reaping zombies.
>
> The distance between these is mighty fishy.

Yes.  1 to 2 jiffiers per iteration.

That probably puts us in:
do_wait()
   do_wait_thread()
      wait_consider_task()
         wait_task_zombie()
            release_task()

The only parts that I see that are clearly outside of the tasklist_lock are:
put_user in wait_task_zombie 
proc_flush_task in release_task
release_thread in release_task

Of those if I had to take a blind guess I would guess something in
proc_flush_task possibly kern_unmount.  That is the only bit that
should be namespace unique.

But shrug.  I have looked and I don't see anything obvious in those
code paths.

The only other possibility are schedule and signal deliver in the
syscall return path.  Perhaps there is kernel thread or a work queue
or something running on the same cpu and using all of the time and
our reaper thread only gets scheduled occasionally.  Or perhaps
it is something peculiar with the signal delivery logic.

Shrug.  I have skimmed through all of that code and I don't see anything
obvious.  I guess it would take a few more data points to figure out
where we are sleeping for a jiffy or two while we are reaping children.

Eric

> marge:~ # grep 'signalfd_cleanup ' /trace2
>           vsftpd-9628  [003] ....   712.571961: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.575717: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.579698: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.587734: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.591671: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.595695: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.599685: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.603680: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.607682: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.611692: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.615740: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.619705: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.623730: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.627748: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.631712: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.635741: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.643683: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.647685: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.651691: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.655742: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.659738: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.663738: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.667756: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.671693: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.679682: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.683694: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.687750: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.691738: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.695751: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.699740: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.703736: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.707757: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.711685: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.715689: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.719694: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.723742: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.727752: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.731695: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.739687: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.743688: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.747697: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.751689: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.755688: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.759699: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.763705: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.767754: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.771702: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.775749: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.775884: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.783754: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.787754: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.791763: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.795764: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.799755: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.807768: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.835723: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.843695: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.847752: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.851694: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.855711: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.859704: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.863751: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.867754: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.871753: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.875765: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.879706: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.883696: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.887697: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.891711: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.898493: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.911740: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.927755: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.955754: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.975771: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   712.995826: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   713.003739: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   713.003920: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   713.011710: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   713.015831: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   713.023827: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   713.031694: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   713.035715: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   713.039714: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   713.043816: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   713.047726: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   713.051818: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   713.055724: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   713.059814: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   713.063725: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   713.067824: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   713.071825: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   713.075726: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   713.079709: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   713.083814: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   713.087850: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   713.095859: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   713.099826: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   713.103830: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   713.107726: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   713.111723: signalfd_cleanup <-__cleanup_sighand
>           vsftpd-9628  [003] d...   713.115874: signalfd_cleanup <-__cleanup_sighand

  reply	other threads:[~2012-05-04 20:25 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-28  9:19 [RFC PATCH] namespaces: fix leak on fork() failure Mike Galbraith
2012-04-28 14:26 ` Oleg Nesterov
2012-04-29  4:13   ` Mike Galbraith
2012-04-29  7:57   ` Eric W. Biederman
2012-04-29  9:49     ` Mike Galbraith
2012-04-29 16:58     ` Oleg Nesterov
2012-04-30  2:59       ` Eric W. Biederman
2012-04-30  3:25         ` Mike Galbraith
2012-05-02 12:40         ` Oleg Nesterov
2012-05-02 17:37           ` Eric W. Biederman
2012-04-30  3:01       ` [PATCH] " Mike Galbraith
     [not found]         ` <m1zk9rmyh4.fsf@fess.ebiederm.org>
2012-05-01 20:42           ` Andrew Morton
2012-05-03  3:12             ` Mike Galbraith
2012-05-03 14:56               ` Mike Galbraith
2012-05-04  4:27                 ` Mike Galbraith
2012-05-04  7:55                   ` Eric W. Biederman
2012-05-04  8:34                     ` Mike Galbraith
2012-05-04  9:45                     ` Mike Galbraith
2012-05-04 14:13                       ` Eric W. Biederman
2012-05-04 14:49                         ` Mike Galbraith
2012-05-04 15:36                           ` Eric W. Biederman
2012-05-04 16:57                             ` Mike Galbraith
2012-05-04 20:29                               ` Eric W. Biederman [this message]
2012-05-05  5:56                                 ` Mike Galbraith
2012-05-05  6:08                                   ` Mike Galbraith
2012-05-05  7:12                                     ` Mike Galbraith
2012-05-05 11:37                                       ` Eric W. Biederman
2012-05-07 21:51                                       ` [PATCH] vfs: Speed up deactivate_super for non-modular filesystems Eric W. Biederman
2012-05-07 22:17                                         ` Al Viro
2012-05-07 23:56                                           ` Paul E. McKenney
2012-05-08  1:07                                             ` Eric W. Biederman
2012-05-08  4:53                                               ` Mike Galbraith
2012-05-09  7:55                                               ` Nick Piggin
2012-05-09 11:02                                                 ` Eric W. Biederman
2012-05-15  8:40                                                   ` Nick Piggin
2012-05-16  0:34                                                     ` Eric W. Biederman
2012-05-09 13:59                                                 ` Paul E. McKenney
2012-05-04  8:03                 ` [PATCH] Re: [RFC PATCH] namespaces: fix leak on fork() failure Eric W. Biederman
2012-05-04  8:19                   ` Mike Galbraith
2012-05-04  8:54                     ` Mike Galbraith
2012-05-07  0:32             ` [PATCH 0/3] pidns: Closing the pid namespace exit race Eric W. Biederman
2012-05-07  0:33               ` [PATCH 1/3] pidns: Use task_active_pid_ns in do_notify_parent Eric W. Biederman
2012-05-07  0:35               ` [PATCH 2/3] pidns: Guarantee that the pidns init will be the last pidns process reaped Eric W. Biederman
2012-05-08 22:50                 ` Andrew Morton
2012-05-16 18:39                 ` Oleg Nesterov
2012-05-16 19:34                   ` Oleg Nesterov
2012-05-16 20:54                   ` Eric W. Biederman
2012-05-17 17:00                     ` Oleg Nesterov
2012-05-17 21:46                       ` Eric W. Biederman
2012-05-18 12:39                         ` Oleg Nesterov
2012-05-19  0:03                           ` Eric W. Biederman
2012-05-21 12:44                             ` Oleg Nesterov
2012-05-22  0:16                               ` Eric W. Biederman
2012-05-22  0:20                               ` [PATCH] pidns: Guarantee that the pidns init will be the last pidns process reaped. v2 Eric W. Biederman
2012-05-22 16:54                                 ` Oleg Nesterov
2012-05-22 19:23                                 ` Andrew Morton
2012-05-23 14:52                                   ` Oleg Nesterov
2012-05-25 15:15                                     ` [PATCH -mm] pidns-guarantee-that-the-pidns-init-will-be-the-last-pidns-process-r eaped-v2-fix-fix Oleg Nesterov
2012-05-25 15:59                                       ` [PATCH -mm 0/1] pidns: find_new_reaper() can no longer switch to init_pid_ns.child_reaper Oleg Nesterov
2012-05-25 16:00                                         ` [PATCH -mm 1/1] " Oleg Nesterov
2012-05-25 21:43                                           ` Eric W. Biederman
2012-05-27 19:10                                             ` [PATCH v2 -mm 0/1] " Oleg Nesterov
2012-05-27 19:11                                               ` [PATCH v2 -mm 1/1] " Oleg Nesterov
2012-05-29  6:34                                                 ` Eric W. Biederman
2012-05-25 21:25                                       ` [PATCH -mm] pidns-guarantee-that-the-pidns-init-will-be-the-last-pidns-process-r eaped-v2-fix-fix Eric W. Biederman
2012-05-27 18:41                                         ` [PATCH -mm v2] " Oleg Nesterov
2012-05-07  0:35               ` [PATCH 3/3] pidns: Make killed children autoreap Eric W. Biederman
2012-05-08 22:51                 ` Andrew Morton
2012-04-30 13:57 ` [RFC PATCH] namespaces: fix leak on fork() failure Mike Galbraith

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m1fwbfd71h.fsf@fess.ebiederm.org \
    --to=ebiederm@xmission.com \
    --cc=akpm@linux-foundation.org \
    --cc=efault@gmx.de \
    --cc=gorcunov@openvz.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=louis.rilling@kerlabs.com \
    --cc=oleg@redhat.com \
    --cc=xemul@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).