From: Oleg Nesterov <oleg@redhat.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Aaron Tomlin <atomlin@redhat.com>,
Pavel Emelyanov <xemul@parallels.com>,
Serge Hallyn <serge.hallyn@ubuntu.com>,
Sterling Alexander <stalexan@redhat.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] exit: reparent: fix/update the comments in zap_pid_ns_processes()
Date: Tue, 25 Nov 2014 17:57:45 +0100 [thread overview]
Message-ID: <20141125165745.GB28913@redhat.com> (raw)
In-Reply-To: <874mtodzhd.fsf@x220.int.ebiederm.org>
On 11/24, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> > The comments in zap_pid_ns_processes() are simply wrong, we need
> > to explain how this code actually works.
> >
> > 1. "Ignore SIGCHLD" looks like optimization but it is not, we also
> > need this for correctness.
> >
> > 2. The comment above sys_wait4() could be more clear.
> >
> > 3. The comment about TASK_DEAD children is outdated. Contrary to
> > what it says we do not need to make sure they all go away.
>
> We absolutely do need to wait until they all go away.
I can easily miss something, and that is why I asked you to review this
change. But if I missed something, perhaps we should update the comments?
This "wait until TASK_DEAD children" loop was added to ensure that
child_reaper can't be reaped before other tasks in this pid namespace.
6347e90091041e "pidns: guarantee that the pidns init will be the last pidns
process reaped".
But this was needed because proc_flush_task(pid == 1) called
kern_unmount(proc_mnt). After 0a01f2cc390e10633 "pidns: Make the pidns
proc mount/umount logic obvious" we can rely on schedule_work() in
free_pid(nr_hashed == 0).
So in any case I think that the current comment is outdated.
> - rusage will be wrong if we don't wait.
I already answered in 0/2, let me quote myself:
Do you mean cstime/cutime/c* accounting?
Firstly it is not clear what makes child_reaper special in _this_ sense, but
this doesn't matter at all.
The auotoreaping/EXIT_DEAD children are not accounted, only wait_task_zombie()
accumulates these counters. (just in case, accounting in __exit_signal() is
another thing).
> - We won't wait for an injected process in a pid namespace,
> or a processes debugged with gdb to be reaped before the pid
> init process exits if we don't wait.
Yes, and I do not see why this is bad, but this is off-topic.
Again, lets discuss this in another thread. This patch doesn't try to
document the desired semantics, it only tries to explain why zap_pid_ns
_has_ to wait until EXIT_DEAD (and in fact EXIT_ZOMBIE) tasks go away.
> The user visible semantics go from weird to completely insane if we
> relax the rule that the init process is the last process in a pid
> namespace.
>
> I don't see anything approaching a good reason for changing the user
> space semantics.
See above.
> > @@ -190,7 +190,11 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns)
> > /* Don't allow any more processes into the pid namespace */
> > disable_pid_allocation(pid_ns);
>
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> The above line disables injections of new processes in this pid
> namespace.
Yes, sure. See below.
> > - /* Ignore SIGCHLD causing any terminated children to autoreap */
> > + /*
> > + * Ignore SIGCHLD causing any terminated children to autoreap.
> > + * This speeds up the namespace shutdown, plus see the comment
> > + * below.
> > + */
>
> This speeds up the namespace shutdown and ensures that after we
> have waited for all existing zombies there will be no more
> zombies to wait for.
Afaics, no, see below.
> > - * sys_wait4() above can't reap the TASK_DEAD children.
> > - * Make sure they all go away, see free_pid().
> > + * sys_wait4() above can't reap the TASK_DEAD children but we do not
> > + * really care, we could reparent them to the global init.
>
> We do care.
See above. I only meant that nothing bad can happen from the kernel
perspective.
> > + * But this namespace can also have other tasks injected by setns().
> > + * Again, we do not really need to wait until they are all reaped,
>
> We do, and setns does not matter here. Injection was stopped way up above.
I think that setns() does matter.
Yes injection was stopped. But a task T can be already injected before
disable_pid_allocation() was called.
Now it is killed (or it could even exit before). To simplify, suppose it
is already EXIT_ZOMBIE, although this doesn't really matter.
The sys_wait4() loop can't see it, it is not our child.
Now suppose that its parent doesn't do wait() but exits. This means that
the exiting parent will try to reparent T to pid_ns->child_reaper.
child_reaper already sleeps in "wait for nr_hashed == init_pids" loop, it
can do nothing with T.
So we rely on autoreaping (forced by ignored SIGCHLD), and this is the
(technical) reason why child_reaper can't exit: pid_ns->child_reaper should
be valid.
No?
> > + * We rely on ignored SIGCHLD, an injected zombie must be autoreaped
> > + * if reparented.
>
> Your new comment is about 90% wrong.
See above.
Eric. I'd really ask you to take another look. But in any case: thanks for
looking at this.
Oleg.
next prev parent reply other threads:[~2014-11-25 16:57 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-07 20:14 [PATCH 0/4] proc: deuglify task_state() Oleg Nesterov
2014-11-07 20:14 ` [PATCH 1/4] proc: task_state: read cred->group_info outside of task_lock() Oleg Nesterov
2014-11-07 20:14 ` [PATCH 2/4] proc: task_state: deuglify the max_fds calculation Oleg Nesterov
2014-11-07 20:14 ` [PATCH 3/4] proc: task_state: move the main seq_printf() outside of rcu_read_lock() Oleg Nesterov
2014-11-13 18:04 ` Paul E. McKenney
2014-11-07 20:14 ` [PATCH 4/4] proc: task_state: ptrace_parent() doesn't need pid_alive() check Oleg Nesterov
2014-11-10 21:59 ` [PATCH 0/5] exit: reparent zombie fix + cleanups/optimizations Oleg Nesterov
2014-11-10 22:00 ` [PATCH 1/5] sched_show_task: fix unsafe usage of ->real_parent Oleg Nesterov
2014-11-11 10:39 ` Peter Zijlstra
2014-11-10 22:00 ` [PATCH 2/5] exit: reparent: use ->ptrace_entry rather than ->sibling for EXIT_DEAD tasks Oleg Nesterov
2014-11-10 22:00 ` [PATCH 3/5] exit: reparent: cleanup the changing of ->parent Oleg Nesterov
2014-11-10 22:00 ` [PATCH 4/5] exit: reparent: cleanup the usage of reparent_leader() Oleg Nesterov
2014-11-10 22:00 ` [PATCH 5/5] exit: ptrace: shift "reap dead" code from exit_ptrace() to forget_original_parent() Oleg Nesterov
2014-11-14 1:37 ` [PATCH 0/5] exit: more cleanups/optimizations Oleg Nesterov
2014-11-14 1:38 ` [PATCH 1/5] exit: wait: cleanup the ptrace_reparented() checks Oleg Nesterov
2014-11-14 1:38 ` [PATCH 2/5] exit: wait: don't use zombie->real_parent Oleg Nesterov
2014-11-14 1:38 ` [PATCH 3/5] exit: wait: drop tasklist_lock before psig->c* accounting Oleg Nesterov
2014-11-14 1:38 ` [PATCH 4/5] exit: release_task: fix the comment about group leader accounting Oleg Nesterov
2014-11-14 1:38 ` [PATCH 5/5] exit: proc: don't try to flush /proc/tgid/task/tgid Oleg Nesterov
2014-11-18 21:29 ` [PATCH 0/6] exit: find_new_reaper() fixes/cleanups Oleg Nesterov
2014-11-18 21:30 ` [PATCH 1/6] exit: reparent: fix the dead-parent PR_SET_CHILD_SUBREAPER reparenting Oleg Nesterov
2014-11-18 21:30 ` [PATCH 2/6] exit: reparent: fix the cross-namespace " Oleg Nesterov
2014-11-18 21:30 ` [PATCH 3/6] exit: reparent: s/while_each_thread/for_each_thread/ in find_new_reaper() Oleg Nesterov
2014-11-18 21:30 ` [PATCH 4/6] exit: reparent: document the ->has_child_subreaper checks Oleg Nesterov
2014-11-18 21:30 ` [PATCH 5/6] exit: reparent: introduce find_child_reaper() Oleg Nesterov
2014-11-18 21:30 ` [PATCH 6/6] exit: reparent: introduce find_alive_thread() Oleg Nesterov
2014-11-20 18:34 ` [PATCH 0/3] exit: avoid O(n ** 2) thread-list scan on group-exit if possible Oleg Nesterov
2014-11-20 18:34 ` [PATCH -mm 1/3] exit: reparent: avoid find_new_reaper() if no children Oleg Nesterov
2014-11-20 22:37 ` Andrew Morton
2014-11-21 20:01 ` Oleg Nesterov
2014-11-20 18:34 ` [PATCH -mm 2/3] exit: reparent: call forget_original_parent() under tasklist_lock Oleg Nesterov
2014-11-20 18:34 ` [PATCH -mm 3/3] exit: exit_notify: re-use "dead" list to autoreap current Oleg Nesterov
2014-11-24 20:06 ` [PATCH 0/2] exit/pid_ns: comments + simple fix Oleg Nesterov
2014-11-24 20:06 ` [PATCH 1/2] exit: reparent: fix/update the comments in zap_pid_ns_processes() Oleg Nesterov
2014-11-24 20:14 ` Oleg Nesterov
2014-11-24 22:07 ` Eric W. Biederman
2014-11-25 16:57 ` Oleg Nesterov [this message]
2014-11-25 17:17 ` Oleg Nesterov
2014-11-24 20:06 ` [PATCH 2/2] exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting Oleg Nesterov
2014-11-24 21:46 ` Eric W. Biederman
2014-11-25 17:07 ` Oleg Nesterov
2014-11-25 17:50 ` Eric W. Biederman
2014-11-25 18:15 ` Oleg Nesterov
2014-11-25 18:43 ` Eric W. Biederman
2014-11-25 18:59 ` Oleg Nesterov
2014-11-24 21:27 ` [PATCH 0/2] exit/pid_ns: comments + simple fix Eric W. Biederman
2014-11-24 21:38 ` Oleg Nesterov
2014-11-24 21:48 ` Eric W. Biederman
2014-11-25 16:57 ` Oleg Nesterov
2014-11-26 23:54 ` [PATCH v2 " Oleg Nesterov
2014-11-26 23:54 ` [PATCH v2 1/2] exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting Oleg Nesterov
2014-11-27 15:44 ` Eric W. Biederman
2014-11-26 23:54 ` [PATCH v2 2/2] exit: pidns: fix/update the comments in zap_pid_ns_processes() Oleg Nesterov
2014-12-01 22:39 ` Andrew Morton
2014-12-01 23:24 ` Oleg Nesterov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141125165745.GB28913@redhat.com \
--to=oleg@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=atomlin@redhat.com \
--cc=ebiederm@xmission.com \
--cc=linux-kernel@vger.kernel.org \
--cc=serge.hallyn@ubuntu.com \
--cc=stalexan@redhat.com \
--cc=xemul@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).