All of lore.kernel.org
 help / color / mirror / Atom feed
From: Louis Rilling <louis.rilling@kerlabs.com>
To: Oleg Nesterov <oleg@redhat.com>
Cc: "\\\"Eric W. Biederman\\\"" <ebiederm@xmission.com>,
	Pavel Emelyanov <xemul@openvz.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux Containers <containers@lists.osdl.org>,
	linux-kernel@vger.kernel.org,
	Louis Rilling <louis.rilling@kerlabs.com>
Subject: [PATCH] pid_ns: Fix proc_flush_task() accessing freed proc_mnt
Date: Thu, 24 Jun 2010 10:37:43 +0200	[thread overview]
Message-ID: <1277368663-32705-1-git-send-email-louis.rilling@kerlabs.com> (raw)
In-Reply-To: <20100623203652.GA25298@redhat.com>

On 06/19, Oleg Nesterov wrote:
> And the last one on top of this series, before I go away from this
> thread ;)
>
> Since my further fixes were not approved, I think at least it makes
> sense to cleanup pid_ns_release_proc() logic a bit.

It's completely untested and could be split into three patches. But I think that
it solves the issues found so far, and that it will work with Eric's
unshare(CLONE_NEWPID) too.

What do you think about this approach?

Thanks,

Louis

On 20/06/10 20:06 +0200, Oleg Nesterov wrote:
> pid_namespace holds ns->proc_mnt, while this vfsmount has a referene to
> the namespace via PROC_I(sb->s_root->d_inode)->pid. To break this circle
> /sbin/init does mntput() in pid_ns_release_proc(). See 6f4e6433.
>
> But we have the following problems:
>
>         - Nobody does mntput() if copy_process() fails after
>           pid_ns_prepare_proc().
>
>         - proc_flush_task() checks upid->nr == 1 to verify we are init,
>           this is wrong if a multi-threaded init does exec.
>
>         - As Louis pointed out, this namespace can have the detached
>           EXIT_DEAD tasks which can use ns->proc_mnt after this mntput().

This patch does four things:
- defer pid_ns_release_proc()->mntput() to a worqueue context, so that
  pid_ns_release_proc() can be called in atomic context;
- introduce pid_ns->nr_pids, so that we can count the number of pids
  allocated by alloc_pidmap();
- move the call to pid_ns_prepare_proc() to alloc_pid(), where we know
  when the first pid of a namespace is allocated;
- move the call to pid_ns_release_proc() to free_pid(), where we are now
  able to know when the last pid of a namespace is detached.

This solves the missing mntput() in copy_process() cleanup path, since
free_pid() is called to cleanup alloc_pid().

This solves the multi-threaded init doing exec issue, since all
sub-threads including former leader have called proc_flush_task() when the
last pid is detached.

This solves the EXIT_DEAD tasks issue for the same reason.

Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com>
---
 fs/proc/base.c                |    4 ----
 fs/proc/root.c                |   11 ++++++++++-
 include/linux/pid_namespace.h |    3 +++
 kernel/fork.c                 |    6 ------
 kernel/pid.c                  |   16 +++++++++++++---
 kernel/pid_namespace.c        |    1 +
 6 files changed, 27 insertions(+), 14 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index acb7ef8..455b109 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2745,10 +2745,6 @@ void proc_flush_task(struct task_struct *task)
 		proc_flush_task_mnt(upid->ns->proc_mnt, upid->nr,
 					tgid->numbers[i].nr);
 	}
-
-	upid = &pid->numbers[pid->level];
-	if (upid->nr == 1)
-		pid_ns_release_proc(upid->ns);
 }
 
 static struct dentry *proc_pid_instantiate(struct inode *dir,
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 4258384..9876cd9 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -215,7 +215,16 @@ int pid_ns_prepare_proc(struct pid_namespace *ns)
 	return 0;
 }
 
-void pid_ns_release_proc(struct pid_namespace *ns)
+static void do_pid_ns_release_proc(struct work_struct *work)
 {
+	struct pid_namespace *ns;
+
+	ns = container_of(work, struct pid_namespace, release_proc_work);
 	mntput(ns->proc_mnt);
 }
+
+void pid_ns_release_proc(struct pid_namespace *ns)
+{
+	INIT_WORK(&ns->release_proc_work, do_pid_ns_release_proc);
+	schedule_work(&ns->release_proc_work);
+}
diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index 38d1032..1010733 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -4,6 +4,7 @@
 #include <linux/sched.h>
 #include <linux/mm.h>
 #include <linux/threads.h>
+#include <linux/workqueue.h>
 #include <linux/nsproxy.h>
 #include <linux/kref.h>
 
@@ -19,6 +20,7 @@ struct bsd_acct_struct;
 struct pid_namespace {
 	struct kref kref;
 	struct pidmap pidmap[PIDMAP_ENTRIES];
+	atomic_t nr_pids;
 	int last_pid;
 	struct task_struct *child_reaper;
 	struct kmem_cache *pid_cachep;
@@ -26,6 +28,7 @@ struct pid_namespace {
 	struct pid_namespace *parent;
 #ifdef CONFIG_PROC_FS
 	struct vfsmount *proc_mnt;
+	struct work_struct release_proc_work;
 #endif
 #ifdef CONFIG_BSD_PROCESS_ACCT
 	struct bsd_acct_struct *bacct;
diff --git a/kernel/fork.c b/kernel/fork.c
index b6cce14..b063a9c 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1154,12 +1154,6 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 		pid = alloc_pid(p->nsproxy->pid_ns);
 		if (!pid)
 			goto bad_fork_cleanup_io;
-
-		if (clone_flags & CLONE_NEWPID) {
-			retval = pid_ns_prepare_proc(p->nsproxy->pid_ns);
-			if (retval < 0)
-				goto bad_fork_free_pid;
-		}
 	}
 
 	p->pid = pid_nr(pid);
diff --git a/kernel/pid.c b/kernel/pid.c
index e9fd8c1..fdb73e1 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -34,6 +34,7 @@
 #include <linux/bootmem.h>
 #include <linux/hash.h>
 #include <linux/pid_namespace.h>
+#include <linux/proc_fs.h>
 #include <linux/init_task.h>
 #include <linux/syscalls.h>
 
@@ -112,7 +113,7 @@ EXPORT_SYMBOL(is_container_init);
 
 static  __cacheline_aligned_in_smp DEFINE_SPINLOCK(pidmap_lock);
 
-static void free_pidmap(struct upid *upid)
+static bool free_pidmap(struct upid *upid)
 {
 	int nr = upid->nr;
 	struct pidmap *map = upid->ns->pidmap + nr / BITS_PER_PAGE;
@@ -120,6 +121,7 @@ static void free_pidmap(struct upid *upid)
 
 	clear_bit(offset, map->page);
 	atomic_inc(&map->nr_free);
+	return atomic_dec_and_test(&upid->ns->nr_pids);
 }
 
 static int alloc_pidmap(struct pid_namespace *pid_ns)
@@ -154,6 +156,7 @@ static int alloc_pidmap(struct pid_namespace *pid_ns)
 			do {
 				if (!test_and_set_bit(offset, map->page)) {
 					atomic_dec(&map->nr_free);
+					atomic_inc(&pid_ns->nr_pids);
 					pid_ns->last_pid = pid;
 					return pid;
 				}
@@ -226,6 +229,7 @@ static void delayed_put_pid(struct rcu_head *rhp)
 void free_pid(struct pid *pid)
 {
 	/* We can be called with write_lock_irq(&tasklist_lock) held */
+	struct upid *upid;
 	int i;
 	unsigned long flags;
 
@@ -234,8 +238,11 @@ void free_pid(struct pid *pid)
 		hlist_del_rcu(&pid->numbers[i].pid_chain);
 	spin_unlock_irqrestore(&pidmap_lock, flags);
 
-	for (i = 0; i <= pid->level; i++)
-		free_pidmap(pid->numbers + i);
+	for (i = 0; i <= pid->level; i++) {
+		upid = pid->numbers + i;
+		if (free_pidmap(upid))
+			pid_ns_release_proc(upid->ns);
+	}
 
 	call_rcu(&pid->rcu, delayed_put_pid);
 }
@@ -276,6 +283,9 @@ struct pid *alloc_pid(struct pid_namespace *ns)
 				&pid_hash[pid_hashfn(upid->nr, upid->ns)]);
 	spin_unlock_irq(&pidmap_lock);
 
+	if (pid->numbers[pid->level].nr == 1)
+		pid_ns_prepare_proc(ns);
+
 out:
 	return pid;
 
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index a5aff94..beba2b4 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -92,6 +92,7 @@ static struct pid_namespace *create_pid_namespace(struct pid_namespace *parent_p
 
 	set_bit(0, ns->pidmap[0].page);
 	atomic_set(&ns->pidmap[0].nr_free, BITS_PER_PAGE - 1);
+	atomic_set(&ns->nr_pids, 0);
 
 	for (i = 1; i < PIDMAP_ENTRIES; i++)
 		atomic_set(&ns->pidmap[i].nr_free, BITS_PER_PAGE);
-- 
1.5.6.5

  parent reply	other threads:[~2010-06-24  8:37 UTC|newest]

Thread overview: 93+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-16 16:34 [PATCH] procfs: Do not release pid_ns->proc_mnt too early Louis Rilling
     [not found] ` <1276706068-18567-1-git-send-email-louis.rilling-aw0BnHfMbSpBDgjK7y7TUQ@public.gmane.org>
2010-06-17  9:53   ` Pavel Emelyanov
2010-06-17  9:53     ` Pavel Emelyanov
2010-06-17 13:41     ` Eric W. Biederman
2010-06-17 14:20       ` Louis Rilling
2010-06-17 21:36       ` Oleg Nesterov
2010-06-18  8:27         ` Louis Rilling
2010-06-18 16:27           ` Oleg Nesterov
2010-06-21 11:11             ` Louis Rilling
2010-06-21 12:58               ` Eric W. Biederman
2010-06-21 14:15                 ` Louis Rilling
2010-06-21 14:26                   ` Eric W. Biederman
     [not found]           ` <20100618082738.GE16877-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2010-06-18 16:27             ` Oleg Nesterov
2010-06-17 21:20 ` Oleg Nesterov
2010-06-18  8:20   ` Louis Rilling
2010-06-18 11:15     ` Oleg Nesterov
2010-06-18 16:08       ` Oleg Nesterov
2010-06-18 16:08         ` Oleg Nesterov
2010-06-18 17:33         ` Louis Rilling
2010-06-18 17:55           ` Oleg Nesterov
2010-06-18 17:55             ` Oleg Nesterov
2010-06-18 21:23             ` Oleg Nesterov
2010-06-18 21:23               ` Oleg Nesterov
2010-06-19 19:08               ` [PATCH 0/4] pid_ns_prepare_proc/unshare cleanups Oleg Nesterov
2010-06-19 19:09                 ` [PATCH 1/4] procfs: proc_get_sb: consolidate/cleanup root_inode->pid logic Oleg Nesterov
2010-06-19 19:10                 ` [PATCH 2/4] procfs: kill the global proc_mnt variable Oleg Nesterov
2010-06-19 19:10                 ` [PATCH 3/4] procfs: move pid_ns_prepare_proc() from copy_process() to create_pid_namespace() Oleg Nesterov
2010-06-19 19:11                 ` [PATCH RESEND 4/4] sys_unshare: simplify the not-really-implemented CLONE_THREAD/SIGHAND/VM code Oleg Nesterov
2010-06-20  8:42                 ` [PATCH 0/6] Unshare support for the pid namespace Eric W. Biederman
2010-06-20  8:44                   ` [PATCH 1/6] pid: Remove the child_reaper special case in init/main.c Eric W. Biederman
     [not found]                     ` <m1ljaaqejm.fsf_-_-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2010-06-20 18:29                       ` Oleg Nesterov
2010-06-20 18:29                         ` Oleg Nesterov
2010-06-20 20:27                         ` Oleg Nesterov
2010-06-20  8:45                   ` [PATCH 2/6] pidns: Call pid_ns_prepare_proc from create_pid_namespace Eric W. Biederman
     [not found]                     ` <m1hbkyqeib.fsf_-_-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2010-06-20 18:19                       ` Oleg Nesterov
2010-06-20 18:19                         ` Oleg Nesterov
2010-06-20  8:45                   ` [PATCH 3/6] procfs: kill the global proc_mnt variable Eric W. Biederman
2010-06-20  8:47                   ` [PATCH 4/6] pidns: Don't allow new pids after the namespace is dead Eric W. Biederman
2010-06-20 18:44                     ` Oleg Nesterov
2010-06-20  8:48                   ` [PATCH 5/6] pidns: Use task_active_pid_ns where appropriate Eric W. Biederman
2010-06-20  8:49                   ` [PATCH 6/6] pidns: Support unsharing the pid namespace Eric W. Biederman
2010-06-20 20:14                     ` Oleg Nesterov
2010-06-20 20:42                       ` Oleg Nesterov
2010-06-21  1:53                       ` Eric W. Biederman
2010-06-20 18:03                   ` [PATCH 0/6] Unshare support for " Oleg Nesterov
2010-06-20 18:05                     ` [PATCH 0/2] pid_ns_release_proc() fixes Oleg Nesterov
2010-06-20 18:06                       ` [PATCH 1/2] pid_ns: move destroy_pid_namespace() into workqueue context Oleg Nesterov
2010-06-20 18:06                       ` [PATCH 2/2] pid_ns: refactor the buggy pid_ns_release_proc() logic Oleg Nesterov
2010-06-20 21:00                     ` [PATCH 0/6] Unshare support for the pid namespace Eric W. Biederman
2010-06-20 21:48                       ` Oleg Nesterov
     [not found]                       ` <m14ogxctd6.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2010-06-20 21:56                         ` Oleg Nesterov
2010-06-20 21:56                           ` Oleg Nesterov
2011-01-26 15:57                   ` Daniel Lezcano
2010-06-23 20:36                 ` [PATCH 0/1] pid_ns: move pid_ns_release_proc() from proc_flush_task() to zap_pid_ns_processes() Oleg Nesterov
     [not found]                   ` <20100623203652.GA25298-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2010-06-23 20:37                     ` [PATCH 1/1] " Oleg Nesterov
2010-06-23 20:37                       ` Oleg Nesterov
2010-06-24  6:36                       ` Sukadev Bhattiprolu
2010-06-24 12:59                         ` Oleg Nesterov
2010-06-24  7:06                       ` Eric W. Biederman
2010-06-24 13:01                         ` Oleg Nesterov
2010-06-24  8:37                   ` Louis Rilling [this message]
2010-06-24 17:08                   ` [RESEND PATCH] pid_ns: Fix proc_flush_task() accessing freed proc_mnt Louis Rilling
2010-06-24 19:18                     ` Oleg Nesterov
2010-06-25 10:23                       ` Louis Rilling
     [not found]                         ` <20100625102303.GG3773-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2010-06-25 12:21                           ` Oleg Nesterov
2010-06-25 12:21                             ` Oleg Nesterov
2010-06-25 18:37                           ` Sukadev Bhattiprolu
2010-06-25 18:37                         ` Sukadev Bhattiprolu
2010-06-25 19:29                           ` Oleg Nesterov
2010-06-25 21:26                             ` Sukadev Bhattiprolu
2010-06-25 21:27                               ` Oleg Nesterov
2010-06-25 22:07                                 ` Sukadev Bhattiprolu
2010-07-09  4:36                                   ` [RFC][PATCH 1/2] pidns: Add a flag to indicate a pid namespace is dead Eric W. Biederman
2010-07-09  4:39                                     ` [RFC][PATCH 2/2] pidns: Remove proc flush races when a pid namespaces are exiting Eric W. Biederman
2010-07-09 12:14                                       ` Louis Rilling
2010-07-09 13:05                                         ` Eric W. Biederman
2010-07-09 14:13                                           ` Louis Rilling
2010-07-09 15:58                                             ` [PATCH 01/24] pidns: Remove races by stopping the caching of proc_mnt Eric W. Biederman
2010-07-09 22:13                                               ` Serge E. Hallyn
2010-07-11 14:14                                               ` Louis Rilling
2010-07-11 14:25                                                 ` Eric W. Biederman
2010-07-12 18:09                                                 ` [PATCH] pidns: Fix wait for zombies to be reaped in zap_pid_ns_processes Eric W. Biederman
2010-07-13 21:42                                                   ` Louis Rilling
     [not found]                                                     ` <20100713214234.GA21042-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2010-07-13 22:34                                                       ` Serge E. Hallyn
2010-07-13 22:34                                                     ` Serge E. Hallyn
2010-07-14  1:47                                                     ` Eric W. Biederman
     [not found]                                                       ` <m1oceakf5x.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2010-10-30  7:07                                                         ` Sukadev Bhattiprolu
2010-10-30  7:07                                                           ` Sukadev Bhattiprolu
2010-07-14 20:53                                                   ` Sukadev Bhattiprolu
2010-07-14 21:35                                                     ` Eric W. Biederman
2010-06-21 11:09             ` [PATCH] procfs: Do not release pid_ns->proc_mnt too early Louis Rilling
2010-06-21 11:15             ` Louis Rilling
2010-06-21 14:38               ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1277368663-32705-1-git-send-email-louis.rilling@kerlabs.com \
    --to=louis.rilling@kerlabs.com \
    --cc=akpm@linux-foundation.org \
    --cc=containers@lists.osdl.org \
    --cc=ebiederm@xmission.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=xemul@openvz.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.