linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: rjw@sisk.pl, oleg@redhat.com
Cc: linux-kernel@vger.kernel.org, lizefan@huawei.com,
	containers@lists.linux-foundation.org, cgroups@vger.kernel.org,
	Tejun Heo <tj@kernel.org>,
	stable@vger.kernel.org
Subject: [PATCH 1/7] cgroup: cgroup_subsys->fork() should be called after the task is added to css_set
Date: Tue, 16 Oct 2012 15:28:40 -0700	[thread overview]
Message-ID: <1350426526-14254-2-git-send-email-tj@kernel.org> (raw)
In-Reply-To: <1350426526-14254-1-git-send-email-tj@kernel.org>

cgroup core has a bug which violates a basic rule about event
notifications - when a new entity needs to be added, you add that to
the notification list first and then make the new entity conform to
the current state.  If done in the reverse order, an event happening
inbetween will be lost.

cgroup_subsys->fork() is invoked way before the new task is added to
the css_set.  Currently, cgroup_freezer is the only user of ->fork()
and uses it to make new tasks conform to the current state of the
freezer.  If FROZEN state is requested while fork is in progress
between cgroup_fork_callbacks() and cgroup_post_fork(), the child
could escape freezing - the cgroup isn't frozen when ->fork() is
called and the freezer couldn't see the new task on the css_set.

This patch moves cgroup_subsys->fork() invocation to
cgroup_post_fork() after the new task is added to the css_set.
cgroup_fork_callbacks() is removed.

Because now a task may be migrated during cgroup_subsys->fork(),
freezer_fork() is updated so that it adheres to the usual RCU locking
and the rather pointless comment on why locking can be different there
is removed (if it doesn't make anything simpler, why even bother?).

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: stable@vger.kernel.org
---
 include/linux/cgroup.h  |    1 -
 kernel/cgroup.c         |   62 ++++++++++++++++++++++------------------------
 kernel/cgroup_freezer.c |   13 +++-------
 kernel/fork.c           |    9 +------
 4 files changed, 35 insertions(+), 50 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index f8a030c..4cd1d0f 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -34,7 +34,6 @@ extern int cgroup_lock_is_held(void);
 extern bool cgroup_lock_live_group(struct cgroup *cgrp);
 extern void cgroup_unlock(void);
 extern void cgroup_fork(struct task_struct *p);
-extern void cgroup_fork_callbacks(struct task_struct *p);
 extern void cgroup_post_fork(struct task_struct *p);
 extern void cgroup_exit(struct task_struct *p, int run_callbacks);
 extern int cgroupstats_build(struct cgroupstats *stats,
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 13774b3..b7a0171 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4844,44 +4844,19 @@ void cgroup_fork(struct task_struct *child)
 }
 
 /**
- * cgroup_fork_callbacks - run fork callbacks
- * @child: the new task
- *
- * Called on a new task very soon before adding it to the
- * tasklist. No need to take any locks since no-one can
- * be operating on this task.
- */
-void cgroup_fork_callbacks(struct task_struct *child)
-{
-	if (need_forkexit_callback) {
-		int i;
-		for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
-			struct cgroup_subsys *ss = subsys[i];
-
-			/*
-			 * forkexit callbacks are only supported for
-			 * builtin subsystems.
-			 */
-			if (!ss || ss->module)
-				continue;
-
-			if (ss->fork)
-				ss->fork(child);
-		}
-	}
-}
-
-/**
  * cgroup_post_fork - called on a new task after adding it to the task list
  * @child: the task in question
  *
- * Adds the task to the list running through its css_set if necessary.
- * Has to be after the task is visible on the task list in case we race
- * with the first call to cgroup_iter_start() - to guarantee that the
- * new task ends up on its list.
+ * Adds the task to the list running through its css_set if necessary and
+ * call the subsystem fork() callbacks.  Has to be after the task is
+ * visible on the task list in case we race with the first call to
+ * cgroup_iter_start() - to guarantee that the new task ends up on its
+ * list.
  */
 void cgroup_post_fork(struct task_struct *child)
 {
+	int i;
+
 	/*
 	 * use_task_css_set_links is set to 1 before we walk the tasklist
 	 * under the tasklist_lock and we read it here after we added the child
@@ -4910,7 +4885,30 @@ void cgroup_post_fork(struct task_struct *child)
 		}
 		write_unlock(&css_set_lock);
 	}
+
+	/*
+	 * Call ss->fork().  This must happen after @child is linked on
+	 * css_set; otherwise, @child might change state between ->fork()
+	 * and addition to css_set.
+	 */
+	if (need_forkexit_callback) {
+		for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
+			struct cgroup_subsys *ss = subsys[i];
+
+			/*
+			 * fork/exit callbacks are supported only for
+			 * builtin subsystems and we don't need further
+			 * synchronization as they never go away.
+			 */
+			if (!ss || ss->module)
+				continue;
+
+			if (ss->fork)
+				ss->fork(child);
+		}
+	}
 }
+
 /**
  * cgroup_exit - detach cgroup from exiting task
  * @tsk: pointer to task_struct of exiting process
diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index b1724ce..12bfedb 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -186,23 +186,15 @@ static void freezer_fork(struct task_struct *task)
 {
 	struct freezer *freezer;
 
-	/*
-	 * No lock is needed, since the task isn't on tasklist yet,
-	 * so it can't be moved to another cgroup, which means the
-	 * freezer won't be removed and will be valid during this
-	 * function call.  Nevertheless, apply RCU read-side critical
-	 * section to suppress RCU lockdep false positives.
-	 */
 	rcu_read_lock();
 	freezer = task_freezer(task);
-	rcu_read_unlock();
 
 	/*
 	 * The root cgroup is non-freezable, so we can skip the
 	 * following check.
 	 */
 	if (!freezer->css.cgroup->parent)
-		return;
+		goto out;
 
 	spin_lock_irq(&freezer->lock);
 	BUG_ON(freezer->state == CGROUP_FROZEN);
@@ -210,7 +202,10 @@ static void freezer_fork(struct task_struct *task)
 	/* Locking avoids race with FREEZING -> THAWED transitions. */
 	if (freezer->state == CGROUP_FREEZING)
 		freeze_task(task);
+
 	spin_unlock_irq(&freezer->lock);
+out:
+	rcu_read_unlock();
 }
 
 /*
diff --git a/kernel/fork.c b/kernel/fork.c
index 8b20ab7..acc4cb6 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1135,7 +1135,6 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 {
 	int retval;
 	struct task_struct *p;
-	int cgroup_callbacks_done = 0;
 
 	if ((clone_flags & (CLONE_NEWNS|CLONE_FS)) == (CLONE_NEWNS|CLONE_FS))
 		return ERR_PTR(-EINVAL);
@@ -1393,12 +1392,6 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 	INIT_LIST_HEAD(&p->thread_group);
 	p->task_works = NULL;
 
-	/* Now that the task is set up, run cgroup callbacks if
-	 * necessary. We need to run them before the task is visible
-	 * on the tasklist. */
-	cgroup_fork_callbacks(p);
-	cgroup_callbacks_done = 1;
-
 	/* Need tasklist lock for parent etc handling! */
 	write_lock_irq(&tasklist_lock);
 
@@ -1503,7 +1496,7 @@ bad_fork_cleanup_cgroup:
 #endif
 	if (clone_flags & CLONE_THREAD)
 		threadgroup_change_end(current);
-	cgroup_exit(p, cgroup_callbacks_done);
+	cgroup_exit(p, 0);
 	delayacct_tsk_free(p);
 	module_put(task_thread_info(p)->exec_domain->module);
 bad_fork_cleanup_count:
-- 
1.7.7.3


  reply	other threads:[~2012-10-16 22:28 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-16 22:28 [PATCHSET cgroup/for-3.8] cgroup_freezer: allow migration regardless of freezer state and update locking Tejun Heo
2012-10-16 22:28 ` Tejun Heo [this message]
2012-10-17  8:28   ` [PATCH 1/7] cgroup: cgroup_subsys->fork() should be called after the task is added to css_set Li Zefan
2012-10-18  1:25     ` Li Zefan
2012-10-21 19:11   ` Oleg Nesterov
2012-10-21 19:22     ` Tejun Heo
2012-10-22 18:04       ` Oleg Nesterov
2012-10-22 21:16         ` Tejun Heo
2012-10-23 15:51           ` Oleg Nesterov
2012-10-24 19:04             ` Tejun Heo
2012-10-25 17:42               ` Oleg Nesterov
2012-12-20  5:25   ` Herton Ronaldo Krzesinski
2012-12-28 21:22     ` [PATCH] cgroup: remove unused dummy cgroup_fork_callbacks() Tejun Heo
2012-10-16 22:28 ` [PATCH 2/7] freezer: add missing mb's to freezer_count() and freezer_should_skip() Tejun Heo
2012-10-22 17:44   ` Oleg Nesterov
2012-10-22 21:13     ` Tejun Heo
2012-10-23 15:39       ` Oleg Nesterov
2012-10-24 18:57         ` Tejun Heo
2012-10-25 16:39           ` [PATCH 0/1] (Was: freezer: add missing mb's to freezer_count() and freezer_should_skip()) Oleg Nesterov
2012-10-25 16:39             ` [PATCH 1/1] freezer: change ptrace_stop/do_signal_stop to use freezable_schedule() Oleg Nesterov
2012-10-25 17:18               ` Tejun Heo
2012-10-25 17:34                 ` Oleg Nesterov
2012-10-25 17:36                   ` Tejun Heo
2012-10-26 17:45                     ` [PATCH v2 0/1] " Oleg Nesterov
2012-10-26 17:46                       ` [PATCH v2 1/1] " Oleg Nesterov
2012-10-26 17:52                         ` Tejun Heo
2012-10-26 18:01                           ` Oleg Nesterov
2012-10-26 21:14                             ` Rafael J. Wysocki
2012-10-26 21:29                               ` Rafael J. Wysocki
2012-10-26 21:29                                 ` Tejun Heo
2012-10-28  0:16                                   ` Rafael J. Wysocki
2012-10-27 22:22                         ` Ben Hutchings
2012-10-28 13:45                           ` Oleg Nesterov
2012-10-16 22:28 ` [PATCH 3/7] cgroup_freezer: make it official that writes to freezer.state don't fail Tejun Heo
2012-10-16 22:28 ` [PATCH 4/7] cgroup_freezer: don't stall transition to FROZEN for PF_NOFREEZE or PF_FREEZER_SKIP tasks Tejun Heo
2012-10-22 18:34   ` Oleg Nesterov
2012-10-22 21:18     ` Tejun Heo
2012-10-23 15:55       ` Oleg Nesterov
2012-10-24 19:06         ` Tejun Heo
2012-10-25 17:12           ` [PATCH 0/1] (Was: cgroup_freezer: don't stall transition to FROZEN for PF_NOFREEZE or PF_FREEZER_SKIP tasks) Oleg Nesterov
2012-10-25 17:12             ` [PATCH 1/1] freezer: exec should clear PF_NOFREEZE along with PF_KTHREAD Oleg Nesterov
2012-10-25 17:20               ` Tejun Heo
2012-10-25 17:37                 ` Oleg Nesterov
2012-10-25 17:37                   ` Tejun Heo
2012-10-25 20:13                     ` Rafael J. Wysocki
2012-10-16 22:28 ` [PATCH 5/7] cgroup_freezer: allow moving tasks in and out of a frozen cgroup Tejun Heo
2012-10-22 19:25   ` Oleg Nesterov
2012-10-22 21:25     ` Tejun Heo
2012-10-23 16:14       ` Oleg Nesterov
2012-10-16 22:28 ` [PATCH 6/7] cgroup_freezer: prepare update_if_frozen() for locking change Tejun Heo
2012-10-16 22:28 ` [PATCH 7/7] cgroup_freezer: don't use cgroup_lock_live_group() Tejun Heo
2012-10-17 19:16 ` [PATCHSET cgroup/for-3.8] cgroup_freezer: allow migration regardless of freezer state and update locking Matt Helsley
2012-10-18 21:14   ` Tejun Heo
2012-10-18 22:21     ` Matt Helsley
2012-10-18 22:35       ` Tejun Heo
2012-10-18 23:47         ` Matt Helsley
2012-10-19  0:01           ` Tejun Heo
2012-10-19  1:29             ` Matt Helsley
2012-10-19 20:02               ` Tejun Heo
2012-10-19 16:54 ` Rafael J. Wysocki
2012-10-19 20:04   ` Tejun Heo
2012-10-21 19:18     ` Oleg Nesterov
2012-10-21 19:24       ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1350426526-14254-2-git-send-email-tj@kernel.org \
    --to=tj@kernel.org \
    --cc=cgroups@vger.kernel.org \
    --cc=containers@lists.linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizefan@huawei.com \
    --cc=oleg@redhat.com \
    --cc=rjw@sisk.pl \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).