All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET cgroup/for-3.8] cgroup_freezer: allow migration regardless of freezer state and update locking
@ 2012-10-16 22:28 ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-16 22:28 UTC (permalink / raw)
  To: rjw, oleg; +Cc: linux-kernel, lizefan, containers, cgroups

Hello,

This patchset updates cgroup_freezer so that

* Unfreezable kernel tasks don't prevent a cgroup from transitioning
  into FROZEN from FREEZING.  There's nothing userland can do with or
  about such tasks.

* Tasks can be moved in and out of a frozen cgroup.  Tasks are made to
  conform to the state of the new cgroup during migration.  This
  behavior makes a lot more sense and removes the use of
  ->can_attach() which makes co-mounting difficult.

* Remove use of cgroup_lock_live_group().  Grabbing cgroup_lock from
  outside cgroup proper creates a painful locking dependency and is
  being phased out.  With the above behavior change, removing
  dependency on cgroup_lock is pretty easy.  IMHO, it was simply the
  wrong behavior to implement which forced the wrong implementation.

This patchset contains the following seven patches.

 0001-cgroup-cgroup_subsys-fork-should-be-called-after-the.patch
 0002-freezer-add-missing-mb-s-to-freezer_count-and-freeze.patch
 0003-cgroup_freezer-make-it-official-that-writes-to-freez.patch
 0004-cgroup_freezer-don-t-stall-transition-to-FROZEN-for-.patch
 0005-cgroup_freezer-allow-moving-tasks-in-and-out-of-a-fr.patch
 0006-cgroup_freezer-prepare-update_if_frozen-for-locking-.patch
 0007-cgroup_freezer-don-t-use-cgroup_lock_live_group.patch

0001 is a fix for a rather embarrassing bug in cgroup core.  It does
things in the wrong order leaving a window for racing during fork.

0002 adds a missing mb() around freezing condition updates / checks.

0003-0004 make cgroup_freezer ignore unfreezable kernel tasks and
handle PF_FREEZER_SKIP correctly.

0005 allows migrating tasks in and out of a frozen cgroup.

0006-0007 remove the use of cgroup_lock_live_group().

This patchset is on top of v3.7-rc1 and available in the following git
branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-cgroup_freezer-locking

 include/linux/cgroup.h  |    1 
 include/linux/freezer.h |   50 +++++++++--
 kernel/cgroup.c         |   62 ++++++--------
 kernel/cgroup_freezer.c |  210 ++++++++++++++++--------------------------------
 kernel/fork.c           |    9 --
 5 files changed, 147 insertions(+), 185 deletions(-)

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* [PATCHSET cgroup/for-3.8] cgroup_freezer: allow migration regardless of freezer state and update locking
@ 2012-10-16 22:28 ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-16 22:28 UTC (permalink / raw)
  To: rjw-KKrjLPT3xs0, oleg-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	lizefan-hv44wF8Li93QT0dZR+AlfA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Hello,

This patchset updates cgroup_freezer so that

* Unfreezable kernel tasks don't prevent a cgroup from transitioning
  into FROZEN from FREEZING.  There's nothing userland can do with or
  about such tasks.

* Tasks can be moved in and out of a frozen cgroup.  Tasks are made to
  conform to the state of the new cgroup during migration.  This
  behavior makes a lot more sense and removes the use of
  ->can_attach() which makes co-mounting difficult.

* Remove use of cgroup_lock_live_group().  Grabbing cgroup_lock from
  outside cgroup proper creates a painful locking dependency and is
  being phased out.  With the above behavior change, removing
  dependency on cgroup_lock is pretty easy.  IMHO, it was simply the
  wrong behavior to implement which forced the wrong implementation.

This patchset contains the following seven patches.

 0001-cgroup-cgroup_subsys-fork-should-be-called-after-the.patch
 0002-freezer-add-missing-mb-s-to-freezer_count-and-freeze.patch
 0003-cgroup_freezer-make-it-official-that-writes-to-freez.patch
 0004-cgroup_freezer-don-t-stall-transition-to-FROZEN-for-.patch
 0005-cgroup_freezer-allow-moving-tasks-in-and-out-of-a-fr.patch
 0006-cgroup_freezer-prepare-update_if_frozen-for-locking-.patch
 0007-cgroup_freezer-don-t-use-cgroup_lock_live_group.patch

0001 is a fix for a rather embarrassing bug in cgroup core.  It does
things in the wrong order leaving a window for racing during fork.

0002 adds a missing mb() around freezing condition updates / checks.

0003-0004 make cgroup_freezer ignore unfreezable kernel tasks and
handle PF_FREEZER_SKIP correctly.

0005 allows migrating tasks in and out of a frozen cgroup.

0006-0007 remove the use of cgroup_lock_live_group().

This patchset is on top of v3.7-rc1 and available in the following git
branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-cgroup_freezer-locking

 include/linux/cgroup.h  |    1 
 include/linux/freezer.h |   50 +++++++++--
 kernel/cgroup.c         |   62 ++++++--------
 kernel/cgroup_freezer.c |  210 ++++++++++++++++--------------------------------
 kernel/fork.c           |    9 --
 5 files changed, 147 insertions(+), 185 deletions(-)

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* [PATCH 1/7] cgroup: cgroup_subsys->fork() should be called after the task is added to css_set
  2012-10-16 22:28 ` Tejun Heo
@ 2012-10-16 22:28     ` Tejun Heo
  -1 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-16 22:28 UTC (permalink / raw)
  To: rjw-KKrjLPT3xs0, oleg-H+wXaHxf7aLQT0dZR+AlfA
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
	cgroups-u79uwXL29TY76Z2rM5mHXA

cgroup core has a bug which violates a basic rule about event
notifications - when a new entity needs to be added, you add that to
the notification list first and then make the new entity conform to
the current state.  If done in the reverse order, an event happening
inbetween will be lost.

cgroup_subsys->fork() is invoked way before the new task is added to
the css_set.  Currently, cgroup_freezer is the only user of ->fork()
and uses it to make new tasks conform to the current state of the
freezer.  If FROZEN state is requested while fork is in progress
between cgroup_fork_callbacks() and cgroup_post_fork(), the child
could escape freezing - the cgroup isn't frozen when ->fork() is
called and the freezer couldn't see the new task on the css_set.

This patch moves cgroup_subsys->fork() invocation to
cgroup_post_fork() after the new task is added to the css_set.
cgroup_fork_callbacks() is removed.

Because now a task may be migrated during cgroup_subsys->fork(),
freezer_fork() is updated so that it adheres to the usual RCU locking
and the rather pointless comment on why locking can be different there
is removed (if it doesn't make anything simpler, why even bother?).

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org>
Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
---
 include/linux/cgroup.h  |    1 -
 kernel/cgroup.c         |   62 ++++++++++++++++++++++------------------------
 kernel/cgroup_freezer.c |   13 +++-------
 kernel/fork.c           |    9 +------
 4 files changed, 35 insertions(+), 50 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index f8a030c..4cd1d0f 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -34,7 +34,6 @@ extern int cgroup_lock_is_held(void);
 extern bool cgroup_lock_live_group(struct cgroup *cgrp);
 extern void cgroup_unlock(void);
 extern void cgroup_fork(struct task_struct *p);
-extern void cgroup_fork_callbacks(struct task_struct *p);
 extern void cgroup_post_fork(struct task_struct *p);
 extern void cgroup_exit(struct task_struct *p, int run_callbacks);
 extern int cgroupstats_build(struct cgroupstats *stats,
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 13774b3..b7a0171 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4844,44 +4844,19 @@ void cgroup_fork(struct task_struct *child)
 }
 
 /**
- * cgroup_fork_callbacks - run fork callbacks
- * @child: the new task
- *
- * Called on a new task very soon before adding it to the
- * tasklist. No need to take any locks since no-one can
- * be operating on this task.
- */
-void cgroup_fork_callbacks(struct task_struct *child)
-{
-	if (need_forkexit_callback) {
-		int i;
-		for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
-			struct cgroup_subsys *ss = subsys[i];
-
-			/*
-			 * forkexit callbacks are only supported for
-			 * builtin subsystems.
-			 */
-			if (!ss || ss->module)
-				continue;
-
-			if (ss->fork)
-				ss->fork(child);
-		}
-	}
-}
-
-/**
  * cgroup_post_fork - called on a new task after adding it to the task list
  * @child: the task in question
  *
- * Adds the task to the list running through its css_set if necessary.
- * Has to be after the task is visible on the task list in case we race
- * with the first call to cgroup_iter_start() - to guarantee that the
- * new task ends up on its list.
+ * Adds the task to the list running through its css_set if necessary and
+ * call the subsystem fork() callbacks.  Has to be after the task is
+ * visible on the task list in case we race with the first call to
+ * cgroup_iter_start() - to guarantee that the new task ends up on its
+ * list.
  */
 void cgroup_post_fork(struct task_struct *child)
 {
+	int i;
+
 	/*
 	 * use_task_css_set_links is set to 1 before we walk the tasklist
 	 * under the tasklist_lock and we read it here after we added the child
@@ -4910,7 +4885,30 @@ void cgroup_post_fork(struct task_struct *child)
 		}
 		write_unlock(&css_set_lock);
 	}
+
+	/*
+	 * Call ss->fork().  This must happen after @child is linked on
+	 * css_set; otherwise, @child might change state between ->fork()
+	 * and addition to css_set.
+	 */
+	if (need_forkexit_callback) {
+		for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
+			struct cgroup_subsys *ss = subsys[i];
+
+			/*
+			 * fork/exit callbacks are supported only for
+			 * builtin subsystems and we don't need further
+			 * synchronization as they never go away.
+			 */
+			if (!ss || ss->module)
+				continue;
+
+			if (ss->fork)
+				ss->fork(child);
+		}
+	}
 }
+
 /**
  * cgroup_exit - detach cgroup from exiting task
  * @tsk: pointer to task_struct of exiting process
diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index b1724ce..12bfedb 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -186,23 +186,15 @@ static void freezer_fork(struct task_struct *task)
 {
 	struct freezer *freezer;
 
-	/*
-	 * No lock is needed, since the task isn't on tasklist yet,
-	 * so it can't be moved to another cgroup, which means the
-	 * freezer won't be removed and will be valid during this
-	 * function call.  Nevertheless, apply RCU read-side critical
-	 * section to suppress RCU lockdep false positives.
-	 */
 	rcu_read_lock();
 	freezer = task_freezer(task);
-	rcu_read_unlock();
 
 	/*
 	 * The root cgroup is non-freezable, so we can skip the
 	 * following check.
 	 */
 	if (!freezer->css.cgroup->parent)
-		return;
+		goto out;
 
 	spin_lock_irq(&freezer->lock);
 	BUG_ON(freezer->state == CGROUP_FROZEN);
@@ -210,7 +202,10 @@ static void freezer_fork(struct task_struct *task)
 	/* Locking avoids race with FREEZING -> THAWED transitions. */
 	if (freezer->state == CGROUP_FREEZING)
 		freeze_task(task);
+
 	spin_unlock_irq(&freezer->lock);
+out:
+	rcu_read_unlock();
 }
 
 /*
diff --git a/kernel/fork.c b/kernel/fork.c
index 8b20ab7..acc4cb6 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1135,7 +1135,6 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 {
 	int retval;
 	struct task_struct *p;
-	int cgroup_callbacks_done = 0;
 
 	if ((clone_flags & (CLONE_NEWNS|CLONE_FS)) == (CLONE_NEWNS|CLONE_FS))
 		return ERR_PTR(-EINVAL);
@@ -1393,12 +1392,6 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 	INIT_LIST_HEAD(&p->thread_group);
 	p->task_works = NULL;
 
-	/* Now that the task is set up, run cgroup callbacks if
-	 * necessary. We need to run them before the task is visible
-	 * on the tasklist. */
-	cgroup_fork_callbacks(p);
-	cgroup_callbacks_done = 1;
-
 	/* Need tasklist lock for parent etc handling! */
 	write_lock_irq(&tasklist_lock);
 
@@ -1503,7 +1496,7 @@ bad_fork_cleanup_cgroup:
 #endif
 	if (clone_flags & CLONE_THREAD)
 		threadgroup_change_end(current);
-	cgroup_exit(p, cgroup_callbacks_done);
+	cgroup_exit(p, 0);
 	delayacct_tsk_free(p);
 	module_put(task_thread_info(p)->exec_domain->module);
 bad_fork_cleanup_count:
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 1/7] cgroup: cgroup_subsys->fork() should be called after the task is added to css_set
@ 2012-10-16 22:28     ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-16 22:28 UTC (permalink / raw)
  To: rjw, oleg; +Cc: linux-kernel, lizefan, containers, cgroups, Tejun Heo, stable

cgroup core has a bug which violates a basic rule about event
notifications - when a new entity needs to be added, you add that to
the notification list first and then make the new entity conform to
the current state.  If done in the reverse order, an event happening
inbetween will be lost.

cgroup_subsys->fork() is invoked way before the new task is added to
the css_set.  Currently, cgroup_freezer is the only user of ->fork()
and uses it to make new tasks conform to the current state of the
freezer.  If FROZEN state is requested while fork is in progress
between cgroup_fork_callbacks() and cgroup_post_fork(), the child
could escape freezing - the cgroup isn't frozen when ->fork() is
called and the freezer couldn't see the new task on the css_set.

This patch moves cgroup_subsys->fork() invocation to
cgroup_post_fork() after the new task is added to the css_set.
cgroup_fork_callbacks() is removed.

Because now a task may be migrated during cgroup_subsys->fork(),
freezer_fork() is updated so that it adheres to the usual RCU locking
and the rather pointless comment on why locking can be different there
is removed (if it doesn't make anything simpler, why even bother?).

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: stable@vger.kernel.org
---
 include/linux/cgroup.h  |    1 -
 kernel/cgroup.c         |   62 ++++++++++++++++++++++------------------------
 kernel/cgroup_freezer.c |   13 +++-------
 kernel/fork.c           |    9 +------
 4 files changed, 35 insertions(+), 50 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index f8a030c..4cd1d0f 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -34,7 +34,6 @@ extern int cgroup_lock_is_held(void);
 extern bool cgroup_lock_live_group(struct cgroup *cgrp);
 extern void cgroup_unlock(void);
 extern void cgroup_fork(struct task_struct *p);
-extern void cgroup_fork_callbacks(struct task_struct *p);
 extern void cgroup_post_fork(struct task_struct *p);
 extern void cgroup_exit(struct task_struct *p, int run_callbacks);
 extern int cgroupstats_build(struct cgroupstats *stats,
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 13774b3..b7a0171 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4844,44 +4844,19 @@ void cgroup_fork(struct task_struct *child)
 }
 
 /**
- * cgroup_fork_callbacks - run fork callbacks
- * @child: the new task
- *
- * Called on a new task very soon before adding it to the
- * tasklist. No need to take any locks since no-one can
- * be operating on this task.
- */
-void cgroup_fork_callbacks(struct task_struct *child)
-{
-	if (need_forkexit_callback) {
-		int i;
-		for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
-			struct cgroup_subsys *ss = subsys[i];
-
-			/*
-			 * forkexit callbacks are only supported for
-			 * builtin subsystems.
-			 */
-			if (!ss || ss->module)
-				continue;
-
-			if (ss->fork)
-				ss->fork(child);
-		}
-	}
-}
-
-/**
  * cgroup_post_fork - called on a new task after adding it to the task list
  * @child: the task in question
  *
- * Adds the task to the list running through its css_set if necessary.
- * Has to be after the task is visible on the task list in case we race
- * with the first call to cgroup_iter_start() - to guarantee that the
- * new task ends up on its list.
+ * Adds the task to the list running through its css_set if necessary and
+ * call the subsystem fork() callbacks.  Has to be after the task is
+ * visible on the task list in case we race with the first call to
+ * cgroup_iter_start() - to guarantee that the new task ends up on its
+ * list.
  */
 void cgroup_post_fork(struct task_struct *child)
 {
+	int i;
+
 	/*
 	 * use_task_css_set_links is set to 1 before we walk the tasklist
 	 * under the tasklist_lock and we read it here after we added the child
@@ -4910,7 +4885,30 @@ void cgroup_post_fork(struct task_struct *child)
 		}
 		write_unlock(&css_set_lock);
 	}
+
+	/*
+	 * Call ss->fork().  This must happen after @child is linked on
+	 * css_set; otherwise, @child might change state between ->fork()
+	 * and addition to css_set.
+	 */
+	if (need_forkexit_callback) {
+		for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
+			struct cgroup_subsys *ss = subsys[i];
+
+			/*
+			 * fork/exit callbacks are supported only for
+			 * builtin subsystems and we don't need further
+			 * synchronization as they never go away.
+			 */
+			if (!ss || ss->module)
+				continue;
+
+			if (ss->fork)
+				ss->fork(child);
+		}
+	}
 }
+
 /**
  * cgroup_exit - detach cgroup from exiting task
  * @tsk: pointer to task_struct of exiting process
diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index b1724ce..12bfedb 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -186,23 +186,15 @@ static void freezer_fork(struct task_struct *task)
 {
 	struct freezer *freezer;
 
-	/*
-	 * No lock is needed, since the task isn't on tasklist yet,
-	 * so it can't be moved to another cgroup, which means the
-	 * freezer won't be removed and will be valid during this
-	 * function call.  Nevertheless, apply RCU read-side critical
-	 * section to suppress RCU lockdep false positives.
-	 */
 	rcu_read_lock();
 	freezer = task_freezer(task);
-	rcu_read_unlock();
 
 	/*
 	 * The root cgroup is non-freezable, so we can skip the
 	 * following check.
 	 */
 	if (!freezer->css.cgroup->parent)
-		return;
+		goto out;
 
 	spin_lock_irq(&freezer->lock);
 	BUG_ON(freezer->state == CGROUP_FROZEN);
@@ -210,7 +202,10 @@ static void freezer_fork(struct task_struct *task)
 	/* Locking avoids race with FREEZING -> THAWED transitions. */
 	if (freezer->state == CGROUP_FREEZING)
 		freeze_task(task);
+
 	spin_unlock_irq(&freezer->lock);
+out:
+	rcu_read_unlock();
 }
 
 /*
diff --git a/kernel/fork.c b/kernel/fork.c
index 8b20ab7..acc4cb6 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1135,7 +1135,6 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 {
 	int retval;
 	struct task_struct *p;
-	int cgroup_callbacks_done = 0;
 
 	if ((clone_flags & (CLONE_NEWNS|CLONE_FS)) == (CLONE_NEWNS|CLONE_FS))
 		return ERR_PTR(-EINVAL);
@@ -1393,12 +1392,6 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 	INIT_LIST_HEAD(&p->thread_group);
 	p->task_works = NULL;
 
-	/* Now that the task is set up, run cgroup callbacks if
-	 * necessary. We need to run them before the task is visible
-	 * on the tasklist. */
-	cgroup_fork_callbacks(p);
-	cgroup_callbacks_done = 1;
-
 	/* Need tasklist lock for parent etc handling! */
 	write_lock_irq(&tasklist_lock);
 
@@ -1503,7 +1496,7 @@ bad_fork_cleanup_cgroup:
 #endif
 	if (clone_flags & CLONE_THREAD)
 		threadgroup_change_end(current);
-	cgroup_exit(p, cgroup_callbacks_done);
+	cgroup_exit(p, 0);
 	delayacct_tsk_free(p);
 	module_put(task_thread_info(p)->exec_domain->module);
 bad_fork_cleanup_count:
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 2/7] freezer: add missing mb's to freezer_count() and freezer_should_skip()
  2012-10-16 22:28 ` Tejun Heo
@ 2012-10-16 22:28     ` Tejun Heo
  -1 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-16 22:28 UTC (permalink / raw)
  To: rjw-KKrjLPT3xs0, oleg-H+wXaHxf7aLQT0dZR+AlfA
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
	cgroups-u79uwXL29TY76Z2rM5mHXA

A task is considered frozen enough between freezer_do_not_count() and
freezer_count() and freezers use freezer_should_skip() to test this
condition.  This supposedly works because freezer_count() always calls
try_to_freezer() after clearing %PF_FREEZER_SKIP.

However, there currently is nothing which guarantees that
freezer_count() sees %true freezing() after clearing %PF_FREEZER_SKIP
when freezing is in progress, and vice-versa.  A task can escape the
freezing condition in effect by freezer_count() seeing !freezing() and
freezer_should_skip() seeing %PF_FREEZER_SKIP.

This patch adds smp_mb()'s to freezer_count() and
freezer_should_skip() such that either %true freezing() is visible to
freezer_count() or !PF_FREEZER_SKIP is visible to
freezer_should_skip().

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org>
Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
---
 include/linux/freezer.h |   50 +++++++++++++++++++++++++++++++++++++++-------
 1 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index d09af4b..ee89932 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -75,28 +75,62 @@ static inline bool cgroup_freezing(struct task_struct *task)
  */
 
 
-/* Tell the freezer not to count the current task as freezable. */
+/**
+ * freezer_do_not_count - tell freezer to ignore %current
+ *
+ * Tell freezers to ignore the current task when determining whether the
+ * target frozen state is reached.  IOW, the current task will be
+ * considered frozen enough by freezers.
+ *
+ * The caller shouldn't do anything which isn't allowed for a frozen task
+ * until freezer_cont() is called.  Usually, freezer[_do_not]_count() pair
+ * wrap a scheduling operation and nothing much else.
+ */
 static inline void freezer_do_not_count(void)
 {
 	current->flags |= PF_FREEZER_SKIP;
 }
 
-/*
- * Tell the freezer to count the current task as freezable again and try to
- * freeze it.
+/**
+ * freezer_count - tell freezer to stop ignoring %current
+ *
+ * Undo freezer_do_not_count().  It tells freezers that %current should be
+ * considered again and tries to freeze if freezing condition is already in
+ * effect.
  */
 static inline void freezer_count(void)
 {
 	current->flags &= ~PF_FREEZER_SKIP;
+	/*
+	 * If freezing is in progress, the following paired with smp_mb()
+	 * in freezer_should_skip() ensures that either we see %true
+	 * freezing() or freezer_should_skip() sees !PF_FREEZER_SKIP.
+	 */
+	smp_mb();
 	try_to_freeze();
 }
 
-/*
- * Check if the task should be counted as freezable by the freezer
+/**
+ * freezer_should_skip - whether to skip a task when determining frozen
+ *			 state is reached
+ * @p: task in quesion
+ *
+ * This function is used by freezers after establishing %true freezing() to
+ * test whether a task should be skipped when determining the target frozen
+ * state is reached.  IOW, if this function returns %true, @p is considered
+ * frozen enough.
  */
-static inline int freezer_should_skip(struct task_struct *p)
+static inline bool freezer_should_skip(struct task_struct *p)
 {
-	return !!(p->flags & PF_FREEZER_SKIP);
+	/*
+	 * The following smp_mb() paired with the one in freezer_count()
+	 * ensures that either freezer_count() sees %true freezing() or we
+	 * see cleared %PF_FREEZER_SKIP and return %false.  This makes it
+	 * impossible for a task to slip frozen state testing after
+	 * clearing %PF_FREEZER_SKIP.
+	 */
+	smp_mb();
+	return p->flags & PF_FREEZER_SKIP;
 }
 
 /*
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 2/7] freezer: add missing mb's to freezer_count() and freezer_should_skip()
@ 2012-10-16 22:28     ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-16 22:28 UTC (permalink / raw)
  To: rjw, oleg; +Cc: linux-kernel, lizefan, containers, cgroups, Tejun Heo, stable

A task is considered frozen enough between freezer_do_not_count() and
freezer_count() and freezers use freezer_should_skip() to test this
condition.  This supposedly works because freezer_count() always calls
try_to_freezer() after clearing %PF_FREEZER_SKIP.

However, there currently is nothing which guarantees that
freezer_count() sees %true freezing() after clearing %PF_FREEZER_SKIP
when freezing is in progress, and vice-versa.  A task can escape the
freezing condition in effect by freezer_count() seeing !freezing() and
freezer_should_skip() seeing %PF_FREEZER_SKIP.

This patch adds smp_mb()'s to freezer_count() and
freezer_should_skip() such that either %true freezing() is visible to
freezer_count() or !PF_FREEZER_SKIP is visible to
freezer_should_skip().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: stable@vger.kernel.org
---
 include/linux/freezer.h |   50 +++++++++++++++++++++++++++++++++++++++-------
 1 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index d09af4b..ee89932 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -75,28 +75,62 @@ static inline bool cgroup_freezing(struct task_struct *task)
  */
 
 
-/* Tell the freezer not to count the current task as freezable. */
+/**
+ * freezer_do_not_count - tell freezer to ignore %current
+ *
+ * Tell freezers to ignore the current task when determining whether the
+ * target frozen state is reached.  IOW, the current task will be
+ * considered frozen enough by freezers.
+ *
+ * The caller shouldn't do anything which isn't allowed for a frozen task
+ * until freezer_cont() is called.  Usually, freezer[_do_not]_count() pair
+ * wrap a scheduling operation and nothing much else.
+ */
 static inline void freezer_do_not_count(void)
 {
 	current->flags |= PF_FREEZER_SKIP;
 }
 
-/*
- * Tell the freezer to count the current task as freezable again and try to
- * freeze it.
+/**
+ * freezer_count - tell freezer to stop ignoring %current
+ *
+ * Undo freezer_do_not_count().  It tells freezers that %current should be
+ * considered again and tries to freeze if freezing condition is already in
+ * effect.
  */
 static inline void freezer_count(void)
 {
 	current->flags &= ~PF_FREEZER_SKIP;
+	/*
+	 * If freezing is in progress, the following paired with smp_mb()
+	 * in freezer_should_skip() ensures that either we see %true
+	 * freezing() or freezer_should_skip() sees !PF_FREEZER_SKIP.
+	 */
+	smp_mb();
 	try_to_freeze();
 }
 
-/*
- * Check if the task should be counted as freezable by the freezer
+/**
+ * freezer_should_skip - whether to skip a task when determining frozen
+ *			 state is reached
+ * @p: task in quesion
+ *
+ * This function is used by freezers after establishing %true freezing() to
+ * test whether a task should be skipped when determining the target frozen
+ * state is reached.  IOW, if this function returns %true, @p is considered
+ * frozen enough.
  */
-static inline int freezer_should_skip(struct task_struct *p)
+static inline bool freezer_should_skip(struct task_struct *p)
 {
-	return !!(p->flags & PF_FREEZER_SKIP);
+	/*
+	 * The following smp_mb() paired with the one in freezer_count()
+	 * ensures that either freezer_count() sees %true freezing() or we
+	 * see cleared %PF_FREEZER_SKIP and return %false.  This makes it
+	 * impossible for a task to slip frozen state testing after
+	 * clearing %PF_FREEZER_SKIP.
+	 */
+	smp_mb();
+	return p->flags & PF_FREEZER_SKIP;
 }
 
 /*
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 3/7] cgroup_freezer: make it official that writes to freezer.state don't fail
  2012-10-16 22:28 ` Tejun Heo
@ 2012-10-16 22:28     ` Tejun Heo
  -1 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-16 22:28 UTC (permalink / raw)
  To: rjw-KKrjLPT3xs0, oleg-H+wXaHxf7aLQT0dZR+AlfA
  Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

try_to_freeze_cgroup() has condition checks which are intended to fail
the write operation to freezer.state if there are tasks which can't be
frozen.  The condition checks have been broken for quite some time
now.  freeze_task() returns %false if the target task can't be frozen,
so num_cant_freeze_now is never incremented.

In addition, strangely, cgroup freezing proceeds even after the write
is failed, which is rather broken.

This patch rips out the non-working code intended to fail the write to
freezer.state when the cgroup contains non-freezable tasks and makes
it official that writes to freezer.state succeed whether there are
non-freezable tasks in the cgroup or not.

This leaves is_task_frozen_enough() with only one user -
upste_if_frozen().  Collapse it into the caller.  Note that this
removes an extra call to freezing().

This doesn't cause any userland behavior changes.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org>
---
 kernel/cgroup_freezer.c |   43 +++++++++++--------------------------------
 1 files changed, 11 insertions(+), 32 deletions(-)

diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index 12bfedb..05d5218 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -150,13 +150,6 @@ static void freezer_destroy(struct cgroup *cgroup)
 	kfree(freezer);
 }
 
-/* task is frozen or will freeze immediately when next it gets woken */
-static bool is_task_frozen_enough(struct task_struct *task)
-{
-	return frozen(task) ||
-		(task_is_stopped_or_traced(task) && freezing(task));
-}
-
 /*
  * The call to cgroup_lock() in the freezer.state write method prevents
  * a write to that file racing against an attach, and hence the
@@ -222,7 +215,8 @@ static void update_if_frozen(struct cgroup *cgroup,
 	cgroup_iter_start(cgroup, &it);
 	while ((task = cgroup_iter_next(cgroup, &it))) {
 		ntotal++;
-		if (freezing(task) && is_task_frozen_enough(task))
+		if (freezing(task) && (frozen(task) ||
+				       task_is_stopped_or_traced(task)))
 			nfrozen++;
 	}
 
@@ -264,24 +258,15 @@ static int freezer_read(struct cgroup *cgroup, struct cftype *cft,
 	return 0;
 }
 
-static int try_to_freeze_cgroup(struct cgroup *cgroup, struct freezer *freezer)
+static void freeze_cgroup(struct cgroup *cgroup, struct freezer *freezer)
 {
 	struct cgroup_iter it;
 	struct task_struct *task;
-	unsigned int num_cant_freeze_now = 0;
 
 	cgroup_iter_start(cgroup, &it);
-	while ((task = cgroup_iter_next(cgroup, &it))) {
-		if (!freeze_task(task))
-			continue;
-		if (is_task_frozen_enough(task))
-			continue;
-		if (!freezing(task) && !freezer_should_skip(task))
-			num_cant_freeze_now++;
-	}
+	while ((task = cgroup_iter_next(cgroup, &it)))
+		freeze_task(task);
 	cgroup_iter_end(cgroup, &it);
-
-	return num_cant_freeze_now ? -EBUSY : 0;
 }
 
 static void unfreeze_cgroup(struct cgroup *cgroup, struct freezer *freezer)
@@ -295,13 +280,10 @@ static void unfreeze_cgroup(struct cgroup *cgroup, struct freezer *freezer)
 	cgroup_iter_end(cgroup, &it);
 }
 
-static int freezer_change_state(struct cgroup *cgroup,
-				enum freezer_state goal_state)
+static void freezer_change_state(struct cgroup *cgroup,
+				 enum freezer_state goal_state)
 {
-	struct freezer *freezer;
-	int retval = 0;
-
-	freezer = cgroup_freezer(cgroup);
+	struct freezer *freezer = cgroup_freezer(cgroup);
 
 	spin_lock_irq(&freezer->lock);
 
@@ -318,22 +300,19 @@ static int freezer_change_state(struct cgroup *cgroup,
 		if (freezer->state == CGROUP_THAWED)
 			atomic_inc(&system_freezing_cnt);
 		freezer->state = CGROUP_FREEZING;
-		retval = try_to_freeze_cgroup(cgroup, freezer);
+		freeze_cgroup(cgroup, freezer);
 		break;
 	default:
 		BUG();
 	}
 
 	spin_unlock_irq(&freezer->lock);
-
-	return retval;
 }
 
 static int freezer_write(struct cgroup *cgroup,
 			 struct cftype *cft,
 			 const char *buffer)
 {
-	int retval;
 	enum freezer_state goal_state;
 
 	if (strcmp(buffer, freezer_state_strs[CGROUP_THAWED]) == 0)
@@ -345,9 +324,9 @@ static int freezer_write(struct cgroup *cgroup,
 
 	if (!cgroup_lock_live_group(cgroup))
 		return -ENODEV;
-	retval = freezer_change_state(cgroup, goal_state);
+	freezer_change_state(cgroup, goal_state);
 	cgroup_unlock();
-	return retval;
+	return 0;
 }
 
 static struct cftype files[] = {
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 3/7] cgroup_freezer: make it official that writes to freezer.state don't fail
@ 2012-10-16 22:28     ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-16 22:28 UTC (permalink / raw)
  To: rjw, oleg; +Cc: linux-kernel, lizefan, containers, cgroups, Tejun Heo

try_to_freeze_cgroup() has condition checks which are intended to fail
the write operation to freezer.state if there are tasks which can't be
frozen.  The condition checks have been broken for quite some time
now.  freeze_task() returns %false if the target task can't be frozen,
so num_cant_freeze_now is never incremented.

In addition, strangely, cgroup freezing proceeds even after the write
is failed, which is rather broken.

This patch rips out the non-working code intended to fail the write to
freezer.state when the cgroup contains non-freezable tasks and makes
it official that writes to freezer.state succeed whether there are
non-freezable tasks in the cgroup or not.

This leaves is_task_frozen_enough() with only one user -
upste_if_frozen().  Collapse it into the caller.  Note that this
removes an extra call to freezing().

This doesn't cause any userland behavior changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/cgroup_freezer.c |   43 +++++++++++--------------------------------
 1 files changed, 11 insertions(+), 32 deletions(-)

diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index 12bfedb..05d5218 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -150,13 +150,6 @@ static void freezer_destroy(struct cgroup *cgroup)
 	kfree(freezer);
 }
 
-/* task is frozen or will freeze immediately when next it gets woken */
-static bool is_task_frozen_enough(struct task_struct *task)
-{
-	return frozen(task) ||
-		(task_is_stopped_or_traced(task) && freezing(task));
-}
-
 /*
  * The call to cgroup_lock() in the freezer.state write method prevents
  * a write to that file racing against an attach, and hence the
@@ -222,7 +215,8 @@ static void update_if_frozen(struct cgroup *cgroup,
 	cgroup_iter_start(cgroup, &it);
 	while ((task = cgroup_iter_next(cgroup, &it))) {
 		ntotal++;
-		if (freezing(task) && is_task_frozen_enough(task))
+		if (freezing(task) && (frozen(task) ||
+				       task_is_stopped_or_traced(task)))
 			nfrozen++;
 	}
 
@@ -264,24 +258,15 @@ static int freezer_read(struct cgroup *cgroup, struct cftype *cft,
 	return 0;
 }
 
-static int try_to_freeze_cgroup(struct cgroup *cgroup, struct freezer *freezer)
+static void freeze_cgroup(struct cgroup *cgroup, struct freezer *freezer)
 {
 	struct cgroup_iter it;
 	struct task_struct *task;
-	unsigned int num_cant_freeze_now = 0;
 
 	cgroup_iter_start(cgroup, &it);
-	while ((task = cgroup_iter_next(cgroup, &it))) {
-		if (!freeze_task(task))
-			continue;
-		if (is_task_frozen_enough(task))
-			continue;
-		if (!freezing(task) && !freezer_should_skip(task))
-			num_cant_freeze_now++;
-	}
+	while ((task = cgroup_iter_next(cgroup, &it)))
+		freeze_task(task);
 	cgroup_iter_end(cgroup, &it);
-
-	return num_cant_freeze_now ? -EBUSY : 0;
 }
 
 static void unfreeze_cgroup(struct cgroup *cgroup, struct freezer *freezer)
@@ -295,13 +280,10 @@ static void unfreeze_cgroup(struct cgroup *cgroup, struct freezer *freezer)
 	cgroup_iter_end(cgroup, &it);
 }
 
-static int freezer_change_state(struct cgroup *cgroup,
-				enum freezer_state goal_state)
+static void freezer_change_state(struct cgroup *cgroup,
+				 enum freezer_state goal_state)
 {
-	struct freezer *freezer;
-	int retval = 0;
-
-	freezer = cgroup_freezer(cgroup);
+	struct freezer *freezer = cgroup_freezer(cgroup);
 
 	spin_lock_irq(&freezer->lock);
 
@@ -318,22 +300,19 @@ static int freezer_change_state(struct cgroup *cgroup,
 		if (freezer->state == CGROUP_THAWED)
 			atomic_inc(&system_freezing_cnt);
 		freezer->state = CGROUP_FREEZING;
-		retval = try_to_freeze_cgroup(cgroup, freezer);
+		freeze_cgroup(cgroup, freezer);
 		break;
 	default:
 		BUG();
 	}
 
 	spin_unlock_irq(&freezer->lock);
-
-	return retval;
 }
 
 static int freezer_write(struct cgroup *cgroup,
 			 struct cftype *cft,
 			 const char *buffer)
 {
-	int retval;
 	enum freezer_state goal_state;
 
 	if (strcmp(buffer, freezer_state_strs[CGROUP_THAWED]) == 0)
@@ -345,9 +324,9 @@ static int freezer_write(struct cgroup *cgroup,
 
 	if (!cgroup_lock_live_group(cgroup))
 		return -ENODEV;
-	retval = freezer_change_state(cgroup, goal_state);
+	freezer_change_state(cgroup, goal_state);
 	cgroup_unlock();
-	return retval;
+	return 0;
 }
 
 static struct cftype files[] = {
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 4/7] cgroup_freezer: don't stall transition to FROZEN for PF_NOFREEZE or PF_FREEZER_SKIP tasks
  2012-10-16 22:28 ` Tejun Heo
@ 2012-10-16 22:28     ` Tejun Heo
  -1 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-16 22:28 UTC (permalink / raw)
  To: rjw-KKrjLPT3xs0, oleg-H+wXaHxf7aLQT0dZR+AlfA
  Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

cgroup_freezer doesn't transition from FREEZING to FROZEN if the
cgroup contains PF_NOFREEZE tasks or tasks sleeping with
PF_FREEZER_SKIP set.

Only kernel tasks can be non-freezable (PF_NOFREEZE) and there's
nothing cgroup_freezer or userland can do about or to it.  It's
pointless to stall the transition for PF_NOFREEZE tasks.

PF_FREEZER_SKIP indicates that the task can be skipped when
determining whether frozen state is reached.  A task with
PF_FREEZER_SKIP is guaranteed to perform try_to_freeze() after it
wakes up and can be considered frozen much like stopped or traced
tasks.  Note that a vfork parent uses PF_FREEZER_SKIP while waiting
for the child.

This updates update_if_frozen() such that it only considers freezable
tasks and treats %true freezer_should_skip() tasks as frozen.

This allows cgroups w/ kthreads and vfork parents successfully reach
FROZEN state.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org>
---
 kernel/cgroup_freezer.c |   16 ++++++++++++----
 1 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index 05d5218..557f367 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -214,10 +214,18 @@ static void update_if_frozen(struct cgroup *cgroup,
 
 	cgroup_iter_start(cgroup, &it);
 	while ((task = cgroup_iter_next(cgroup, &it))) {
-		ntotal++;
-		if (freezing(task) && (frozen(task) ||
-				       task_is_stopped_or_traced(task)))
-			nfrozen++;
+		if (freezing(task)) {
+			ntotal++;
+			/*
+			 * freezer_should_skip() indicates that the task
+			 * should be skipped when determining freezing
+			 * completion.  Consider it frozen in addition to
+			 * the usual frozen condition.
+			 */
+			if (frozen(task) || task_is_stopped_or_traced(task) ||
+			    freezer_should_skip(task))
+				nfrozen++;
+		}
 	}
 
 	if (old_state == CGROUP_THAWED) {
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 4/7] cgroup_freezer: don't stall transition to FROZEN for PF_NOFREEZE or PF_FREEZER_SKIP tasks
@ 2012-10-16 22:28     ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-16 22:28 UTC (permalink / raw)
  To: rjw, oleg; +Cc: linux-kernel, lizefan, containers, cgroups, Tejun Heo

cgroup_freezer doesn't transition from FREEZING to FROZEN if the
cgroup contains PF_NOFREEZE tasks or tasks sleeping with
PF_FREEZER_SKIP set.

Only kernel tasks can be non-freezable (PF_NOFREEZE) and there's
nothing cgroup_freezer or userland can do about or to it.  It's
pointless to stall the transition for PF_NOFREEZE tasks.

PF_FREEZER_SKIP indicates that the task can be skipped when
determining whether frozen state is reached.  A task with
PF_FREEZER_SKIP is guaranteed to perform try_to_freeze() after it
wakes up and can be considered frozen much like stopped or traced
tasks.  Note that a vfork parent uses PF_FREEZER_SKIP while waiting
for the child.

This updates update_if_frozen() such that it only considers freezable
tasks and treats %true freezer_should_skip() tasks as frozen.

This allows cgroups w/ kthreads and vfork parents successfully reach
FROZEN state.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
---
 kernel/cgroup_freezer.c |   16 ++++++++++++----
 1 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index 05d5218..557f367 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -214,10 +214,18 @@ static void update_if_frozen(struct cgroup *cgroup,
 
 	cgroup_iter_start(cgroup, &it);
 	while ((task = cgroup_iter_next(cgroup, &it))) {
-		ntotal++;
-		if (freezing(task) && (frozen(task) ||
-				       task_is_stopped_or_traced(task)))
-			nfrozen++;
+		if (freezing(task)) {
+			ntotal++;
+			/*
+			 * freezer_should_skip() indicates that the task
+			 * should be skipped when determining freezing
+			 * completion.  Consider it frozen in addition to
+			 * the usual frozen condition.
+			 */
+			if (frozen(task) || task_is_stopped_or_traced(task) ||
+			    freezer_should_skip(task))
+				nfrozen++;
+		}
 	}
 
 	if (old_state == CGROUP_THAWED) {
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 5/7] cgroup_freezer: allow moving tasks in and out of a frozen cgroup
       [not found] ` <1350426526-14254-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (3 preceding siblings ...)
  2012-10-16 22:28     ` Tejun Heo
@ 2012-10-16 22:28   ` Tejun Heo
  2012-10-16 22:28     ` Tejun Heo
                     ` (3 subsequent siblings)
  8 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-16 22:28 UTC (permalink / raw)
  To: rjw-KKrjLPT3xs0, oleg-H+wXaHxf7aLQT0dZR+AlfA
  Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

cgroup_freezer is one of the few users of cgroup_subsys->can_attach()
and uses it to prevent tasks from being migrated into or out of a
frozen cgroup.  This makes cgroup_freezer cumbersome to use especially
when co-mounted with other controllers.

->can_attach() is problematic in general as it can make co-mounting
multiple cgroups difficult - migrating tasks may fail for reasons
completely irrelevant for other controllers.  freezer_can_attach() in
particular is more problematic because it messes with cgroup internal
locking to ensure that the state verification performed at
freezer_can_attach() stays valid until migration is complete.

This patch replaces freezer_can_attach() with freezer_attach() so that
tasks are always allowed to migrate - they are nudged into the
conforming state from freezer_attach().  This means that there can be
tasks which are being migrated which don't conform to the current
cgroup_freezer state until freezer_attach() is complete.  Under the
current locking scheme, the only such place is freezer_fork() which is
updated to handle such window.

While this patch doesn't remove the use of internal cgroup locking
from freezer_read/write() paths, it removes the requirement to keep
the freezer state constant while migrating and enables such change.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org>
Cc: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 kernel/cgroup_freezer.c |   51 ++++++++++++++++++++++++++++------------------
 1 files changed, 31 insertions(+), 20 deletions(-)

diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index 557f367..0b0e105 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -152,27 +152,38 @@ static void freezer_destroy(struct cgroup *cgroup)
 
 /*
  * The call to cgroup_lock() in the freezer.state write method prevents
- * a write to that file racing against an attach, and hence the
- * can_attach() result will remain valid until the attach completes.
+ * a write to that file racing against an attach, and hence we don't need
+ * to worry about racing against migration.
  */
-static int freezer_can_attach(struct cgroup *new_cgroup,
-			      struct cgroup_taskset *tset)
+static void freezer_attach(struct cgroup *new_cgrp, struct cgroup_taskset *tset)
 {
-	struct freezer *freezer;
+	struct freezer *freezer = cgroup_freezer(new_cgrp);
 	struct task_struct *task;
 
+	spin_lock_irq(&freezer->lock);
+
 	/*
-	 * Anything frozen can't move or be moved to/from.
+	 * Make the new tasks conform to the current state of @new_cgrp.
+	 * For simplicity, when migrating any task to a FROZEN cgroup, we
+	 * revert it to FREEZING and let update_if_frozen() determine the
+	 * correct state later.
+	 *
+	 * Tasks in @tset are on @new_cgrp but may not conform to its
+	 * current state before executing the following - !frozen tasks may
+	 * be visible in a FROZEN cgroup and frozen tasks in a THAWED one.
+	 * This means that, to determine whether to freeze, one should test
+	 * whether the state equals THAWED.
 	 */
-	cgroup_taskset_for_each(task, new_cgroup, tset)
-		if (cgroup_freezing(task))
-			return -EBUSY;
-
-	freezer = cgroup_freezer(new_cgroup);
-	if (freezer->state != CGROUP_THAWED)
-		return -EBUSY;
+	cgroup_taskset_for_each(task, new_cgrp, tset) {
+		if (freezer->state == CGROUP_THAWED) {
+			__thaw_task(task);
+		} else {
+			freeze_task(task);
+			freezer->state = CGROUP_FREEZING;
+		}
+	}
 
-	return 0;
+	spin_unlock_irq(&freezer->lock);
 }
 
 static void freezer_fork(struct task_struct *task)
@@ -190,12 +201,12 @@ static void freezer_fork(struct task_struct *task)
 		goto out;
 
 	spin_lock_irq(&freezer->lock);
-	BUG_ON(freezer->state == CGROUP_FROZEN);
-
-	/* Locking avoids race with FREEZING -> THAWED transitions. */
-	if (freezer->state == CGROUP_FREEZING)
+	/*
+	 * @task might have been just migrated into a FROZEN cgroup.  Test
+	 * equality with THAWED.  Read the comment in freezer_attach().
+	 */
+	if (freezer->state != CGROUP_THAWED)
 		freeze_task(task);
-
 	spin_unlock_irq(&freezer->lock);
 out:
 	rcu_read_unlock();
@@ -352,7 +363,7 @@ struct cgroup_subsys freezer_subsys = {
 	.create		= freezer_create,
 	.destroy	= freezer_destroy,
 	.subsys_id	= freezer_subsys_id,
-	.can_attach	= freezer_can_attach,
+	.attach		= freezer_attach,
 	.fork		= freezer_fork,
 	.base_cftypes	= files,
 
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 5/7] cgroup_freezer: allow moving tasks in and out of a frozen cgroup
  2012-10-16 22:28 ` Tejun Heo
  (?)
@ 2012-10-16 22:28 ` Tejun Heo
       [not found]   ` <1350426526-14254-6-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  2012-10-22 19:25     ` Oleg Nesterov
  -1 siblings, 2 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-16 22:28 UTC (permalink / raw)
  To: rjw, oleg; +Cc: linux-kernel, lizefan, containers, cgroups, Tejun Heo

cgroup_freezer is one of the few users of cgroup_subsys->can_attach()
and uses it to prevent tasks from being migrated into or out of a
frozen cgroup.  This makes cgroup_freezer cumbersome to use especially
when co-mounted with other controllers.

->can_attach() is problematic in general as it can make co-mounting
multiple cgroups difficult - migrating tasks may fail for reasons
completely irrelevant for other controllers.  freezer_can_attach() in
particular is more problematic because it messes with cgroup internal
locking to ensure that the state verification performed at
freezer_can_attach() stays valid until migration is complete.

This patch replaces freezer_can_attach() with freezer_attach() so that
tasks are always allowed to migrate - they are nudged into the
conforming state from freezer_attach().  This means that there can be
tasks which are being migrated which don't conform to the current
cgroup_freezer state until freezer_attach() is complete.  Under the
current locking scheme, the only such place is freezer_fork() which is
updated to handle such window.

While this patch doesn't remove the use of internal cgroup locking
from freezer_read/write() paths, it removes the requirement to keep
the freezer state constant while migrating and enables such change.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Li Zefan <lizefan@huawei.com>
---
 kernel/cgroup_freezer.c |   51 ++++++++++++++++++++++++++++------------------
 1 files changed, 31 insertions(+), 20 deletions(-)

diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index 557f367..0b0e105 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -152,27 +152,38 @@ static void freezer_destroy(struct cgroup *cgroup)
 
 /*
  * The call to cgroup_lock() in the freezer.state write method prevents
- * a write to that file racing against an attach, and hence the
- * can_attach() result will remain valid until the attach completes.
+ * a write to that file racing against an attach, and hence we don't need
+ * to worry about racing against migration.
  */
-static int freezer_can_attach(struct cgroup *new_cgroup,
-			      struct cgroup_taskset *tset)
+static void freezer_attach(struct cgroup *new_cgrp, struct cgroup_taskset *tset)
 {
-	struct freezer *freezer;
+	struct freezer *freezer = cgroup_freezer(new_cgrp);
 	struct task_struct *task;
 
+	spin_lock_irq(&freezer->lock);
+
 	/*
-	 * Anything frozen can't move or be moved to/from.
+	 * Make the new tasks conform to the current state of @new_cgrp.
+	 * For simplicity, when migrating any task to a FROZEN cgroup, we
+	 * revert it to FREEZING and let update_if_frozen() determine the
+	 * correct state later.
+	 *
+	 * Tasks in @tset are on @new_cgrp but may not conform to its
+	 * current state before executing the following - !frozen tasks may
+	 * be visible in a FROZEN cgroup and frozen tasks in a THAWED one.
+	 * This means that, to determine whether to freeze, one should test
+	 * whether the state equals THAWED.
 	 */
-	cgroup_taskset_for_each(task, new_cgroup, tset)
-		if (cgroup_freezing(task))
-			return -EBUSY;
-
-	freezer = cgroup_freezer(new_cgroup);
-	if (freezer->state != CGROUP_THAWED)
-		return -EBUSY;
+	cgroup_taskset_for_each(task, new_cgrp, tset) {
+		if (freezer->state == CGROUP_THAWED) {
+			__thaw_task(task);
+		} else {
+			freeze_task(task);
+			freezer->state = CGROUP_FREEZING;
+		}
+	}
 
-	return 0;
+	spin_unlock_irq(&freezer->lock);
 }
 
 static void freezer_fork(struct task_struct *task)
@@ -190,12 +201,12 @@ static void freezer_fork(struct task_struct *task)
 		goto out;
 
 	spin_lock_irq(&freezer->lock);
-	BUG_ON(freezer->state == CGROUP_FROZEN);
-
-	/* Locking avoids race with FREEZING -> THAWED transitions. */
-	if (freezer->state == CGROUP_FREEZING)
+	/*
+	 * @task might have been just migrated into a FROZEN cgroup.  Test
+	 * equality with THAWED.  Read the comment in freezer_attach().
+	 */
+	if (freezer->state != CGROUP_THAWED)
 		freeze_task(task);
-
 	spin_unlock_irq(&freezer->lock);
 out:
 	rcu_read_unlock();
@@ -352,7 +363,7 @@ struct cgroup_subsys freezer_subsys = {
 	.create		= freezer_create,
 	.destroy	= freezer_destroy,
 	.subsys_id	= freezer_subsys_id,
-	.can_attach	= freezer_can_attach,
+	.attach		= freezer_attach,
 	.fork		= freezer_fork,
 	.base_cftypes	= files,
 
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 6/7] cgroup_freezer: prepare update_if_frozen() for locking change
  2012-10-16 22:28 ` Tejun Heo
@ 2012-10-16 22:28     ` Tejun Heo
  -1 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-16 22:28 UTC (permalink / raw)
  To: rjw-KKrjLPT3xs0, oleg-H+wXaHxf7aLQT0dZR+AlfA
  Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Locking will change such that migration can happen while
freezer_read/write() is in progress.  This means that
update_if_frozen() can no longer assume that all tasks in the cgroup
coform to the current freezer state - newly migrated tasks which
haven't finished freezer_attach() yet might be in any state.

This patch updates update_if_frozen() such that it no longer verifies
task states against freezer state.  It now simply decides whether
FREEZING stage is complete.

This removal of verification makes it meaningless to call from
freezer_change_state().  Drop it and move the fast exit test from
freezer_read() - the only left caller - to update_if_frozen().

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org>
Cc: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 kernel/cgroup_freezer.c |   43 +++++++++++++++++--------------------------
 1 files changed, 17 insertions(+), 26 deletions(-)

diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index 0b0e105..3d45503 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -213,41 +213,39 @@ out:
 }
 
 /*
- * caller must hold freezer->lock
+ * We change from FREEZING to FROZEN lazily if the cgroup was only
+ * partially frozen when we exitted write.  Caller must hold freezer->lock.
+ *
+ * Task states and freezer state might disagree while tasks are being
+ * migrated into @cgroup, so we can't verify task states against @freezer
+ * state here.  See freezer_attach() for details.
  */
-static void update_if_frozen(struct cgroup *cgroup,
-				 struct freezer *freezer)
+static void update_if_frozen(struct cgroup *cgroup, struct freezer *freezer)
 {
 	struct cgroup_iter it;
 	struct task_struct *task;
-	unsigned int nfrozen = 0, ntotal = 0;
-	enum freezer_state old_state = freezer->state;
+
+	if (freezer->state != CGROUP_FREEZING)
+		return;
 
 	cgroup_iter_start(cgroup, &it);
+
 	while ((task = cgroup_iter_next(cgroup, &it))) {
 		if (freezing(task)) {
-			ntotal++;
 			/*
 			 * freezer_should_skip() indicates that the task
 			 * should be skipped when determining freezing
 			 * completion.  Consider it frozen in addition to
 			 * the usual frozen condition.
 			 */
-			if (frozen(task) || task_is_stopped_or_traced(task) ||
-			    freezer_should_skip(task))
-				nfrozen++;
+			if (!frozen(task) && !task_is_stopped_or_traced(task) &&
+			    !freezer_should_skip(task))
+				goto notyet;
 		}
 	}
 
-	if (old_state == CGROUP_THAWED) {
-		BUG_ON(nfrozen > 0);
-	} else if (old_state == CGROUP_FREEZING) {
-		if (nfrozen == ntotal)
-			freezer->state = CGROUP_FROZEN;
-	} else { /* old_state == CGROUP_FROZEN */
-		BUG_ON(nfrozen != ntotal);
-	}
-
+	freezer->state = CGROUP_FROZEN;
+notyet:
 	cgroup_iter_end(cgroup, &it);
 }
 
@@ -262,13 +260,8 @@ static int freezer_read(struct cgroup *cgroup, struct cftype *cft,
 
 	freezer = cgroup_freezer(cgroup);
 	spin_lock_irq(&freezer->lock);
+	update_if_frozen(cgroup, freezer);
 	state = freezer->state;
-	if (state == CGROUP_FREEZING) {
-		/* We change from FREEZING to FROZEN lazily if the cgroup was
-		 * only partially frozen when we exitted write. */
-		update_if_frozen(cgroup, freezer);
-		state = freezer->state;
-	}
 	spin_unlock_irq(&freezer->lock);
 	cgroup_unlock();
 
@@ -306,8 +299,6 @@ static void freezer_change_state(struct cgroup *cgroup,
 
 	spin_lock_irq(&freezer->lock);
 
-	update_if_frozen(cgroup, freezer);
-
 	switch (goal_state) {
 	case CGROUP_THAWED:
 		if (freezer->state != CGROUP_THAWED)
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 6/7] cgroup_freezer: prepare update_if_frozen() for locking change
@ 2012-10-16 22:28     ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-16 22:28 UTC (permalink / raw)
  To: rjw, oleg; +Cc: linux-kernel, lizefan, containers, cgroups, Tejun Heo

Locking will change such that migration can happen while
freezer_read/write() is in progress.  This means that
update_if_frozen() can no longer assume that all tasks in the cgroup
coform to the current freezer state - newly migrated tasks which
haven't finished freezer_attach() yet might be in any state.

This patch updates update_if_frozen() such that it no longer verifies
task states against freezer state.  It now simply decides whether
FREEZING stage is complete.

This removal of verification makes it meaningless to call from
freezer_change_state().  Drop it and move the fast exit test from
freezer_read() - the only left caller - to update_if_frozen().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Li Zefan <lizefan@huawei.com>
---
 kernel/cgroup_freezer.c |   43 +++++++++++++++++--------------------------
 1 files changed, 17 insertions(+), 26 deletions(-)

diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index 0b0e105..3d45503 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -213,41 +213,39 @@ out:
 }
 
 /*
- * caller must hold freezer->lock
+ * We change from FREEZING to FROZEN lazily if the cgroup was only
+ * partially frozen when we exitted write.  Caller must hold freezer->lock.
+ *
+ * Task states and freezer state might disagree while tasks are being
+ * migrated into @cgroup, so we can't verify task states against @freezer
+ * state here.  See freezer_attach() for details.
  */
-static void update_if_frozen(struct cgroup *cgroup,
-				 struct freezer *freezer)
+static void update_if_frozen(struct cgroup *cgroup, struct freezer *freezer)
 {
 	struct cgroup_iter it;
 	struct task_struct *task;
-	unsigned int nfrozen = 0, ntotal = 0;
-	enum freezer_state old_state = freezer->state;
+
+	if (freezer->state != CGROUP_FREEZING)
+		return;
 
 	cgroup_iter_start(cgroup, &it);
+
 	while ((task = cgroup_iter_next(cgroup, &it))) {
 		if (freezing(task)) {
-			ntotal++;
 			/*
 			 * freezer_should_skip() indicates that the task
 			 * should be skipped when determining freezing
 			 * completion.  Consider it frozen in addition to
 			 * the usual frozen condition.
 			 */
-			if (frozen(task) || task_is_stopped_or_traced(task) ||
-			    freezer_should_skip(task))
-				nfrozen++;
+			if (!frozen(task) && !task_is_stopped_or_traced(task) &&
+			    !freezer_should_skip(task))
+				goto notyet;
 		}
 	}
 
-	if (old_state == CGROUP_THAWED) {
-		BUG_ON(nfrozen > 0);
-	} else if (old_state == CGROUP_FREEZING) {
-		if (nfrozen == ntotal)
-			freezer->state = CGROUP_FROZEN;
-	} else { /* old_state == CGROUP_FROZEN */
-		BUG_ON(nfrozen != ntotal);
-	}
-
+	freezer->state = CGROUP_FROZEN;
+notyet:
 	cgroup_iter_end(cgroup, &it);
 }
 
@@ -262,13 +260,8 @@ static int freezer_read(struct cgroup *cgroup, struct cftype *cft,
 
 	freezer = cgroup_freezer(cgroup);
 	spin_lock_irq(&freezer->lock);
+	update_if_frozen(cgroup, freezer);
 	state = freezer->state;
-	if (state == CGROUP_FREEZING) {
-		/* We change from FREEZING to FROZEN lazily if the cgroup was
-		 * only partially frozen when we exitted write. */
-		update_if_frozen(cgroup, freezer);
-		state = freezer->state;
-	}
 	spin_unlock_irq(&freezer->lock);
 	cgroup_unlock();
 
@@ -306,8 +299,6 @@ static void freezer_change_state(struct cgroup *cgroup,
 
 	spin_lock_irq(&freezer->lock);
 
-	update_if_frozen(cgroup, freezer);
-
 	switch (goal_state) {
 	case CGROUP_THAWED:
 		if (freezer->state != CGROUP_THAWED)
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 7/7] cgroup_freezer: don't use cgroup_lock_live_group()
  2012-10-16 22:28 ` Tejun Heo
@ 2012-10-16 22:28     ` Tejun Heo
  -1 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-16 22:28 UTC (permalink / raw)
  To: rjw-KKrjLPT3xs0, oleg-H+wXaHxf7aLQT0dZR+AlfA
  Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

freezer_read/write() used cgroup_lock_live_group() to synchronize
against task migration into and out of the target cgroup.
cgroup_lock_live_group() grabs the internal cgroup lock and using it
from outside cgroup core leads to complex and fragile locking
dependency issues which are difficult to resolve.

Now that freezer_can_attach() is replaced with freezer_attach() and
update_if_frozen() updated, nothing requires excluding migration
against freezer state reads and changes.

This patch removes cgroup_lock_live_group() and the matching
cgroup_unlock() usages.  The prone-to-bitrot, already outdated and
unnecessary global lock hierarchy documentation is replaced with
documentation in local scope.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org>
Cc: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 kernel/cgroup_freezer.c |   66 +++++++---------------------------------------
 1 files changed, 10 insertions(+), 56 deletions(-)

diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index 3d45503..8a92b0e 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -84,50 +84,6 @@ static const char *freezer_state_strs[] = {
 
 struct cgroup_subsys freezer_subsys;
 
-/* Locks taken and their ordering
- * ------------------------------
- * cgroup_mutex (AKA cgroup_lock)
- * freezer->lock
- * css_set_lock
- * task->alloc_lock (AKA task_lock)
- * task->sighand->siglock
- *
- * cgroup code forces css_set_lock to be taken before task->alloc_lock
- *
- * freezer_create(), freezer_destroy():
- * cgroup_mutex [ by cgroup core ]
- *
- * freezer_can_attach():
- * cgroup_mutex (held by caller of can_attach)
- *
- * freezer_fork() (preserving fork() performance means can't take cgroup_mutex):
- * freezer->lock
- *  sighand->siglock (if the cgroup is freezing)
- *
- * freezer_read():
- * cgroup_mutex
- *  freezer->lock
- *   write_lock css_set_lock (cgroup iterator start)
- *    task->alloc_lock
- *   read_lock css_set_lock (cgroup iterator start)
- *
- * freezer_write() (freeze):
- * cgroup_mutex
- *  freezer->lock
- *   write_lock css_set_lock (cgroup iterator start)
- *    task->alloc_lock
- *   read_lock css_set_lock (cgroup iterator start)
- *    sighand->siglock (fake signal delivery inside freeze_task())
- *
- * freezer_write() (unfreeze):
- * cgroup_mutex
- *  freezer->lock
- *   write_lock css_set_lock (cgroup iterator start)
- *    task->alloc_lock
- *   read_lock css_set_lock (cgroup iterator start)
- *    task->alloc_lock (inside __thaw_task(), prevents race with refrigerator())
- *     sighand->siglock
- */
 static struct cgroup_subsys_state *freezer_create(struct cgroup *cgroup)
 {
 	struct freezer *freezer;
@@ -151,9 +107,13 @@ static void freezer_destroy(struct cgroup *cgroup)
 }
 
 /*
- * The call to cgroup_lock() in the freezer.state write method prevents
- * a write to that file racing against an attach, and hence we don't need
- * to worry about racing against migration.
+ * Tasks can be migrated into a different freezer anytime regardless of its
+ * current state.  freezer_attach() is responsible for making new tasks
+ * conform to the current state.
+ *
+ * Freezer state changes and task migration are synchronized via
+ * @freezer->lock.  freezer_attach() makes the new tasks conform to the
+ * current state and all following state changes can see the new tasks.
  */
 static void freezer_attach(struct cgroup *new_cgrp, struct cgroup_taskset *tset)
 {
@@ -217,8 +177,8 @@ out:
  * partially frozen when we exitted write.  Caller must hold freezer->lock.
  *
  * Task states and freezer state might disagree while tasks are being
- * migrated into @cgroup, so we can't verify task states against @freezer
- * state here.  See freezer_attach() for details.
+ * migrated into or out of @cgroup, so we can't verify task states against
+ * @freezer state here.  See freezer_attach() for details.
  */
 static void update_if_frozen(struct cgroup *cgroup, struct freezer *freezer)
 {
@@ -255,15 +215,11 @@ static int freezer_read(struct cgroup *cgroup, struct cftype *cft,
 	struct freezer *freezer;
 	enum freezer_state state;
 
-	if (!cgroup_lock_live_group(cgroup))
-		return -ENODEV;
-
 	freezer = cgroup_freezer(cgroup);
 	spin_lock_irq(&freezer->lock);
 	update_if_frozen(cgroup, freezer);
 	state = freezer->state;
 	spin_unlock_irq(&freezer->lock);
-	cgroup_unlock();
 
 	seq_puts(m, freezer_state_strs[state]);
 	seq_putc(m, '\n');
@@ -297,6 +253,7 @@ static void freezer_change_state(struct cgroup *cgroup,
 {
 	struct freezer *freezer = cgroup_freezer(cgroup);
 
+	/* also synchronizes against task migration, see freezer_attach() */
 	spin_lock_irq(&freezer->lock);
 
 	switch (goal_state) {
@@ -332,10 +289,7 @@ static int freezer_write(struct cgroup *cgroup,
 	else
 		return -EINVAL;
 
-	if (!cgroup_lock_live_group(cgroup))
-		return -ENODEV;
 	freezer_change_state(cgroup, goal_state);
-	cgroup_unlock();
 	return 0;
 }
 
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 7/7] cgroup_freezer: don't use cgroup_lock_live_group()
@ 2012-10-16 22:28     ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-16 22:28 UTC (permalink / raw)
  To: rjw, oleg; +Cc: linux-kernel, lizefan, containers, cgroups, Tejun Heo

freezer_read/write() used cgroup_lock_live_group() to synchronize
against task migration into and out of the target cgroup.
cgroup_lock_live_group() grabs the internal cgroup lock and using it
from outside cgroup core leads to complex and fragile locking
dependency issues which are difficult to resolve.

Now that freezer_can_attach() is replaced with freezer_attach() and
update_if_frozen() updated, nothing requires excluding migration
against freezer state reads and changes.

This patch removes cgroup_lock_live_group() and the matching
cgroup_unlock() usages.  The prone-to-bitrot, already outdated and
unnecessary global lock hierarchy documentation is replaced with
documentation in local scope.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Li Zefan <lizefan@huawei.com>
---
 kernel/cgroup_freezer.c |   66 +++++++---------------------------------------
 1 files changed, 10 insertions(+), 56 deletions(-)

diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index 3d45503..8a92b0e 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -84,50 +84,6 @@ static const char *freezer_state_strs[] = {
 
 struct cgroup_subsys freezer_subsys;
 
-/* Locks taken and their ordering
- * ------------------------------
- * cgroup_mutex (AKA cgroup_lock)
- * freezer->lock
- * css_set_lock
- * task->alloc_lock (AKA task_lock)
- * task->sighand->siglock
- *
- * cgroup code forces css_set_lock to be taken before task->alloc_lock
- *
- * freezer_create(), freezer_destroy():
- * cgroup_mutex [ by cgroup core ]
- *
- * freezer_can_attach():
- * cgroup_mutex (held by caller of can_attach)
- *
- * freezer_fork() (preserving fork() performance means can't take cgroup_mutex):
- * freezer->lock
- *  sighand->siglock (if the cgroup is freezing)
- *
- * freezer_read():
- * cgroup_mutex
- *  freezer->lock
- *   write_lock css_set_lock (cgroup iterator start)
- *    task->alloc_lock
- *   read_lock css_set_lock (cgroup iterator start)
- *
- * freezer_write() (freeze):
- * cgroup_mutex
- *  freezer->lock
- *   write_lock css_set_lock (cgroup iterator start)
- *    task->alloc_lock
- *   read_lock css_set_lock (cgroup iterator start)
- *    sighand->siglock (fake signal delivery inside freeze_task())
- *
- * freezer_write() (unfreeze):
- * cgroup_mutex
- *  freezer->lock
- *   write_lock css_set_lock (cgroup iterator start)
- *    task->alloc_lock
- *   read_lock css_set_lock (cgroup iterator start)
- *    task->alloc_lock (inside __thaw_task(), prevents race with refrigerator())
- *     sighand->siglock
- */
 static struct cgroup_subsys_state *freezer_create(struct cgroup *cgroup)
 {
 	struct freezer *freezer;
@@ -151,9 +107,13 @@ static void freezer_destroy(struct cgroup *cgroup)
 }
 
 /*
- * The call to cgroup_lock() in the freezer.state write method prevents
- * a write to that file racing against an attach, and hence we don't need
- * to worry about racing against migration.
+ * Tasks can be migrated into a different freezer anytime regardless of its
+ * current state.  freezer_attach() is responsible for making new tasks
+ * conform to the current state.
+ *
+ * Freezer state changes and task migration are synchronized via
+ * @freezer->lock.  freezer_attach() makes the new tasks conform to the
+ * current state and all following state changes can see the new tasks.
  */
 static void freezer_attach(struct cgroup *new_cgrp, struct cgroup_taskset *tset)
 {
@@ -217,8 +177,8 @@ out:
  * partially frozen when we exitted write.  Caller must hold freezer->lock.
  *
  * Task states and freezer state might disagree while tasks are being
- * migrated into @cgroup, so we can't verify task states against @freezer
- * state here.  See freezer_attach() for details.
+ * migrated into or out of @cgroup, so we can't verify task states against
+ * @freezer state here.  See freezer_attach() for details.
  */
 static void update_if_frozen(struct cgroup *cgroup, struct freezer *freezer)
 {
@@ -255,15 +215,11 @@ static int freezer_read(struct cgroup *cgroup, struct cftype *cft,
 	struct freezer *freezer;
 	enum freezer_state state;
 
-	if (!cgroup_lock_live_group(cgroup))
-		return -ENODEV;
-
 	freezer = cgroup_freezer(cgroup);
 	spin_lock_irq(&freezer->lock);
 	update_if_frozen(cgroup, freezer);
 	state = freezer->state;
 	spin_unlock_irq(&freezer->lock);
-	cgroup_unlock();
 
 	seq_puts(m, freezer_state_strs[state]);
 	seq_putc(m, '\n');
@@ -297,6 +253,7 @@ static void freezer_change_state(struct cgroup *cgroup,
 {
 	struct freezer *freezer = cgroup_freezer(cgroup);
 
+	/* also synchronizes against task migration, see freezer_attach() */
 	spin_lock_irq(&freezer->lock);
 
 	switch (goal_state) {
@@ -332,10 +289,7 @@ static int freezer_write(struct cgroup *cgroup,
 	else
 		return -EINVAL;
 
-	if (!cgroup_lock_live_group(cgroup))
-		return -ENODEV;
 	freezer_change_state(cgroup, goal_state);
-	cgroup_unlock();
 	return 0;
 }
 
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* Re: [PATCH 1/7] cgroup: cgroup_subsys->fork() should be called after the task is added to css_set
  2012-10-16 22:28     ` Tejun Heo
@ 2012-10-17  8:28         ` Li Zefan
  -1 siblings, 0 replies; 149+ messages in thread
From: Li Zefan @ 2012-10-17  8:28 UTC (permalink / raw)
  To: Tejun Heo
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	oleg-H+wXaHxf7aLQT0dZR+AlfA, stable-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, rjw-KKrjLPT3xs0,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On 2012/10/17 6:28, Tejun Heo wrote:
> cgroup core has a bug which violates a basic rule about event
> notifications - when a new entity needs to be added, you add that to
> the notification list first and then make the new entity conform to
> the current state.  If done in the reverse order, an event happening
> inbetween will be lost.
> 
> cgroup_subsys->fork() is invoked way before the new task is added to
> the css_set.  Currently, cgroup_freezer is the only user of ->fork()
> and uses it to make new tasks conform to the current state of the
> freezer.  If FROZEN state is requested while fork is in progress
> between cgroup_fork_callbacks() and cgroup_post_fork(), the child
> could escape freezing - the cgroup isn't frozen when ->fork() is
> called and the freezer couldn't see the new task on the css_set.
> 
> This patch moves cgroup_subsys->fork() invocation to
> cgroup_post_fork() after the new task is added to the css_set.
> cgroup_fork_callbacks() is removed.
> 
> Because now a task may be migrated during cgroup_subsys->fork(),
> freezer_fork() is updated so that it adheres to the usual RCU locking
> and the rather pointless comment on why locking can be different there
> is removed (if it doesn't make anything simpler, why even bother?).
> 

I don't think rcu read section is sufficient. It guarantees the data you're
accessing is valid, but the data can be new or can be old.

So a case below is possible:

in freezer_fork():
rcu_read_lock();
freezer = task_freezer(task);
                                  move task from freezer to freezer2
                                  which is in FREEZING/FROZEN state
freezer is in THAWED state,
nothing to do.
rcu_read_unlock();

> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Cc: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Cc: Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org>
> Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> ---
>  include/linux/cgroup.h  |    1 -
>  kernel/cgroup.c         |   62 ++++++++++++++++++++++------------------------
>  kernel/cgroup_freezer.c |   13 +++-------
>  kernel/fork.c           |    9 +------
>  4 files changed, 35 insertions(+), 50 deletions(-)
> 
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index f8a030c..4cd1d0f 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -34,7 +34,6 @@ extern int cgroup_lock_is_held(void);
>  extern bool cgroup_lock_live_group(struct cgroup *cgrp);
>  extern void cgroup_unlock(void);
>  extern void cgroup_fork(struct task_struct *p);
> -extern void cgroup_fork_callbacks(struct task_struct *p);
>  extern void cgroup_post_fork(struct task_struct *p);
>  extern void cgroup_exit(struct task_struct *p, int run_callbacks);
>  extern int cgroupstats_build(struct cgroupstats *stats,
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 13774b3..b7a0171 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -4844,44 +4844,19 @@ void cgroup_fork(struct task_struct *child)
>  }
>  
>  /**
> - * cgroup_fork_callbacks - run fork callbacks
> - * @child: the new task
> - *
> - * Called on a new task very soon before adding it to the
> - * tasklist. No need to take any locks since no-one can
> - * be operating on this task.
> - */
> -void cgroup_fork_callbacks(struct task_struct *child)
> -{
> -	if (need_forkexit_callback) {
> -		int i;
> -		for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
> -			struct cgroup_subsys *ss = subsys[i];
> -
> -			/*
> -			 * forkexit callbacks are only supported for
> -			 * builtin subsystems.
> -			 */
> -			if (!ss || ss->module)
> -				continue;
> -
> -			if (ss->fork)
> -				ss->fork(child);
> -		}
> -	}
> -}
> -
> -/**
>   * cgroup_post_fork - called on a new task after adding it to the task list
>   * @child: the task in question
>   *
> - * Adds the task to the list running through its css_set if necessary.
> - * Has to be after the task is visible on the task list in case we race
> - * with the first call to cgroup_iter_start() - to guarantee that the
> - * new task ends up on its list.
> + * Adds the task to the list running through its css_set if necessary and
> + * call the subsystem fork() callbacks.  Has to be after the task is
> + * visible on the task list in case we race with the first call to
> + * cgroup_iter_start() - to guarantee that the new task ends up on its
> + * list.
>   */
>  void cgroup_post_fork(struct task_struct *child)
>  {
> +	int i;
> +
>  	/*
>  	 * use_task_css_set_links is set to 1 before we walk the tasklist
>  	 * under the tasklist_lock and we read it here after we added the child
> @@ -4910,7 +4885,30 @@ void cgroup_post_fork(struct task_struct *child)
>  		}
>  		write_unlock(&css_set_lock);
>  	}
> +
> +	/*
> +	 * Call ss->fork().  This must happen after @child is linked on
> +	 * css_set; otherwise, @child might change state between ->fork()
> +	 * and addition to css_set.
> +	 */
> +	if (need_forkexit_callback) {
> +		for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
> +			struct cgroup_subsys *ss = subsys[i];
> +
> +			/*
> +			 * fork/exit callbacks are supported only for
> +			 * builtin subsystems and we don't need further
> +			 * synchronization as they never go away.
> +			 */
> +			if (!ss || ss->module)
> +				continue;
> +
> +			if (ss->fork)
> +				ss->fork(child);
> +		}
> +	}
>  }
> +
>  /**
>   * cgroup_exit - detach cgroup from exiting task
>   * @tsk: pointer to task_struct of exiting process
> diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
> index b1724ce..12bfedb 100644
> --- a/kernel/cgroup_freezer.c
> +++ b/kernel/cgroup_freezer.c
> @@ -186,23 +186,15 @@ static void freezer_fork(struct task_struct *task)
>  {
>  	struct freezer *freezer;
>  
> -	/*
> -	 * No lock is needed, since the task isn't on tasklist yet,
> -	 * so it can't be moved to another cgroup, which means the
> -	 * freezer won't be removed and will be valid during this
> -	 * function call.  Nevertheless, apply RCU read-side critical
> -	 * section to suppress RCU lockdep false positives.
> -	 */
>  	rcu_read_lock();
>  	freezer = task_freezer(task);
> -	rcu_read_unlock();
>  
>  	/*
>  	 * The root cgroup is non-freezable, so we can skip the
>  	 * following check.
>  	 */
>  	if (!freezer->css.cgroup->parent)
> -		return;
> +		goto out;
>  
>  	spin_lock_irq(&freezer->lock);
>  	BUG_ON(freezer->state == CGROUP_FROZEN);
> @@ -210,7 +202,10 @@ static void freezer_fork(struct task_struct *task)
>  	/* Locking avoids race with FREEZING -> THAWED transitions. */
>  	if (freezer->state == CGROUP_FREEZING)
>  		freeze_task(task);
> +
>  	spin_unlock_irq(&freezer->lock);
> +out:
> +	rcu_read_unlock();
>  }
>  
>  /*
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 8b20ab7..acc4cb6 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1135,7 +1135,6 @@ static struct task_struct *copy_process(unsigned long clone_flags,
>  {
>  	int retval;
>  	struct task_struct *p;
> -	int cgroup_callbacks_done = 0;
>  
>  	if ((clone_flags & (CLONE_NEWNS|CLONE_FS)) == (CLONE_NEWNS|CLONE_FS))
>  		return ERR_PTR(-EINVAL);
> @@ -1393,12 +1392,6 @@ static struct task_struct *copy_process(unsigned long clone_flags,
>  	INIT_LIST_HEAD(&p->thread_group);
>  	p->task_works = NULL;
>  
> -	/* Now that the task is set up, run cgroup callbacks if
> -	 * necessary. We need to run them before the task is visible
> -	 * on the tasklist. */
> -	cgroup_fork_callbacks(p);
> -	cgroup_callbacks_done = 1;
> -
>  	/* Need tasklist lock for parent etc handling! */
>  	write_lock_irq(&tasklist_lock);
>  
> @@ -1503,7 +1496,7 @@ bad_fork_cleanup_cgroup:
>  #endif
>  	if (clone_flags & CLONE_THREAD)
>  		threadgroup_change_end(current);
> -	cgroup_exit(p, cgroup_callbacks_done);
> +	cgroup_exit(p, 0);
>  	delayacct_tsk_free(p);
>  	module_put(task_thread_info(p)->exec_domain->module);
>  bad_fork_cleanup_count:
> 

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 1/7] cgroup: cgroup_subsys->fork() should be called after the task is added to css_set
@ 2012-10-17  8:28         ` Li Zefan
  0 siblings, 0 replies; 149+ messages in thread
From: Li Zefan @ 2012-10-17  8:28 UTC (permalink / raw)
  To: Tejun Heo; +Cc: rjw, oleg, linux-kernel, containers, cgroups, stable

On 2012/10/17 6:28, Tejun Heo wrote:
> cgroup core has a bug which violates a basic rule about event
> notifications - when a new entity needs to be added, you add that to
> the notification list first and then make the new entity conform to
> the current state.  If done in the reverse order, an event happening
> inbetween will be lost.
> 
> cgroup_subsys->fork() is invoked way before the new task is added to
> the css_set.  Currently, cgroup_freezer is the only user of ->fork()
> and uses it to make new tasks conform to the current state of the
> freezer.  If FROZEN state is requested while fork is in progress
> between cgroup_fork_callbacks() and cgroup_post_fork(), the child
> could escape freezing - the cgroup isn't frozen when ->fork() is
> called and the freezer couldn't see the new task on the css_set.
> 
> This patch moves cgroup_subsys->fork() invocation to
> cgroup_post_fork() after the new task is added to the css_set.
> cgroup_fork_callbacks() is removed.
> 
> Because now a task may be migrated during cgroup_subsys->fork(),
> freezer_fork() is updated so that it adheres to the usual RCU locking
> and the rather pointless comment on why locking can be different there
> is removed (if it doesn't make anything simpler, why even bother?).
> 

I don't think rcu read section is sufficient. It guarantees the data you're
accessing is valid, but the data can be new or can be old.

So a case below is possible:

in freezer_fork():
rcu_read_lock();
freezer = task_freezer(task);
                                  move task from freezer to freezer2
                                  which is in FREEZING/FROZEN state
freezer is in THAWED state,
nothing to do.
rcu_read_unlock();

> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Rafael J. Wysocki <rjw@sisk.pl>
> Cc: stable@vger.kernel.org
> ---
>  include/linux/cgroup.h  |    1 -
>  kernel/cgroup.c         |   62 ++++++++++++++++++++++------------------------
>  kernel/cgroup_freezer.c |   13 +++-------
>  kernel/fork.c           |    9 +------
>  4 files changed, 35 insertions(+), 50 deletions(-)
> 
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index f8a030c..4cd1d0f 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -34,7 +34,6 @@ extern int cgroup_lock_is_held(void);
>  extern bool cgroup_lock_live_group(struct cgroup *cgrp);
>  extern void cgroup_unlock(void);
>  extern void cgroup_fork(struct task_struct *p);
> -extern void cgroup_fork_callbacks(struct task_struct *p);
>  extern void cgroup_post_fork(struct task_struct *p);
>  extern void cgroup_exit(struct task_struct *p, int run_callbacks);
>  extern int cgroupstats_build(struct cgroupstats *stats,
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 13774b3..b7a0171 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -4844,44 +4844,19 @@ void cgroup_fork(struct task_struct *child)
>  }
>  
>  /**
> - * cgroup_fork_callbacks - run fork callbacks
> - * @child: the new task
> - *
> - * Called on a new task very soon before adding it to the
> - * tasklist. No need to take any locks since no-one can
> - * be operating on this task.
> - */
> -void cgroup_fork_callbacks(struct task_struct *child)
> -{
> -	if (need_forkexit_callback) {
> -		int i;
> -		for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
> -			struct cgroup_subsys *ss = subsys[i];
> -
> -			/*
> -			 * forkexit callbacks are only supported for
> -			 * builtin subsystems.
> -			 */
> -			if (!ss || ss->module)
> -				continue;
> -
> -			if (ss->fork)
> -				ss->fork(child);
> -		}
> -	}
> -}
> -
> -/**
>   * cgroup_post_fork - called on a new task after adding it to the task list
>   * @child: the task in question
>   *
> - * Adds the task to the list running through its css_set if necessary.
> - * Has to be after the task is visible on the task list in case we race
> - * with the first call to cgroup_iter_start() - to guarantee that the
> - * new task ends up on its list.
> + * Adds the task to the list running through its css_set if necessary and
> + * call the subsystem fork() callbacks.  Has to be after the task is
> + * visible on the task list in case we race with the first call to
> + * cgroup_iter_start() - to guarantee that the new task ends up on its
> + * list.
>   */
>  void cgroup_post_fork(struct task_struct *child)
>  {
> +	int i;
> +
>  	/*
>  	 * use_task_css_set_links is set to 1 before we walk the tasklist
>  	 * under the tasklist_lock and we read it here after we added the child
> @@ -4910,7 +4885,30 @@ void cgroup_post_fork(struct task_struct *child)
>  		}
>  		write_unlock(&css_set_lock);
>  	}
> +
> +	/*
> +	 * Call ss->fork().  This must happen after @child is linked on
> +	 * css_set; otherwise, @child might change state between ->fork()
> +	 * and addition to css_set.
> +	 */
> +	if (need_forkexit_callback) {
> +		for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
> +			struct cgroup_subsys *ss = subsys[i];
> +
> +			/*
> +			 * fork/exit callbacks are supported only for
> +			 * builtin subsystems and we don't need further
> +			 * synchronization as they never go away.
> +			 */
> +			if (!ss || ss->module)
> +				continue;
> +
> +			if (ss->fork)
> +				ss->fork(child);
> +		}
> +	}
>  }
> +
>  /**
>   * cgroup_exit - detach cgroup from exiting task
>   * @tsk: pointer to task_struct of exiting process
> diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
> index b1724ce..12bfedb 100644
> --- a/kernel/cgroup_freezer.c
> +++ b/kernel/cgroup_freezer.c
> @@ -186,23 +186,15 @@ static void freezer_fork(struct task_struct *task)
>  {
>  	struct freezer *freezer;
>  
> -	/*
> -	 * No lock is needed, since the task isn't on tasklist yet,
> -	 * so it can't be moved to another cgroup, which means the
> -	 * freezer won't be removed and will be valid during this
> -	 * function call.  Nevertheless, apply RCU read-side critical
> -	 * section to suppress RCU lockdep false positives.
> -	 */
>  	rcu_read_lock();
>  	freezer = task_freezer(task);
> -	rcu_read_unlock();
>  
>  	/*
>  	 * The root cgroup is non-freezable, so we can skip the
>  	 * following check.
>  	 */
>  	if (!freezer->css.cgroup->parent)
> -		return;
> +		goto out;
>  
>  	spin_lock_irq(&freezer->lock);
>  	BUG_ON(freezer->state == CGROUP_FROZEN);
> @@ -210,7 +202,10 @@ static void freezer_fork(struct task_struct *task)
>  	/* Locking avoids race with FREEZING -> THAWED transitions. */
>  	if (freezer->state == CGROUP_FREEZING)
>  		freeze_task(task);
> +
>  	spin_unlock_irq(&freezer->lock);
> +out:
> +	rcu_read_unlock();
>  }
>  
>  /*
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 8b20ab7..acc4cb6 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1135,7 +1135,6 @@ static struct task_struct *copy_process(unsigned long clone_flags,
>  {
>  	int retval;
>  	struct task_struct *p;
> -	int cgroup_callbacks_done = 0;
>  
>  	if ((clone_flags & (CLONE_NEWNS|CLONE_FS)) == (CLONE_NEWNS|CLONE_FS))
>  		return ERR_PTR(-EINVAL);
> @@ -1393,12 +1392,6 @@ static struct task_struct *copy_process(unsigned long clone_flags,
>  	INIT_LIST_HEAD(&p->thread_group);
>  	p->task_works = NULL;
>  
> -	/* Now that the task is set up, run cgroup callbacks if
> -	 * necessary. We need to run them before the task is visible
> -	 * on the tasklist. */
> -	cgroup_fork_callbacks(p);
> -	cgroup_callbacks_done = 1;
> -
>  	/* Need tasklist lock for parent etc handling! */
>  	write_lock_irq(&tasklist_lock);
>  
> @@ -1503,7 +1496,7 @@ bad_fork_cleanup_cgroup:
>  #endif
>  	if (clone_flags & CLONE_THREAD)
>  		threadgroup_change_end(current);
> -	cgroup_exit(p, cgroup_callbacks_done);
> +	cgroup_exit(p, 0);
>  	delayacct_tsk_free(p);
>  	module_put(task_thread_info(p)->exec_domain->module);
>  bad_fork_cleanup_count:
> 


^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCHSET cgroup/for-3.8] cgroup_freezer: allow migration regardless of freezer state and update locking
  2012-10-16 22:28 ` Tejun Heo
@ 2012-10-17 19:16     ` Matt Helsley
  -1 siblings, 0 replies; 149+ messages in thread
From: Matt Helsley @ 2012-10-17 19:16 UTC (permalink / raw)
  To: Tejun Heo
  Cc: rjw-KKrjLPT3xs0, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	oleg-H+wXaHxf7aLQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Tue, Oct 16, 2012 at 03:28:39PM -0700, Tejun Heo wrote:
> Hello,
> 
> This patchset updates cgroup_freezer so that
> 
> * Unfreezable kernel tasks don't prevent a cgroup from transitioning
>   into FROZEN from FREEZING.  There's nothing userland can do with or
>   about such tasks.

Seems like a non-problem. Do you have a testcase showing how kernel
threads prevent cgroups that should be freezable from being frozen?
It used to be that you couldn't move kernel threads out of the root
cgroup and the root cgroup was not freezable. So this was never a
problem before. Is there some change here that I'm unaware of?

> 
> * Tasks can be moved in and out of a frozen cgroup.  Tasks are made to
>   conform to the state of the new cgroup during migration.  This
>   behavior makes a lot more sense and removes the use of
>   ->can_attach() which makes co-mounting difficult.

One nice aspect of freezing the set of tasks in the cgroup as well as the
tasks themselves was you had a fixed set of tasks to work with (from
userspace or otherwise). With this change that will no longer be true.
This is a userspace-visible behavior change and userspace code may
have relied on this feature.

Will this work for the CRIU folks? With this patch one of the tasks being
checkpointed could become thawed simply by some other process writing
the pid into a different cgroup's tasks file. In contrast, with the
current code they'd have to explicitly thaw its cgroup first.

Cheers,
	-Matt Helsley

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCHSET cgroup/for-3.8] cgroup_freezer: allow migration regardless of freezer state and update locking
@ 2012-10-17 19:16     ` Matt Helsley
  0 siblings, 0 replies; 149+ messages in thread
From: Matt Helsley @ 2012-10-17 19:16 UTC (permalink / raw)
  To: Tejun Heo; +Cc: rjw, oleg, cgroups, containers, linux-kernel

On Tue, Oct 16, 2012 at 03:28:39PM -0700, Tejun Heo wrote:
> Hello,
> 
> This patchset updates cgroup_freezer so that
> 
> * Unfreezable kernel tasks don't prevent a cgroup from transitioning
>   into FROZEN from FREEZING.  There's nothing userland can do with or
>   about such tasks.

Seems like a non-problem. Do you have a testcase showing how kernel
threads prevent cgroups that should be freezable from being frozen?
It used to be that you couldn't move kernel threads out of the root
cgroup and the root cgroup was not freezable. So this was never a
problem before. Is there some change here that I'm unaware of?

> 
> * Tasks can be moved in and out of a frozen cgroup.  Tasks are made to
>   conform to the state of the new cgroup during migration.  This
>   behavior makes a lot more sense and removes the use of
>   ->can_attach() which makes co-mounting difficult.

One nice aspect of freezing the set of tasks in the cgroup as well as the
tasks themselves was you had a fixed set of tasks to work with (from
userspace or otherwise). With this change that will no longer be true.
This is a userspace-visible behavior change and userspace code may
have relied on this feature.

Will this work for the CRIU folks? With this patch one of the tasks being
checkpointed could become thawed simply by some other process writing
the pid into a different cgroup's tasks file. In contrast, with the
current code they'd have to explicitly thaw its cgroup first.

Cheers,
	-Matt Helsley


^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 1/7] cgroup: cgroup_subsys->fork() should be called after the task is added to css_set
       [not found]         ` <507E6C4B.6000704-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2012-10-18  1:25           ` Li Zefan
  0 siblings, 0 replies; 149+ messages in thread
From: Li Zefan @ 2012-10-18  1:25 UTC (permalink / raw)
  To: Tejun Heo
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	oleg-H+wXaHxf7aLQT0dZR+AlfA, stable-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, rjw-KKrjLPT3xs0,
	cgroups-u79uwXL29TY76Z2rM5mHXA

于 2012/10/17 16:28, Li Zefan 写道:
> On 2012/10/17 6:28, Tejun Heo wrote:
>> cgroup core has a bug which violates a basic rule about event
>> notifications - when a new entity needs to be added, you add that to
>> the notification list first and then make the new entity conform to
>> the current state.  If done in the reverse order, an event happening
>> inbetween will be lost.
>>
>> cgroup_subsys->fork() is invoked way before the new task is added to
>> the css_set.  Currently, cgroup_freezer is the only user of ->fork()
>> and uses it to make new tasks conform to the current state of the
>> freezer.  If FROZEN state is requested while fork is in progress
>> between cgroup_fork_callbacks() and cgroup_post_fork(), the child
>> could escape freezing - the cgroup isn't frozen when ->fork() is
>> called and the freezer couldn't see the new task on the css_set.
>>
>> This patch moves cgroup_subsys->fork() invocation to
>> cgroup_post_fork() after the new task is added to the css_set.
>> cgroup_fork_callbacks() is removed.
>>
>> Because now a task may be migrated during cgroup_subsys->fork(),
>> freezer_fork() is updated so that it adheres to the usual RCU locking
>> and the rather pointless comment on why locking can be different there
>> is removed (if it doesn't make anything simpler, why even bother?).
>>
> 
> I don't think rcu read section is sufficient. It guarantees the data you're
> accessing is valid, but the data can be new or can be old.
> 
> So a case below is possible:
> 
> in freezer_fork():
> rcu_read_lock();
> freezer = task_freezer(task);
>                                   move task from freezer to freezer2
>                                   which is in FREEZING/FROZEN state
> freezer is in THAWED state,
> nothing to do.
> rcu_read_unlock();
> 

forget about it. The task will be correctly frozen when moving to
another cgroup, so nothing unexpected will happen.

for this patch:

Acked-by: Li Zefan <lizefan@huawei.com>

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 1/7] cgroup: cgroup_subsys->fork() should be called after the task is added to css_set
       [not found]         ` <507E6C4B.6000704-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2012-10-18  1:25           ` Li Zefan
  0 siblings, 0 replies; 149+ messages in thread
From: Li Zefan @ 2012-10-18  1:25 UTC (permalink / raw)
  To: Tejun Heo; +Cc: rjw, oleg, linux-kernel, containers, cgroups, stable

于 2012/10/17 16:28, Li Zefan 写道:
> On 2012/10/17 6:28, Tejun Heo wrote:
>> cgroup core has a bug which violates a basic rule about event
>> notifications - when a new entity needs to be added, you add that to
>> the notification list first and then make the new entity conform to
>> the current state.  If done in the reverse order, an event happening
>> inbetween will be lost.
>>
>> cgroup_subsys->fork() is invoked way before the new task is added to
>> the css_set.  Currently, cgroup_freezer is the only user of ->fork()
>> and uses it to make new tasks conform to the current state of the
>> freezer.  If FROZEN state is requested while fork is in progress
>> between cgroup_fork_callbacks() and cgroup_post_fork(), the child
>> could escape freezing - the cgroup isn't frozen when ->fork() is
>> called and the freezer couldn't see the new task on the css_set.
>>
>> This patch moves cgroup_subsys->fork() invocation to
>> cgroup_post_fork() after the new task is added to the css_set.
>> cgroup_fork_callbacks() is removed.
>>
>> Because now a task may be migrated during cgroup_subsys->fork(),
>> freezer_fork() is updated so that it adheres to the usual RCU locking
>> and the rather pointless comment on why locking can be different there
>> is removed (if it doesn't make anything simpler, why even bother?).
>>
> 
> I don't think rcu read section is sufficient. It guarantees the data you're
> accessing is valid, but the data can be new or can be old.
> 
> So a case below is possible:
> 
> in freezer_fork():
> rcu_read_lock();
> freezer = task_freezer(task);
>                                   move task from freezer to freezer2
>                                   which is in FREEZING/FROZEN state
> freezer is in THAWED state,
> nothing to do.
> rcu_read_unlock();
> 

forget about it. The task will be correctly frozen when moving to
another cgroup, so nothing unexpected will happen.

for this patch:

Acked-by: Li Zefan <lizefan@huawei.com>


^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 1/7] cgroup: cgroup_subsys->fork() should be called after the task is added to css_set
@ 2012-10-18  1:25           ` Li Zefan
  0 siblings, 0 replies; 149+ messages in thread
From: Li Zefan @ 2012-10-18  1:25 UTC (permalink / raw)
  To: Tejun Heo
  Cc: rjw-KKrjLPT3xs0, oleg-H+wXaHxf7aLQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, stable-u79uwXL29TY76Z2rM5mHXA

ÓÚ 2012/10/17 16:28, Li Zefan дµÀ:
> On 2012/10/17 6:28, Tejun Heo wrote:
>> cgroup core has a bug which violates a basic rule about event
>> notifications - when a new entity needs to be added, you add that to
>> the notification list first and then make the new entity conform to
>> the current state.  If done in the reverse order, an event happening
>> inbetween will be lost.
>>
>> cgroup_subsys->fork() is invoked way before the new task is added to
>> the css_set.  Currently, cgroup_freezer is the only user of ->fork()
>> and uses it to make new tasks conform to the current state of the
>> freezer.  If FROZEN state is requested while fork is in progress
>> between cgroup_fork_callbacks() and cgroup_post_fork(), the child
>> could escape freezing - the cgroup isn't frozen when ->fork() is
>> called and the freezer couldn't see the new task on the css_set.
>>
>> This patch moves cgroup_subsys->fork() invocation to
>> cgroup_post_fork() after the new task is added to the css_set.
>> cgroup_fork_callbacks() is removed.
>>
>> Because now a task may be migrated during cgroup_subsys->fork(),
>> freezer_fork() is updated so that it adheres to the usual RCU locking
>> and the rather pointless comment on why locking can be different there
>> is removed (if it doesn't make anything simpler, why even bother?).
>>
> 
> I don't think rcu read section is sufficient. It guarantees the data you're
> accessing is valid, but the data can be new or can be old.
> 
> So a case below is possible:
> 
> in freezer_fork():
> rcu_read_lock();
> freezer = task_freezer(task);
>                                   move task from freezer to freezer2
>                                   which is in FREEZING/FROZEN state
> freezer is in THAWED state,
> nothing to do.
> rcu_read_unlock();
> 

forget about it. The task will be correctly frozen when moving to
another cgroup, so nothing unexpected will happen.

for this patch:

Acked-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCHSET cgroup/for-3.8] cgroup_freezer: allow migration regardless of freezer state and update locking
       [not found]     ` <20121017191606.GA6223-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2012-10-18 21:14       ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-18 21:14 UTC (permalink / raw)
  To: Matt Helsley
  Cc: rjw-KKrjLPT3xs0, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	oleg-H+wXaHxf7aLQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA

Hello, Matt.

On Wed, Oct 17, 2012 at 12:16:06PM -0700, Matt Helsley wrote:
> > * Unfreezable kernel tasks don't prevent a cgroup from transitioning
> >   into FROZEN from FREEZING.  There's nothing userland can do with or
> >   about such tasks.
> 
> Seems like a non-problem. Do you have a testcase showing how kernel
> threads prevent cgroups that should be freezable from being frozen?
> It used to be that you couldn't move kernel threads out of the root
> cgroup and the root cgroup was not freezable. So this was never a
> problem before. Is there some change here that I'm unaware of?

Hmmm?  Nothing prevents kthreads from being moved around.  We only
recently added the restriction to prevent migration of the kthreadd
(the one which creates other kthreads).  You can reproduce it with
khungtask or any other !freezable kthreads.

> > * Tasks can be moved in and out of a frozen cgroup.  Tasks are made to
> >   conform to the state of the new cgroup during migration.  This
> >   behavior makes a lot more sense and removes the use of
> >   ->can_attach() which makes co-mounting difficult.
> 
> One nice aspect of freezing the set of tasks in the cgroup as well as the
> tasks themselves was you had a fixed set of tasks to work with (from
> userspace or otherwise). With this change that will no longer be true.
> This is a userspace-visible behavior change and userspace code may
> have relied on this feature.
>
> Will this work for the CRIU folks? With this patch one of the tasks being
> checkpointed could become thawed simply by some other process writing
> the pid into a different cgroup's tasks file. In contrast, with the
> current code they'd have to explicitly thaw its cgroup first.

Yeap, this is a userland visible behavior change.  Don't see any other
way around it tho and if you can't prevent someone else moving things
in/out of your cgroup, what prevents someone else echoing THAWED to
freezer.state?

As for CRIU, it isn't using cgroup freezer at the moment because
frozen tasks can't be ptraced currently.  Something I'm hoping to
change but not sure when it can be done.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCHSET cgroup/for-3.8] cgroup_freezer: allow migration regardless of freezer state and update locking
       [not found]     ` <20121017191606.GA6223-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2012-10-18 21:14       ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-18 21:14 UTC (permalink / raw)
  To: Matt Helsley; +Cc: rjw, oleg, cgroups, containers, linux-kernel

Hello, Matt.

On Wed, Oct 17, 2012 at 12:16:06PM -0700, Matt Helsley wrote:
> > * Unfreezable kernel tasks don't prevent a cgroup from transitioning
> >   into FROZEN from FREEZING.  There's nothing userland can do with or
> >   about such tasks.
> 
> Seems like a non-problem. Do you have a testcase showing how kernel
> threads prevent cgroups that should be freezable from being frozen?
> It used to be that you couldn't move kernel threads out of the root
> cgroup and the root cgroup was not freezable. So this was never a
> problem before. Is there some change here that I'm unaware of?

Hmmm?  Nothing prevents kthreads from being moved around.  We only
recently added the restriction to prevent migration of the kthreadd
(the one which creates other kthreads).  You can reproduce it with
khungtask or any other !freezable kthreads.

> > * Tasks can be moved in and out of a frozen cgroup.  Tasks are made to
> >   conform to the state of the new cgroup during migration.  This
> >   behavior makes a lot more sense and removes the use of
> >   ->can_attach() which makes co-mounting difficult.
> 
> One nice aspect of freezing the set of tasks in the cgroup as well as the
> tasks themselves was you had a fixed set of tasks to work with (from
> userspace or otherwise). With this change that will no longer be true.
> This is a userspace-visible behavior change and userspace code may
> have relied on this feature.
>
> Will this work for the CRIU folks? With this patch one of the tasks being
> checkpointed could become thawed simply by some other process writing
> the pid into a different cgroup's tasks file. In contrast, with the
> current code they'd have to explicitly thaw its cgroup first.

Yeap, this is a userland visible behavior change.  Don't see any other
way around it tho and if you can't prevent someone else moving things
in/out of your cgroup, what prevents someone else echoing THAWED to
freezer.state?

As for CRIU, it isn't using cgroup freezer at the moment because
frozen tasks can't be ptraced currently.  Something I'm hoping to
change but not sure when it can be done.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCHSET cgroup/for-3.8] cgroup_freezer: allow migration regardless of freezer state and update locking
@ 2012-10-18 21:14       ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-18 21:14 UTC (permalink / raw)
  To: Matt Helsley
  Cc: rjw-KKrjLPT3xs0, oleg-H+wXaHxf7aLQT0dZR+AlfA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Hello, Matt.

On Wed, Oct 17, 2012 at 12:16:06PM -0700, Matt Helsley wrote:
> > * Unfreezable kernel tasks don't prevent a cgroup from transitioning
> >   into FROZEN from FREEZING.  There's nothing userland can do with or
> >   about such tasks.
> 
> Seems like a non-problem. Do you have a testcase showing how kernel
> threads prevent cgroups that should be freezable from being frozen?
> It used to be that you couldn't move kernel threads out of the root
> cgroup and the root cgroup was not freezable. So this was never a
> problem before. Is there some change here that I'm unaware of?

Hmmm?  Nothing prevents kthreads from being moved around.  We only
recently added the restriction to prevent migration of the kthreadd
(the one which creates other kthreads).  You can reproduce it with
khungtask or any other !freezable kthreads.

> > * Tasks can be moved in and out of a frozen cgroup.  Tasks are made to
> >   conform to the state of the new cgroup during migration.  This
> >   behavior makes a lot more sense and removes the use of
> >   ->can_attach() which makes co-mounting difficult.
> 
> One nice aspect of freezing the set of tasks in the cgroup as well as the
> tasks themselves was you had a fixed set of tasks to work with (from
> userspace or otherwise). With this change that will no longer be true.
> This is a userspace-visible behavior change and userspace code may
> have relied on this feature.
>
> Will this work for the CRIU folks? With this patch one of the tasks being
> checkpointed could become thawed simply by some other process writing
> the pid into a different cgroup's tasks file. In contrast, with the
> current code they'd have to explicitly thaw its cgroup first.

Yeap, this is a userland visible behavior change.  Don't see any other
way around it tho and if you can't prevent someone else moving things
in/out of your cgroup, what prevents someone else echoing THAWED to
freezer.state?

As for CRIU, it isn't using cgroup freezer at the moment because
frozen tasks can't be ptraced currently.  Something I'm hoping to
change but not sure when it can be done.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCHSET cgroup/for-3.8] cgroup_freezer: allow migration regardless of freezer state and update locking
  2012-10-18 21:14       ` Tejun Heo
@ 2012-10-18 22:21           ` Matt Helsley
  -1 siblings, 0 replies; 149+ messages in thread
From: Matt Helsley @ 2012-10-18 22:21 UTC (permalink / raw)
  To: Tejun Heo
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	oleg-H+wXaHxf7aLQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	rjw-KKrjLPT3xs0, Matt Helsley, cgroups-u79uwXL29TY76Z2rM5mHXA

On Thu, Oct 18, 2012 at 02:14:34PM -0700, Tejun Heo wrote:
> Hello, Matt.
> 
> On Wed, Oct 17, 2012 at 12:16:06PM -0700, Matt Helsley wrote:
> > > * Unfreezable kernel tasks don't prevent a cgroup from transitioning
> > >   into FROZEN from FREEZING.  There's nothing userland can do with or
> > >   about such tasks.
> > 
> > Seems like a non-problem. Do you have a testcase showing how kernel
> > threads prevent cgroups that should be freezable from being frozen?
> > It used to be that you couldn't move kernel threads out of the root
> > cgroup and the root cgroup was not freezable. So this was never a
> > problem before. Is there some change here that I'm unaware of?
> 
> Hmmm?  Nothing prevents kthreads from being moved around.  We only
> recently added the restriction to prevent migration of the kthreadd
> (the one which creates other kthreads).  You can reproduce it with
> khungtask or any other !freezable kthreads.

Ok. I don't immmediately see why that'd be a good idea but it's
possible..

> 
> > > * Tasks can be moved in and out of a frozen cgroup.  Tasks are made to
> > >   conform to the state of the new cgroup during migration.  This
> > >   behavior makes a lot more sense and removes the use of
> > >   ->can_attach() which makes co-mounting difficult.
> > 
> > One nice aspect of freezing the set of tasks in the cgroup as well as the
> > tasks themselves was you had a fixed set of tasks to work with (from
> > userspace or otherwise). With this change that will no longer be true.
> > This is a userspace-visible behavior change and userspace code may
> > have relied on this feature.
> >
> > Will this work for the CRIU folks? With this patch one of the tasks being
> > checkpointed could become thawed simply by some other process writing
> > the pid into a different cgroup's tasks file. In contrast, with the
> > current code they'd have to explicitly thaw its cgroup first.
> 
> Yeap, this is a userland visible behavior change.  Don't see any other
> way around it tho and if you can't prevent someone else moving things
> in/out of your cgroup, what prevents someone else echoing THAWED to
> freezer.state?

The userspace code would at least have to be aware that the group could
be frozen. That alone ought to encourage some tool writers to think
about how to share the cgroup's freezer.state and possibly other bits
with other tools and/or the user(s).

In general though, yes, there's alot of potential for users and tools
to step all over each others' toes when it comes to manipulating cgroup
state. Even tools aware of the freezer could still write THAWED to the
freezer state when another tool does not expect it.

<off_topic>
"Whoever" did the freeze needs a way to "lock" access to the freezer state,
change the freezer state itself, do something (e.g. CRIU) which relies on
the state not changing, and then release the lock. Plus the lock has to be
released by the kernel if "Whoever" dies without the chance to release it.
So I was thinking who holds the lock and its lifetime could be represented
in userspace by file descriptor(s).
</off_topic>

> As for CRIU, it isn't using cgroup freezer at the moment because
> frozen tasks can't be ptraced currently.  Something I'm hoping to
> change but not sure when it can be done.

OK, but that doesn't mean the frozen nature of the task list itself
won't be useful in the future.

Cheers,
	-Matt Helsley

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCHSET cgroup/for-3.8] cgroup_freezer: allow migration regardless of freezer state and update locking
@ 2012-10-18 22:21           ` Matt Helsley
  0 siblings, 0 replies; 149+ messages in thread
From: Matt Helsley @ 2012-10-18 22:21 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Matt Helsley, rjw, oleg, cgroups, containers, linux-kernel

On Thu, Oct 18, 2012 at 02:14:34PM -0700, Tejun Heo wrote:
> Hello, Matt.
> 
> On Wed, Oct 17, 2012 at 12:16:06PM -0700, Matt Helsley wrote:
> > > * Unfreezable kernel tasks don't prevent a cgroup from transitioning
> > >   into FROZEN from FREEZING.  There's nothing userland can do with or
> > >   about such tasks.
> > 
> > Seems like a non-problem. Do you have a testcase showing how kernel
> > threads prevent cgroups that should be freezable from being frozen?
> > It used to be that you couldn't move kernel threads out of the root
> > cgroup and the root cgroup was not freezable. So this was never a
> > problem before. Is there some change here that I'm unaware of?
> 
> Hmmm?  Nothing prevents kthreads from being moved around.  We only
> recently added the restriction to prevent migration of the kthreadd
> (the one which creates other kthreads).  You can reproduce it with
> khungtask or any other !freezable kthreads.

Ok. I don't immmediately see why that'd be a good idea but it's
possible..

> 
> > > * Tasks can be moved in and out of a frozen cgroup.  Tasks are made to
> > >   conform to the state of the new cgroup during migration.  This
> > >   behavior makes a lot more sense and removes the use of
> > >   ->can_attach() which makes co-mounting difficult.
> > 
> > One nice aspect of freezing the set of tasks in the cgroup as well as the
> > tasks themselves was you had a fixed set of tasks to work with (from
> > userspace or otherwise). With this change that will no longer be true.
> > This is a userspace-visible behavior change and userspace code may
> > have relied on this feature.
> >
> > Will this work for the CRIU folks? With this patch one of the tasks being
> > checkpointed could become thawed simply by some other process writing
> > the pid into a different cgroup's tasks file. In contrast, with the
> > current code they'd have to explicitly thaw its cgroup first.
> 
> Yeap, this is a userland visible behavior change.  Don't see any other
> way around it tho and if you can't prevent someone else moving things
> in/out of your cgroup, what prevents someone else echoing THAWED to
> freezer.state?

The userspace code would at least have to be aware that the group could
be frozen. That alone ought to encourage some tool writers to think
about how to share the cgroup's freezer.state and possibly other bits
with other tools and/or the user(s).

In general though, yes, there's alot of potential for users and tools
to step all over each others' toes when it comes to manipulating cgroup
state. Even tools aware of the freezer could still write THAWED to the
freezer state when another tool does not expect it.

<off_topic>
"Whoever" did the freeze needs a way to "lock" access to the freezer state,
change the freezer state itself, do something (e.g. CRIU) which relies on
the state not changing, and then release the lock. Plus the lock has to be
released by the kernel if "Whoever" dies without the chance to release it.
So I was thinking who holds the lock and its lifetime could be represented
in userspace by file descriptor(s).
</off_topic>

> As for CRIU, it isn't using cgroup freezer at the moment because
> frozen tasks can't be ptraced currently.  Something I'm hoping to
> change but not sure when it can be done.

OK, but that doesn't mean the frozen nature of the task list itself
won't be useful in the future.

Cheers,
	-Matt Helsley


^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCHSET cgroup/for-3.8] cgroup_freezer: allow migration regardless of freezer state and update locking
  2012-10-18 22:21           ` Matt Helsley
@ 2012-10-18 22:35               ` Tejun Heo
  -1 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-18 22:35 UTC (permalink / raw)
  To: Matt Helsley
  Cc: rjw-KKrjLPT3xs0, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	oleg-H+wXaHxf7aLQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA

Hello, Matt.

On Thu, Oct 18, 2012 at 03:21:55PM -0700, Matt Helsley wrote:
> > Hmmm?  Nothing prevents kthreads from being moved around.  We only
> > recently added the restriction to prevent migration of the kthreadd
> > (the one which creates other kthreads).  You can reproduce it with
> > khungtask or any other !freezable kthreads.
> 
> Ok. I don't immmediately see why that'd be a good idea but it's
> possible..

Beats me too.  There were talks about restricting all kthreads from
being moved out of the root cgroup.  Dunno.  Maybe.

> <off_topic>
> "Whoever" did the freeze needs a way to "lock" access to the freezer state,
> change the freezer state itself, do something (e.g. CRIU) which relies on
> the state not changing, and then release the lock. Plus the lock has to be
> released by the kernel if "Whoever" dies without the chance to release it.
> So I was thinking who holds the lock and its lifetime could be represented
> in userspace by file descriptor(s).
> </off_topic>

Long term, I don't think it's feasible to continue to use cgroup
kernel interface as the multiplexing layer among different users.
cgroup core simply doesn't have enough context or infrastructure to
support such usages.  It's somewhere between being a pure interface to
the provided kernel feature and fully multiplexed interface which
userland applications can depend on for arbitration.  It tries to have
the appearance of the latter but fails.

I think the only sane way would be having a userland arbitrator which
owns the kernel interface to itself and makes policy decisions from
userland clients and configures cgroup accordingly.

> > As for CRIU, it isn't using cgroup freezer at the moment because
> > frozen tasks can't be ptraced currently.  Something I'm hoping to
> > change but not sure when it can be done.
> 
> OK, but that doesn't mean the frozen nature of the task list itself
> won't be useful in the future.

I think that should be solved via userland policies rather than
depending on this accidental cgroup_freezer feature.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCHSET cgroup/for-3.8] cgroup_freezer: allow migration regardless of freezer state and update locking
@ 2012-10-18 22:35               ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-18 22:35 UTC (permalink / raw)
  To: Matt Helsley; +Cc: rjw, oleg, cgroups, containers, linux-kernel

Hello, Matt.

On Thu, Oct 18, 2012 at 03:21:55PM -0700, Matt Helsley wrote:
> > Hmmm?  Nothing prevents kthreads from being moved around.  We only
> > recently added the restriction to prevent migration of the kthreadd
> > (the one which creates other kthreads).  You can reproduce it with
> > khungtask or any other !freezable kthreads.
> 
> Ok. I don't immmediately see why that'd be a good idea but it's
> possible..

Beats me too.  There were talks about restricting all kthreads from
being moved out of the root cgroup.  Dunno.  Maybe.

> <off_topic>
> "Whoever" did the freeze needs a way to "lock" access to the freezer state,
> change the freezer state itself, do something (e.g. CRIU) which relies on
> the state not changing, and then release the lock. Plus the lock has to be
> released by the kernel if "Whoever" dies without the chance to release it.
> So I was thinking who holds the lock and its lifetime could be represented
> in userspace by file descriptor(s).
> </off_topic>

Long term, I don't think it's feasible to continue to use cgroup
kernel interface as the multiplexing layer among different users.
cgroup core simply doesn't have enough context or infrastructure to
support such usages.  It's somewhere between being a pure interface to
the provided kernel feature and fully multiplexed interface which
userland applications can depend on for arbitration.  It tries to have
the appearance of the latter but fails.

I think the only sane way would be having a userland arbitrator which
owns the kernel interface to itself and makes policy decisions from
userland clients and configures cgroup accordingly.

> > As for CRIU, it isn't using cgroup freezer at the moment because
> > frozen tasks can't be ptraced currently.  Something I'm hoping to
> > change but not sure when it can be done.
> 
> OK, but that doesn't mean the frozen nature of the task list itself
> won't be useful in the future.

I think that should be solved via userland policies rather than
depending on this accidental cgroup_freezer feature.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCHSET cgroup/for-3.8] cgroup_freezer: allow migration regardless of freezer state and update locking
  2012-10-18 22:35               ` Tejun Heo
@ 2012-10-18 23:47                   ` Matt Helsley
  -1 siblings, 0 replies; 149+ messages in thread
From: Matt Helsley @ 2012-10-18 23:47 UTC (permalink / raw)
  To: Tejun Heo
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	oleg-H+wXaHxf7aLQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	rjw-KKrjLPT3xs0, Matt Helsley, cgroups-u79uwXL29TY76Z2rM5mHXA

On Thu, Oct 18, 2012 at 03:35:17PM -0700, Tejun Heo wrote:
> Hello, Matt.
> 
> On Thu, Oct 18, 2012 at 03:21:55PM -0700, Matt Helsley wrote:
> > > Hmmm?  Nothing prevents kthreads from being moved around.  We only
> > > recently added the restriction to prevent migration of the kthreadd
> > > (the one which creates other kthreads).  You can reproduce it with
> > > khungtask or any other !freezable kthreads.
> > 
> > Ok. I don't immmediately see why that'd be a good idea but it's
> > possible..
> 
> Beats me too.  There were talks about restricting all kthreads from
> being moved out of the root cgroup.  Dunno.  Maybe.
> 
> > <off_topic>
> > "Whoever" did the freeze needs a way to "lock" access to the freezer state,
> > change the freezer state itself, do something (e.g. CRIU) which relies on
> > the state not changing, and then release the lock. Plus the lock has to be
> > released by the kernel if "Whoever" dies without the chance to release it.
> > So I was thinking who holds the lock and its lifetime could be represented
> > in userspace by file descriptor(s).
> > </off_topic>
> 
> Long term, I don't think it's feasible to continue to use cgroup
> kernel interface as the multiplexing layer among different users.
> cgroup core simply doesn't have enough context or infrastructure to
> support such usages.  It's somewhere between being a pure interface to
> the provided kernel feature and fully multiplexed interface which
> userland applications can depend on for arbitration.  It tries to have
> the appearance of the latter but fails.
> 
> I think the only sane way would be having a userland arbitrator which
> owns the kernel interface to itself and makes policy decisions from
> userland clients and configures cgroup accordingly.

OK -- yeah, solving the arbitration issue in userspace might be best.

> 
> > > As for CRIU, it isn't using cgroup freezer at the moment because
> > > frozen tasks can't be ptraced currently.  Something I'm hoping to
> > > change but not sure when it can be done.
> > 
> > OK, but that doesn't mean the frozen nature of the task list itself
> > won't be useful in the future.
> 
> I think that should be solved via userland policies rather than
> depending on this accidental cgroup_freezer feature.

It's not accidental -- it *was an intended feature*:

  22 # This bash script tests freezer code by starting a long sleep process.
  23 # The sleep process is frozen. We then move the sleep process to a THAWED
  24 # cgroup. We expect moving the sleep process to fail.

( This atrocious link is the easiest way to see the testcase:
http://ltp.git.sourceforge.net/git/gitweb.cgi?p=ltp/ltp.git;a=blob;f=testcases/kernel/controllers/freezer/freeze_move_thaw.sh;h=b2d5a83506a8425b117be9ff775d9f73d2d58393;hb=0436176dbfe6fdaaf97590d2356eb23d2739b2c2
)

It was intended for something very much like the CRIU case I mentioned
:).

Cheers,
	-Matt Helsley

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCHSET cgroup/for-3.8] cgroup_freezer: allow migration regardless of freezer state and update locking
@ 2012-10-18 23:47                   ` Matt Helsley
  0 siblings, 0 replies; 149+ messages in thread
From: Matt Helsley @ 2012-10-18 23:47 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Matt Helsley, rjw, oleg, cgroups, containers, linux-kernel

On Thu, Oct 18, 2012 at 03:35:17PM -0700, Tejun Heo wrote:
> Hello, Matt.
> 
> On Thu, Oct 18, 2012 at 03:21:55PM -0700, Matt Helsley wrote:
> > > Hmmm?  Nothing prevents kthreads from being moved around.  We only
> > > recently added the restriction to prevent migration of the kthreadd
> > > (the one which creates other kthreads).  You can reproduce it with
> > > khungtask or any other !freezable kthreads.
> > 
> > Ok. I don't immmediately see why that'd be a good idea but it's
> > possible..
> 
> Beats me too.  There were talks about restricting all kthreads from
> being moved out of the root cgroup.  Dunno.  Maybe.
> 
> > <off_topic>
> > "Whoever" did the freeze needs a way to "lock" access to the freezer state,
> > change the freezer state itself, do something (e.g. CRIU) which relies on
> > the state not changing, and then release the lock. Plus the lock has to be
> > released by the kernel if "Whoever" dies without the chance to release it.
> > So I was thinking who holds the lock and its lifetime could be represented
> > in userspace by file descriptor(s).
> > </off_topic>
> 
> Long term, I don't think it's feasible to continue to use cgroup
> kernel interface as the multiplexing layer among different users.
> cgroup core simply doesn't have enough context or infrastructure to
> support such usages.  It's somewhere between being a pure interface to
> the provided kernel feature and fully multiplexed interface which
> userland applications can depend on for arbitration.  It tries to have
> the appearance of the latter but fails.
> 
> I think the only sane way would be having a userland arbitrator which
> owns the kernel interface to itself and makes policy decisions from
> userland clients and configures cgroup accordingly.

OK -- yeah, solving the arbitration issue in userspace might be best.

> 
> > > As for CRIU, it isn't using cgroup freezer at the moment because
> > > frozen tasks can't be ptraced currently.  Something I'm hoping to
> > > change but not sure when it can be done.
> > 
> > OK, but that doesn't mean the frozen nature of the task list itself
> > won't be useful in the future.
> 
> I think that should be solved via userland policies rather than
> depending on this accidental cgroup_freezer feature.

It's not accidental -- it *was an intended feature*:

  22 # This bash script tests freezer code by starting a long sleep process.
  23 # The sleep process is frozen. We then move the sleep process to a THAWED
  24 # cgroup. We expect moving the sleep process to fail.

( This atrocious link is the easiest way to see the testcase:
http://ltp.git.sourceforge.net/git/gitweb.cgi?p=ltp/ltp.git;a=blob;f=testcases/kernel/controllers/freezer/freeze_move_thaw.sh;h=b2d5a83506a8425b117be9ff775d9f73d2d58393;hb=0436176dbfe6fdaaf97590d2356eb23d2739b2c2
)

It was intended for something very much like the CRIU case I mentioned
:).

Cheers,
	-Matt Helsley


^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCHSET cgroup/for-3.8] cgroup_freezer: allow migration regardless of freezer state and update locking
       [not found]                   ` <20121018234726.GC6223-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2012-10-19  0:01                     ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-19  0:01 UTC (permalink / raw)
  To: Matt Helsley
  Cc: rjw-KKrjLPT3xs0, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	oleg-H+wXaHxf7aLQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA

Hello, Matt.

On Thu, Oct 18, 2012 at 04:47:26PM -0700, Matt Helsley wrote:
> > I think the only sane way would be having a userland arbitrator which
> > owns the kernel interface to itself and makes policy decisions from
> > userland clients and configures cgroup accordingly.
> 
> OK -- yeah, solving the arbitration issue in userspace might be best.

Yeah, I think we need that but there currently isn't any concrete (or
even floppy) plan for it.  If anyone is interested, beer is on me. :)

> > I think that should be solved via userland policies rather than
> > depending on this accidental cgroup_freezer feature.
> 
> It's not accidental -- it *was an intended feature*:
> 
>   22 # This bash script tests freezer code by starting a long sleep process.
>   23 # The sleep process is frozen. We then move the sleep process to a THAWED
>   24 # cgroup. We expect moving the sleep process to fail.
> 
> ( This atrocious link is the easiest way to see the testcase:
> http://ltp.git.sourceforge.net/git/gitweb.cgi?p=ltp/ltp.git;a=blob;f=testcases/kernel/controllers/freezer/freeze_move_thaw.sh;h=b2d5a83506a8425b117be9ff775d9f73d2d58393;hb=0436176dbfe6fdaaf97590d2356eb23d2739b2c2
> )
> 
> It was intended for something very much like the CRIU case I mentioned
> :).

I probably have chosen the wrong word.  I mean that it's a hierarchy
management feature implemented at the wrong layer.  If we want to
provide cgroup migration locking, it should be implemented at the
cgroup core layer as a controller independent feature.  It's kinda
interesting the incorrect layering here almost directly caused messy
locking problem too.  I hope we don't need it with (the imaginary)
proper userland arbitration but even if we do implementing it in
cgroup proper as a separate feature would be a lot less messy.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCHSET cgroup/for-3.8] cgroup_freezer: allow migration regardless of freezer state and update locking
       [not found]                   ` <20121018234726.GC6223-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2012-10-19  0:01                     ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-19  0:01 UTC (permalink / raw)
  To: Matt Helsley; +Cc: rjw, oleg, cgroups, containers, linux-kernel

Hello, Matt.

On Thu, Oct 18, 2012 at 04:47:26PM -0700, Matt Helsley wrote:
> > I think the only sane way would be having a userland arbitrator which
> > owns the kernel interface to itself and makes policy decisions from
> > userland clients and configures cgroup accordingly.
> 
> OK -- yeah, solving the arbitration issue in userspace might be best.

Yeah, I think we need that but there currently isn't any concrete (or
even floppy) plan for it.  If anyone is interested, beer is on me. :)

> > I think that should be solved via userland policies rather than
> > depending on this accidental cgroup_freezer feature.
> 
> It's not accidental -- it *was an intended feature*:
> 
>   22 # This bash script tests freezer code by starting a long sleep process.
>   23 # The sleep process is frozen. We then move the sleep process to a THAWED
>   24 # cgroup. We expect moving the sleep process to fail.
> 
> ( This atrocious link is the easiest way to see the testcase:
> http://ltp.git.sourceforge.net/git/gitweb.cgi?p=ltp/ltp.git;a=blob;f=testcases/kernel/controllers/freezer/freeze_move_thaw.sh;h=b2d5a83506a8425b117be9ff775d9f73d2d58393;hb=0436176dbfe6fdaaf97590d2356eb23d2739b2c2
> )
> 
> It was intended for something very much like the CRIU case I mentioned
> :).

I probably have chosen the wrong word.  I mean that it's a hierarchy
management feature implemented at the wrong layer.  If we want to
provide cgroup migration locking, it should be implemented at the
cgroup core layer as a controller independent feature.  It's kinda
interesting the incorrect layering here almost directly caused messy
locking problem too.  I hope we don't need it with (the imaginary)
proper userland arbitration but even if we do implementing it in
cgroup proper as a separate feature would be a lot less messy.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCHSET cgroup/for-3.8] cgroup_freezer: allow migration regardless of freezer state and update locking
@ 2012-10-19  0:01                     ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-19  0:01 UTC (permalink / raw)
  To: Matt Helsley
  Cc: rjw-KKrjLPT3xs0, oleg-H+wXaHxf7aLQT0dZR+AlfA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Hello, Matt.

On Thu, Oct 18, 2012 at 04:47:26PM -0700, Matt Helsley wrote:
> > I think the only sane way would be having a userland arbitrator which
> > owns the kernel interface to itself and makes policy decisions from
> > userland clients and configures cgroup accordingly.
> 
> OK -- yeah, solving the arbitration issue in userspace might be best.

Yeah, I think we need that but there currently isn't any concrete (or
even floppy) plan for it.  If anyone is interested, beer is on me. :)

> > I think that should be solved via userland policies rather than
> > depending on this accidental cgroup_freezer feature.
> 
> It's not accidental -- it *was an intended feature*:
> 
>   22 # This bash script tests freezer code by starting a long sleep process.
>   23 # The sleep process is frozen. We then move the sleep process to a THAWED
>   24 # cgroup. We expect moving the sleep process to fail.
> 
> ( This atrocious link is the easiest way to see the testcase:
> http://ltp.git.sourceforge.net/git/gitweb.cgi?p=ltp/ltp.git;a=blob;f=testcases/kernel/controllers/freezer/freeze_move_thaw.sh;h=b2d5a83506a8425b117be9ff775d9f73d2d58393;hb=0436176dbfe6fdaaf97590d2356eb23d2739b2c2
> )
> 
> It was intended for something very much like the CRIU case I mentioned
> :).

I probably have chosen the wrong word.  I mean that it's a hierarchy
management feature implemented at the wrong layer.  If we want to
provide cgroup migration locking, it should be implemented at the
cgroup core layer as a controller independent feature.  It's kinda
interesting the incorrect layering here almost directly caused messy
locking problem too.  I hope we don't need it with (the imaginary)
proper userland arbitration but even if we do implementing it in
cgroup proper as a separate feature would be a lot less messy.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCHSET cgroup/for-3.8] cgroup_freezer: allow migration regardless of freezer state and update locking
       [not found]                     ` <20121019000153.GZ13370-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-10-19  1:29                       ` Matt Helsley
  0 siblings, 0 replies; 149+ messages in thread
From: Matt Helsley @ 2012-10-19  1:29 UTC (permalink / raw)
  To: Tejun Heo
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	oleg-H+wXaHxf7aLQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	rjw-KKrjLPT3xs0, Matt Helsley, cgroups-u79uwXL29TY76Z2rM5mHXA

On Thu, Oct 18, 2012 at 05:01:53PM -0700, Tejun Heo wrote:

<snip>
> I probably have chosen the wrong word.  I mean that it's a hierarchy
> management feature implemented at the wrong layer.  If we want to
> provide cgroup migration locking, it should be implemented at the
> cgroup core layer as a controller independent feature.  It's kinda
> interesting the incorrect layering here almost directly caused messy
> locking problem too.  I hope we don't need it with (the imaginary)
> proper userland arbitration but even if we do implementing it in
> cgroup proper as a separate feature would be a lot less messy.

Yeah, that would be a nice cleanup too. I guess the ultra-careful way to
remove this feature would be something like:

	Add an internal migration restriction (which may or may not be
		exported as a userspace interface in a subsequent
		patch).

	Make the cgroup freezer use it.

	Make the cgroup freezer WARN_ONCE() when the subsystem is first
		mounted. Indicates that the behavior is going to
		change.

	... time passes ...

	Remove the use of the migration "lock" from the cgroup freezer
		and the WARN_ONCE().

Which would also make the feature more obvious.

Cheers,
	-Matt

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCHSET cgroup/for-3.8] cgroup_freezer: allow migration regardless of freezer state and update locking
       [not found]                     ` <20121019000153.GZ13370-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-10-19  1:29                       ` Matt Helsley
  0 siblings, 0 replies; 149+ messages in thread
From: Matt Helsley @ 2012-10-19  1:29 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Matt Helsley, rjw, oleg, cgroups, containers, linux-kernel

On Thu, Oct 18, 2012 at 05:01:53PM -0700, Tejun Heo wrote:

<snip>
> I probably have chosen the wrong word.  I mean that it's a hierarchy
> management feature implemented at the wrong layer.  If we want to
> provide cgroup migration locking, it should be implemented at the
> cgroup core layer as a controller independent feature.  It's kinda
> interesting the incorrect layering here almost directly caused messy
> locking problem too.  I hope we don't need it with (the imaginary)
> proper userland arbitration but even if we do implementing it in
> cgroup proper as a separate feature would be a lot less messy.

Yeah, that would be a nice cleanup too. I guess the ultra-careful way to
remove this feature would be something like:

	Add an internal migration restriction (which may or may not be
		exported as a userspace interface in a subsequent
		patch).

	Make the cgroup freezer use it.

	Make the cgroup freezer WARN_ONCE() when the subsystem is first
		mounted. Indicates that the behavior is going to
		change.

	... time passes ...

	Remove the use of the migration "lock" from the cgroup freezer
		and the WARN_ONCE().

Which would also make the feature more obvious.

Cheers,
	-Matt


^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCHSET cgroup/for-3.8] cgroup_freezer: allow migration regardless of freezer state and update locking
@ 2012-10-19  1:29                       ` Matt Helsley
  0 siblings, 0 replies; 149+ messages in thread
From: Matt Helsley @ 2012-10-19  1:29 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Matt Helsley, rjw-KKrjLPT3xs0, oleg-H+wXaHxf7aLQT0dZR+AlfA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Thu, Oct 18, 2012 at 05:01:53PM -0700, Tejun Heo wrote:

<snip>
> I probably have chosen the wrong word.  I mean that it's a hierarchy
> management feature implemented at the wrong layer.  If we want to
> provide cgroup migration locking, it should be implemented at the
> cgroup core layer as a controller independent feature.  It's kinda
> interesting the incorrect layering here almost directly caused messy
> locking problem too.  I hope we don't need it with (the imaginary)
> proper userland arbitration but even if we do implementing it in
> cgroup proper as a separate feature would be a lot less messy.

Yeah, that would be a nice cleanup too. I guess the ultra-careful way to
remove this feature would be something like:

	Add an internal migration restriction (which may or may not be
		exported as a userspace interface in a subsequent
		patch).

	Make the cgroup freezer use it.

	Make the cgroup freezer WARN_ONCE() when the subsystem is first
		mounted. Indicates that the behavior is going to
		change.

	... time passes ...

	Remove the use of the migration "lock" from the cgroup freezer
		and the WARN_ONCE().

Which would also make the feature more obvious.

Cheers,
	-Matt

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCHSET cgroup/for-3.8] cgroup_freezer: allow migration regardless of freezer state and update locking
       [not found] ` <1350426526-14254-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (7 preceding siblings ...)
  2012-10-17 19:16     ` Matt Helsley
@ 2012-10-19 16:54   ` Rafael J. Wysocki
  8 siblings, 0 replies; 149+ messages in thread
From: Rafael J. Wysocki @ 2012-10-19 16:54 UTC (permalink / raw)
  To: Tejun Heo
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	oleg-H+wXaHxf7aLQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Tuesday 16 of October 2012 15:28:39 Tejun Heo wrote:
> Hello,
> 
> This patchset updates cgroup_freezer so that
> 
> * Unfreezable kernel tasks don't prevent a cgroup from transitioning
>   into FROZEN from FREEZING.  There's nothing userland can do with or
>   about such tasks.
> 
> * Tasks can be moved in and out of a frozen cgroup.  Tasks are made to
>   conform to the state of the new cgroup during migration.  This
>   behavior makes a lot more sense and removes the use of
>   ->can_attach() which makes co-mounting difficult.
> 
> * Remove use of cgroup_lock_live_group().  Grabbing cgroup_lock from
>   outside cgroup proper creates a painful locking dependency and is
>   being phased out.  With the above behavior change, removing
>   dependency on cgroup_lock is pretty easy.  IMHO, it was simply the
>   wrong behavior to implement which forced the wrong implementation.
> 
> This patchset contains the following seven patches.
> 
>  0001-cgroup-cgroup_subsys-fork-should-be-called-after-the.patch
>  0002-freezer-add-missing-mb-s-to-freezer_count-and-freeze.patch
>  0003-cgroup_freezer-make-it-official-that-writes-to-freez.patch
>  0004-cgroup_freezer-don-t-stall-transition-to-FROZEN-for-.patch
>  0005-cgroup_freezer-allow-moving-tasks-in-and-out-of-a-fr.patch
>  0006-cgroup_freezer-prepare-update_if_frozen-for-locking-.patch
>  0007-cgroup_freezer-don-t-use-cgroup_lock_live_group.patch
> 
> 0001 is a fix for a rather embarrassing bug in cgroup core.  It does
> things in the wrong order leaving a window for racing during fork.
> 
> 0002 adds a missing mb() around freezing condition updates / checks.
> 
> 0003-0004 make cgroup_freezer ignore unfreezable kernel tasks and
> handle PF_FREEZER_SKIP correctly.
> 
> 0005 allows migrating tasks in and out of a frozen cgroup.
> 
> 0006-0007 remove the use of cgroup_lock_live_group().
> 
> This patchset is on top of v3.7-rc1 and available in the following git
> branch.
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-cgroup_freezer-locking

It seems that no one has any comments. :-)

Are you going to prepare a branch for me to pull from?

Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCHSET cgroup/for-3.8] cgroup_freezer: allow migration regardless of freezer state and update locking
       [not found] ` <1350426526-14254-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2012-10-19 16:54   ` Rafael J. Wysocki
  2012-10-16 22:28     ` Tejun Heo
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 149+ messages in thread
From: Rafael J. Wysocki @ 2012-10-19 16:54 UTC (permalink / raw)
  To: Tejun Heo; +Cc: oleg, linux-kernel, lizefan, containers, cgroups

On Tuesday 16 of October 2012 15:28:39 Tejun Heo wrote:
> Hello,
> 
> This patchset updates cgroup_freezer so that
> 
> * Unfreezable kernel tasks don't prevent a cgroup from transitioning
>   into FROZEN from FREEZING.  There's nothing userland can do with or
>   about such tasks.
> 
> * Tasks can be moved in and out of a frozen cgroup.  Tasks are made to
>   conform to the state of the new cgroup during migration.  This
>   behavior makes a lot more sense and removes the use of
>   ->can_attach() which makes co-mounting difficult.
> 
> * Remove use of cgroup_lock_live_group().  Grabbing cgroup_lock from
>   outside cgroup proper creates a painful locking dependency and is
>   being phased out.  With the above behavior change, removing
>   dependency on cgroup_lock is pretty easy.  IMHO, it was simply the
>   wrong behavior to implement which forced the wrong implementation.
> 
> This patchset contains the following seven patches.
> 
>  0001-cgroup-cgroup_subsys-fork-should-be-called-after-the.patch
>  0002-freezer-add-missing-mb-s-to-freezer_count-and-freeze.patch
>  0003-cgroup_freezer-make-it-official-that-writes-to-freez.patch
>  0004-cgroup_freezer-don-t-stall-transition-to-FROZEN-for-.patch
>  0005-cgroup_freezer-allow-moving-tasks-in-and-out-of-a-fr.patch
>  0006-cgroup_freezer-prepare-update_if_frozen-for-locking-.patch
>  0007-cgroup_freezer-don-t-use-cgroup_lock_live_group.patch
> 
> 0001 is a fix for a rather embarrassing bug in cgroup core.  It does
> things in the wrong order leaving a window for racing during fork.
> 
> 0002 adds a missing mb() around freezing condition updates / checks.
> 
> 0003-0004 make cgroup_freezer ignore unfreezable kernel tasks and
> handle PF_FREEZER_SKIP correctly.
> 
> 0005 allows migrating tasks in and out of a frozen cgroup.
> 
> 0006-0007 remove the use of cgroup_lock_live_group().
> 
> This patchset is on top of v3.7-rc1 and available in the following git
> branch.
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-cgroup_freezer-locking

It seems that no one has any comments. :-)

Are you going to prepare a branch for me to pull from?

Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCHSET cgroup/for-3.8] cgroup_freezer: allow migration regardless of freezer state and update locking
@ 2012-10-19 16:54   ` Rafael J. Wysocki
  0 siblings, 0 replies; 149+ messages in thread
From: Rafael J. Wysocki @ 2012-10-19 16:54 UTC (permalink / raw)
  To: Tejun Heo
  Cc: oleg-H+wXaHxf7aLQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	lizefan-hv44wF8Li93QT0dZR+AlfA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Tuesday 16 of October 2012 15:28:39 Tejun Heo wrote:
> Hello,
> 
> This patchset updates cgroup_freezer so that
> 
> * Unfreezable kernel tasks don't prevent a cgroup from transitioning
>   into FROZEN from FREEZING.  There's nothing userland can do with or
>   about such tasks.
> 
> * Tasks can be moved in and out of a frozen cgroup.  Tasks are made to
>   conform to the state of the new cgroup during migration.  This
>   behavior makes a lot more sense and removes the use of
>   ->can_attach() which makes co-mounting difficult.
> 
> * Remove use of cgroup_lock_live_group().  Grabbing cgroup_lock from
>   outside cgroup proper creates a painful locking dependency and is
>   being phased out.  With the above behavior change, removing
>   dependency on cgroup_lock is pretty easy.  IMHO, it was simply the
>   wrong behavior to implement which forced the wrong implementation.
> 
> This patchset contains the following seven patches.
> 
>  0001-cgroup-cgroup_subsys-fork-should-be-called-after-the.patch
>  0002-freezer-add-missing-mb-s-to-freezer_count-and-freeze.patch
>  0003-cgroup_freezer-make-it-official-that-writes-to-freez.patch
>  0004-cgroup_freezer-don-t-stall-transition-to-FROZEN-for-.patch
>  0005-cgroup_freezer-allow-moving-tasks-in-and-out-of-a-fr.patch
>  0006-cgroup_freezer-prepare-update_if_frozen-for-locking-.patch
>  0007-cgroup_freezer-don-t-use-cgroup_lock_live_group.patch
> 
> 0001 is a fix for a rather embarrassing bug in cgroup core.  It does
> things in the wrong order leaving a window for racing during fork.
> 
> 0002 adds a missing mb() around freezing condition updates / checks.
> 
> 0003-0004 make cgroup_freezer ignore unfreezable kernel tasks and
> handle PF_FREEZER_SKIP correctly.
> 
> 0005 allows migrating tasks in and out of a frozen cgroup.
> 
> 0006-0007 remove the use of cgroup_lock_live_group().
> 
> This patchset is on top of v3.7-rc1 and available in the following git
> branch.
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-cgroup_freezer-locking

It seems that no one has any comments. :-)

Are you going to prepare a branch for me to pull from?

Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCHSET cgroup/for-3.8] cgroup_freezer: allow migration regardless of freezer state and update locking
       [not found]                       ` <20121019012945.GD6223-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2012-10-19 20:02                         ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-19 20:02 UTC (permalink / raw)
  To: Matt Helsley
  Cc: rjw-KKrjLPT3xs0, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	oleg-H+wXaHxf7aLQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA

Hey, Matt.

On Thu, Oct 18, 2012 at 06:29:45PM -0700, Matt Helsley wrote:
> Yeah, that would be a nice cleanup too. I guess the ultra-careful way to
> remove this feature would be something like:
> 
> 	Add an internal migration restriction (which may or may not be
> 		exported as a userspace interface in a subsequent
> 		patch).
> 
> 	Make the cgroup freezer use it.
> 
> 	Make the cgroup freezer WARN_ONCE() when the subsystem is first
> 		mounted. Indicates that the behavior is going to
> 		change.
> 
> 	... time passes ...
> 
> 	Remove the use of the migration "lock" from the cgroup freezer
> 		and the WARN_ONCE().
> 
> Which would also make the feature more obvious.

I don't think I'm gonna go that far for this.  I don't think we need
cgroup migration locking after all.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCHSET cgroup/for-3.8] cgroup_freezer: allow migration regardless of freezer state and update locking
       [not found]                       ` <20121019012945.GD6223-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2012-10-19 20:02                         ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-19 20:02 UTC (permalink / raw)
  To: Matt Helsley; +Cc: rjw, oleg, cgroups, containers, linux-kernel

Hey, Matt.

On Thu, Oct 18, 2012 at 06:29:45PM -0700, Matt Helsley wrote:
> Yeah, that would be a nice cleanup too. I guess the ultra-careful way to
> remove this feature would be something like:
> 
> 	Add an internal migration restriction (which may or may not be
> 		exported as a userspace interface in a subsequent
> 		patch).
> 
> 	Make the cgroup freezer use it.
> 
> 	Make the cgroup freezer WARN_ONCE() when the subsystem is first
> 		mounted. Indicates that the behavior is going to
> 		change.
> 
> 	... time passes ...
> 
> 	Remove the use of the migration "lock" from the cgroup freezer
> 		and the WARN_ONCE().
> 
> Which would also make the feature more obvious.

I don't think I'm gonna go that far for this.  I don't think we need
cgroup migration locking after all.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCHSET cgroup/for-3.8] cgroup_freezer: allow migration regardless of freezer state and update locking
@ 2012-10-19 20:02                         ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-19 20:02 UTC (permalink / raw)
  To: Matt Helsley
  Cc: rjw-KKrjLPT3xs0, oleg-H+wXaHxf7aLQT0dZR+AlfA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Hey, Matt.

On Thu, Oct 18, 2012 at 06:29:45PM -0700, Matt Helsley wrote:
> Yeah, that would be a nice cleanup too. I guess the ultra-careful way to
> remove this feature would be something like:
> 
> 	Add an internal migration restriction (which may or may not be
> 		exported as a userspace interface in a subsequent
> 		patch).
> 
> 	Make the cgroup freezer use it.
> 
> 	Make the cgroup freezer WARN_ONCE() when the subsystem is first
> 		mounted. Indicates that the behavior is going to
> 		change.
> 
> 	... time passes ...
> 
> 	Remove the use of the migration "lock" from the cgroup freezer
> 		and the WARN_ONCE().
> 
> Which would also make the feature more obvious.

I don't think I'm gonna go that far for this.  I don't think we need
cgroup migration locking after all.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCHSET cgroup/for-3.8] cgroup_freezer: allow migration regardless of freezer state and update locking
       [not found]   ` <2424755.Pg0O5tTD3k-sKB8Sp2ER+y1GS7QM15AGw@public.gmane.org>
@ 2012-10-19 20:04     ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-19 20:04 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	oleg-H+wXaHxf7aLQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Fri, Oct 19, 2012 at 06:54:42PM +0200, Rafael J. Wysocki wrote:
> >  git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-cgroup_freezer-locking
> 
> It seems that no one has any comments. :-)
> 
> Are you going to prepare a branch for me to pull from?

I'm waiting for Oleg to poke some holes in the synchronization
department but if that doesn't happen you can pull from the above
branch.  I'll pull it into cgroup/for-3.8 too.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCHSET cgroup/for-3.8] cgroup_freezer: allow migration regardless of freezer state and update locking
       [not found]   ` <2424755.Pg0O5tTD3k-sKB8Sp2ER+y1GS7QM15AGw@public.gmane.org>
@ 2012-10-19 20:04     ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-19 20:04 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: oleg, linux-kernel, lizefan, containers, cgroups

On Fri, Oct 19, 2012 at 06:54:42PM +0200, Rafael J. Wysocki wrote:
> >  git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-cgroup_freezer-locking
> 
> It seems that no one has any comments. :-)
> 
> Are you going to prepare a branch for me to pull from?

I'm waiting for Oleg to poke some holes in the synchronization
department but if that doesn't happen you can pull from the above
branch.  I'll pull it into cgroup/for-3.8 too.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCHSET cgroup/for-3.8] cgroup_freezer: allow migration regardless of freezer state and update locking
@ 2012-10-19 20:04     ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-19 20:04 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: oleg-H+wXaHxf7aLQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	lizefan-hv44wF8Li93QT0dZR+AlfA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Fri, Oct 19, 2012 at 06:54:42PM +0200, Rafael J. Wysocki wrote:
> >  git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-cgroup_freezer-locking
> 
> It seems that no one has any comments. :-)
> 
> Are you going to prepare a branch for me to pull from?

I'm waiting for Oleg to poke some holes in the synchronization
department but if that doesn't happen you can pull from the above
branch.  I'll pull it into cgroup/for-3.8 too.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 1/7] cgroup: cgroup_subsys->fork() should be called after the task is added to css_set
  2012-10-16 22:28     ` Tejun Heo
@ 2012-10-21 19:11         ` Oleg Nesterov
  -1 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-21 19:11 UTC (permalink / raw)
  To: Tejun Heo
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA, rjw-KKrjLPT3xs0,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On 10/16, Tejun Heo wrote:
>
> cgroup_subsys->fork() is invoked way before the new task is added to
> the css_set.

Plus, it called before this task (and even its task_struct) was fully
initialized.

All I can say is: personally I like this patch, it also simplifies
copy_process().

But I am in no position to ack it. I seem to forget everything (not
too much ;) I ever knew about this code.



A couple of off-topic questions. With or without this patch I do not
understand cgroup_fork,

	/*
	 * We don't need to task_lock() current because current->cgroups
	 * can't be changed concurrently here. The parent obviously hasn't
	 * exited and called cgroup_exit(), and we are synchronized against
	 * cgroup migration through threadgroup_change_begin().
	 */
	child->cgroups = current->cgroups;
	get_css_set(child->cgroups);

How so? threadgroup_change_begin() is only called if CLONE_THREAD.
So in theory this copy + atomic_add looks racy...




And it seems that fork() can race with cgroup iterator. post_fork
will notice use_task_css_set_links, but until then the child belongs
to the parent's css and it is not "visible" to iterator (and right
after cgroup_fork() it is not visible to do_each_thread() if
use_task_css_set_links is not set).

For example. Suppose that the child migrates to another cgroup after
copy_process() makes it visible to the user-space. Then update_if_frozen
sets CGROUP_FROZEN (again, cgroup_iter_next do not see this child).

Now, post_fork calls freezer_fork() and hits BUG_ON(CGROUP_FROZEN).

But again, I do not blame this patch.


I am starting to think again about a big-rw-lock around copy_process.
Recently I tried to add one around dup_mmap for uprobes, but perhaps
cgroups can use it too...

Oleg.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 1/7] cgroup: cgroup_subsys->fork() should be called after the task is added to css_set
@ 2012-10-21 19:11         ` Oleg Nesterov
  0 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-21 19:11 UTC (permalink / raw)
  To: Tejun Heo; +Cc: rjw, linux-kernel, lizefan, containers, cgroups, stable

On 10/16, Tejun Heo wrote:
>
> cgroup_subsys->fork() is invoked way before the new task is added to
> the css_set.

Plus, it called before this task (and even its task_struct) was fully
initialized.

All I can say is: personally I like this patch, it also simplifies
copy_process().

But I am in no position to ack it. I seem to forget everything (not
too much ;) I ever knew about this code.



A couple of off-topic questions. With or without this patch I do not
understand cgroup_fork,

	/*
	 * We don't need to task_lock() current because current->cgroups
	 * can't be changed concurrently here. The parent obviously hasn't
	 * exited and called cgroup_exit(), and we are synchronized against
	 * cgroup migration through threadgroup_change_begin().
	 */
	child->cgroups = current->cgroups;
	get_css_set(child->cgroups);

How so? threadgroup_change_begin() is only called if CLONE_THREAD.
So in theory this copy + atomic_add looks racy...




And it seems that fork() can race with cgroup iterator. post_fork
will notice use_task_css_set_links, but until then the child belongs
to the parent's css and it is not "visible" to iterator (and right
after cgroup_fork() it is not visible to do_each_thread() if
use_task_css_set_links is not set).

For example. Suppose that the child migrates to another cgroup after
copy_process() makes it visible to the user-space. Then update_if_frozen
sets CGROUP_FROZEN (again, cgroup_iter_next do not see this child).

Now, post_fork calls freezer_fork() and hits BUG_ON(CGROUP_FROZEN).

But again, I do not blame this patch.


I am starting to think again about a big-rw-lock around copy_process.
Recently I tried to add one around dup_mmap for uprobes, but perhaps
cgroups can use it too...

Oleg.


^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCHSET cgroup/for-3.8] cgroup_freezer: allow migration regardless of freezer state and update locking
  2012-10-19 20:04     ` Tejun Heo
@ 2012-10-21 19:18         ` Oleg Nesterov
  -1 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-21 19:18 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Rafael J. Wysocki, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 10/19, Tejun Heo wrote:
>
> On Fri, Oct 19, 2012 at 06:54:42PM +0200, Rafael J. Wysocki wrote:
> > >  git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-cgroup_freezer-locking
> >
> > It seems that no one has any comments. :-)
> >
> > Are you going to prepare a branch for me to pull from?
>
> I'm waiting for Oleg to poke some holes in the synchronization
> department but if that doesn't happen you can pull from the above
> branch.  I'll pull it into cgroup/for-3.8 too.

Just in case, I see nothing bad ;)

But let me repeat I never understood this code in details, and I
already forget everything.

Oleg.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCHSET cgroup/for-3.8] cgroup_freezer: allow migration regardless of freezer state and update locking
@ 2012-10-21 19:18         ` Oleg Nesterov
  0 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-21 19:18 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Rafael J. Wysocki, linux-kernel, lizefan, containers, cgroups

On 10/19, Tejun Heo wrote:
>
> On Fri, Oct 19, 2012 at 06:54:42PM +0200, Rafael J. Wysocki wrote:
> > >  git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-cgroup_freezer-locking
> >
> > It seems that no one has any comments. :-)
> >
> > Are you going to prepare a branch for me to pull from?
>
> I'm waiting for Oleg to poke some holes in the synchronization
> department but if that doesn't happen you can pull from the above
> branch.  I'll pull it into cgroup/for-3.8 too.

Just in case, I see nothing bad ;)

But let me repeat I never understood this code in details, and I
already forget everything.

Oleg.


^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 1/7] cgroup: cgroup_subsys->fork() should be called after the task is added to css_set
       [not found]         ` <20121021191141.GA26218-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2012-10-21 19:22           ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-21 19:22 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA, rjw-KKrjLPT3xs0,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Hello, Oleg.

On Sun, Oct 21, 2012 at 09:11:41PM +0200, Oleg Nesterov wrote:
> A couple of off-topic questions. With or without this patch I do not
> understand cgroup_fork,
> 
> 	/*
> 	 * We don't need to task_lock() current because current->cgroups
> 	 * can't be changed concurrently here. The parent obviously hasn't
> 	 * exited and called cgroup_exit(), and we are synchronized against
> 	 * cgroup migration through threadgroup_change_begin().
> 	 */
> 	child->cgroups = current->cgroups;
> 	get_css_set(child->cgroups);
> 
> How so? threadgroup_change_begin() is only called if CLONE_THREAD.
> So in theory this copy + atomic_add looks racy...

It's a bug.  Revert patches to restore task_lock() are already queued
in cgroup/for-3.7-fixes.

> And it seems that fork() can race with cgroup iterator. post_fork
> will notice use_task_css_set_links, but until then the child belongs
> to the parent's css and it is not "visible" to iterator (and right
> after cgroup_fork() it is not visible to do_each_thread() if
> use_task_css_set_links is not set).
> 
> For example. Suppose that the child migrates to another cgroup after
> copy_process() makes it visible to the user-space. Then update_if_frozen
> sets CGROUP_FROZEN (again, cgroup_iter_next do not see this child).
> 
> Now, post_fork calls freezer_fork() and hits BUG_ON(CGROUP_FROZEN).
> 
> But again, I do not blame this patch.

I'm planning to update it to,

* Clear ->cgroup to %NULL during copy_process().  A new task isn't
  considered to be cgroup-wise active at this point.  Userland is not
  allowed to migrate it and none of cgroup callbacks will be called.

* Do all the initialization in post_fork so that a task is only allowd
  to be migrated and operated on cgroup-wise after ->fork() is
  complete.

> I am starting to think again about a big-rw-lock around copy_process.
> Recently I tried to add one around dup_mmap for uprobes, but perhaps
> cgroups can use it too...

If some other subsystems need it, maybe just make threadgroup locking
coarser?  I *think* I can make cgroup work correctly without a agiant
rwlock there but if someone else needs it we can definitely hitch.

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 1/7] cgroup: cgroup_subsys->fork() should be called after the task is added to css_set
       [not found]         ` <20121021191141.GA26218-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2012-10-21 19:22           ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-21 19:22 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: rjw, linux-kernel, lizefan, containers, cgroups, stable

Hello, Oleg.

On Sun, Oct 21, 2012 at 09:11:41PM +0200, Oleg Nesterov wrote:
> A couple of off-topic questions. With or without this patch I do not
> understand cgroup_fork,
> 
> 	/*
> 	 * We don't need to task_lock() current because current->cgroups
> 	 * can't be changed concurrently here. The parent obviously hasn't
> 	 * exited and called cgroup_exit(), and we are synchronized against
> 	 * cgroup migration through threadgroup_change_begin().
> 	 */
> 	child->cgroups = current->cgroups;
> 	get_css_set(child->cgroups);
> 
> How so? threadgroup_change_begin() is only called if CLONE_THREAD.
> So in theory this copy + atomic_add looks racy...

It's a bug.  Revert patches to restore task_lock() are already queued
in cgroup/for-3.7-fixes.

> And it seems that fork() can race with cgroup iterator. post_fork
> will notice use_task_css_set_links, but until then the child belongs
> to the parent's css and it is not "visible" to iterator (and right
> after cgroup_fork() it is not visible to do_each_thread() if
> use_task_css_set_links is not set).
> 
> For example. Suppose that the child migrates to another cgroup after
> copy_process() makes it visible to the user-space. Then update_if_frozen
> sets CGROUP_FROZEN (again, cgroup_iter_next do not see this child).
> 
> Now, post_fork calls freezer_fork() and hits BUG_ON(CGROUP_FROZEN).
> 
> But again, I do not blame this patch.

I'm planning to update it to,

* Clear ->cgroup to %NULL during copy_process().  A new task isn't
  considered to be cgroup-wise active at this point.  Userland is not
  allowed to migrate it and none of cgroup callbacks will be called.

* Do all the initialization in post_fork so that a task is only allowd
  to be migrated and operated on cgroup-wise after ->fork() is
  complete.

> I am starting to think again about a big-rw-lock around copy_process.
> Recently I tried to add one around dup_mmap for uprobes, but perhaps
> cgroups can use it too...

If some other subsystems need it, maybe just make threadgroup locking
coarser?  I *think* I can make cgroup work correctly without a agiant
rwlock there but if someone else needs it we can definitely hitch.

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 1/7] cgroup: cgroup_subsys->fork() should be called after the task is added to css_set
@ 2012-10-21 19:22           ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-21 19:22 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: rjw-KKrjLPT3xs0, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	lizefan-hv44wF8Li93QT0dZR+AlfA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, stable-u79uwXL29TY76Z2rM5mHXA

Hello, Oleg.

On Sun, Oct 21, 2012 at 09:11:41PM +0200, Oleg Nesterov wrote:
> A couple of off-topic questions. With or without this patch I do not
> understand cgroup_fork,
> 
> 	/*
> 	 * We don't need to task_lock() current because current->cgroups
> 	 * can't be changed concurrently here. The parent obviously hasn't
> 	 * exited and called cgroup_exit(), and we are synchronized against
> 	 * cgroup migration through threadgroup_change_begin().
> 	 */
> 	child->cgroups = current->cgroups;
> 	get_css_set(child->cgroups);
> 
> How so? threadgroup_change_begin() is only called if CLONE_THREAD.
> So in theory this copy + atomic_add looks racy...

It's a bug.  Revert patches to restore task_lock() are already queued
in cgroup/for-3.7-fixes.

> And it seems that fork() can race with cgroup iterator. post_fork
> will notice use_task_css_set_links, but until then the child belongs
> to the parent's css and it is not "visible" to iterator (and right
> after cgroup_fork() it is not visible to do_each_thread() if
> use_task_css_set_links is not set).
> 
> For example. Suppose that the child migrates to another cgroup after
> copy_process() makes it visible to the user-space. Then update_if_frozen
> sets CGROUP_FROZEN (again, cgroup_iter_next do not see this child).
> 
> Now, post_fork calls freezer_fork() and hits BUG_ON(CGROUP_FROZEN).
> 
> But again, I do not blame this patch.

I'm planning to update it to,

* Clear ->cgroup to %NULL during copy_process().  A new task isn't
  considered to be cgroup-wise active at this point.  Userland is not
  allowed to migrate it and none of cgroup callbacks will be called.

* Do all the initialization in post_fork so that a task is only allowd
  to be migrated and operated on cgroup-wise after ->fork() is
  complete.

> I am starting to think again about a big-rw-lock around copy_process.
> Recently I tried to add one around dup_mmap for uprobes, but perhaps
> cgroups can use it too...

If some other subsystems need it, maybe just make threadgroup locking
coarser?  I *think* I can make cgroup work correctly without a agiant
rwlock there but if someone else needs it we can definitely hitch.

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCHSET cgroup/for-3.8] cgroup_freezer: allow migration regardless of freezer state and update locking
  2012-10-21 19:18         ` Oleg Nesterov
@ 2012-10-21 19:24             ` Tejun Heo
  -1 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-21 19:24 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Rafael J. Wysocki, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Hello, Oleg.

On Sun, Oct 21, 2012 at 09:18:53PM +0200, Oleg Nesterov wrote:
> > I'm waiting for Oleg to poke some holes in the synchronization
> > department but if that doesn't happen you can pull from the above
> > branch.  I'll pull it into cgroup/for-3.8 too.
> 
> Just in case, I see nothing bad ;)
> 
> But let me repeat I never understood this code in details, and I
> already forget everything.

Heh, if you can't spot something wrong in the freezer state memory
barrier dancing, I'm happy. :)

Thanks a lot for taking a look.

--
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCHSET cgroup/for-3.8] cgroup_freezer: allow migration regardless of freezer state and update locking
@ 2012-10-21 19:24             ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-21 19:24 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Rafael J. Wysocki, linux-kernel, lizefan, containers, cgroups

Hello, Oleg.

On Sun, Oct 21, 2012 at 09:18:53PM +0200, Oleg Nesterov wrote:
> > I'm waiting for Oleg to poke some holes in the synchronization
> > department but if that doesn't happen you can pull from the above
> > branch.  I'll pull it into cgroup/for-3.8 too.
> 
> Just in case, I see nothing bad ;)
> 
> But let me repeat I never understood this code in details, and I
> already forget everything.

Heh, if you can't spot something wrong in the freezer state memory
barrier dancing, I'm happy. :)

Thanks a lot for taking a look.

--
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 2/7] freezer: add missing mb's to freezer_count() and freezer_should_skip()
  2012-10-16 22:28     ` Tejun Heo
@ 2012-10-22 17:44         ` Oleg Nesterov
  -1 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-22 17:44 UTC (permalink / raw)
  To: Tejun Heo
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA, rjw-KKrjLPT3xs0,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On 10/16, Tejun Heo wrote:
>
> +/**
> + * freezer_count - tell freezer to stop ignoring %current
> + *
> + * Undo freezer_do_not_count().  It tells freezers that %current should be
> + * considered again and tries to freeze if freezing condition is already in
> + * effect.
>   */
>  static inline void freezer_count(void)
>  {
>  	current->flags &= ~PF_FREEZER_SKIP;
> +	/*
> +	 * If freezing is in progress, the following paired with smp_mb()
> +	 * in freezer_should_skip() ensures that either we see %true
> +	 * freezing() or freezer_should_skip() sees !PF_FREEZER_SKIP.
> +	 */
> +	smp_mb();
>  	try_to_freeze();

I agree, this looks like a bug fix.

> -static inline int freezer_should_skip(struct task_struct *p)
> +static inline bool freezer_should_skip(struct task_struct *p)
>  {
> -	return !!(p->flags & PF_FREEZER_SKIP);
> +	/*
> +	 * The following smp_mb() paired with the one in freezer_count()
> +	 * ensures that either freezer_count() sees %true freezing() or we
> +	 * see cleared %PF_FREEZER_SKIP and return %false.  This makes it
> +	 * impossible for a task to slip frozen state testing after
> +	 * clearing %PF_FREEZER_SKIP.
> +	 */
> +	smp_mb();
> +	return p->flags & PF_FREEZER_SKIP;
>  }

I am not sure we really need smp_mb() here. Speaking of cgroup_freezer,
it seems that a single mb() after "->state = CGROUP_FREEZING" should be
enough.

But even if I am right, I agree that it looks better in freezer_should_skip()
and this is more robust.

So I think the patch is fine and fixes the bug.





We probably have another similar race. If ptrace_stop()->may_ptrace_stop()
returns false, the task does

	__set_current_state(TASK_RUNNING);
	// no mb in between
	try_to_freeze();

And this can race with task_is_stopped_or_traced() check in the same way.
(of course this is only theoretical).


do_signal_stop() is probably fine, we can rely on ->siglock.

Oleg.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 2/7] freezer: add missing mb's to freezer_count() and freezer_should_skip()
@ 2012-10-22 17:44         ` Oleg Nesterov
  0 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-22 17:44 UTC (permalink / raw)
  To: Tejun Heo; +Cc: rjw, linux-kernel, lizefan, containers, cgroups, stable

On 10/16, Tejun Heo wrote:
>
> +/**
> + * freezer_count - tell freezer to stop ignoring %current
> + *
> + * Undo freezer_do_not_count().  It tells freezers that %current should be
> + * considered again and tries to freeze if freezing condition is already in
> + * effect.
>   */
>  static inline void freezer_count(void)
>  {
>  	current->flags &= ~PF_FREEZER_SKIP;
> +	/*
> +	 * If freezing is in progress, the following paired with smp_mb()
> +	 * in freezer_should_skip() ensures that either we see %true
> +	 * freezing() or freezer_should_skip() sees !PF_FREEZER_SKIP.
> +	 */
> +	smp_mb();
>  	try_to_freeze();

I agree, this looks like a bug fix.

> -static inline int freezer_should_skip(struct task_struct *p)
> +static inline bool freezer_should_skip(struct task_struct *p)
>  {
> -	return !!(p->flags & PF_FREEZER_SKIP);
> +	/*
> +	 * The following smp_mb() paired with the one in freezer_count()
> +	 * ensures that either freezer_count() sees %true freezing() or we
> +	 * see cleared %PF_FREEZER_SKIP and return %false.  This makes it
> +	 * impossible for a task to slip frozen state testing after
> +	 * clearing %PF_FREEZER_SKIP.
> +	 */
> +	smp_mb();
> +	return p->flags & PF_FREEZER_SKIP;
>  }

I am not sure we really need smp_mb() here. Speaking of cgroup_freezer,
it seems that a single mb() after "->state = CGROUP_FREEZING" should be
enough.

But even if I am right, I agree that it looks better in freezer_should_skip()
and this is more robust.

So I think the patch is fine and fixes the bug.





We probably have another similar race. If ptrace_stop()->may_ptrace_stop()
returns false, the task does

	__set_current_state(TASK_RUNNING);
	// no mb in between
	try_to_freeze();

And this can race with task_is_stopped_or_traced() check in the same way.
(of course this is only theoretical).


do_signal_stop() is probably fine, we can rely on ->siglock.

Oleg.


^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 1/7] cgroup: cgroup_subsys->fork() should be called after the task is added to css_set
  2012-10-21 19:22           ` Tejun Heo
@ 2012-10-22 18:04               ` Oleg Nesterov
  -1 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-22 18:04 UTC (permalink / raw)
  To: Tejun Heo
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA, rjw-KKrjLPT3xs0,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Hi Tejun,

On 10/21, Tejun Heo wrote:
>
> On Sun, Oct 21, 2012 at 09:11:41PM +0200, Oleg Nesterov wrote:
>
> > And it seems that fork() can race with cgroup iterator. post_fork
> > will notice use_task_css_set_links, but until then the child belongs
> > to the parent's css and it is not "visible" to iterator (and right
> > after cgroup_fork() it is not visible to do_each_thread() if
> > use_task_css_set_links is not set).
> >
> > For example. Suppose that the child migrates to another cgroup after
> > copy_process() makes it visible to the user-space. Then update_if_frozen
> > sets CGROUP_FROZEN (again, cgroup_iter_next do not see this child).
> >
> > Now, post_fork calls freezer_fork() and hits BUG_ON(CGROUP_FROZEN).
> >
> > But again, I do not blame this patch.
>
> I'm planning to update it to,
>
> * Clear ->cgroup to %NULL during copy_process().

I completely agree. new_child->cgroups copied from parent looks simply
strange until post_fork. If nothing else, the new task is still under
construction by the time cgroup_fork() is called.

> > I am starting to think again about a big-rw-lock around copy_process.
> > Recently I tried to add one around dup_mmap for uprobes, but perhaps
> > cgroups can use it too...
>
> If some other subsystems need it, maybe just make threadgroup locking
> coarser?

What do you mean?

> I *think* I can make cgroup work correctly without a agiant
> rwlock

Yes, probably cgroup doesn't really need it. Although we could probably
kill signal->group_rwsem, but this is minor and "write-lock" will be much
slower.

Oleg.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 1/7] cgroup: cgroup_subsys->fork() should be called after the task is added to css_set
@ 2012-10-22 18:04               ` Oleg Nesterov
  0 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-22 18:04 UTC (permalink / raw)
  To: Tejun Heo; +Cc: rjw, linux-kernel, lizefan, containers, cgroups, stable

Hi Tejun,

On 10/21, Tejun Heo wrote:
>
> On Sun, Oct 21, 2012 at 09:11:41PM +0200, Oleg Nesterov wrote:
>
> > And it seems that fork() can race with cgroup iterator. post_fork
> > will notice use_task_css_set_links, but until then the child belongs
> > to the parent's css and it is not "visible" to iterator (and right
> > after cgroup_fork() it is not visible to do_each_thread() if
> > use_task_css_set_links is not set).
> >
> > For example. Suppose that the child migrates to another cgroup after
> > copy_process() makes it visible to the user-space. Then update_if_frozen
> > sets CGROUP_FROZEN (again, cgroup_iter_next do not see this child).
> >
> > Now, post_fork calls freezer_fork() and hits BUG_ON(CGROUP_FROZEN).
> >
> > But again, I do not blame this patch.
>
> I'm planning to update it to,
>
> * Clear ->cgroup to %NULL during copy_process().

I completely agree. new_child->cgroups copied from parent looks simply
strange until post_fork. If nothing else, the new task is still under
construction by the time cgroup_fork() is called.

> > I am starting to think again about a big-rw-lock around copy_process.
> > Recently I tried to add one around dup_mmap for uprobes, but perhaps
> > cgroups can use it too...
>
> If some other subsystems need it, maybe just make threadgroup locking
> coarser?

What do you mean?

> I *think* I can make cgroup work correctly without a agiant
> rwlock

Yes, probably cgroup doesn't really need it. Although we could probably
kill signal->group_rwsem, but this is minor and "write-lock" will be much
slower.

Oleg.


^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 4/7] cgroup_freezer: don't stall transition to FROZEN for PF_NOFREEZE or PF_FREEZER_SKIP tasks
       [not found]     ` <1350426526-14254-5-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2012-10-22 18:34       ` Oleg Nesterov
  0 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-22 18:34 UTC (permalink / raw)
  To: Tejun Heo
  Cc: rjw-KKrjLPT3xs0, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 10/16, Tejun Heo wrote:
>
> cgroup_freezer doesn't transition from FREEZING to FROZEN if the
> cgroup contains PF_NOFREEZE tasks or tasks sleeping with
> PF_FREEZER_SKIP set.

And thus the patch looks like another bugfix to me.


Just one question, and this question is offtopic again,

> Only kernel tasks can be non-freezable (PF_NOFREEZE)

Hmm. We seem to "leak" PF_NOFREEZE if a kernel thread execs?
Perhaps do_execve_common() should do set_freezable() before return.

Or, at least, simply clear this flag along with PF_KTHREAD in
flush_old_exec().

Oleg.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 4/7] cgroup_freezer: don't stall transition to FROZEN for PF_NOFREEZE or PF_FREEZER_SKIP tasks
       [not found]     ` <1350426526-14254-5-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2012-10-22 18:34       ` Oleg Nesterov
  0 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-22 18:34 UTC (permalink / raw)
  To: Tejun Heo; +Cc: rjw, linux-kernel, lizefan, containers, cgroups

On 10/16, Tejun Heo wrote:
>
> cgroup_freezer doesn't transition from FREEZING to FROZEN if the
> cgroup contains PF_NOFREEZE tasks or tasks sleeping with
> PF_FREEZER_SKIP set.

And thus the patch looks like another bugfix to me.


Just one question, and this question is offtopic again,

> Only kernel tasks can be non-freezable (PF_NOFREEZE)

Hmm. We seem to "leak" PF_NOFREEZE if a kernel thread execs?
Perhaps do_execve_common() should do set_freezable() before return.

Or, at least, simply clear this flag along with PF_KTHREAD in
flush_old_exec().

Oleg.


^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 4/7] cgroup_freezer: don't stall transition to FROZEN for PF_NOFREEZE or PF_FREEZER_SKIP tasks
@ 2012-10-22 18:34       ` Oleg Nesterov
  0 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-22 18:34 UTC (permalink / raw)
  To: Tejun Heo
  Cc: rjw-KKrjLPT3xs0, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	lizefan-hv44wF8Li93QT0dZR+AlfA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On 10/16, Tejun Heo wrote:
>
> cgroup_freezer doesn't transition from FREEZING to FROZEN if the
> cgroup contains PF_NOFREEZE tasks or tasks sleeping with
> PF_FREEZER_SKIP set.

And thus the patch looks like another bugfix to me.


Just one question, and this question is offtopic again,

> Only kernel tasks can be non-freezable (PF_NOFREEZE)

Hmm. We seem to "leak" PF_NOFREEZE if a kernel thread execs?
Perhaps do_execve_common() should do set_freezable() before return.

Or, at least, simply clear this flag along with PF_KTHREAD in
flush_old_exec().

Oleg.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 5/7] cgroup_freezer: allow moving tasks in and out of a frozen cgroup
       [not found]   ` <1350426526-14254-6-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2012-10-22 19:25     ` Oleg Nesterov
  0 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-22 19:25 UTC (permalink / raw)
  To: Tejun Heo
  Cc: rjw-KKrjLPT3xs0, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Tha last question ;)

On 10/16, Tejun Heo wrote:
>
> @@ -190,12 +201,12 @@ static void freezer_fork(struct task_struct *task)
>  		goto out;
>
>  	spin_lock_irq(&freezer->lock);
> -	BUG_ON(freezer->state == CGROUP_FROZEN);
> -
> -	/* Locking avoids race with FREEZING -> THAWED transitions. */
> -	if (freezer->state == CGROUP_FREEZING)
> +	/*
> +	 * @task might have been just migrated into a FROZEN cgroup.

Confused. If it was migrated, then freezer_attach() should take care
do freeze_task?

IOW,

> Test
> +	 * equality with THAWED.  Read the comment in freezer_attach().
> +	 */
> +	if (freezer->state != CGROUP_THAWED)
>  		freeze_task(task);
> -

This can only happen in the "normal" case, when a CGROUP_FREEZING task
forks the new child. So we could even do BUG_ON(state = CGROUP_FROZEN).

Or there could be another reason?

Oleg.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 5/7] cgroup_freezer: allow moving tasks in and out of a frozen cgroup
       [not found]   ` <1350426526-14254-6-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2012-10-22 19:25     ` Oleg Nesterov
  0 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-22 19:25 UTC (permalink / raw)
  To: Tejun Heo; +Cc: rjw, linux-kernel, lizefan, containers, cgroups

Tha last question ;)

On 10/16, Tejun Heo wrote:
>
> @@ -190,12 +201,12 @@ static void freezer_fork(struct task_struct *task)
>  		goto out;
>
>  	spin_lock_irq(&freezer->lock);
> -	BUG_ON(freezer->state == CGROUP_FROZEN);
> -
> -	/* Locking avoids race with FREEZING -> THAWED transitions. */
> -	if (freezer->state == CGROUP_FREEZING)
> +	/*
> +	 * @task might have been just migrated into a FROZEN cgroup.

Confused. If it was migrated, then freezer_attach() should take care
do freeze_task?

IOW,

> Test
> +	 * equality with THAWED.  Read the comment in freezer_attach().
> +	 */
> +	if (freezer->state != CGROUP_THAWED)
>  		freeze_task(task);
> -

This can only happen in the "normal" case, when a CGROUP_FREEZING task
forks the new child. So we could even do BUG_ON(state = CGROUP_FROZEN).

Or there could be another reason?

Oleg.


^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 5/7] cgroup_freezer: allow moving tasks in and out of a frozen cgroup
@ 2012-10-22 19:25     ` Oleg Nesterov
  0 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-22 19:25 UTC (permalink / raw)
  To: Tejun Heo
  Cc: rjw-KKrjLPT3xs0, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	lizefan-hv44wF8Li93QT0dZR+AlfA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Tha last question ;)

On 10/16, Tejun Heo wrote:
>
> @@ -190,12 +201,12 @@ static void freezer_fork(struct task_struct *task)
>  		goto out;
>
>  	spin_lock_irq(&freezer->lock);
> -	BUG_ON(freezer->state == CGROUP_FROZEN);
> -
> -	/* Locking avoids race with FREEZING -> THAWED transitions. */
> -	if (freezer->state == CGROUP_FREEZING)
> +	/*
> +	 * @task might have been just migrated into a FROZEN cgroup.

Confused. If it was migrated, then freezer_attach() should take care
do freeze_task?

IOW,

> Test
> +	 * equality with THAWED.  Read the comment in freezer_attach().
> +	 */
> +	if (freezer->state != CGROUP_THAWED)
>  		freeze_task(task);
> -

This can only happen in the "normal" case, when a CGROUP_FREEZING task
forks the new child. So we could even do BUG_ON(state = CGROUP_FROZEN).

Or there could be another reason?

Oleg.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 2/7] freezer: add missing mb's to freezer_count() and freezer_should_skip()
  2012-10-22 17:44         ` Oleg Nesterov
@ 2012-10-22 21:13             ` Tejun Heo
  -1 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-22 21:13 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA, rjw-KKrjLPT3xs0,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Hello, Oleg.

On Mon, Oct 22, 2012 at 07:44:04PM +0200, Oleg Nesterov wrote:
> >  static inline void freezer_count(void)
> >  {
> >  	current->flags &= ~PF_FREEZER_SKIP;
> > +	/*
> > +	 * If freezing is in progress, the following paired with smp_mb()
> > +	 * in freezer_should_skip() ensures that either we see %true
> > +	 * freezing() or freezer_should_skip() sees !PF_FREEZER_SKIP.
> > +	 */
> > +	smp_mb();
> >  	try_to_freeze();
> 
> I agree, this looks like a bug fix.

Yeah, and this isn't dangerous at all.  I'll ping -stable.

> > -static inline int freezer_should_skip(struct task_struct *p)
> > +static inline bool freezer_should_skip(struct task_struct *p)
> >  {
> > -	return !!(p->flags & PF_FREEZER_SKIP);
> > +	/*
> > +	 * The following smp_mb() paired with the one in freezer_count()
> > +	 * ensures that either freezer_count() sees %true freezing() or we
> > +	 * see cleared %PF_FREEZER_SKIP and return %false.  This makes it
> > +	 * impossible for a task to slip frozen state testing after
> > +	 * clearing %PF_FREEZER_SKIP.
> > +	 */
> > +	smp_mb();
> > +	return p->flags & PF_FREEZER_SKIP;
> >  }
> 
> I am not sure we really need smp_mb() here. Speaking of cgroup_freezer,
> it seems that a single mb() after "->state = CGROUP_FREEZING" should be
> enough.

Hmmm... I agree pairing there would work too.

> But even if I am right, I agree that it looks better in freezer_should_skip()
> and this is more robust.

But, yeah, performance implications at this level are almost
completely irrelavent here and I think pairing freezer_should_skip()
is easier to read.

> So I think the patch is fine and fixes the bug.

Awesome.

> We probably have another similar race. If ptrace_stop()->may_ptrace_stop()
> returns false, the task does
> 
> 	__set_current_state(TASK_RUNNING);
> 	// no mb in between
> 	try_to_freeze();
> 
> And this can race with task_is_stopped_or_traced() check in the same way.
> (of course this is only theoretical).
> 
> do_signal_stop() is probably fine, we can rely on ->siglock.

Hmm....  Guess we should drop __ from set_current_state.  I wonder
whether we should just add mb to freezing()?  What do you think?

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 2/7] freezer: add missing mb's to freezer_count() and freezer_should_skip()
@ 2012-10-22 21:13             ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-22 21:13 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: rjw, linux-kernel, lizefan, containers, cgroups, stable

Hello, Oleg.

On Mon, Oct 22, 2012 at 07:44:04PM +0200, Oleg Nesterov wrote:
> >  static inline void freezer_count(void)
> >  {
> >  	current->flags &= ~PF_FREEZER_SKIP;
> > +	/*
> > +	 * If freezing is in progress, the following paired with smp_mb()
> > +	 * in freezer_should_skip() ensures that either we see %true
> > +	 * freezing() or freezer_should_skip() sees !PF_FREEZER_SKIP.
> > +	 */
> > +	smp_mb();
> >  	try_to_freeze();
> 
> I agree, this looks like a bug fix.

Yeah, and this isn't dangerous at all.  I'll ping -stable.

> > -static inline int freezer_should_skip(struct task_struct *p)
> > +static inline bool freezer_should_skip(struct task_struct *p)
> >  {
> > -	return !!(p->flags & PF_FREEZER_SKIP);
> > +	/*
> > +	 * The following smp_mb() paired with the one in freezer_count()
> > +	 * ensures that either freezer_count() sees %true freezing() or we
> > +	 * see cleared %PF_FREEZER_SKIP and return %false.  This makes it
> > +	 * impossible for a task to slip frozen state testing after
> > +	 * clearing %PF_FREEZER_SKIP.
> > +	 */
> > +	smp_mb();
> > +	return p->flags & PF_FREEZER_SKIP;
> >  }
> 
> I am not sure we really need smp_mb() here. Speaking of cgroup_freezer,
> it seems that a single mb() after "->state = CGROUP_FREEZING" should be
> enough.

Hmmm... I agree pairing there would work too.

> But even if I am right, I agree that it looks better in freezer_should_skip()
> and this is more robust.

But, yeah, performance implications at this level are almost
completely irrelavent here and I think pairing freezer_should_skip()
is easier to read.

> So I think the patch is fine and fixes the bug.

Awesome.

> We probably have another similar race. If ptrace_stop()->may_ptrace_stop()
> returns false, the task does
> 
> 	__set_current_state(TASK_RUNNING);
> 	// no mb in between
> 	try_to_freeze();
> 
> And this can race with task_is_stopped_or_traced() check in the same way.
> (of course this is only theoretical).
> 
> do_signal_stop() is probably fine, we can rely on ->siglock.

Hmm....  Guess we should drop __ from set_current_state.  I wonder
whether we should just add mb to freezing()?  What do you think?

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 1/7] cgroup: cgroup_subsys->fork() should be called after the task is added to css_set
  2012-10-22 18:04               ` Oleg Nesterov
@ 2012-10-22 21:16                   ` Tejun Heo
  -1 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-22 21:16 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA, rjw-KKrjLPT3xs0,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Hey,

On Mon, Oct 22, 2012 at 08:04:45PM +0200, Oleg Nesterov wrote:
> > * Clear ->cgroup to %NULL during copy_process().
> 
> I completely agree. new_child->cgroups copied from parent looks simply
> strange until post_fork. If nothing else, the new task is still under
> construction by the time cgroup_fork() is called.

Yeah, and it's just nasty to have cgroup->fork() and ->attach() racing
each other.  As far as cgroup is concerned, the new task should be
completely idle till ->fork() is complete.

> > > I am starting to think again about a big-rw-lock around copy_process.
> > > Recently I tried to add one around dup_mmap for uprobes, but perhaps
> > > cgroups can use it too...
> >
> > If some other subsystems need it, maybe just make threadgroup locking
> > coarser?
> 
> What do you mean?

I probabl have misunderstood you but If you're gonna add big-rw-lock
around copy-process which is always gonna be grabbed, I was suggesting
maybe we could simply repurpose the existing threadgroup locking.  Or
are the requirements too different?

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 1/7] cgroup: cgroup_subsys->fork() should be called after the task is added to css_set
@ 2012-10-22 21:16                   ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-22 21:16 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: rjw, linux-kernel, lizefan, containers, cgroups, stable

Hey,

On Mon, Oct 22, 2012 at 08:04:45PM +0200, Oleg Nesterov wrote:
> > * Clear ->cgroup to %NULL during copy_process().
> 
> I completely agree. new_child->cgroups copied from parent looks simply
> strange until post_fork. If nothing else, the new task is still under
> construction by the time cgroup_fork() is called.

Yeah, and it's just nasty to have cgroup->fork() and ->attach() racing
each other.  As far as cgroup is concerned, the new task should be
completely idle till ->fork() is complete.

> > > I am starting to think again about a big-rw-lock around copy_process.
> > > Recently I tried to add one around dup_mmap for uprobes, but perhaps
> > > cgroups can use it too...
> >
> > If some other subsystems need it, maybe just make threadgroup locking
> > coarser?
> 
> What do you mean?

I probabl have misunderstood you but If you're gonna add big-rw-lock
around copy-process which is always gonna be grabbed, I was suggesting
maybe we could simply repurpose the existing threadgroup locking.  Or
are the requirements too different?

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 4/7] cgroup_freezer: don't stall transition to FROZEN for PF_NOFREEZE or PF_FREEZER_SKIP tasks
  2012-10-22 18:34       ` Oleg Nesterov
@ 2012-10-22 21:18           ` Tejun Heo
  -1 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-22 21:18 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: rjw-KKrjLPT3xs0, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Hey,

On Mon, Oct 22, 2012 at 08:34:53PM +0200, Oleg Nesterov wrote:
> On 10/16, Tejun Heo wrote:
> >
> > cgroup_freezer doesn't transition from FREEZING to FROZEN if the
> > cgroup contains PF_NOFREEZE tasks or tasks sleeping with
> > PF_FREEZER_SKIP set.
> 
> And thus the patch looks like another bugfix to me.

It is but I'm not gonna send this one -stable.  Nobody has complained
yet and this is a userland visible behavior change, so I'd rather not
risk it.

> Just one question, and this question is offtopic again,
> 
> > Only kernel tasks can be non-freezable (PF_NOFREEZE)
> 
> Hmm. We seem to "leak" PF_NOFREEZE if a kernel thread execs?
> Perhaps do_execve_common() should do set_freezable() before return.
> 
> Or, at least, simply clear this flag along with PF_KTHREAD in
> flush_old_exec().

Ooh, ouch, definitely.  We should clear that.  Can you please make a
patch?

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 4/7] cgroup_freezer: don't stall transition to FROZEN for PF_NOFREEZE or PF_FREEZER_SKIP tasks
@ 2012-10-22 21:18           ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-22 21:18 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: rjw, linux-kernel, lizefan, containers, cgroups

Hey,

On Mon, Oct 22, 2012 at 08:34:53PM +0200, Oleg Nesterov wrote:
> On 10/16, Tejun Heo wrote:
> >
> > cgroup_freezer doesn't transition from FREEZING to FROZEN if the
> > cgroup contains PF_NOFREEZE tasks or tasks sleeping with
> > PF_FREEZER_SKIP set.
> 
> And thus the patch looks like another bugfix to me.

It is but I'm not gonna send this one -stable.  Nobody has complained
yet and this is a userland visible behavior change, so I'd rather not
risk it.

> Just one question, and this question is offtopic again,
> 
> > Only kernel tasks can be non-freezable (PF_NOFREEZE)
> 
> Hmm. We seem to "leak" PF_NOFREEZE if a kernel thread execs?
> Perhaps do_execve_common() should do set_freezable() before return.
> 
> Or, at least, simply clear this flag along with PF_KTHREAD in
> flush_old_exec().

Ooh, ouch, definitely.  We should clear that.  Can you please make a
patch?

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 5/7] cgroup_freezer: allow moving tasks in and out of a frozen cgroup
       [not found]     ` <20121022192506.GA27163-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2012-10-22 21:25       ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-22 21:25 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: rjw-KKrjLPT3xs0, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Hello,

On Mon, Oct 22, 2012 at 09:25:06PM +0200, Oleg Nesterov wrote:
> > @@ -190,12 +201,12 @@ static void freezer_fork(struct task_struct *task)
> >  		goto out;
> >
> >  	spin_lock_irq(&freezer->lock);
> > -	BUG_ON(freezer->state == CGROUP_FROZEN);
> > -
> > -	/* Locking avoids race with FREEZING -> THAWED transitions. */
> > -	if (freezer->state == CGROUP_FREEZING)
> > +	/*
> > +	 * @task might have been just migrated into a FROZEN cgroup.
> 
> Confused. If it was migrated, then freezer_attach() should take care
> do freeze_task?

Hmmm... there's a window where a task is migrated but ->attach()
hasn't completed yet, so freezer_attach() might not have kicked in
yet.  It might not be strictly necessary but I think it's more
consistent this way.  No?

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 5/7] cgroup_freezer: allow moving tasks in and out of a frozen cgroup
       [not found]     ` <20121022192506.GA27163-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2012-10-22 21:25       ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-22 21:25 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: rjw, linux-kernel, lizefan, containers, cgroups

Hello,

On Mon, Oct 22, 2012 at 09:25:06PM +0200, Oleg Nesterov wrote:
> > @@ -190,12 +201,12 @@ static void freezer_fork(struct task_struct *task)
> >  		goto out;
> >
> >  	spin_lock_irq(&freezer->lock);
> > -	BUG_ON(freezer->state == CGROUP_FROZEN);
> > -
> > -	/* Locking avoids race with FREEZING -> THAWED transitions. */
> > -	if (freezer->state == CGROUP_FREEZING)
> > +	/*
> > +	 * @task might have been just migrated into a FROZEN cgroup.
> 
> Confused. If it was migrated, then freezer_attach() should take care
> do freeze_task?

Hmmm... there's a window where a task is migrated but ->attach()
hasn't completed yet, so freezer_attach() might not have kicked in
yet.  It might not be strictly necessary but I think it's more
consistent this way.  No?

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 5/7] cgroup_freezer: allow moving tasks in and out of a frozen cgroup
@ 2012-10-22 21:25       ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-22 21:25 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: rjw-KKrjLPT3xs0, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	lizefan-hv44wF8Li93QT0dZR+AlfA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Hello,

On Mon, Oct 22, 2012 at 09:25:06PM +0200, Oleg Nesterov wrote:
> > @@ -190,12 +201,12 @@ static void freezer_fork(struct task_struct *task)
> >  		goto out;
> >
> >  	spin_lock_irq(&freezer->lock);
> > -	BUG_ON(freezer->state == CGROUP_FROZEN);
> > -
> > -	/* Locking avoids race with FREEZING -> THAWED transitions. */
> > -	if (freezer->state == CGROUP_FREEZING)
> > +	/*
> > +	 * @task might have been just migrated into a FROZEN cgroup.
> 
> Confused. If it was migrated, then freezer_attach() should take care
> do freeze_task?

Hmmm... there's a window where a task is migrated but ->attach()
hasn't completed yet, so freezer_attach() might not have kicked in
yet.  It might not be strictly necessary but I think it's more
consistent this way.  No?

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 2/7] freezer: add missing mb's to freezer_count() and freezer_should_skip()
  2012-10-22 21:13             ` Tejun Heo
@ 2012-10-23 15:39                 ` Oleg Nesterov
  -1 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-23 15:39 UTC (permalink / raw)
  To: Tejun Heo
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA, rjw-KKrjLPT3xs0,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Hi Tejun,

On 10/22, Tejun Heo wrote:
>
> On Mon, Oct 22, 2012 at 07:44:04PM +0200, Oleg Nesterov wrote:
>
> > We probably have another similar race. If ptrace_stop()->may_ptrace_stop()
> > returns false, the task does
> >
> > 	__set_current_state(TASK_RUNNING);
> > 	// no mb in between
> > 	try_to_freeze();
> >
> > And this can race with task_is_stopped_or_traced() check in the same way.
> > (of course this is only theoretical).
> >
> > do_signal_stop() is probably fine, we can rely on ->siglock.
>
> Hmm....  Guess we should drop __ from set_current_state.

Yes.

Or we can change ptrace_stop() and do_signal_stop() to use freezer_do_not_count/
freezer_count and remove task_is_stopped_or_traced() from update_if_frozen()
and try_to_freeze_tasks(). But this means that do_signal_stop() will call
try_to_freeze() twice, unless we add __freezer_count() which only clears
PF_FREEZER_SKIP.

> I wonder
> whether we should just add mb to freezing()?  What do you think?

Yes, I thought about this too. I just do not know what would be better.

Oleg.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 2/7] freezer: add missing mb's to freezer_count() and freezer_should_skip()
@ 2012-10-23 15:39                 ` Oleg Nesterov
  0 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-23 15:39 UTC (permalink / raw)
  To: Tejun Heo; +Cc: rjw, linux-kernel, lizefan, containers, cgroups, stable

Hi Tejun,

On 10/22, Tejun Heo wrote:
>
> On Mon, Oct 22, 2012 at 07:44:04PM +0200, Oleg Nesterov wrote:
>
> > We probably have another similar race. If ptrace_stop()->may_ptrace_stop()
> > returns false, the task does
> >
> > 	__set_current_state(TASK_RUNNING);
> > 	// no mb in between
> > 	try_to_freeze();
> >
> > And this can race with task_is_stopped_or_traced() check in the same way.
> > (of course this is only theoretical).
> >
> > do_signal_stop() is probably fine, we can rely on ->siglock.
>
> Hmm....  Guess we should drop __ from set_current_state.

Yes.

Or we can change ptrace_stop() and do_signal_stop() to use freezer_do_not_count/
freezer_count and remove task_is_stopped_or_traced() from update_if_frozen()
and try_to_freeze_tasks(). But this means that do_signal_stop() will call
try_to_freeze() twice, unless we add __freezer_count() which only clears
PF_FREEZER_SKIP.

> I wonder
> whether we should just add mb to freezing()?  What do you think?

Yes, I thought about this too. I just do not know what would be better.

Oleg.


^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 1/7] cgroup: cgroup_subsys->fork() should be called after the task is added to css_set
  2012-10-22 21:16                   ` Tejun Heo
@ 2012-10-23 15:51                       ` Oleg Nesterov
  -1 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-23 15:51 UTC (permalink / raw)
  To: Tejun Heo
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA, rjw-KKrjLPT3xs0,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On 10/22, Tejun Heo wrote:
>
> On Mon, Oct 22, 2012 at 08:04:45PM +0200, Oleg Nesterov wrote:
>
> > > > I am starting to think again about a big-rw-lock around copy_process.
> > > > Recently I tried to add one around dup_mmap for uprobes, but perhaps
> > > > cgroups can use it too...
> > >
> > > If some other subsystems need it, maybe just make threadgroup locking
> > > coarser?
> >
> > What do you mean?
>
> I probabl have misunderstood you

Probably me ;)

> but If you're gonna add big-rw-lock
> around copy-process which is always gonna be grabbed, I was suggesting
> maybe we could simply repurpose the existing threadgroup locking.

Yes, yes. But in this case (I mean, for uprobes) "threadgroup" in the name
is misleading. It should be called unconditially without any argument.

Please see

	[PATCH 1/2] brw_mutex: big read-write mutex
	http://marc.info/?l=linux-kernel&m=135032816223715

	[PATCH 2/2] uprobes: Use brw_mutex to fix register/unregister vs dup_mmap() race
	http://marc.info/?l=linux-kernel&m=135032817823720

for details, but in short 2/2 needs this giant lock to block dup_mmap()
system-wide, while cgroup (currently) only needs threadgroup lock if
CLONE_THREAD (ignoring do_exit) and per-task.

So please forget, I no longer think it makes sense to use the same
thing for uprobes and cgroups.

Oleg.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 1/7] cgroup: cgroup_subsys->fork() should be called after the task is added to css_set
@ 2012-10-23 15:51                       ` Oleg Nesterov
  0 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-23 15:51 UTC (permalink / raw)
  To: Tejun Heo; +Cc: rjw, linux-kernel, lizefan, containers, cgroups, stable

On 10/22, Tejun Heo wrote:
>
> On Mon, Oct 22, 2012 at 08:04:45PM +0200, Oleg Nesterov wrote:
>
> > > > I am starting to think again about a big-rw-lock around copy_process.
> > > > Recently I tried to add one around dup_mmap for uprobes, but perhaps
> > > > cgroups can use it too...
> > >
> > > If some other subsystems need it, maybe just make threadgroup locking
> > > coarser?
> >
> > What do you mean?
>
> I probabl have misunderstood you

Probably me ;)

> but If you're gonna add big-rw-lock
> around copy-process which is always gonna be grabbed, I was suggesting
> maybe we could simply repurpose the existing threadgroup locking.

Yes, yes. But in this case (I mean, for uprobes) "threadgroup" in the name
is misleading. It should be called unconditially without any argument.

Please see

	[PATCH 1/2] brw_mutex: big read-write mutex
	http://marc.info/?l=linux-kernel&m=135032816223715

	[PATCH 2/2] uprobes: Use brw_mutex to fix register/unregister vs dup_mmap() race
	http://marc.info/?l=linux-kernel&m=135032817823720

for details, but in short 2/2 needs this giant lock to block dup_mmap()
system-wide, while cgroup (currently) only needs threadgroup lock if
CLONE_THREAD (ignoring do_exit) and per-task.

So please forget, I no longer think it makes sense to use the same
thing for uprobes and cgroups.

Oleg.


^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 4/7] cgroup_freezer: don't stall transition to FROZEN for PF_NOFREEZE or PF_FREEZER_SKIP tasks
       [not found]           ` <20121022211822.GF5951-OlzNCW9NnSVy/B6EtB590w@public.gmane.org>
@ 2012-10-23 15:55             ` Oleg Nesterov
  0 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-23 15:55 UTC (permalink / raw)
  To: Tejun Heo
  Cc: rjw-KKrjLPT3xs0, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 10/22, Tejun Heo wrote:
>
> On Mon, Oct 22, 2012 at 08:34:53PM +0200, Oleg Nesterov wrote:
> >
> > Hmm. We seem to "leak" PF_NOFREEZE if a kernel thread execs?
> > Perhaps do_execve_common() should do set_freezable() before return.
> >
> > Or, at least, simply clear this flag along with PF_KTHREAD in
> > flush_old_exec().
>
> Ooh, ouch, definitely.  We should clear that.  Can you please make a
> patch?

Sure... but what do you think is better?

I'd prefer to simply clear PF_NOFREEZE (without set_freezable), but
obviously this doesn't look exactly right from cgroup_freezer pov.

Oleg.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 4/7] cgroup_freezer: don't stall transition to FROZEN for PF_NOFREEZE or PF_FREEZER_SKIP tasks
       [not found]           ` <20121022211822.GF5951-OlzNCW9NnSVy/B6EtB590w@public.gmane.org>
@ 2012-10-23 15:55             ` Oleg Nesterov
  0 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-23 15:55 UTC (permalink / raw)
  To: Tejun Heo; +Cc: rjw, linux-kernel, lizefan, containers, cgroups

On 10/22, Tejun Heo wrote:
>
> On Mon, Oct 22, 2012 at 08:34:53PM +0200, Oleg Nesterov wrote:
> >
> > Hmm. We seem to "leak" PF_NOFREEZE if a kernel thread execs?
> > Perhaps do_execve_common() should do set_freezable() before return.
> >
> > Or, at least, simply clear this flag along with PF_KTHREAD in
> > flush_old_exec().
>
> Ooh, ouch, definitely.  We should clear that.  Can you please make a
> patch?

Sure... but what do you think is better?

I'd prefer to simply clear PF_NOFREEZE (without set_freezable), but
obviously this doesn't look exactly right from cgroup_freezer pov.

Oleg.


^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 4/7] cgroup_freezer: don't stall transition to FROZEN for PF_NOFREEZE or PF_FREEZER_SKIP tasks
@ 2012-10-23 15:55             ` Oleg Nesterov
  0 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-23 15:55 UTC (permalink / raw)
  To: Tejun Heo
  Cc: rjw-KKrjLPT3xs0, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	lizefan-hv44wF8Li93QT0dZR+AlfA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On 10/22, Tejun Heo wrote:
>
> On Mon, Oct 22, 2012 at 08:34:53PM +0200, Oleg Nesterov wrote:
> >
> > Hmm. We seem to "leak" PF_NOFREEZE if a kernel thread execs?
> > Perhaps do_execve_common() should do set_freezable() before return.
> >
> > Or, at least, simply clear this flag along with PF_KTHREAD in
> > flush_old_exec().
>
> Ooh, ouch, definitely.  We should clear that.  Can you please make a
> patch?

Sure... but what do you think is better?

I'd prefer to simply clear PF_NOFREEZE (without set_freezable), but
obviously this doesn't look exactly right from cgroup_freezer pov.

Oleg.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 5/7] cgroup_freezer: allow moving tasks in and out of a frozen cgroup
       [not found]       ` <20121022212505.GG5951-OlzNCW9NnSVy/B6EtB590w@public.gmane.org>
@ 2012-10-23 16:14         ` Oleg Nesterov
  0 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-23 16:14 UTC (permalink / raw)
  To: Tejun Heo
  Cc: rjw-KKrjLPT3xs0, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 10/22, Tejun Heo wrote:
>
> Hello,
>
> On Mon, Oct 22, 2012 at 09:25:06PM +0200, Oleg Nesterov wrote:
> > > @@ -190,12 +201,12 @@ static void freezer_fork(struct task_struct *task)
> > >  		goto out;
> > >
> > >  	spin_lock_irq(&freezer->lock);
> > > -	BUG_ON(freezer->state == CGROUP_FROZEN);
> > > -
> > > -	/* Locking avoids race with FREEZING -> THAWED transitions. */
> > > -	if (freezer->state == CGROUP_FREEZING)
> > > +	/*
> > > +	 * @task might have been just migrated into a FROZEN cgroup.
> >
> > Confused. If it was migrated, then freezer_attach() should take care
> > do freeze_task?
>
> Hmmm... there's a window where a task is migrated but ->attach()
> hasn't completed yet, so freezer_attach() might not have kicked in
> yet.

Yes, I see.

Indeed, cgroup_task_migrate() is called before ->attach().

Thanks!

Oleg.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 5/7] cgroup_freezer: allow moving tasks in and out of a frozen cgroup
       [not found]       ` <20121022212505.GG5951-OlzNCW9NnSVy/B6EtB590w@public.gmane.org>
@ 2012-10-23 16:14         ` Oleg Nesterov
  0 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-23 16:14 UTC (permalink / raw)
  To: Tejun Heo; +Cc: rjw, linux-kernel, lizefan, containers, cgroups

On 10/22, Tejun Heo wrote:
>
> Hello,
>
> On Mon, Oct 22, 2012 at 09:25:06PM +0200, Oleg Nesterov wrote:
> > > @@ -190,12 +201,12 @@ static void freezer_fork(struct task_struct *task)
> > >  		goto out;
> > >
> > >  	spin_lock_irq(&freezer->lock);
> > > -	BUG_ON(freezer->state == CGROUP_FROZEN);
> > > -
> > > -	/* Locking avoids race with FREEZING -> THAWED transitions. */
> > > -	if (freezer->state == CGROUP_FREEZING)
> > > +	/*
> > > +	 * @task might have been just migrated into a FROZEN cgroup.
> >
> > Confused. If it was migrated, then freezer_attach() should take care
> > do freeze_task?
>
> Hmmm... there's a window where a task is migrated but ->attach()
> hasn't completed yet, so freezer_attach() might not have kicked in
> yet.

Yes, I see.

Indeed, cgroup_task_migrate() is called before ->attach().

Thanks!

Oleg.


^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 5/7] cgroup_freezer: allow moving tasks in and out of a frozen cgroup
@ 2012-10-23 16:14         ` Oleg Nesterov
  0 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-23 16:14 UTC (permalink / raw)
  To: Tejun Heo
  Cc: rjw-KKrjLPT3xs0, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	lizefan-hv44wF8Li93QT0dZR+AlfA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On 10/22, Tejun Heo wrote:
>
> Hello,
>
> On Mon, Oct 22, 2012 at 09:25:06PM +0200, Oleg Nesterov wrote:
> > > @@ -190,12 +201,12 @@ static void freezer_fork(struct task_struct *task)
> > >  		goto out;
> > >
> > >  	spin_lock_irq(&freezer->lock);
> > > -	BUG_ON(freezer->state == CGROUP_FROZEN);
> > > -
> > > -	/* Locking avoids race with FREEZING -> THAWED transitions. */
> > > -	if (freezer->state == CGROUP_FREEZING)
> > > +	/*
> > > +	 * @task might have been just migrated into a FROZEN cgroup.
> >
> > Confused. If it was migrated, then freezer_attach() should take care
> > do freeze_task?
>
> Hmmm... there's a window where a task is migrated but ->attach()
> hasn't completed yet, so freezer_attach() might not have kicked in
> yet.

Yes, I see.

Indeed, cgroup_task_migrate() is called before ->attach().

Thanks!

Oleg.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 2/7] freezer: add missing mb's to freezer_count() and freezer_should_skip()
       [not found]                 ` <20121023153919.GA16201-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2012-10-24 18:57                   ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-24 18:57 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA, rjw-KKrjLPT3xs0,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Hello, Oleg.

On Tue, Oct 23, 2012 at 05:39:19PM +0200, Oleg Nesterov wrote:
> > Hmm....  Guess we should drop __ from set_current_state.
> 
> Yes.
> 
> Or we can change ptrace_stop() and do_signal_stop() to use freezer_do_not_count/
> freezer_count and remove task_is_stopped_or_traced() from update_if_frozen()
> and try_to_freeze_tasks(). But this means that do_signal_stop() will call
> try_to_freeze() twice, unless we add __freezer_count() which only clears
> PF_FREEZER_SKIP.

Ooh, I like this idea.  If we have a mechanism to mark a task "frozen
enough", it makes sense to use it universally.  As long as
try_to_freeze() invocation stays outside fast path, I don't think
invoking it twice really matters.  Can you please cook up a patch for
it?

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 2/7] freezer: add missing mb's to freezer_count() and freezer_should_skip()
       [not found]                 ` <20121023153919.GA16201-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2012-10-24 18:57                   ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-24 18:57 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: rjw, linux-kernel, lizefan, containers, cgroups, stable

Hello, Oleg.

On Tue, Oct 23, 2012 at 05:39:19PM +0200, Oleg Nesterov wrote:
> > Hmm....  Guess we should drop __ from set_current_state.
> 
> Yes.
> 
> Or we can change ptrace_stop() and do_signal_stop() to use freezer_do_not_count/
> freezer_count and remove task_is_stopped_or_traced() from update_if_frozen()
> and try_to_freeze_tasks(). But this means that do_signal_stop() will call
> try_to_freeze() twice, unless we add __freezer_count() which only clears
> PF_FREEZER_SKIP.

Ooh, I like this idea.  If we have a mechanism to mark a task "frozen
enough", it makes sense to use it universally.  As long as
try_to_freeze() invocation stays outside fast path, I don't think
invoking it twice really matters.  Can you please cook up a patch for
it?

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 2/7] freezer: add missing mb's to freezer_count() and freezer_should_skip()
@ 2012-10-24 18:57                   ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-24 18:57 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: rjw-KKrjLPT3xs0, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	lizefan-hv44wF8Li93QT0dZR+AlfA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, stable-u79uwXL29TY76Z2rM5mHXA

Hello, Oleg.

On Tue, Oct 23, 2012 at 05:39:19PM +0200, Oleg Nesterov wrote:
> > Hmm....  Guess we should drop __ from set_current_state.
> 
> Yes.
> 
> Or we can change ptrace_stop() and do_signal_stop() to use freezer_do_not_count/
> freezer_count and remove task_is_stopped_or_traced() from update_if_frozen()
> and try_to_freeze_tasks(). But this means that do_signal_stop() will call
> try_to_freeze() twice, unless we add __freezer_count() which only clears
> PF_FREEZER_SKIP.

Ooh, I like this idea.  If we have a mechanism to mark a task "frozen
enough", it makes sense to use it universally.  As long as
try_to_freeze() invocation stays outside fast path, I don't think
invoking it twice really matters.  Can you please cook up a patch for
it?

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 1/7] cgroup: cgroup_subsys->fork() should be called after the task is added to css_set
  2012-10-23 15:51                       ` Oleg Nesterov
@ 2012-10-24 19:04                           ` Tejun Heo
  -1 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-24 19:04 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA, rjw-KKrjLPT3xs0,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Hello,

On Tue, Oct 23, 2012 at 05:51:28PM +0200, Oleg Nesterov wrote:
> Yes, yes. But in this case (I mean, for uprobes) "threadgroup" in the name
> is misleading. It should be called unconditially without any argument.
> 
> Please see
> 
> 	[PATCH 1/2] brw_mutex: big read-write mutex
> 	http://marc.info/?l=linux-kernel&m=135032816223715

Ooh... that's something completely different.

> 	[PATCH 2/2] uprobes: Use brw_mutex to fix register/unregister vs dup_mmap() race
> 	http://marc.info/?l=linux-kernel&m=135032817823720
> 
> for details, but in short 2/2 needs this giant lock to block dup_mmap()
> system-wide, while cgroup (currently) only needs threadgroup lock if
> CLONE_THREAD (ignoring do_exit) and per-task.
> 
> So please forget, I no longer think it makes sense to use the same
> thing for uprobes and cgroups.

It is quite tempting to reduce hot path overhead and penalize cgroup
migration ops more tho.  Write-locking brw_mutex on migration might
not be too bad.  Why did you change your mind?

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 1/7] cgroup: cgroup_subsys->fork() should be called after the task is added to css_set
@ 2012-10-24 19:04                           ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-24 19:04 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: rjw, linux-kernel, lizefan, containers, cgroups, stable

Hello,

On Tue, Oct 23, 2012 at 05:51:28PM +0200, Oleg Nesterov wrote:
> Yes, yes. But in this case (I mean, for uprobes) "threadgroup" in the name
> is misleading. It should be called unconditially without any argument.
> 
> Please see
> 
> 	[PATCH 1/2] brw_mutex: big read-write mutex
> 	http://marc.info/?l=linux-kernel&m=135032816223715

Ooh... that's something completely different.

> 	[PATCH 2/2] uprobes: Use brw_mutex to fix register/unregister vs dup_mmap() race
> 	http://marc.info/?l=linux-kernel&m=135032817823720
> 
> for details, but in short 2/2 needs this giant lock to block dup_mmap()
> system-wide, while cgroup (currently) only needs threadgroup lock if
> CLONE_THREAD (ignoring do_exit) and per-task.
> 
> So please forget, I no longer think it makes sense to use the same
> thing for uprobes and cgroups.

It is quite tempting to reduce hot path overhead and penalize cgroup
migration ops more tho.  Write-locking brw_mutex on migration might
not be too bad.  Why did you change your mind?

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 4/7] cgroup_freezer: don't stall transition to FROZEN for PF_NOFREEZE or PF_FREEZER_SKIP tasks
  2012-10-23 15:55             ` Oleg Nesterov
@ 2012-10-24 19:06                 ` Tejun Heo
  -1 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-24 19:06 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: rjw-KKrjLPT3xs0, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Hello, Oleg.

On Tue, Oct 23, 2012 at 05:55:33PM +0200, Oleg Nesterov wrote:
> > Ooh, ouch, definitely.  We should clear that.  Can you please make a
> > patch?
> 
> Sure... but what do you think is better?
> 
> I'd prefer to simply clear PF_NOFREEZE (without set_freezable), but
> obviously this doesn't look exactly right from cgroup_freezer pov.

I don't think it matters all that much.  It's a pretty special case.
Maybe define PF_CLEAR_ON_EXEC_MASK or something and just include
PF_NOFREEZE in it?

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 4/7] cgroup_freezer: don't stall transition to FROZEN for PF_NOFREEZE or PF_FREEZER_SKIP tasks
@ 2012-10-24 19:06                 ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-24 19:06 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: rjw, linux-kernel, lizefan, containers, cgroups

Hello, Oleg.

On Tue, Oct 23, 2012 at 05:55:33PM +0200, Oleg Nesterov wrote:
> > Ooh, ouch, definitely.  We should clear that.  Can you please make a
> > patch?
> 
> Sure... but what do you think is better?
> 
> I'd prefer to simply clear PF_NOFREEZE (without set_freezable), but
> obviously this doesn't look exactly right from cgroup_freezer pov.

I don't think it matters all that much.  It's a pretty special case.
Maybe define PF_CLEAR_ON_EXEC_MASK or something and just include
PF_NOFREEZE in it?

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* [PATCH 0/1] (Was: freezer: add missing mb's to freezer_count() and freezer_should_skip())
  2012-10-24 18:57                   ` Tejun Heo
@ 2012-10-25 16:39                       ` Oleg Nesterov
  -1 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-25 16:39 UTC (permalink / raw)
  To: Tejun Heo
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA, rjw-KKrjLPT3xs0,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Hi Tejun,

On 10/24, Tejun Heo wrote:
> Hello, Oleg.
>
> On Tue, Oct 23, 2012 at 05:39:19PM +0200, Oleg Nesterov wrote:
> > > Hmm....  Guess we should drop __ from set_current_state.
> >
> > Yes.
> >
> > Or we can change ptrace_stop() and do_signal_stop() to use freezer_do_not_count/
> > freezer_count and remove task_is_stopped_or_traced() from update_if_frozen()
> > and try_to_freeze_tasks(). But this means that do_signal_stop() will call
> > try_to_freeze() twice, unless we add __freezer_count() which only clears
> > PF_FREEZER_SKIP.
>
> Ooh, I like this idea.  If we have a mechanism to mark a task "frozen
> enough", it makes sense to use it universally.

Yes, I agree.

Fortunately we already have freezable_schedule() so this patch is
really simple.

On top of this series.

Oleg.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* [PATCH 0/1] (Was: freezer: add missing mb's to freezer_count() and freezer_should_skip())
@ 2012-10-25 16:39                       ` Oleg Nesterov
  0 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-25 16:39 UTC (permalink / raw)
  To: Tejun Heo; +Cc: rjw, linux-kernel, lizefan, containers, cgroups, stable

Hi Tejun,

On 10/24, Tejun Heo wrote:
> Hello, Oleg.
>
> On Tue, Oct 23, 2012 at 05:39:19PM +0200, Oleg Nesterov wrote:
> > > Hmm....  Guess we should drop __ from set_current_state.
> >
> > Yes.
> >
> > Or we can change ptrace_stop() and do_signal_stop() to use freezer_do_not_count/
> > freezer_count and remove task_is_stopped_or_traced() from update_if_frozen()
> > and try_to_freeze_tasks(). But this means that do_signal_stop() will call
> > try_to_freeze() twice, unless we add __freezer_count() which only clears
> > PF_FREEZER_SKIP.
>
> Ooh, I like this idea.  If we have a mechanism to mark a task "frozen
> enough", it makes sense to use it universally.

Yes, I agree.

Fortunately we already have freezable_schedule() so this patch is
really simple.

On top of this series.

Oleg.


^ permalink raw reply	[flat|nested] 149+ messages in thread

* [PATCH 1/1] freezer: change ptrace_stop/do_signal_stop to use freezable_schedule()
  2012-10-25 16:39                       ` Oleg Nesterov
@ 2012-10-25 16:39                           ` Oleg Nesterov
  -1 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-25 16:39 UTC (permalink / raw)
  To: Tejun Heo
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA, rjw-KKrjLPT3xs0,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Change ptrace_stop() and do_signal_stop() to use freezable_schedule()
rather than rely on subsequent try_to_freeze().

This allows to remove the task_is_stopped_or_traced() checks from
try_to_freeze_tasks() and update_if_frozen(), and this fixes the
unlikely race with ptrace_stop(). If the tracee does not schedule()
it can miss a freezing condition.

Signed-off-by: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 include/linux/freezer.h |    7 +++----
 kernel/cgroup_freezer.c |    3 +--
 kernel/freezer.c        |   11 ++---------
 kernel/power/process.c  |   13 +------------
 kernel/signal.c         |   11 ++---------
 5 files changed, 9 insertions(+), 36 deletions(-)

diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index ee89932..8039893 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -134,10 +134,9 @@ static inline bool freezer_should_skip(struct task_struct *p)
 }
 
 /*
- * These macros are intended to be used whenever you want allow a task that's
- * sleeping in TASK_UNINTERRUPTIBLE or TASK_KILLABLE state to be frozen. Note
- * that neither return any clear indication of whether a freeze event happened
- * while in this function.
+ * These macros are intended to be used whenever you want allow a sleeping
+ * task to be frozen. Note that neither return any clear indication of
+ * whether a freeze event happened while in this function.
  */
 
 /* Like schedule(), but should not block the freezer. */
diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index 8a92b0e..bedefd9 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -198,8 +198,7 @@ static void update_if_frozen(struct cgroup *cgroup, struct freezer *freezer)
 			 * completion.  Consider it frozen in addition to
 			 * the usual frozen condition.
 			 */
-			if (!frozen(task) && !task_is_stopped_or_traced(task) &&
-			    !freezer_should_skip(task))
+			if (!frozen(task) && !freezer_should_skip(task))
 				goto notyet;
 		}
 	}
diff --git a/kernel/freezer.c b/kernel/freezer.c
index 11f82a4..c38893b 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -116,17 +116,10 @@ bool freeze_task(struct task_struct *p)
 		return false;
 	}
 
-	if (!(p->flags & PF_KTHREAD)) {
+	if (!(p->flags & PF_KTHREAD))
 		fake_signal_wake_up(p);
-		/*
-		 * fake_signal_wake_up() goes through p's scheduler
-		 * lock and guarantees that TASK_STOPPED/TRACED ->
-		 * TASK_RUNNING transition can't race with task state
-		 * testing in try_to_freeze_tasks().
-		 */
-	} else {
+	else
 		wake_up_state(p, TASK_INTERRUPTIBLE);
-	}
 
 	spin_unlock_irqrestore(&freezer_lock, flags);
 	return true;
diff --git a/kernel/power/process.c b/kernel/power/process.c
index 87da817..d5a258b 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -48,18 +48,7 @@ static int try_to_freeze_tasks(bool user_only)
 			if (p == current || !freeze_task(p))
 				continue;
 
-			/*
-			 * Now that we've done set_freeze_flag, don't
-			 * perturb a task in TASK_STOPPED or TASK_TRACED.
-			 * It is "frozen enough".  If the task does wake
-			 * up, it will immediately call try_to_freeze.
-			 *
-			 * Because freeze_task() goes through p's scheduler lock, it's
-			 * guaranteed that TASK_STOPPED/TRACED -> TASK_RUNNING
-			 * transition can't race with task state testing here.
-			 */
-			if (!task_is_stopped_or_traced(p) &&
-			    !freezer_should_skip(p))
+			if (!freezer_should_skip(p))
 				todo++;
 		} while_each_thread(g, p);
 		read_unlock(&tasklist_lock);
diff --git a/kernel/signal.c b/kernel/signal.c
index 0af8868..1660d7d 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1908,7 +1908,7 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
 		preempt_disable();
 		read_unlock(&tasklist_lock);
 		preempt_enable_no_resched();
-		schedule();
+		freezable_schedule();
 	} else {
 		/*
 		 * By the time we got the lock, our tracer went away.
@@ -1930,13 +1930,6 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
 	}
 
 	/*
-	 * While in TASK_TRACED, we were considered "frozen enough".
-	 * Now that we woke up, it's crucial if we're supposed to be
-	 * frozen that we freeze now before running anything substantial.
-	 */
-	try_to_freeze();
-
-	/*
 	 * We are back.  Now reacquire the siglock before touching
 	 * last_siginfo, so that we are sure to have synchronized with
 	 * any signal-sending on another CPU that wants to examine it.
@@ -2092,7 +2085,7 @@ static bool do_signal_stop(int signr)
 		}
 
 		/* Now we don't run again until woken by SIGCONT or SIGKILL */
-		schedule();
+		freezable_schedule();
 		return true;
 	} else {
 		/*
-- 
1.5.5.1

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 1/1] freezer: change ptrace_stop/do_signal_stop to use freezable_schedule()
@ 2012-10-25 16:39                           ` Oleg Nesterov
  0 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-25 16:39 UTC (permalink / raw)
  To: Tejun Heo; +Cc: rjw, linux-kernel, lizefan, containers, cgroups, stable

Change ptrace_stop() and do_signal_stop() to use freezable_schedule()
rather than rely on subsequent try_to_freeze().

This allows to remove the task_is_stopped_or_traced() checks from
try_to_freeze_tasks() and update_if_frozen(), and this fixes the
unlikely race with ptrace_stop(). If the tracee does not schedule()
it can miss a freezing condition.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 include/linux/freezer.h |    7 +++----
 kernel/cgroup_freezer.c |    3 +--
 kernel/freezer.c        |   11 ++---------
 kernel/power/process.c  |   13 +------------
 kernel/signal.c         |   11 ++---------
 5 files changed, 9 insertions(+), 36 deletions(-)

diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index ee89932..8039893 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -134,10 +134,9 @@ static inline bool freezer_should_skip(struct task_struct *p)
 }
 
 /*
- * These macros are intended to be used whenever you want allow a task that's
- * sleeping in TASK_UNINTERRUPTIBLE or TASK_KILLABLE state to be frozen. Note
- * that neither return any clear indication of whether a freeze event happened
- * while in this function.
+ * These macros are intended to be used whenever you want allow a sleeping
+ * task to be frozen. Note that neither return any clear indication of
+ * whether a freeze event happened while in this function.
  */
 
 /* Like schedule(), but should not block the freezer. */
diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index 8a92b0e..bedefd9 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -198,8 +198,7 @@ static void update_if_frozen(struct cgroup *cgroup, struct freezer *freezer)
 			 * completion.  Consider it frozen in addition to
 			 * the usual frozen condition.
 			 */
-			if (!frozen(task) && !task_is_stopped_or_traced(task) &&
-			    !freezer_should_skip(task))
+			if (!frozen(task) && !freezer_should_skip(task))
 				goto notyet;
 		}
 	}
diff --git a/kernel/freezer.c b/kernel/freezer.c
index 11f82a4..c38893b 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -116,17 +116,10 @@ bool freeze_task(struct task_struct *p)
 		return false;
 	}
 
-	if (!(p->flags & PF_KTHREAD)) {
+	if (!(p->flags & PF_KTHREAD))
 		fake_signal_wake_up(p);
-		/*
-		 * fake_signal_wake_up() goes through p's scheduler
-		 * lock and guarantees that TASK_STOPPED/TRACED ->
-		 * TASK_RUNNING transition can't race with task state
-		 * testing in try_to_freeze_tasks().
-		 */
-	} else {
+	else
 		wake_up_state(p, TASK_INTERRUPTIBLE);
-	}
 
 	spin_unlock_irqrestore(&freezer_lock, flags);
 	return true;
diff --git a/kernel/power/process.c b/kernel/power/process.c
index 87da817..d5a258b 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -48,18 +48,7 @@ static int try_to_freeze_tasks(bool user_only)
 			if (p == current || !freeze_task(p))
 				continue;
 
-			/*
-			 * Now that we've done set_freeze_flag, don't
-			 * perturb a task in TASK_STOPPED or TASK_TRACED.
-			 * It is "frozen enough".  If the task does wake
-			 * up, it will immediately call try_to_freeze.
-			 *
-			 * Because freeze_task() goes through p's scheduler lock, it's
-			 * guaranteed that TASK_STOPPED/TRACED -> TASK_RUNNING
-			 * transition can't race with task state testing here.
-			 */
-			if (!task_is_stopped_or_traced(p) &&
-			    !freezer_should_skip(p))
+			if (!freezer_should_skip(p))
 				todo++;
 		} while_each_thread(g, p);
 		read_unlock(&tasklist_lock);
diff --git a/kernel/signal.c b/kernel/signal.c
index 0af8868..1660d7d 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1908,7 +1908,7 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
 		preempt_disable();
 		read_unlock(&tasklist_lock);
 		preempt_enable_no_resched();
-		schedule();
+		freezable_schedule();
 	} else {
 		/*
 		 * By the time we got the lock, our tracer went away.
@@ -1930,13 +1930,6 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
 	}
 
 	/*
-	 * While in TASK_TRACED, we were considered "frozen enough".
-	 * Now that we woke up, it's crucial if we're supposed to be
-	 * frozen that we freeze now before running anything substantial.
-	 */
-	try_to_freeze();
-
-	/*
 	 * We are back.  Now reacquire the siglock before touching
 	 * last_siginfo, so that we are sure to have synchronized with
 	 * any signal-sending on another CPU that wants to examine it.
@@ -2092,7 +2085,7 @@ static bool do_signal_stop(int signr)
 		}
 
 		/* Now we don't run again until woken by SIGCONT or SIGKILL */
-		schedule();
+		freezable_schedule();
 		return true;
 	} else {
 		/*
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 0/1] (Was: cgroup_freezer: don't stall transition to FROZEN for PF_NOFREEZE or PF_FREEZER_SKIP tasks)
  2012-10-24 19:06                 ` Tejun Heo
@ 2012-10-25 17:12                     ` Oleg Nesterov
  -1 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-25 17:12 UTC (permalink / raw)
  To: Tejun Heo
  Cc: rjw-KKrjLPT3xs0, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 10/24, Tejun Heo wrote:
>
> > I'd prefer to simply clear PF_NOFREEZE (without set_freezable), but
> > obviously this doesn't look exactly right from cgroup_freezer pov.
>
> I don't think it matters all that much.  It's a pretty special case.

OK.

> Maybe define PF_CLEAR_ON_EXEC_MASK or something and just include
> PF_NOFREEZE in it?

May be... but it will have a single user and sched.h is already huge,
so didn't do this.

Oleg.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* [PATCH 0/1] (Was: cgroup_freezer: don't stall transition to FROZEN for PF_NOFREEZE or PF_FREEZER_SKIP tasks)
@ 2012-10-25 17:12                     ` Oleg Nesterov
  0 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-25 17:12 UTC (permalink / raw)
  To: Tejun Heo; +Cc: rjw, linux-kernel, lizefan, containers, cgroups

On 10/24, Tejun Heo wrote:
>
> > I'd prefer to simply clear PF_NOFREEZE (without set_freezable), but
> > obviously this doesn't look exactly right from cgroup_freezer pov.
>
> I don't think it matters all that much.  It's a pretty special case.

OK.

> Maybe define PF_CLEAR_ON_EXEC_MASK or something and just include
> PF_NOFREEZE in it?

May be... but it will have a single user and sched.h is already huge,
so didn't do this.

Oleg.


^ permalink raw reply	[flat|nested] 149+ messages in thread

* [PATCH 1/1] freezer: exec should clear PF_NOFREEZE along with PF_KTHREAD
  2012-10-25 17:12                     ` Oleg Nesterov
@ 2012-10-25 17:12                         ` Oleg Nesterov
  -1 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-25 17:12 UTC (permalink / raw)
  To: Tejun Heo
  Cc: rjw-KKrjLPT3xs0, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

flush_old_exec() clears PF_KTHREAD but forgets about PF_NOFREEZE.

Signed-off-by: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 fs/exec.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 8b9011b..0039055 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1083,7 +1083,8 @@ int flush_old_exec(struct linux_binprm * bprm)
 	bprm->mm = NULL;		/* We're using it now */
 
 	set_fs(USER_DS);
-	current->flags &= ~(PF_RANDOMIZE | PF_FORKNOEXEC | PF_KTHREAD);
+	current->flags &=
+		~(PF_RANDOMIZE | PF_FORKNOEXEC | PF_KTHREAD | PF_NOFREEZE);
 	flush_thread();
 	current->personality &= ~bprm->per_clear;
 
-- 
1.5.5.1

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 1/1] freezer: exec should clear PF_NOFREEZE along with PF_KTHREAD
@ 2012-10-25 17:12                         ` Oleg Nesterov
  0 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-25 17:12 UTC (permalink / raw)
  To: Tejun Heo; +Cc: rjw, linux-kernel, lizefan, containers, cgroups

flush_old_exec() clears PF_KTHREAD but forgets about PF_NOFREEZE.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/exec.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 8b9011b..0039055 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1083,7 +1083,8 @@ int flush_old_exec(struct linux_binprm * bprm)
 	bprm->mm = NULL;		/* We're using it now */
 
 	set_fs(USER_DS);
-	current->flags &= ~(PF_RANDOMIZE | PF_FORKNOEXEC | PF_KTHREAD);
+	current->flags &=
+		~(PF_RANDOMIZE | PF_FORKNOEXEC | PF_KTHREAD | PF_NOFREEZE);
 	flush_thread();
 	current->personality &= ~bprm->per_clear;
 
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 149+ messages in thread

* Re: [PATCH 1/1] freezer: change ptrace_stop/do_signal_stop to use freezable_schedule()
       [not found]                           ` <20121025163959.GB3801-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2012-10-25 17:18                             ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-25 17:18 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA, rjw-KKrjLPT3xs0,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Hello, Oleg.

On Thu, Oct 25, 2012 at 06:39:59PM +0200, Oleg Nesterov wrote:
> Change ptrace_stop() and do_signal_stop() to use freezable_schedule()
> rather than rely on subsequent try_to_freeze().
> 
> This allows to remove the task_is_stopped_or_traced() checks from
> try_to_freeze_tasks() and update_if_frozen(), and this fixes the
> unlikely race with ptrace_stop(). If the tracee does not schedule()
> it can miss a freezing condition.

I think it would be great if the description is more detailed.  This
code path always makes my head spin and I think we can definitely use
some more guiding in understanding this dang thing. :)

> @@ -48,18 +48,7 @@ static int try_to_freeze_tasks(bool user_only)
>  			if (p == current || !freeze_task(p))
>  				continue;
>  
> -			/*
> -			 * Now that we've done set_freeze_flag, don't
> -			 * perturb a task in TASK_STOPPED or TASK_TRACED.
> -			 * It is "frozen enough".  If the task does wake
> -			 * up, it will immediately call try_to_freeze.
> -			 *
> -			 * Because freeze_task() goes through p's scheduler lock, it's
> -			 * guaranteed that TASK_STOPPED/TRACED -> TASK_RUNNING
> -			 * transition can't race with task state testing here.
> -			 */
> -			if (!task_is_stopped_or_traced(p) &&
> -			    !freezer_should_skip(p))
> +			if (!freezer_should_skip(p))
>  				todo++;
>  		} while_each_thread(g, p);
>  		read_unlock(&tasklist_lock);

This looks really good.

> diff --git a/kernel/signal.c b/kernel/signal.c
> index 0af8868..1660d7d 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1908,7 +1908,7 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
>  		preempt_disable();
>  		read_unlock(&tasklist_lock);
>  		preempt_enable_no_resched();
> -		schedule();
> +		freezable_schedule();
>  	} else {
>  		/*
>  		 * By the time we got the lock, our tracer went away.
> @@ -1930,13 +1930,6 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
>  	}
>  
>  	/*
> -	 * While in TASK_TRACED, we were considered "frozen enough".
> -	 * Now that we woke up, it's crucial if we're supposed to be
> -	 * frozen that we freeze now before running anything substantial.
> -	 */
> -	try_to_freeze();
> -
> -	/*
>  	 * We are back.  Now reacquire the siglock before touching
>  	 * last_siginfo, so that we are sure to have synchronized with
>  	 * any signal-sending on another CPU that wants to examine it.
> @@ -2092,7 +2085,7 @@ static bool do_signal_stop(int signr)
>  		}
>  
>  		/* Now we don't run again until woken by SIGCONT or SIGKILL */
> -		schedule();
> +		freezable_schedule();

This makes me wonder whether we still need try_to_freeze() in
get_signal_to_deliver() right after the relock: label.  Freezer no
longer treats STOPPED/TRACED special and both sleeping sites in signal
deliver path are marked freezable_schedule().  We shouldn't need the
explicit try_to_freeze(), right?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 1/1] freezer: change ptrace_stop/do_signal_stop to use freezable_schedule()
       [not found]                           ` <20121025163959.GB3801-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2012-10-25 17:18                             ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-25 17:18 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: rjw, linux-kernel, lizefan, containers, cgroups, stable

Hello, Oleg.

On Thu, Oct 25, 2012 at 06:39:59PM +0200, Oleg Nesterov wrote:
> Change ptrace_stop() and do_signal_stop() to use freezable_schedule()
> rather than rely on subsequent try_to_freeze().
> 
> This allows to remove the task_is_stopped_or_traced() checks from
> try_to_freeze_tasks() and update_if_frozen(), and this fixes the
> unlikely race with ptrace_stop(). If the tracee does not schedule()
> it can miss a freezing condition.

I think it would be great if the description is more detailed.  This
code path always makes my head spin and I think we can definitely use
some more guiding in understanding this dang thing. :)

> @@ -48,18 +48,7 @@ static int try_to_freeze_tasks(bool user_only)
>  			if (p == current || !freeze_task(p))
>  				continue;
>  
> -			/*
> -			 * Now that we've done set_freeze_flag, don't
> -			 * perturb a task in TASK_STOPPED or TASK_TRACED.
> -			 * It is "frozen enough".  If the task does wake
> -			 * up, it will immediately call try_to_freeze.
> -			 *
> -			 * Because freeze_task() goes through p's scheduler lock, it's
> -			 * guaranteed that TASK_STOPPED/TRACED -> TASK_RUNNING
> -			 * transition can't race with task state testing here.
> -			 */
> -			if (!task_is_stopped_or_traced(p) &&
> -			    !freezer_should_skip(p))
> +			if (!freezer_should_skip(p))
>  				todo++;
>  		} while_each_thread(g, p);
>  		read_unlock(&tasklist_lock);

This looks really good.

> diff --git a/kernel/signal.c b/kernel/signal.c
> index 0af8868..1660d7d 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1908,7 +1908,7 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
>  		preempt_disable();
>  		read_unlock(&tasklist_lock);
>  		preempt_enable_no_resched();
> -		schedule();
> +		freezable_schedule();
>  	} else {
>  		/*
>  		 * By the time we got the lock, our tracer went away.
> @@ -1930,13 +1930,6 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
>  	}
>  
>  	/*
> -	 * While in TASK_TRACED, we were considered "frozen enough".
> -	 * Now that we woke up, it's crucial if we're supposed to be
> -	 * frozen that we freeze now before running anything substantial.
> -	 */
> -	try_to_freeze();
> -
> -	/*
>  	 * We are back.  Now reacquire the siglock before touching
>  	 * last_siginfo, so that we are sure to have synchronized with
>  	 * any signal-sending on another CPU that wants to examine it.
> @@ -2092,7 +2085,7 @@ static bool do_signal_stop(int signr)
>  		}
>  
>  		/* Now we don't run again until woken by SIGCONT or SIGKILL */
> -		schedule();
> +		freezable_schedule();

This makes me wonder whether we still need try_to_freeze() in
get_signal_to_deliver() right after the relock: label.  Freezer no
longer treats STOPPED/TRACED special and both sleeping sites in signal
deliver path are marked freezable_schedule().  We shouldn't need the
explicit try_to_freeze(), right?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 1/1] freezer: change ptrace_stop/do_signal_stop to use freezable_schedule()
@ 2012-10-25 17:18                             ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-25 17:18 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: rjw-KKrjLPT3xs0, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	lizefan-hv44wF8Li93QT0dZR+AlfA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, stable-u79uwXL29TY76Z2rM5mHXA

Hello, Oleg.

On Thu, Oct 25, 2012 at 06:39:59PM +0200, Oleg Nesterov wrote:
> Change ptrace_stop() and do_signal_stop() to use freezable_schedule()
> rather than rely on subsequent try_to_freeze().
> 
> This allows to remove the task_is_stopped_or_traced() checks from
> try_to_freeze_tasks() and update_if_frozen(), and this fixes the
> unlikely race with ptrace_stop(). If the tracee does not schedule()
> it can miss a freezing condition.

I think it would be great if the description is more detailed.  This
code path always makes my head spin and I think we can definitely use
some more guiding in understanding this dang thing. :)

> @@ -48,18 +48,7 @@ static int try_to_freeze_tasks(bool user_only)
>  			if (p == current || !freeze_task(p))
>  				continue;
>  
> -			/*
> -			 * Now that we've done set_freeze_flag, don't
> -			 * perturb a task in TASK_STOPPED or TASK_TRACED.
> -			 * It is "frozen enough".  If the task does wake
> -			 * up, it will immediately call try_to_freeze.
> -			 *
> -			 * Because freeze_task() goes through p's scheduler lock, it's
> -			 * guaranteed that TASK_STOPPED/TRACED -> TASK_RUNNING
> -			 * transition can't race with task state testing here.
> -			 */
> -			if (!task_is_stopped_or_traced(p) &&
> -			    !freezer_should_skip(p))
> +			if (!freezer_should_skip(p))
>  				todo++;
>  		} while_each_thread(g, p);
>  		read_unlock(&tasklist_lock);

This looks really good.

> diff --git a/kernel/signal.c b/kernel/signal.c
> index 0af8868..1660d7d 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1908,7 +1908,7 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
>  		preempt_disable();
>  		read_unlock(&tasklist_lock);
>  		preempt_enable_no_resched();
> -		schedule();
> +		freezable_schedule();
>  	} else {
>  		/*
>  		 * By the time we got the lock, our tracer went away.
> @@ -1930,13 +1930,6 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
>  	}
>  
>  	/*
> -	 * While in TASK_TRACED, we were considered "frozen enough".
> -	 * Now that we woke up, it's crucial if we're supposed to be
> -	 * frozen that we freeze now before running anything substantial.
> -	 */
> -	try_to_freeze();
> -
> -	/*
>  	 * We are back.  Now reacquire the siglock before touching
>  	 * last_siginfo, so that we are sure to have synchronized with
>  	 * any signal-sending on another CPU that wants to examine it.
> @@ -2092,7 +2085,7 @@ static bool do_signal_stop(int signr)
>  		}
>  
>  		/* Now we don't run again until woken by SIGCONT or SIGKILL */
> -		schedule();
> +		freezable_schedule();

This makes me wonder whether we still need try_to_freeze() in
get_signal_to_deliver() right after the relock: label.  Freezer no
longer treats STOPPED/TRACED special and both sleeping sites in signal
deliver path are marked freezable_schedule().  We shouldn't need the
explicit try_to_freeze(), right?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 1/1] freezer: exec should clear PF_NOFREEZE along with PF_KTHREAD
       [not found]                         ` <20121025171256.GB6776-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2012-10-25 17:20                           ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-25 17:20 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, rjw-KKrjLPT3xs0,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Andrew Morton

On Thu, Oct 25, 2012 at 07:12:56PM +0200, Oleg Nesterov wrote:
> flush_old_exec() clears PF_KTHREAD but forgets about PF_NOFREEZE.
> 
> Signed-off-by: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Probably needs Cc: stable.

  Acked-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

How should this be routed?  -mm?  Andrew, can you please pick this up?

Thanks.

> ---
>  fs/exec.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 8b9011b..0039055 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1083,7 +1083,8 @@ int flush_old_exec(struct linux_binprm * bprm)
>  	bprm->mm = NULL;		/* We're using it now */
>  
>  	set_fs(USER_DS);
> -	current->flags &= ~(PF_RANDOMIZE | PF_FORKNOEXEC | PF_KTHREAD);
> +	current->flags &=
> +		~(PF_RANDOMIZE | PF_FORKNOEXEC | PF_KTHREAD | PF_NOFREEZE);
>  	flush_thread();
>  	current->personality &= ~bprm->per_clear;
>  
> -- 
> 1.5.5.1
> 
> 

-- 
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 1/1] freezer: exec should clear PF_NOFREEZE along with PF_KTHREAD
       [not found]                         ` <20121025171256.GB6776-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2012-10-25 17:20                           ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-25 17:20 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: rjw, linux-kernel, lizefan, containers, cgroups, Andrew Morton

On Thu, Oct 25, 2012 at 07:12:56PM +0200, Oleg Nesterov wrote:
> flush_old_exec() clears PF_KTHREAD but forgets about PF_NOFREEZE.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Probably needs Cc: stable.

  Acked-by: Tejun Heo <tj@kernel.org>

How should this be routed?  -mm?  Andrew, can you please pick this up?

Thanks.

> ---
>  fs/exec.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 8b9011b..0039055 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1083,7 +1083,8 @@ int flush_old_exec(struct linux_binprm * bprm)
>  	bprm->mm = NULL;		/* We're using it now */
>  
>  	set_fs(USER_DS);
> -	current->flags &= ~(PF_RANDOMIZE | PF_FORKNOEXEC | PF_KTHREAD);
> +	current->flags &=
> +		~(PF_RANDOMIZE | PF_FORKNOEXEC | PF_KTHREAD | PF_NOFREEZE);
>  	flush_thread();
>  	current->personality &= ~bprm->per_clear;
>  
> -- 
> 1.5.5.1
> 
> 

-- 
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 1/1] freezer: exec should clear PF_NOFREEZE along with PF_KTHREAD
@ 2012-10-25 17:20                           ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-25 17:20 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: rjw-KKrjLPT3xs0, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	lizefan-hv44wF8Li93QT0dZR+AlfA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Andrew Morton

On Thu, Oct 25, 2012 at 07:12:56PM +0200, Oleg Nesterov wrote:
> flush_old_exec() clears PF_KTHREAD but forgets about PF_NOFREEZE.
> 
> Signed-off-by: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Probably needs Cc: stable.

  Acked-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

How should this be routed?  -mm?  Andrew, can you please pick this up?

Thanks.

> ---
>  fs/exec.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 8b9011b..0039055 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1083,7 +1083,8 @@ int flush_old_exec(struct linux_binprm * bprm)
>  	bprm->mm = NULL;		/* We're using it now */
>  
>  	set_fs(USER_DS);
> -	current->flags &= ~(PF_RANDOMIZE | PF_FORKNOEXEC | PF_KTHREAD);
> +	current->flags &=
> +		~(PF_RANDOMIZE | PF_FORKNOEXEC | PF_KTHREAD | PF_NOFREEZE);
>  	flush_thread();
>  	current->personality &= ~bprm->per_clear;
>  
> -- 
> 1.5.5.1
> 
> 

-- 
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 1/1] freezer: change ptrace_stop/do_signal_stop to use freezable_schedule()
       [not found]                             ` <20121025171812.GE11442-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
@ 2012-10-25 17:34                               ` Oleg Nesterov
  0 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-25 17:34 UTC (permalink / raw)
  To: Tejun Heo
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA, rjw-KKrjLPT3xs0,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On 10/25, Tejun Heo wrote:
>
> Hello, Oleg.
>
> On Thu, Oct 25, 2012 at 06:39:59PM +0200, Oleg Nesterov wrote:
> > Change ptrace_stop() and do_signal_stop() to use freezable_schedule()
> > rather than rely on subsequent try_to_freeze().
> >
> > This allows to remove the task_is_stopped_or_traced() checks from
> > try_to_freeze_tasks() and update_if_frozen(), and this fixes the
> > unlikely race with ptrace_stop(). If the tracee does not schedule()
> > it can miss a freezing condition.
>
> I think it would be great if the description is more detailed.  This
> code path always makes my head spin and I think we can definitely use
> some more guiding in understanding this dang thing. :)

Do you mean describe the race in more details? OK, will do and resend
tomorrow.

> > @@ -2092,7 +2085,7 @@ static bool do_signal_stop(int signr)
> >  		}
> >
> >  		/* Now we don't run again until woken by SIGCONT or SIGKILL */
> > -		schedule();
> > +		freezable_schedule();
>
> This makes me wonder whether we still need try_to_freeze() in
> get_signal_to_deliver() right after the relock: label.  Freezer no
> longer treats STOPPED/TRACED special and both sleeping sites in signal
> deliver path are marked freezable_schedule().  We shouldn't need the
> explicit try_to_freeze(), right?

OOPS.

I'd say this doesn't really matter but yes we can move it up,
get_signal_to_deliver() will be called again.

But! the comment above try_to_freeze() becomes misleading with
this patch, so this really needs v2.

Thanks.

Oleg.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 1/1] freezer: change ptrace_stop/do_signal_stop to use freezable_schedule()
       [not found]                             ` <20121025171812.GE11442-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
@ 2012-10-25 17:34                               ` Oleg Nesterov
  0 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-25 17:34 UTC (permalink / raw)
  To: Tejun Heo; +Cc: rjw, linux-kernel, lizefan, containers, cgroups, stable

On 10/25, Tejun Heo wrote:
>
> Hello, Oleg.
>
> On Thu, Oct 25, 2012 at 06:39:59PM +0200, Oleg Nesterov wrote:
> > Change ptrace_stop() and do_signal_stop() to use freezable_schedule()
> > rather than rely on subsequent try_to_freeze().
> >
> > This allows to remove the task_is_stopped_or_traced() checks from
> > try_to_freeze_tasks() and update_if_frozen(), and this fixes the
> > unlikely race with ptrace_stop(). If the tracee does not schedule()
> > it can miss a freezing condition.
>
> I think it would be great if the description is more detailed.  This
> code path always makes my head spin and I think we can definitely use
> some more guiding in understanding this dang thing. :)

Do you mean describe the race in more details? OK, will do and resend
tomorrow.

> > @@ -2092,7 +2085,7 @@ static bool do_signal_stop(int signr)
> >  		}
> >
> >  		/* Now we don't run again until woken by SIGCONT or SIGKILL */
> > -		schedule();
> > +		freezable_schedule();
>
> This makes me wonder whether we still need try_to_freeze() in
> get_signal_to_deliver() right after the relock: label.  Freezer no
> longer treats STOPPED/TRACED special and both sleeping sites in signal
> deliver path are marked freezable_schedule().  We shouldn't need the
> explicit try_to_freeze(), right?

OOPS.

I'd say this doesn't really matter but yes we can move it up,
get_signal_to_deliver() will be called again.

But! the comment above try_to_freeze() becomes misleading with
this patch, so this really needs v2.

Thanks.

Oleg.


^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 1/1] freezer: change ptrace_stop/do_signal_stop to use freezable_schedule()
@ 2012-10-25 17:34                               ` Oleg Nesterov
  0 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-25 17:34 UTC (permalink / raw)
  To: Tejun Heo
  Cc: rjw-KKrjLPT3xs0, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	lizefan-hv44wF8Li93QT0dZR+AlfA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, stable-u79uwXL29TY76Z2rM5mHXA

On 10/25, Tejun Heo wrote:
>
> Hello, Oleg.
>
> On Thu, Oct 25, 2012 at 06:39:59PM +0200, Oleg Nesterov wrote:
> > Change ptrace_stop() and do_signal_stop() to use freezable_schedule()
> > rather than rely on subsequent try_to_freeze().
> >
> > This allows to remove the task_is_stopped_or_traced() checks from
> > try_to_freeze_tasks() and update_if_frozen(), and this fixes the
> > unlikely race with ptrace_stop(). If the tracee does not schedule()
> > it can miss a freezing condition.
>
> I think it would be great if the description is more detailed.  This
> code path always makes my head spin and I think we can definitely use
> some more guiding in understanding this dang thing. :)

Do you mean describe the race in more details? OK, will do and resend
tomorrow.

> > @@ -2092,7 +2085,7 @@ static bool do_signal_stop(int signr)
> >  		}
> >
> >  		/* Now we don't run again until woken by SIGCONT or SIGKILL */
> > -		schedule();
> > +		freezable_schedule();
>
> This makes me wonder whether we still need try_to_freeze() in
> get_signal_to_deliver() right after the relock: label.  Freezer no
> longer treats STOPPED/TRACED special and both sleeping sites in signal
> deliver path are marked freezable_schedule().  We shouldn't need the
> explicit try_to_freeze(), right?

OOPS.

I'd say this doesn't really matter but yes we can move it up,
get_signal_to_deliver() will be called again.

But! the comment above try_to_freeze() becomes misleading with
this patch, so this really needs v2.

Thanks.

Oleg.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 1/1] freezer: change ptrace_stop/do_signal_stop to use freezable_schedule()
  2012-10-25 17:34                               ` Oleg Nesterov
@ 2012-10-25 17:36                                   ` Tejun Heo
  -1 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-25 17:36 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA, rjw-KKrjLPT3xs0,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Hello,

On Thu, Oct 25, 2012 at 07:34:33PM +0200, Oleg Nesterov wrote:
> > I think it would be great if the description is more detailed.  This
> > code path always makes my head spin and I think we can definitely use
> > some more guiding in understanding this dang thing. :)
> 
> Do you mean describe the race in more details? OK, will do and resend
> tomorrow.

Yeah and maybe explain briefly how schedule_freezable() gets us out of
the trouble.

> > > @@ -2092,7 +2085,7 @@ static bool do_signal_stop(int signr)
> > >  		}
> > >
> > >  		/* Now we don't run again until woken by SIGCONT or SIGKILL */
> > > -		schedule();
> > > +		freezable_schedule();
> >
> > This makes me wonder whether we still need try_to_freeze() in
> > get_signal_to_deliver() right after the relock: label.  Freezer no
> > longer treats STOPPED/TRACED special and both sleeping sites in signal
> > deliver path are marked freezable_schedule().  We shouldn't need the
> > explicit try_to_freeze(), right?
> 
> OOPS.
> 
> I'd say this doesn't really matter but yes we can move it up,
> get_signal_to_deliver() will be called again.

Right, we can't remove it.  That's our main freezing point for
userland tasks.

> But! the comment above try_to_freeze() becomes misleading with
> this patch, so this really needs v2.

But, yeah, I think we should move it above relock: and update the
comment to explain that that's the usual freezing site.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 1/1] freezer: change ptrace_stop/do_signal_stop to use freezable_schedule()
@ 2012-10-25 17:36                                   ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-25 17:36 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: rjw, linux-kernel, lizefan, containers, cgroups, stable

Hello,

On Thu, Oct 25, 2012 at 07:34:33PM +0200, Oleg Nesterov wrote:
> > I think it would be great if the description is more detailed.  This
> > code path always makes my head spin and I think we can definitely use
> > some more guiding in understanding this dang thing. :)
> 
> Do you mean describe the race in more details? OK, will do and resend
> tomorrow.

Yeah and maybe explain briefly how schedule_freezable() gets us out of
the trouble.

> > > @@ -2092,7 +2085,7 @@ static bool do_signal_stop(int signr)
> > >  		}
> > >
> > >  		/* Now we don't run again until woken by SIGCONT or SIGKILL */
> > > -		schedule();
> > > +		freezable_schedule();
> >
> > This makes me wonder whether we still need try_to_freeze() in
> > get_signal_to_deliver() right after the relock: label.  Freezer no
> > longer treats STOPPED/TRACED special and both sleeping sites in signal
> > deliver path are marked freezable_schedule().  We shouldn't need the
> > explicit try_to_freeze(), right?
> 
> OOPS.
> 
> I'd say this doesn't really matter but yes we can move it up,
> get_signal_to_deliver() will be called again.

Right, we can't remove it.  That's our main freezing point for
userland tasks.

> But! the comment above try_to_freeze() becomes misleading with
> this patch, so this really needs v2.

But, yeah, I think we should move it above relock: and update the
comment to explain that that's the usual freezing site.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 1/1] freezer: exec should clear PF_NOFREEZE along with PF_KTHREAD
  2012-10-25 17:37                               ` Oleg Nesterov
@ 2012-10-25 17:37                                   ` Tejun Heo
  -1 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-25 17:37 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, rjw-KKrjLPT3xs0,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Andrew Morton

Hello,

On Thu, Oct 25, 2012 at 10:37 AM, Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>> How should this be routed?  -mm?  Andrew, can you please pick this up?
>
> or perhaps Rafael can take it?

Yeah, that works too.  Rafael?

-- 
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 1/1] freezer: exec should clear PF_NOFREEZE along with PF_KTHREAD
@ 2012-10-25 17:37                                   ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-25 17:37 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: rjw, linux-kernel, lizefan, containers, cgroups, Andrew Morton

Hello,

On Thu, Oct 25, 2012 at 10:37 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>> How should this be routed?  -mm?  Andrew, can you please pick this up?
>
> or perhaps Rafael can take it?

Yeah, that works too.  Rafael?

-- 
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 1/1] freezer: exec should clear PF_NOFREEZE along with PF_KTHREAD
  2012-10-25 17:20                           ` Tejun Heo
@ 2012-10-25 17:37                               ` Oleg Nesterov
  -1 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-25 17:37 UTC (permalink / raw)
  To: Tejun Heo
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, rjw-KKrjLPT3xs0,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Andrew Morton

On 10/25, Tejun Heo wrote:
>
> On Thu, Oct 25, 2012 at 07:12:56PM +0200, Oleg Nesterov wrote:
> > flush_old_exec() clears PF_KTHREAD but forgets about PF_NOFREEZE.
> >
> > Signed-off-by: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>
> Probably needs Cc: stable.
>
>   Acked-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

Thanks,

> How should this be routed?  -mm?  Andrew, can you please pick this up?

or perhaps Rafael can take it?

Oleg.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 1/1] freezer: exec should clear PF_NOFREEZE along with PF_KTHREAD
@ 2012-10-25 17:37                               ` Oleg Nesterov
  0 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-25 17:37 UTC (permalink / raw)
  To: Tejun Heo; +Cc: rjw, linux-kernel, lizefan, containers, cgroups, Andrew Morton

On 10/25, Tejun Heo wrote:
>
> On Thu, Oct 25, 2012 at 07:12:56PM +0200, Oleg Nesterov wrote:
> > flush_old_exec() clears PF_KTHREAD but forgets about PF_NOFREEZE.
> >
> > Signed-off-by: Oleg Nesterov <oleg@redhat.com>
>
> Probably needs Cc: stable.
>
>   Acked-by: Tejun Heo <tj@kernel.org>

Thanks,

> How should this be routed?  -mm?  Andrew, can you please pick this up?

or perhaps Rafael can take it?

Oleg.


^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 1/7] cgroup: cgroup_subsys->fork() should be called after the task is added to css_set
  2012-10-24 19:04                           ` Tejun Heo
@ 2012-10-25 17:42                               ` Oleg Nesterov
  -1 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-25 17:42 UTC (permalink / raw)
  To: Tejun Heo
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA, rjw-KKrjLPT3xs0,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On 10/24, Tejun Heo wrote:
>
> Hello,
>
> On Tue, Oct 23, 2012 at 05:51:28PM +0200, Oleg Nesterov wrote:
> > Yes, yes. But in this case (I mean, for uprobes) "threadgroup" in the name
> > is misleading. It should be called unconditially without any argument.
> >
> > Please see
> >
> > 	[PATCH 1/2] brw_mutex: big read-write mutex
> > 	http://marc.info/?l=linux-kernel&m=135032816223715
>
> Ooh... that's something completely different.
>
> > 	[PATCH 2/2] uprobes: Use brw_mutex to fix register/unregister vs dup_mmap() race
> > 	http://marc.info/?l=linux-kernel&m=135032817823720
> >
> > for details, but in short 2/2 needs this giant lock to block dup_mmap()
> > system-wide, while cgroup (currently) only needs threadgroup lock if
> > CLONE_THREAD (ignoring do_exit) and per-task.
> >
> > So please forget, I no longer think it makes sense to use the same
> > thing for uprobes and cgroups.
>
> It is quite tempting to reduce hot path overhead and penalize cgroup
> migration ops more tho.  Write-locking brw_mutex on migration might
> not be too bad.  Why did you change your mind?

Well, mostly because I do not think 1/2 will be ever applied ;)

Since we already have (to my surprise!) percpu_rw_semaphore, I do
not think I can add another similar lock.

Perhaps uprobes can use percpu_rw_semaphore, but I am not sure...

Oleg.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 1/7] cgroup: cgroup_subsys->fork() should be called after the task is added to css_set
@ 2012-10-25 17:42                               ` Oleg Nesterov
  0 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-25 17:42 UTC (permalink / raw)
  To: Tejun Heo; +Cc: rjw, linux-kernel, lizefan, containers, cgroups, stable

On 10/24, Tejun Heo wrote:
>
> Hello,
>
> On Tue, Oct 23, 2012 at 05:51:28PM +0200, Oleg Nesterov wrote:
> > Yes, yes. But in this case (I mean, for uprobes) "threadgroup" in the name
> > is misleading. It should be called unconditially without any argument.
> >
> > Please see
> >
> > 	[PATCH 1/2] brw_mutex: big read-write mutex
> > 	http://marc.info/?l=linux-kernel&m=135032816223715
>
> Ooh... that's something completely different.
>
> > 	[PATCH 2/2] uprobes: Use brw_mutex to fix register/unregister vs dup_mmap() race
> > 	http://marc.info/?l=linux-kernel&m=135032817823720
> >
> > for details, but in short 2/2 needs this giant lock to block dup_mmap()
> > system-wide, while cgroup (currently) only needs threadgroup lock if
> > CLONE_THREAD (ignoring do_exit) and per-task.
> >
> > So please forget, I no longer think it makes sense to use the same
> > thing for uprobes and cgroups.
>
> It is quite tempting to reduce hot path overhead and penalize cgroup
> migration ops more tho.  Write-locking brw_mutex on migration might
> not be too bad.  Why did you change your mind?

Well, mostly because I do not think 1/2 will be ever applied ;)

Since we already have (to my surprise!) percpu_rw_semaphore, I do
not think I can add another similar lock.

Perhaps uprobes can use percpu_rw_semaphore, but I am not sure...

Oleg.


^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 1/1] freezer: exec should clear PF_NOFREEZE along with PF_KTHREAD
       [not found]                                   ` <CAOS58YPAVVr=itauGD9eTpfRLSBLuM8Bpyuq9AP73MDr8dPmiQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-10-25 20:13                                     ` Rafael J. Wysocki
  0 siblings, 0 replies; 149+ messages in thread
From: Rafael J. Wysocki @ 2012-10-25 20:13 UTC (permalink / raw)
  To: Tejun Heo
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Oleg Nesterov, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Andrew Morton

On Thursday, October 25, 2012 10:37:31 AM Tejun Heo wrote:
> Hello,
> 
> On Thu, Oct 25, 2012 at 10:37 AM, Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> >> How should this be routed?  -mm?  Andrew, can you please pick this up?
> >
> > or perhaps Rafael can take it?
> 
> Yeah, that works too.  Rafael?

Yup, I'm taking it.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 1/1] freezer: exec should clear PF_NOFREEZE along with PF_KTHREAD
  2012-10-25 17:37                                   ` Tejun Heo
  (?)
@ 2012-10-25 20:13                                   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 149+ messages in thread
From: Rafael J. Wysocki @ 2012-10-25 20:13 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Oleg Nesterov, linux-kernel, lizefan, containers, cgroups, Andrew Morton

On Thursday, October 25, 2012 10:37:31 AM Tejun Heo wrote:
> Hello,
> 
> On Thu, Oct 25, 2012 at 10:37 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> >> How should this be routed?  -mm?  Andrew, can you please pick this up?
> >
> > or perhaps Rafael can take it?
> 
> Yeah, that works too.  Rafael?

Yup, I'm taking it.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* [PATCH v2 0/1] freezer: change ptrace_stop/do_signal_stop to use freezable_schedule()
  2012-10-25 17:36                                   ` Tejun Heo
@ 2012-10-26 17:45                                       ` Oleg Nesterov
  -1 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-26 17:45 UTC (permalink / raw)
  To: Tejun Heo
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA, rjw-KKrjLPT3xs0,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Hi Tejun,

On 10/25, Tejun Heo wrote:
>
> > But! the comment above try_to_freeze() becomes misleading with
> > this patch, so this really needs v2.
>
> But, yeah, I think we should move it above relock: and update the
> comment to explain that that's the usual freezing site.

Yeeeeeeees, I knew that you won't allow me to simply remove the old
comment without adding the new one ;)

And you can't imagine how many time I spent trying to invent something
meaningful. Please feel free to update/rewrite it, I am not sure it
is good enough. Or I can send v3 if you suggest something better.

Oleg.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* [PATCH v2 0/1] freezer: change ptrace_stop/do_signal_stop to use freezable_schedule()
@ 2012-10-26 17:45                                       ` Oleg Nesterov
  0 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-26 17:45 UTC (permalink / raw)
  To: Tejun Heo; +Cc: rjw, linux-kernel, lizefan, containers, cgroups, stable

Hi Tejun,

On 10/25, Tejun Heo wrote:
>
> > But! the comment above try_to_freeze() becomes misleading with
> > this patch, so this really needs v2.
>
> But, yeah, I think we should move it above relock: and update the
> comment to explain that that's the usual freezing site.

Yeeeeeeees, I knew that you won't allow me to simply remove the old
comment without adding the new one ;)

And you can't imagine how many time I spent trying to invent something
meaningful. Please feel free to update/rewrite it, I am not sure it
is good enough. Or I can send v3 if you suggest something better.

Oleg.


^ permalink raw reply	[flat|nested] 149+ messages in thread

* [PATCH v2 1/1] freezer: change ptrace_stop/do_signal_stop to use freezable_schedule()
  2012-10-26 17:45                                       ` Oleg Nesterov
@ 2012-10-26 17:46                                           ` Oleg Nesterov
  -1 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-26 17:46 UTC (permalink / raw)
  To: Tejun Heo
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA, rjw-KKrjLPT3xs0,
	cgroups-u79uwXL29TY76Z2rM5mHXA

try_to_freeze_tasks() and cgroup_freezer rely on scheduler locks
to ensure that a task doing STOPPED/TRACED -> RUNNING transition
can't escape freezing. This mostly works, but ptrace_stop() does
not necessarily call schedule(), it can change task->state back to
RUNNING and check freezing() without any lock/barrier in between.

We could add the necessary barrier, but this patch changes
ptrace_stop() and do_signal_stop() to use freezable_schedule().
This fixes the race, freezer_count() and freezer_should_skip()
carefully avoid the race.

And this simplifies the code, try_to_freeze_tasks/update_if_frozen
no longer need to use task_is_stopped_or_traced() checks with the
non trivial assumptions. We can rely on the mechanism which was
specially designed to mark the sleeping task as "frozen enough".

v2: As Tejun pointed out, we can also change get_signal_to_deliver()
and move try_to_freeze() up before 'relock' label.

Signed-off-by: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 include/linux/freezer.h |    7 +++----
 kernel/cgroup_freezer.c |    3 +--
 kernel/freezer.c        |   11 ++---------
 kernel/power/process.c  |   13 +------------
 kernel/signal.c         |   20 ++++++--------------
 5 files changed, 13 insertions(+), 41 deletions(-)

diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index ee89932..8039893 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -134,10 +134,9 @@ static inline bool freezer_should_skip(struct task_struct *p)
 }
 
 /*
- * These macros are intended to be used whenever you want allow a task that's
- * sleeping in TASK_UNINTERRUPTIBLE or TASK_KILLABLE state to be frozen. Note
- * that neither return any clear indication of whether a freeze event happened
- * while in this function.
+ * These macros are intended to be used whenever you want allow a sleeping
+ * task to be frozen. Note that neither return any clear indication of
+ * whether a freeze event happened while in this function.
  */
 
 /* Like schedule(), but should not block the freezer. */
diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index 8a92b0e..bedefd9 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -198,8 +198,7 @@ static void update_if_frozen(struct cgroup *cgroup, struct freezer *freezer)
 			 * completion.  Consider it frozen in addition to
 			 * the usual frozen condition.
 			 */
-			if (!frozen(task) && !task_is_stopped_or_traced(task) &&
-			    !freezer_should_skip(task))
+			if (!frozen(task) && !freezer_should_skip(task))
 				goto notyet;
 		}
 	}
diff --git a/kernel/freezer.c b/kernel/freezer.c
index 11f82a4..c38893b 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -116,17 +116,10 @@ bool freeze_task(struct task_struct *p)
 		return false;
 	}
 
-	if (!(p->flags & PF_KTHREAD)) {
+	if (!(p->flags & PF_KTHREAD))
 		fake_signal_wake_up(p);
-		/*
-		 * fake_signal_wake_up() goes through p's scheduler
-		 * lock and guarantees that TASK_STOPPED/TRACED ->
-		 * TASK_RUNNING transition can't race with task state
-		 * testing in try_to_freeze_tasks().
-		 */
-	} else {
+	else
 		wake_up_state(p, TASK_INTERRUPTIBLE);
-	}
 
 	spin_unlock_irqrestore(&freezer_lock, flags);
 	return true;
diff --git a/kernel/power/process.c b/kernel/power/process.c
index 87da817..d5a258b 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -48,18 +48,7 @@ static int try_to_freeze_tasks(bool user_only)
 			if (p == current || !freeze_task(p))
 				continue;
 
-			/*
-			 * Now that we've done set_freeze_flag, don't
-			 * perturb a task in TASK_STOPPED or TASK_TRACED.
-			 * It is "frozen enough".  If the task does wake
-			 * up, it will immediately call try_to_freeze.
-			 *
-			 * Because freeze_task() goes through p's scheduler lock, it's
-			 * guaranteed that TASK_STOPPED/TRACED -> TASK_RUNNING
-			 * transition can't race with task state testing here.
-			 */
-			if (!task_is_stopped_or_traced(p) &&
-			    !freezer_should_skip(p))
+			if (!freezer_should_skip(p))
 				todo++;
 		} while_each_thread(g, p);
 		read_unlock(&tasklist_lock);
diff --git a/kernel/signal.c b/kernel/signal.c
index 0af8868..5ffb562 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1908,7 +1908,7 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
 		preempt_disable();
 		read_unlock(&tasklist_lock);
 		preempt_enable_no_resched();
-		schedule();
+		freezable_schedule();
 	} else {
 		/*
 		 * By the time we got the lock, our tracer went away.
@@ -1930,13 +1930,6 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
 	}
 
 	/*
-	 * While in TASK_TRACED, we were considered "frozen enough".
-	 * Now that we woke up, it's crucial if we're supposed to be
-	 * frozen that we freeze now before running anything substantial.
-	 */
-	try_to_freeze();
-
-	/*
 	 * We are back.  Now reacquire the siglock before touching
 	 * last_siginfo, so that we are sure to have synchronized with
 	 * any signal-sending on another CPU that wants to examine it.
@@ -2092,7 +2085,7 @@ static bool do_signal_stop(int signr)
 		}
 
 		/* Now we don't run again until woken by SIGCONT or SIGKILL */
-		schedule();
+		freezable_schedule();
 		return true;
 	} else {
 		/*
@@ -2200,15 +2193,14 @@ int get_signal_to_deliver(siginfo_t *info, struct k_sigaction *return_ka,
 	if (unlikely(uprobe_deny_signal()))
 		return 0;
 
-relock:
 	/*
-	 * We'll jump back here after any time we were stopped in TASK_STOPPED.
-	 * While in TASK_STOPPED, we were considered "frozen enough".
-	 * Now that we woke up, it's crucial if we're supposed to be
-	 * frozen that we freeze now before running anything substantial.
+	 * Do this once, we can't return to user-mode if freezing() == T.
+	 * do_signal_stop() and ptrace_stop() do freezable_schedule() and
+	 * thus do not need another check after return.
 	 */
 	try_to_freeze();
 
+relock:
 	spin_lock_irq(&sighand->siglock);
 	/*
 	 * Every stopped thread goes here after wakeup. Check to see if
-- 
1.5.5.1

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v2 1/1] freezer: change ptrace_stop/do_signal_stop to use freezable_schedule()
@ 2012-10-26 17:46                                           ` Oleg Nesterov
  0 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-26 17:46 UTC (permalink / raw)
  To: Tejun Heo; +Cc: rjw, linux-kernel, lizefan, containers, cgroups, stable

try_to_freeze_tasks() and cgroup_freezer rely on scheduler locks
to ensure that a task doing STOPPED/TRACED -> RUNNING transition
can't escape freezing. This mostly works, but ptrace_stop() does
not necessarily call schedule(), it can change task->state back to
RUNNING and check freezing() without any lock/barrier in between.

We could add the necessary barrier, but this patch changes
ptrace_stop() and do_signal_stop() to use freezable_schedule().
This fixes the race, freezer_count() and freezer_should_skip()
carefully avoid the race.

And this simplifies the code, try_to_freeze_tasks/update_if_frozen
no longer need to use task_is_stopped_or_traced() checks with the
non trivial assumptions. We can rely on the mechanism which was
specially designed to mark the sleeping task as "frozen enough".

v2: As Tejun pointed out, we can also change get_signal_to_deliver()
and move try_to_freeze() up before 'relock' label.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 include/linux/freezer.h |    7 +++----
 kernel/cgroup_freezer.c |    3 +--
 kernel/freezer.c        |   11 ++---------
 kernel/power/process.c  |   13 +------------
 kernel/signal.c         |   20 ++++++--------------
 5 files changed, 13 insertions(+), 41 deletions(-)

diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index ee89932..8039893 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -134,10 +134,9 @@ static inline bool freezer_should_skip(struct task_struct *p)
 }
 
 /*
- * These macros are intended to be used whenever you want allow a task that's
- * sleeping in TASK_UNINTERRUPTIBLE or TASK_KILLABLE state to be frozen. Note
- * that neither return any clear indication of whether a freeze event happened
- * while in this function.
+ * These macros are intended to be used whenever you want allow a sleeping
+ * task to be frozen. Note that neither return any clear indication of
+ * whether a freeze event happened while in this function.
  */
 
 /* Like schedule(), but should not block the freezer. */
diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index 8a92b0e..bedefd9 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -198,8 +198,7 @@ static void update_if_frozen(struct cgroup *cgroup, struct freezer *freezer)
 			 * completion.  Consider it frozen in addition to
 			 * the usual frozen condition.
 			 */
-			if (!frozen(task) && !task_is_stopped_or_traced(task) &&
-			    !freezer_should_skip(task))
+			if (!frozen(task) && !freezer_should_skip(task))
 				goto notyet;
 		}
 	}
diff --git a/kernel/freezer.c b/kernel/freezer.c
index 11f82a4..c38893b 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -116,17 +116,10 @@ bool freeze_task(struct task_struct *p)
 		return false;
 	}
 
-	if (!(p->flags & PF_KTHREAD)) {
+	if (!(p->flags & PF_KTHREAD))
 		fake_signal_wake_up(p);
-		/*
-		 * fake_signal_wake_up() goes through p's scheduler
-		 * lock and guarantees that TASK_STOPPED/TRACED ->
-		 * TASK_RUNNING transition can't race with task state
-		 * testing in try_to_freeze_tasks().
-		 */
-	} else {
+	else
 		wake_up_state(p, TASK_INTERRUPTIBLE);
-	}
 
 	spin_unlock_irqrestore(&freezer_lock, flags);
 	return true;
diff --git a/kernel/power/process.c b/kernel/power/process.c
index 87da817..d5a258b 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -48,18 +48,7 @@ static int try_to_freeze_tasks(bool user_only)
 			if (p == current || !freeze_task(p))
 				continue;
 
-			/*
-			 * Now that we've done set_freeze_flag, don't
-			 * perturb a task in TASK_STOPPED or TASK_TRACED.
-			 * It is "frozen enough".  If the task does wake
-			 * up, it will immediately call try_to_freeze.
-			 *
-			 * Because freeze_task() goes through p's scheduler lock, it's
-			 * guaranteed that TASK_STOPPED/TRACED -> TASK_RUNNING
-			 * transition can't race with task state testing here.
-			 */
-			if (!task_is_stopped_or_traced(p) &&
-			    !freezer_should_skip(p))
+			if (!freezer_should_skip(p))
 				todo++;
 		} while_each_thread(g, p);
 		read_unlock(&tasklist_lock);
diff --git a/kernel/signal.c b/kernel/signal.c
index 0af8868..5ffb562 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1908,7 +1908,7 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
 		preempt_disable();
 		read_unlock(&tasklist_lock);
 		preempt_enable_no_resched();
-		schedule();
+		freezable_schedule();
 	} else {
 		/*
 		 * By the time we got the lock, our tracer went away.
@@ -1930,13 +1930,6 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
 	}
 
 	/*
-	 * While in TASK_TRACED, we were considered "frozen enough".
-	 * Now that we woke up, it's crucial if we're supposed to be
-	 * frozen that we freeze now before running anything substantial.
-	 */
-	try_to_freeze();
-
-	/*
 	 * We are back.  Now reacquire the siglock before touching
 	 * last_siginfo, so that we are sure to have synchronized with
 	 * any signal-sending on another CPU that wants to examine it.
@@ -2092,7 +2085,7 @@ static bool do_signal_stop(int signr)
 		}
 
 		/* Now we don't run again until woken by SIGCONT or SIGKILL */
-		schedule();
+		freezable_schedule();
 		return true;
 	} else {
 		/*
@@ -2200,15 +2193,14 @@ int get_signal_to_deliver(siginfo_t *info, struct k_sigaction *return_ka,
 	if (unlikely(uprobe_deny_signal()))
 		return 0;
 
-relock:
 	/*
-	 * We'll jump back here after any time we were stopped in TASK_STOPPED.
-	 * While in TASK_STOPPED, we were considered "frozen enough".
-	 * Now that we woke up, it's crucial if we're supposed to be
-	 * frozen that we freeze now before running anything substantial.
+	 * Do this once, we can't return to user-mode if freezing() == T.
+	 * do_signal_stop() and ptrace_stop() do freezable_schedule() and
+	 * thus do not need another check after return.
 	 */
 	try_to_freeze();
 
+relock:
 	spin_lock_irq(&sighand->siglock);
 	/*
 	 * Every stopped thread goes here after wakeup. Check to see if
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 1/1] freezer: change ptrace_stop/do_signal_stop to use freezable_schedule()
  2012-10-26 17:46                                           ` Oleg Nesterov
@ 2012-10-26 17:52                                               ` Tejun Heo
  -1 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-26 17:52 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA, rjw-KKrjLPT3xs0,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Fri, Oct 26, 2012 at 07:46:06PM +0200, Oleg Nesterov wrote:
> try_to_freeze_tasks() and cgroup_freezer rely on scheduler locks
> to ensure that a task doing STOPPED/TRACED -> RUNNING transition
> can't escape freezing. This mostly works, but ptrace_stop() does
> not necessarily call schedule(), it can change task->state back to
> RUNNING and check freezing() without any lock/barrier in between.
> 
> We could add the necessary barrier, but this patch changes
> ptrace_stop() and do_signal_stop() to use freezable_schedule().
> This fixes the race, freezer_count() and freezer_should_skip()
> carefully avoid the race.
> 
> And this simplifies the code, try_to_freeze_tasks/update_if_frozen
> no longer need to use task_is_stopped_or_traced() checks with the
> non trivial assumptions. We can rely on the mechanism which was
> specially designed to mark the sleeping task as "frozen enough".
> 
> v2: As Tejun pointed out, we can also change get_signal_to_deliver()
> and move try_to_freeze() up before 'relock' label.
> 
> Signed-off-by: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Looks good to me. :)

 Acked-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

Rafael, sorry that this one doesn't have pm cc'd but can you please
pick up this one too?

Thanks a lot.

-- 
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 1/1] freezer: change ptrace_stop/do_signal_stop to use freezable_schedule()
@ 2012-10-26 17:52                                               ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-26 17:52 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: rjw, linux-kernel, lizefan, containers, cgroups, stable

On Fri, Oct 26, 2012 at 07:46:06PM +0200, Oleg Nesterov wrote:
> try_to_freeze_tasks() and cgroup_freezer rely on scheduler locks
> to ensure that a task doing STOPPED/TRACED -> RUNNING transition
> can't escape freezing. This mostly works, but ptrace_stop() does
> not necessarily call schedule(), it can change task->state back to
> RUNNING and check freezing() without any lock/barrier in between.
> 
> We could add the necessary barrier, but this patch changes
> ptrace_stop() and do_signal_stop() to use freezable_schedule().
> This fixes the race, freezer_count() and freezer_should_skip()
> carefully avoid the race.
> 
> And this simplifies the code, try_to_freeze_tasks/update_if_frozen
> no longer need to use task_is_stopped_or_traced() checks with the
> non trivial assumptions. We can rely on the mechanism which was
> specially designed to mark the sleeping task as "frozen enough".
> 
> v2: As Tejun pointed out, we can also change get_signal_to_deliver()
> and move try_to_freeze() up before 'relock' label.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Looks good to me. :)

 Acked-by: Tejun Heo <tj@kernel.org>

Rafael, sorry that this one doesn't have pm cc'd but can you please
pick up this one too?

Thanks a lot.

-- 
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 1/1] freezer: change ptrace_stop/do_signal_stop to use freezable_schedule()
       [not found]                                               ` <20121026175258.GV11442-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
@ 2012-10-26 18:01                                                 ` Oleg Nesterov
  0 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-26 18:01 UTC (permalink / raw)
  To: Tejun Heo
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA, rjw-KKrjLPT3xs0,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On 10/26, Tejun Heo wrote:
>
>  Acked-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

Thanks!

> Rafael, sorry that this one doesn't have pm cc'd

Ah, sorry Rafael. Yes, I have read you email, and I was going to
add linux-pm but forgot.

> but can you please
> pick up this one too?

Please, and thanks.

Oleg.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 1/1] freezer: change ptrace_stop/do_signal_stop to use freezable_schedule()
       [not found]                                               ` <20121026175258.GV11442-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
@ 2012-10-26 18:01                                                 ` Oleg Nesterov
  0 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-26 18:01 UTC (permalink / raw)
  To: Tejun Heo; +Cc: rjw, linux-kernel, lizefan, containers, cgroups, stable

On 10/26, Tejun Heo wrote:
>
>  Acked-by: Tejun Heo <tj@kernel.org>

Thanks!

> Rafael, sorry that this one doesn't have pm cc'd

Ah, sorry Rafael. Yes, I have read you email, and I was going to
add linux-pm but forgot.

> but can you please
> pick up this one too?

Please, and thanks.

Oleg.


^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 1/1] freezer: change ptrace_stop/do_signal_stop to use freezable_schedule()
@ 2012-10-26 18:01                                                 ` Oleg Nesterov
  0 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-26 18:01 UTC (permalink / raw)
  To: Tejun Heo
  Cc: rjw-KKrjLPT3xs0, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	lizefan-hv44wF8Li93QT0dZR+AlfA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, stable-u79uwXL29TY76Z2rM5mHXA

On 10/26, Tejun Heo wrote:
>
>  Acked-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

Thanks!

> Rafael, sorry that this one doesn't have pm cc'd

Ah, sorry Rafael. Yes, I have read you email, and I was going to
add linux-pm but forgot.

> but can you please
> pick up this one too?

Please, and thanks.

Oleg.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 1/1] freezer: change ptrace_stop/do_signal_stop to use freezable_schedule()
       [not found]                                                 ` <20121026180149.GA22421-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2012-10-26 21:14                                                   ` Rafael J. Wysocki
  0 siblings, 0 replies; 149+ messages in thread
From: Rafael J. Wysocki @ 2012-10-26 21:14 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Friday, October 26, 2012 08:01:49 PM Oleg Nesterov wrote:
> On 10/26, Tejun Heo wrote:
> >
> >  Acked-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> 
> Thanks!
> 
> > Rafael, sorry that this one doesn't have pm cc'd
> 
> Ah, sorry Rafael. Yes, I have read you email, and I was going to
> add linux-pm but forgot.
> 
> > but can you please
> > pick up this one too?
> 
> Please, and thanks.

OK, but that will go to Linus in the next batch.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 1/1] freezer: change ptrace_stop/do_signal_stop to use freezable_schedule()
       [not found]                                                 ` <20121026180149.GA22421-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2012-10-26 21:14                                                   ` Rafael J. Wysocki
  0 siblings, 0 replies; 149+ messages in thread
From: Rafael J. Wysocki @ 2012-10-26 21:14 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Tejun Heo, linux-kernel, lizefan, containers, cgroups, stable

On Friday, October 26, 2012 08:01:49 PM Oleg Nesterov wrote:
> On 10/26, Tejun Heo wrote:
> >
> >  Acked-by: Tejun Heo <tj@kernel.org>
> 
> Thanks!
> 
> > Rafael, sorry that this one doesn't have pm cc'd
> 
> Ah, sorry Rafael. Yes, I have read you email, and I was going to
> add linux-pm but forgot.
> 
> > but can you please
> > pick up this one too?
> 
> Please, and thanks.

OK, but that will go to Linus in the next batch.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 1/1] freezer: change ptrace_stop/do_signal_stop to use freezable_schedule()
@ 2012-10-26 21:14                                                   ` Rafael J. Wysocki
  0 siblings, 0 replies; 149+ messages in thread
From: Rafael J. Wysocki @ 2012-10-26 21:14 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Tejun Heo, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	lizefan-hv44wF8Li93QT0dZR+AlfA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, stable-u79uwXL29TY76Z2rM5mHXA

On Friday, October 26, 2012 08:01:49 PM Oleg Nesterov wrote:
> On 10/26, Tejun Heo wrote:
> >
> >  Acked-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> 
> Thanks!
> 
> > Rafael, sorry that this one doesn't have pm cc'd
> 
> Ah, sorry Rafael. Yes, I have read you email, and I was going to
> add linux-pm but forgot.
> 
> > but can you please
> > pick up this one too?
> 
> Please, and thanks.

OK, but that will go to Linus in the next batch.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 1/1] freezer: change ptrace_stop/do_signal_stop to use freezable_schedule()
       [not found]                                                       ` <2718983.vORnrfWdbE-sKB8Sp2ER+y1GS7QM15AGw@public.gmane.org>
@ 2012-10-26 21:29                                                         ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-26 21:29 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Oleg Nesterov, stable-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Hello,

On Fri, Oct 26, 2012 at 11:29:56PM +0200, Rafael J. Wysocki wrote:
> Actually, what tree is it supposed to apply to?
> 
> The change in kernel/cgroup_freezer.c doesn't look like anything in
> the current Linus' tree to me.

Ooh, right.  This depends on the earlier cgroup_freezer changes.
Sorry about the confusion.  I'll apply it to the following branch (the
same one used for the previous cgroup_freezer updates).

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git cgroup-freezer

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 1/1] freezer: change ptrace_stop/do_signal_stop to use freezable_schedule()
       [not found]                                                       ` <2718983.vORnrfWdbE-sKB8Sp2ER+y1GS7QM15AGw@public.gmane.org>
@ 2012-10-26 21:29                                                         ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-26 21:29 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Oleg Nesterov, linux-kernel, lizefan, containers, cgroups, stable

Hello,

On Fri, Oct 26, 2012 at 11:29:56PM +0200, Rafael J. Wysocki wrote:
> Actually, what tree is it supposed to apply to?
> 
> The change in kernel/cgroup_freezer.c doesn't look like anything in
> the current Linus' tree to me.

Ooh, right.  This depends on the earlier cgroup_freezer changes.
Sorry about the confusion.  I'll apply it to the following branch (the
same one used for the previous cgroup_freezer updates).

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git cgroup-freezer

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 1/1] freezer: change ptrace_stop/do_signal_stop to use freezable_schedule()
@ 2012-10-26 21:29                                                         ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-26 21:29 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Oleg Nesterov, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	lizefan-hv44wF8Li93QT0dZR+AlfA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, stable-u79uwXL29TY76Z2rM5mHXA

Hello,

On Fri, Oct 26, 2012 at 11:29:56PM +0200, Rafael J. Wysocki wrote:
> Actually, what tree is it supposed to apply to?
> 
> The change in kernel/cgroup_freezer.c doesn't look like anything in
> the current Linus' tree to me.

Ooh, right.  This depends on the earlier cgroup_freezer changes.
Sorry about the confusion.  I'll apply it to the following branch (the
same one used for the previous cgroup_freezer updates).

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git cgroup-freezer

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 1/1] freezer: change ptrace_stop/do_signal_stop to use freezable_schedule()
  2012-10-26 21:14                                                   ` Rafael J. Wysocki
@ 2012-10-26 21:29                                                       ` Rafael J. Wysocki
  -1 siblings, 0 replies; 149+ messages in thread
From: Rafael J. Wysocki @ 2012-10-26 21:29 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Friday, October 26, 2012 11:14:17 PM Rafael J. Wysocki wrote:
> On Friday, October 26, 2012 08:01:49 PM Oleg Nesterov wrote:
> > On 10/26, Tejun Heo wrote:
> > >
> > >  Acked-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> > 
> > Thanks!
> > 
> > > Rafael, sorry that this one doesn't have pm cc'd
> > 
> > Ah, sorry Rafael. Yes, I have read you email, and I was going to
> > add linux-pm but forgot.
> > 
> > > but can you please
> > > pick up this one too?
> > 
> > Please, and thanks.
> 
> OK, but that will go to Linus in the next batch.

Actually, what tree is it supposed to apply to?

The change in kernel/cgroup_freezer.c doesn't look like anything in
the current Linus' tree to me.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 1/1] freezer: change ptrace_stop/do_signal_stop to use freezable_schedule()
@ 2012-10-26 21:29                                                       ` Rafael J. Wysocki
  0 siblings, 0 replies; 149+ messages in thread
From: Rafael J. Wysocki @ 2012-10-26 21:29 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Tejun Heo, linux-kernel, lizefan, containers, cgroups, stable

On Friday, October 26, 2012 11:14:17 PM Rafael J. Wysocki wrote:
> On Friday, October 26, 2012 08:01:49 PM Oleg Nesterov wrote:
> > On 10/26, Tejun Heo wrote:
> > >
> > >  Acked-by: Tejun Heo <tj@kernel.org>
> > 
> > Thanks!
> > 
> > > Rafael, sorry that this one doesn't have pm cc'd
> > 
> > Ah, sorry Rafael. Yes, I have read you email, and I was going to
> > add linux-pm but forgot.
> > 
> > > but can you please
> > > pick up this one too?
> > 
> > Please, and thanks.
> 
> OK, but that will go to Linus in the next batch.

Actually, what tree is it supposed to apply to?

The change in kernel/cgroup_freezer.c doesn't look like anything in
the current Linus' tree to me.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 1/1] freezer: change ptrace_stop/do_signal_stop to use freezable_schedule()
       [not found]                                           ` <20121026174606.GB21639-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2012-10-26 17:52                                               ` Tejun Heo
@ 2012-10-27 22:22                                             ` Ben Hutchings
  1 sibling, 0 replies; 149+ messages in thread
From: Ben Hutchings @ 2012-10-27 22:22 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA, rjw-KKrjLPT3xs0, Tejun Heo,
	cgroups-u79uwXL29TY76Z2rM5mHXA


[-- Attachment #1.1: Type: text/plain, Size: 1346 bytes --]

On Fri, 2012-10-26 at 19:46 +0200, Oleg Nesterov wrote:
> try_to_freeze_tasks() and cgroup_freezer rely on scheduler locks
> to ensure that a task doing STOPPED/TRACED -> RUNNING transition
> can't escape freezing. This mostly works, but ptrace_stop() does
> not necessarily call schedule(), it can change task->state back to
> RUNNING and check freezing() without any lock/barrier in between.
> 
> We could add the necessary barrier, but this patch changes
> ptrace_stop() and do_signal_stop() to use freezable_schedule().
> This fixes the race, freezer_count() and freezer_should_skip()
> carefully avoid the race.
> 
> And this simplifies the code, try_to_freeze_tasks/update_if_frozen
> no longer need to use task_is_stopped_or_traced() checks with the
> non trivial assumptions. We can rely on the mechanism which was
> specially designed to mark the sleeping task as "frozen enough".
> 
> v2: As Tejun pointed out, we can also change get_signal_to_deliver()
> and move try_to_freeze() up before 'relock' label.
> 
> Signed-off-by: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
[...]

This is not the correct way to submit a change to stable.  Please see
Documentation/stable_kernel_rules.txt

Ben.

-- 
Ben Hutchings
Never attribute to conspiracy what can adequately be explained by stupidity.

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

[-- Attachment #2: Type: text/plain, Size: 205 bytes --]

_______________________________________________
Containers mailing list
Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 1/1] freezer: change ptrace_stop/do_signal_stop to use freezable_schedule()
       [not found]                                           ` <20121026174606.GB21639-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2012-10-27 22:22                                             ` Ben Hutchings
  2012-10-27 22:22                                             ` Ben Hutchings
  1 sibling, 0 replies; 149+ messages in thread
From: Ben Hutchings @ 2012-10-27 22:22 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Tejun Heo, rjw, linux-kernel, lizefan, containers, cgroups, stable

[-- Attachment #1: Type: text/plain, Size: 1317 bytes --]

On Fri, 2012-10-26 at 19:46 +0200, Oleg Nesterov wrote:
> try_to_freeze_tasks() and cgroup_freezer rely on scheduler locks
> to ensure that a task doing STOPPED/TRACED -> RUNNING transition
> can't escape freezing. This mostly works, but ptrace_stop() does
> not necessarily call schedule(), it can change task->state back to
> RUNNING and check freezing() without any lock/barrier in between.
> 
> We could add the necessary barrier, but this patch changes
> ptrace_stop() and do_signal_stop() to use freezable_schedule().
> This fixes the race, freezer_count() and freezer_should_skip()
> carefully avoid the race.
> 
> And this simplifies the code, try_to_freeze_tasks/update_if_frozen
> no longer need to use task_is_stopped_or_traced() checks with the
> non trivial assumptions. We can rely on the mechanism which was
> specially designed to mark the sleeping task as "frozen enough".
> 
> v2: As Tejun pointed out, we can also change get_signal_to_deliver()
> and move try_to_freeze() up before 'relock' label.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
[...]

This is not the correct way to submit a change to stable.  Please see
Documentation/stable_kernel_rules.txt

Ben.

-- 
Ben Hutchings
Never attribute to conspiracy what can adequately be explained by stupidity.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 1/1] freezer: change ptrace_stop/do_signal_stop to use freezable_schedule()
@ 2012-10-27 22:22                                             ` Ben Hutchings
  0 siblings, 0 replies; 149+ messages in thread
From: Ben Hutchings @ 2012-10-27 22:22 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Tejun Heo, rjw-KKrjLPT3xs0, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	lizefan-hv44wF8Li93QT0dZR+AlfA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, stable-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 1346 bytes --]

On Fri, 2012-10-26 at 19:46 +0200, Oleg Nesterov wrote:
> try_to_freeze_tasks() and cgroup_freezer rely on scheduler locks
> to ensure that a task doing STOPPED/TRACED -> RUNNING transition
> can't escape freezing. This mostly works, but ptrace_stop() does
> not necessarily call schedule(), it can change task->state back to
> RUNNING and check freezing() without any lock/barrier in between.
> 
> We could add the necessary barrier, but this patch changes
> ptrace_stop() and do_signal_stop() to use freezable_schedule().
> This fixes the race, freezer_count() and freezer_should_skip()
> carefully avoid the race.
> 
> And this simplifies the code, try_to_freeze_tasks/update_if_frozen
> no longer need to use task_is_stopped_or_traced() checks with the
> non trivial assumptions. We can rely on the mechanism which was
> specially designed to mark the sleeping task as "frozen enough".
> 
> v2: As Tejun pointed out, we can also change get_signal_to_deliver()
> and move try_to_freeze() up before 'relock' label.
> 
> Signed-off-by: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
[...]

This is not the correct way to submit a change to stable.  Please see
Documentation/stable_kernel_rules.txt

Ben.

-- 
Ben Hutchings
Never attribute to conspiracy what can adequately be explained by stupidity.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 1/1] freezer: change ptrace_stop/do_signal_stop to use freezable_schedule()
       [not found]                                                         ` <20121026212909.GW11442-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
@ 2012-10-28  0:16                                                           ` Rafael J. Wysocki
  0 siblings, 0 replies; 149+ messages in thread
From: Rafael J. Wysocki @ 2012-10-28  0:16 UTC (permalink / raw)
  To: Tejun Heo
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Oleg Nesterov, stable-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Friday, October 26, 2012 02:29:09 PM Tejun Heo wrote:
> Hello,
> 
> On Fri, Oct 26, 2012 at 11:29:56PM +0200, Rafael J. Wysocki wrote:
> > Actually, what tree is it supposed to apply to?
> > 
> > The change in kernel/cgroup_freezer.c doesn't look like anything in
> > the current Linus' tree to me.
> 
> Ooh, right.  This depends on the earlier cgroup_freezer changes.
> Sorry about the confusion.  I'll apply it to the following branch (the
> same one used for the previous cgroup_freezer updates).
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git cgroup-freezer

OK

I haven't merged it yet, so I'll get this fix along with the rest.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 1/1] freezer: change ptrace_stop/do_signal_stop to use freezable_schedule()
       [not found]                                                         ` <20121026212909.GW11442-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
@ 2012-10-28  0:16                                                           ` Rafael J. Wysocki
  0 siblings, 0 replies; 149+ messages in thread
From: Rafael J. Wysocki @ 2012-10-28  0:16 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Oleg Nesterov, linux-kernel, lizefan, containers, cgroups, stable

On Friday, October 26, 2012 02:29:09 PM Tejun Heo wrote:
> Hello,
> 
> On Fri, Oct 26, 2012 at 11:29:56PM +0200, Rafael J. Wysocki wrote:
> > Actually, what tree is it supposed to apply to?
> > 
> > The change in kernel/cgroup_freezer.c doesn't look like anything in
> > the current Linus' tree to me.
> 
> Ooh, right.  This depends on the earlier cgroup_freezer changes.
> Sorry about the confusion.  I'll apply it to the following branch (the
> same one used for the previous cgroup_freezer updates).
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git cgroup-freezer

OK

I haven't merged it yet, so I'll get this fix along with the rest.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 1/1] freezer: change ptrace_stop/do_signal_stop to use freezable_schedule()
@ 2012-10-28  0:16                                                           ` Rafael J. Wysocki
  0 siblings, 0 replies; 149+ messages in thread
From: Rafael J. Wysocki @ 2012-10-28  0:16 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Oleg Nesterov, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	lizefan-hv44wF8Li93QT0dZR+AlfA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, stable-u79uwXL29TY76Z2rM5mHXA

On Friday, October 26, 2012 02:29:09 PM Tejun Heo wrote:
> Hello,
> 
> On Fri, Oct 26, 2012 at 11:29:56PM +0200, Rafael J. Wysocki wrote:
> > Actually, what tree is it supposed to apply to?
> > 
> > The change in kernel/cgroup_freezer.c doesn't look like anything in
> > the current Linus' tree to me.
> 
> Ooh, right.  This depends on the earlier cgroup_freezer changes.
> Sorry about the confusion.  I'll apply it to the following branch (the
> same one used for the previous cgroup_freezer updates).
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git cgroup-freezer

OK

I haven't merged it yet, so I'll get this fix along with the rest.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 1/1] freezer: change ptrace_stop/do_signal_stop to use freezable_schedule()
  2012-10-27 22:22                                             ` Ben Hutchings
@ 2012-10-28 13:45                                                 ` Oleg Nesterov
  -1 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-28 13:45 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA, rjw-KKrjLPT3xs0, Tejun Heo,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On 10/27, Ben Hutchings wrote:
>
> On Fri, 2012-10-26 at 19:46 +0200, Oleg Nesterov wrote:
> > try_to_freeze_tasks() and cgroup_freezer rely on scheduler locks
> > to ensure that a task doing STOPPED/TRACED -> RUNNING transition
> > can't escape freezing. This mostly works, but ptrace_stop() does
> > not necessarily call schedule(), it can change task->state back to
> > RUNNING and check freezing() without any lock/barrier in between.
> >
> > We could add the necessary barrier, but this patch changes
> > ptrace_stop() and do_signal_stop() to use freezable_schedule().
> > This fixes the race, freezer_count() and freezer_should_skip()
> > carefully avoid the race.
> >
> > And this simplifies the code, try_to_freeze_tasks/update_if_frozen
> > no longer need to use task_is_stopped_or_traced() checks with the
> > non trivial assumptions. We can rely on the mechanism which was
> > specially designed to mark the sleeping task as "frozen enough".
> >
> > v2: As Tejun pointed out, we can also change get_signal_to_deliver()
> > and move try_to_freeze() up before 'relock' label.
> >
> > Signed-off-by: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> [...]
>
> This is not the correct way to submit a change to stable.  Please see
> Documentation/stable_kernel_rules.txt

Sorry for confusion, it is not for stable@, it was cc'ed by mistake.

Oleg.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 1/1] freezer: change ptrace_stop/do_signal_stop to use freezable_schedule()
@ 2012-10-28 13:45                                                 ` Oleg Nesterov
  0 siblings, 0 replies; 149+ messages in thread
From: Oleg Nesterov @ 2012-10-28 13:45 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Tejun Heo, rjw, linux-kernel, lizefan, containers, cgroups, stable

On 10/27, Ben Hutchings wrote:
>
> On Fri, 2012-10-26 at 19:46 +0200, Oleg Nesterov wrote:
> > try_to_freeze_tasks() and cgroup_freezer rely on scheduler locks
> > to ensure that a task doing STOPPED/TRACED -> RUNNING transition
> > can't escape freezing. This mostly works, but ptrace_stop() does
> > not necessarily call schedule(), it can change task->state back to
> > RUNNING and check freezing() without any lock/barrier in between.
> >
> > We could add the necessary barrier, but this patch changes
> > ptrace_stop() and do_signal_stop() to use freezable_schedule().
> > This fixes the race, freezer_count() and freezer_should_skip()
> > carefully avoid the race.
> >
> > And this simplifies the code, try_to_freeze_tasks/update_if_frozen
> > no longer need to use task_is_stopped_or_traced() checks with the
> > non trivial assumptions. We can rely on the mechanism which was
> > specially designed to mark the sleeping task as "frozen enough".
> >
> > v2: As Tejun pointed out, we can also change get_signal_to_deliver()
> > and move try_to_freeze() up before 'relock' label.
> >
> > Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> [...]
>
> This is not the correct way to submit a change to stable.  Please see
> Documentation/stable_kernel_rules.txt

Sorry for confusion, it is not for stable@, it was cc'ed by mistake.

Oleg.


^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 1/7] cgroup: cgroup_subsys->fork() should be called after the task is added to css_set
  2012-10-16 22:28     ` Tejun Heo
@ 2012-12-20  5:25         ` Herton Ronaldo Krzesinski
  -1 siblings, 0 replies; 149+ messages in thread
From: Herton Ronaldo Krzesinski @ 2012-12-20  5:25 UTC (permalink / raw)
  To: Tejun Heo
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	oleg-H+wXaHxf7aLQT0dZR+AlfA, stable-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, rjw-KKrjLPT3xs0,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Tue, Oct 16, 2012 at 03:28:40PM -0700, Tejun Heo wrote:
[...]
> 
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index f8a030c..4cd1d0f 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -34,7 +34,6 @@ extern int cgroup_lock_is_held(void);
>  extern bool cgroup_lock_live_group(struct cgroup *cgrp);
>  extern void cgroup_unlock(void);
>  extern void cgroup_fork(struct task_struct *p);
> -extern void cgroup_fork_callbacks(struct task_struct *p);
>  extern void cgroup_post_fork(struct task_struct *p);
>  extern void cgroup_exit(struct task_struct *p, int run_callbacks);
>  extern int cgroupstats_build(struct cgroupstats *stats,
[...]

Minor issue, the patch missed the removal of

static inline void cgroup_fork_callbacks(struct task_struct *p) {}

for the !CONFIG_CGROUPS case

-- 
[]'s
Herton

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 1/7] cgroup: cgroup_subsys->fork() should be called after the task is added to css_set
@ 2012-12-20  5:25         ` Herton Ronaldo Krzesinski
  0 siblings, 0 replies; 149+ messages in thread
From: Herton Ronaldo Krzesinski @ 2012-12-20  5:25 UTC (permalink / raw)
  To: Tejun Heo; +Cc: rjw, oleg, linux-kernel, lizefan, containers, cgroups, stable

On Tue, Oct 16, 2012 at 03:28:40PM -0700, Tejun Heo wrote:
[...]
> 
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index f8a030c..4cd1d0f 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -34,7 +34,6 @@ extern int cgroup_lock_is_held(void);
>  extern bool cgroup_lock_live_group(struct cgroup *cgrp);
>  extern void cgroup_unlock(void);
>  extern void cgroup_fork(struct task_struct *p);
> -extern void cgroup_fork_callbacks(struct task_struct *p);
>  extern void cgroup_post_fork(struct task_struct *p);
>  extern void cgroup_exit(struct task_struct *p, int run_callbacks);
>  extern int cgroupstats_build(struct cgroupstats *stats,
[...]

Minor issue, the patch missed the removal of

static inline void cgroup_fork_callbacks(struct task_struct *p) {}

for the !CONFIG_CGROUPS case

-- 
[]'s
Herton

^ permalink raw reply	[flat|nested] 149+ messages in thread

* [PATCH] cgroup: remove unused dummy cgroup_fork_callbacks()
  2012-12-20  5:25         ` Herton Ronaldo Krzesinski
@ 2012-12-28 21:22           ` Tejun Heo
  -1 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-12-28 21:22 UTC (permalink / raw)
  To: Herton Ronaldo Krzesinski
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	oleg-H+wXaHxf7aLQT0dZR+AlfA, stable-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, rjw-KKrjLPT3xs0,
	cgroups-u79uwXL29TY76Z2rM5mHXA

From a0a4bddd2779a51b6529afa113c5671ebcc21b14 Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Date: Fri, 28 Dec 2012 13:18:28 -0800

5edee61ede ("cgroup: cgroup_subsys->fork() should be called after the
task is added to css_set") removed cgroup_fork_callbacks() but forgot
to remove its dummy version for !CONFIG_CGROUPS.  Remove it.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Reported-by: Herton Ronaldo Krzesinski <herton.krzesinski-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
---
Applied to cgroup/for-3.9.  Thanks.

 include/linux/cgroup.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 7d73905..942e687 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -706,7 +706,6 @@ struct cgroup_subsys_state *cgroup_css_from_dir(struct file *f, int id);
 static inline int cgroup_init_early(void) { return 0; }
 static inline int cgroup_init(void) { return 0; }
 static inline void cgroup_fork(struct task_struct *p) {}
-static inline void cgroup_fork_callbacks(struct task_struct *p) {}
 static inline void cgroup_post_fork(struct task_struct *p) {}
 static inline void cgroup_exit(struct task_struct *p, int callbacks) {}
 
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH] cgroup: remove unused dummy cgroup_fork_callbacks()
@ 2012-12-28 21:22           ` Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-12-28 21:22 UTC (permalink / raw)
  To: Herton Ronaldo Krzesinski
  Cc: rjw, oleg, linux-kernel, lizefan, containers, cgroups, stable

>From a0a4bddd2779a51b6529afa113c5671ebcc21b14 Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@kernel.org>
Date: Fri, 28 Dec 2012 13:18:28 -0800

5edee61ede ("cgroup: cgroup_subsys->fork() should be called after the
task is added to css_set") removed cgroup_fork_callbacks() but forgot
to remove its dummy version for !CONFIG_CGROUPS.  Remove it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
---
Applied to cgroup/for-3.9.  Thanks.

 include/linux/cgroup.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 7d73905..942e687 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -706,7 +706,6 @@ struct cgroup_subsys_state *cgroup_css_from_dir(struct file *f, int id);
 static inline int cgroup_init_early(void) { return 0; }
 static inline int cgroup_init(void) { return 0; }
 static inline void cgroup_fork(struct task_struct *p) {}
-static inline void cgroup_fork_callbacks(struct task_struct *p) {}
 static inline void cgroup_post_fork(struct task_struct *p) {}
 static inline void cgroup_exit(struct task_struct *p, int callbacks) {}
 
-- 
1.8.0.2


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCHSET cgroup/for-3.8] cgroup_freezer: allow migration regardless of freezer state and update locking
@ 2012-10-16 22:28 Tejun Heo
  0 siblings, 0 replies; 149+ messages in thread
From: Tejun Heo @ 2012-10-16 22:28 UTC (permalink / raw)
  To: rjw-KKrjLPT3xs0, oleg-H+wXaHxf7aLQT0dZR+AlfA
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Hello,

This patchset updates cgroup_freezer so that

* Unfreezable kernel tasks don't prevent a cgroup from transitioning
  into FROZEN from FREEZING.  There's nothing userland can do with or
  about such tasks.

* Tasks can be moved in and out of a frozen cgroup.  Tasks are made to
  conform to the state of the new cgroup during migration.  This
  behavior makes a lot more sense and removes the use of
  ->can_attach() which makes co-mounting difficult.

* Remove use of cgroup_lock_live_group().  Grabbing cgroup_lock from
  outside cgroup proper creates a painful locking dependency and is
  being phased out.  With the above behavior change, removing
  dependency on cgroup_lock is pretty easy.  IMHO, it was simply the
  wrong behavior to implement which forced the wrong implementation.

This patchset contains the following seven patches.

 0001-cgroup-cgroup_subsys-fork-should-be-called-after-the.patch
 0002-freezer-add-missing-mb-s-to-freezer_count-and-freeze.patch
 0003-cgroup_freezer-make-it-official-that-writes-to-freez.patch
 0004-cgroup_freezer-don-t-stall-transition-to-FROZEN-for-.patch
 0005-cgroup_freezer-allow-moving-tasks-in-and-out-of-a-fr.patch
 0006-cgroup_freezer-prepare-update_if_frozen-for-locking-.patch
 0007-cgroup_freezer-don-t-use-cgroup_lock_live_group.patch

0001 is a fix for a rather embarrassing bug in cgroup core.  It does
things in the wrong order leaving a window for racing during fork.

0002 adds a missing mb() around freezing condition updates / checks.

0003-0004 make cgroup_freezer ignore unfreezable kernel tasks and
handle PF_FREEZER_SKIP correctly.

0005 allows migrating tasks in and out of a frozen cgroup.

0006-0007 remove the use of cgroup_lock_live_group().

This patchset is on top of v3.7-rc1 and available in the following git
branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-cgroup_freezer-locking

 include/linux/cgroup.h  |    1 
 include/linux/freezer.h |   50 +++++++++--
 kernel/cgroup.c         |   62 ++++++--------
 kernel/cgroup_freezer.c |  210 ++++++++++++++++--------------------------------
 kernel/fork.c           |    9 --
 5 files changed, 147 insertions(+), 185 deletions(-)

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 149+ messages in thread

end of thread, other threads:[~2012-12-28 21:28 UTC | newest]

Thread overview: 149+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-16 22:28 [PATCHSET cgroup/for-3.8] cgroup_freezer: allow migration regardless of freezer state and update locking Tejun Heo
2012-10-16 22:28 ` Tejun Heo
2012-10-16 22:28 ` [PATCH 5/7] cgroup_freezer: allow moving tasks in and out of a frozen cgroup Tejun Heo
     [not found]   ` <1350426526-14254-6-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-10-22 19:25     ` Oleg Nesterov
2012-10-22 19:25   ` Oleg Nesterov
2012-10-22 19:25     ` Oleg Nesterov
2012-10-22 21:25     ` Tejun Heo
2012-10-22 21:25       ` Tejun Heo
2012-10-23 16:14       ` Oleg Nesterov
2012-10-23 16:14         ` Oleg Nesterov
     [not found]       ` <20121022212505.GG5951-OlzNCW9NnSVy/B6EtB590w@public.gmane.org>
2012-10-23 16:14         ` Oleg Nesterov
     [not found]     ` <20121022192506.GA27163-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-22 21:25       ` Tejun Heo
2012-10-19 16:54 ` [PATCHSET cgroup/for-3.8] cgroup_freezer: allow migration regardless of freezer state and update locking Rafael J. Wysocki
2012-10-19 16:54   ` Rafael J. Wysocki
2012-10-19 20:04   ` Tejun Heo
2012-10-19 20:04     ` Tejun Heo
     [not found]     ` <20121019200421.GO13370-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-10-21 19:18       ` Oleg Nesterov
2012-10-21 19:18         ` Oleg Nesterov
     [not found]         ` <20121021191853.GB26218-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-21 19:24           ` Tejun Heo
2012-10-21 19:24             ` Tejun Heo
     [not found]   ` <2424755.Pg0O5tTD3k-sKB8Sp2ER+y1GS7QM15AGw@public.gmane.org>
2012-10-19 20:04     ` Tejun Heo
     [not found] ` <1350426526-14254-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-10-16 22:28   ` [PATCH 1/7] cgroup: cgroup_subsys->fork() should be called after the task is added to css_set Tejun Heo
2012-10-16 22:28     ` Tejun Heo
     [not found]     ` <1350426526-14254-2-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-10-17  8:28       ` Li Zefan
2012-10-17  8:28         ` Li Zefan
2012-10-18  1:25         ` Li Zefan
2012-10-18  1:25           ` Li Zefan
     [not found]         ` <507E6C4B.6000704-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2012-10-18  1:25           ` Li Zefan
2012-10-21 19:11       ` Oleg Nesterov
2012-10-21 19:11         ` Oleg Nesterov
     [not found]         ` <20121021191141.GA26218-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-21 19:22           ` Tejun Heo
2012-10-21 19:22         ` Tejun Heo
2012-10-21 19:22           ` Tejun Heo
     [not found]           ` <20121021192222.GB5951-OlzNCW9NnSVy/B6EtB590w@public.gmane.org>
2012-10-22 18:04             ` Oleg Nesterov
2012-10-22 18:04               ` Oleg Nesterov
     [not found]               ` <20121022180445.GB21553-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-22 21:16                 ` Tejun Heo
2012-10-22 21:16                   ` Tejun Heo
     [not found]                   ` <20121022211631.GE5951-OlzNCW9NnSVy/B6EtB590w@public.gmane.org>
2012-10-23 15:51                     ` Oleg Nesterov
2012-10-23 15:51                       ` Oleg Nesterov
     [not found]                       ` <20121023155128.GB16201-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-24 19:04                         ` Tejun Heo
2012-10-24 19:04                           ` Tejun Heo
     [not found]                           ` <20121024190458.GB12182-OlzNCW9NnSVy/B6EtB590w@public.gmane.org>
2012-10-25 17:42                             ` Oleg Nesterov
2012-10-25 17:42                               ` Oleg Nesterov
2012-12-20  5:25       ` Herton Ronaldo Krzesinski
2012-12-20  5:25         ` Herton Ronaldo Krzesinski
2012-12-28 21:22         ` [PATCH] cgroup: remove unused dummy cgroup_fork_callbacks() Tejun Heo
2012-12-28 21:22           ` Tejun Heo
2012-10-16 22:28   ` [PATCH 2/7] freezer: add missing mb's to freezer_count() and freezer_should_skip() Tejun Heo
2012-10-16 22:28     ` Tejun Heo
     [not found]     ` <1350426526-14254-3-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-10-22 17:44       ` Oleg Nesterov
2012-10-22 17:44         ` Oleg Nesterov
     [not found]         ` <20121022174404.GA21553-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-22 21:13           ` Tejun Heo
2012-10-22 21:13             ` Tejun Heo
     [not found]             ` <20121022211317.GD5951-OlzNCW9NnSVy/B6EtB590w@public.gmane.org>
2012-10-23 15:39               ` Oleg Nesterov
2012-10-23 15:39                 ` Oleg Nesterov
     [not found]                 ` <20121023153919.GA16201-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-24 18:57                   ` Tejun Heo
2012-10-24 18:57                 ` Tejun Heo
2012-10-24 18:57                   ` Tejun Heo
     [not found]                   ` <20121024185710.GA12182-OlzNCW9NnSVy/B6EtB590w@public.gmane.org>
2012-10-25 16:39                     ` [PATCH 0/1] (Was: freezer: add missing mb's to freezer_count() and freezer_should_skip()) Oleg Nesterov
2012-10-25 16:39                       ` Oleg Nesterov
     [not found]                       ` <20121025163941.GA3801-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-25 16:39                         ` [PATCH 1/1] freezer: change ptrace_stop/do_signal_stop to use freezable_schedule() Oleg Nesterov
2012-10-25 16:39                           ` Oleg Nesterov
2012-10-25 17:18                           ` Tejun Heo
2012-10-25 17:18                             ` Tejun Heo
2012-10-25 17:34                             ` Oleg Nesterov
2012-10-25 17:34                               ` Oleg Nesterov
     [not found]                               ` <20121025173433.GA7650-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-25 17:36                                 ` Tejun Heo
2012-10-25 17:36                                   ` Tejun Heo
     [not found]                                   ` <20121025173632.GI11442-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2012-10-26 17:45                                     ` [PATCH v2 0/1] " Oleg Nesterov
2012-10-26 17:45                                       ` Oleg Nesterov
     [not found]                                       ` <20121026174545.GA21639-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-26 17:46                                         ` [PATCH v2 1/1] " Oleg Nesterov
2012-10-26 17:46                                           ` Oleg Nesterov
     [not found]                                           ` <20121026174606.GB21639-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-26 17:52                                             ` Tejun Heo
2012-10-26 17:52                                               ` Tejun Heo
2012-10-26 18:01                                               ` Oleg Nesterov
2012-10-26 18:01                                                 ` Oleg Nesterov
2012-10-26 21:14                                                 ` Rafael J. Wysocki
2012-10-26 21:14                                                   ` Rafael J. Wysocki
     [not found]                                                   ` <2566006.UzAQbpOjNQ-sKB8Sp2ER+y1GS7QM15AGw@public.gmane.org>
2012-10-26 21:29                                                     ` Rafael J. Wysocki
2012-10-26 21:29                                                       ` Rafael J. Wysocki
2012-10-26 21:29                                                       ` Tejun Heo
2012-10-26 21:29                                                         ` Tejun Heo
     [not found]                                                         ` <20121026212909.GW11442-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2012-10-28  0:16                                                           ` Rafael J. Wysocki
2012-10-28  0:16                                                         ` Rafael J. Wysocki
2012-10-28  0:16                                                           ` Rafael J. Wysocki
     [not found]                                                       ` <2718983.vORnrfWdbE-sKB8Sp2ER+y1GS7QM15AGw@public.gmane.org>
2012-10-26 21:29                                                         ` Tejun Heo
     [not found]                                                 ` <20121026180149.GA22421-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-26 21:14                                                   ` Rafael J. Wysocki
     [not found]                                               ` <20121026175258.GV11442-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2012-10-26 18:01                                                 ` Oleg Nesterov
2012-10-27 22:22                                             ` Ben Hutchings
2012-10-27 22:22                                           ` Ben Hutchings
2012-10-27 22:22                                             ` Ben Hutchings
     [not found]                                             ` <1351376558.21585.1.camel-nDn/Rdv9kqW9Jme8/bJn5UCKIB8iOfG2tUK59QYPAWc@public.gmane.org>
2012-10-28 13:45                                               ` Oleg Nesterov
2012-10-28 13:45                                                 ` Oleg Nesterov
     [not found]                             ` <20121025171812.GE11442-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2012-10-25 17:34                               ` [PATCH " Oleg Nesterov
     [not found]                           ` <20121025163959.GB3801-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-25 17:18                             ` Tejun Heo
2012-10-16 22:28   ` [PATCH 3/7] cgroup_freezer: make it official that writes to freezer.state don't fail Tejun Heo
2012-10-16 22:28     ` Tejun Heo
2012-10-16 22:28   ` [PATCH 4/7] cgroup_freezer: don't stall transition to FROZEN for PF_NOFREEZE or PF_FREEZER_SKIP tasks Tejun Heo
2012-10-16 22:28     ` Tejun Heo
2012-10-22 18:34     ` Oleg Nesterov
2012-10-22 18:34       ` Oleg Nesterov
     [not found]       ` <20121022183453.GA24687-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-22 21:18         ` Tejun Heo
2012-10-22 21:18           ` Tejun Heo
2012-10-23 15:55           ` Oleg Nesterov
2012-10-23 15:55             ` Oleg Nesterov
     [not found]             ` <20121023155533.GC16201-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-24 19:06               ` Tejun Heo
2012-10-24 19:06                 ` Tejun Heo
     [not found]                 ` <20121024190651.GC12182-OlzNCW9NnSVy/B6EtB590w@public.gmane.org>
2012-10-25 17:12                   ` [PATCH 0/1] (Was: cgroup_freezer: don't stall transition to FROZEN for PF_NOFREEZE or PF_FREEZER_SKIP tasks) Oleg Nesterov
2012-10-25 17:12                     ` Oleg Nesterov
     [not found]                     ` <20121025171236.GA6776-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-25 17:12                       ` [PATCH 1/1] freezer: exec should clear PF_NOFREEZE along with PF_KTHREAD Oleg Nesterov
2012-10-25 17:12                         ` Oleg Nesterov
2012-10-25 17:20                         ` Tejun Heo
2012-10-25 17:20                           ` Tejun Heo
     [not found]                           ` <20121025172016.GF11442-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2012-10-25 17:37                             ` Oleg Nesterov
2012-10-25 17:37                               ` Oleg Nesterov
     [not found]                               ` <20121025173756.GB7650-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-25 17:37                                 ` Tejun Heo
2012-10-25 17:37                                   ` Tejun Heo
2012-10-25 20:13                                   ` Rafael J. Wysocki
     [not found]                                   ` <CAOS58YPAVVr=itauGD9eTpfRLSBLuM8Bpyuq9AP73MDr8dPmiQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-10-25 20:13                                     ` Rafael J. Wysocki
     [not found]                         ` <20121025171256.GB6776-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-25 17:20                           ` Tejun Heo
     [not found]           ` <20121022211822.GF5951-OlzNCW9NnSVy/B6EtB590w@public.gmane.org>
2012-10-23 15:55             ` [PATCH 4/7] cgroup_freezer: don't stall transition to FROZEN for PF_NOFREEZE or PF_FREEZER_SKIP tasks Oleg Nesterov
     [not found]     ` <1350426526-14254-5-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-10-22 18:34       ` Oleg Nesterov
2012-10-16 22:28   ` [PATCH 5/7] cgroup_freezer: allow moving tasks in and out of a frozen cgroup Tejun Heo
2012-10-16 22:28   ` [PATCH 6/7] cgroup_freezer: prepare update_if_frozen() for locking change Tejun Heo
2012-10-16 22:28     ` Tejun Heo
2012-10-16 22:28   ` [PATCH 7/7] cgroup_freezer: don't use cgroup_lock_live_group() Tejun Heo
2012-10-16 22:28     ` Tejun Heo
2012-10-17 19:16   ` [PATCHSET cgroup/for-3.8] cgroup_freezer: allow migration regardless of freezer state and update locking Matt Helsley
2012-10-17 19:16     ` Matt Helsley
     [not found]     ` <20121017191606.GA6223-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2012-10-18 21:14       ` Tejun Heo
2012-10-18 21:14     ` Tejun Heo
2012-10-18 21:14       ` Tejun Heo
     [not found]       ` <20121018211434.GI13370-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-10-18 22:21         ` Matt Helsley
2012-10-18 22:21           ` Matt Helsley
     [not found]           ` <20121018222155.GB6223-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2012-10-18 22:35             ` Tejun Heo
2012-10-18 22:35               ` Tejun Heo
     [not found]               ` <20121018223517.GQ13370-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-10-18 23:47                 ` Matt Helsley
2012-10-18 23:47                   ` Matt Helsley
2012-10-19  0:01                   ` Tejun Heo
2012-10-19  0:01                     ` Tejun Heo
2012-10-19  1:29                     ` Matt Helsley
2012-10-19  1:29                       ` Matt Helsley
2012-10-19 20:02                       ` Tejun Heo
2012-10-19 20:02                         ` Tejun Heo
     [not found]                       ` <20121019012945.GD6223-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2012-10-19 20:02                         ` Tejun Heo
     [not found]                     ` <20121019000153.GZ13370-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-10-19  1:29                       ` Matt Helsley
     [not found]                   ` <20121018234726.GC6223-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2012-10-19  0:01                     ` Tejun Heo
2012-10-19 16:54   ` Rafael J. Wysocki
2012-10-16 22:28 Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.