All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] cgroup freezer: Fixes and CHECKPOINTING support
@ 2009-06-03  9:31 Matt Helsley
       [not found] ` <cover.1244019829.git.matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  2009-06-03  9:31 ` [PATCH 1/4] cgroup freezer: Fix buggy resume test for tasks frozen with cgroup freezer Matt Helsley
  0 siblings, 2 replies; 33+ messages in thread
From: Matt Helsley @ 2009-06-03  9:31 UTC (permalink / raw)
  Cc: Containers, Rafael J. Wysocki, Pavel Machek, Paul Menage,
	linux-pm-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA


This series fixes a bug with the mainline cgroup freezer code and
cleans up the cgroup freezer before introducing the new
CHECKPOINTING state in the last patch. All but this last patch are
independent of checkpoint/restart and hence should be suitable for
mainline.

The bug was noticed by Oren Ladaan and allows tasks frozen via the
cgroup freezer to be thawed during resume when they should remain
frozen. In Oren's checkpoint/restart tree is means that
sys_checkpoint() sometimes returns EBUSY even when the cgroup is, 
in fact, fully frozen. (Workaround: cat /cgroup/foo/freezer.state
before doing checkpoint..)

The final patch, which adds the CHECKPOINTING state, incorporates
Oren's comments and should fix the lockdep bug he posted too. I 
haven't tested this latest version yet myself and I have a few things to 
attend to before I can get to it so I'm posting this for initial review
now. I believe this resolves all the pending comments I've seen about
the CHECKPOINTING state.

Cheers,
	-Matt Helsley

Matt Helsley (4):
  cgroup freezer: Fix buggy resume test for tasks frozen with cgroup
    freezer
  cgroup freezer: Avoid lazy state changes when convenient
  cgroup freezer: Update stale locking comments
  cgroup freezer: Add CHECKPOINTING state to safeguard container
    checkpoint

 Documentation/cgroups/freezer-subsystem.txt |   10 ++
 checkpoint/checkpoint.c                     |    8 +-
 include/linux/freezer.h                     |   15 ++-
 kernel/cgroup_freezer.c                     |  163 +++++++++++++++++++--------
 kernel/power/process.c                      |    2 +-
 5 files changed, 146 insertions(+), 52 deletions(-)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 1/4] cgroup freezer: Fix buggy resume test for tasks frozen with cgroup freezer
       [not found] ` <cover.1244019829.git.matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-06-03  9:31   ` Matt Helsley
  2009-06-03 16:05     ` Serge E. Hallyn
       [not found]   ` <91260d5797bcf233ca54c7b41da8278e99b4ca66.1244019829.git.matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 33+ messages in thread
From: Matt Helsley @ 2009-06-03  9:31 UTC (permalink / raw)
  Cc: Containers, Rafael J. Wysocki, Pavel Machek, Paul Menage,
	linux-pm-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

When the cgroup freezer is used to freeze tasks we do not want to thaw
those tasks during resume. Currently we test the cgroup freezer
state of the resuming tasks to see if the cgroup is FROZEN.  If so
then we don't thaw the task. However, the FREEZING state also indicates
that the task should remain frozen.

This also avoids a problem pointed out by Oren Ladaan: the freezer state
transition from FREEZING to FROZEN is updated lazily when userspace reads
or writes the freezer.state file in the cgroup filesystem. This means that
resume will thaw tasks in cgroups which should be in the FROZEN state if
there is no read/write of the freezer.state file to trigger this
transition before suspend.

NOTE: Another "simple" solution would be to always update the cgroup
freezer state during resume. However it's a bad choice for several reasons:
Updating the cgroup freezer state is somewhat expensive because it requires
walking all the tasks in the cgroup and checking if they are each frozen.
Worse, this could easily make resume run in N^2 time where N is the number
of tasks in the cgroup. Finally, updating the freezer state from this code
path requires trickier locking because of the way locks must be ordered.

Instead of updating the freezer state we rely on the fact that lazy
updates only manage the transition from FREEZING to FROZEN. We know that
a cgroup with the FREEZING state may actually be FROZEN so test for that
state too. This makes sense in the resume path even for partially-frozen
cgroups -- those that really are FREEZING but not FROZEN.

Reported-by: Oren Ladaan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
Signed-off-by: Matt Helsley <matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Cc: Cedric Le Goater <legoater-GANU6spQydw@public.gmane.org>
Cc: Paul Menage <menage-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Cc: Li Zefan <lizf-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
Cc: Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org>
Cc: Pavel Machek <pavel-AlSwsSmVLrQ@public.gmane.org>
Cc: linux-pm-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org

Seems like a candidate for -stable.
---
 include/linux/freezer.h |    7 +++++--
 kernel/cgroup_freezer.c |    9 ++++++---
 kernel/power/process.c  |    2 +-
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index 5a361f8..da7e52b 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -64,9 +64,12 @@ extern bool freeze_task(struct task_struct *p, bool sig_only);
 extern void cancel_freezing(struct task_struct *p);
 
 #ifdef CONFIG_CGROUP_FREEZER
-extern int cgroup_frozen(struct task_struct *task);
+extern int cgroup_freezing_or_frozen(struct task_struct *task);
 #else /* !CONFIG_CGROUP_FREEZER */
-static inline int cgroup_frozen(struct task_struct *task) { return 0; }
+static inline int cgroup_freezing_or_frozen(struct task_struct *task)
+{
+	return 0;
+}
 #endif /* !CONFIG_CGROUP_FREEZER */
 
 /*
diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index fb249e2..765e2c1 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -47,17 +47,20 @@ static inline struct freezer *task_freezer(struct task_struct *task)
 			    struct freezer, css);
 }
 
-int cgroup_frozen(struct task_struct *task)
+int cgroup_freezing_or_frozen(struct task_struct *task)
 {
 	struct freezer *freezer;
 	enum freezer_state state;
 
 	task_lock(task);
 	freezer = task_freezer(task);
-	state = freezer->state;
+	if (!freezer->css.cgroup->parent)
+		state = CGROUP_THAWED; /* root cgroup can't be frozen */
+	else
+		state = freezer->state;
 	task_unlock(task);
 
-	return state == CGROUP_FROZEN;
+	return (state == CGROUP_FREEZING) || (state == CGROUP_FROZEN);
 }
 
 /*
diff --git a/kernel/power/process.c b/kernel/power/process.c
index ca63401..2c64932 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -135,7 +135,7 @@ static void thaw_tasks(bool nosig_only)
 		if (nosig_only && should_send_signal(p))
 			continue;
 
-		if (cgroup_frozen(p))
+		if (cgroup_freezing_or_frozen(p))
 			continue;
 
 		thaw_process(p);
-- 
1.5.6.3

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 1/4] cgroup freezer: Fix buggy resume test for tasks frozen with cgroup freezer
  2009-06-03  9:31 [PATCH 0/4] cgroup freezer: Fixes and CHECKPOINTING support Matt Helsley
       [not found] ` <cover.1244019829.git.matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-06-03  9:31 ` Matt Helsley
  1 sibling, 0 replies; 33+ messages in thread
From: Matt Helsley @ 2009-06-03  9:31 UTC (permalink / raw)
  Cc: Cedric Le Goater, Oren Laadan, Containers, Li Zefan,
	Pavel Machek, Paul Menage, linux-pm

When the cgroup freezer is used to freeze tasks we do not want to thaw
those tasks during resume. Currently we test the cgroup freezer
state of the resuming tasks to see if the cgroup is FROZEN.  If so
then we don't thaw the task. However, the FREEZING state also indicates
that the task should remain frozen.

This also avoids a problem pointed out by Oren Ladaan: the freezer state
transition from FREEZING to FROZEN is updated lazily when userspace reads
or writes the freezer.state file in the cgroup filesystem. This means that
resume will thaw tasks in cgroups which should be in the FROZEN state if
there is no read/write of the freezer.state file to trigger this
transition before suspend.

NOTE: Another "simple" solution would be to always update the cgroup
freezer state during resume. However it's a bad choice for several reasons:
Updating the cgroup freezer state is somewhat expensive because it requires
walking all the tasks in the cgroup and checking if they are each frozen.
Worse, this could easily make resume run in N^2 time where N is the number
of tasks in the cgroup. Finally, updating the freezer state from this code
path requires trickier locking because of the way locks must be ordered.

Instead of updating the freezer state we rely on the fact that lazy
updates only manage the transition from FREEZING to FROZEN. We know that
a cgroup with the FREEZING state may actually be FROZEN so test for that
state too. This makes sense in the resume path even for partially-frozen
cgroups -- those that really are FREEZING but not FROZEN.

Reported-by: Oren Ladaan <orenl@cs.columbia.edu>
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Cc: Cedric Le Goater <legoater@free.fr>
Cc: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Pavel Machek <pavel@suse.cz>
Cc: linux-pm@lists.linux-foundation.org

Seems like a candidate for -stable.
---
 include/linux/freezer.h |    7 +++++--
 kernel/cgroup_freezer.c |    9 ++++++---
 kernel/power/process.c  |    2 +-
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index 5a361f8..da7e52b 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -64,9 +64,12 @@ extern bool freeze_task(struct task_struct *p, bool sig_only);
 extern void cancel_freezing(struct task_struct *p);
 
 #ifdef CONFIG_CGROUP_FREEZER
-extern int cgroup_frozen(struct task_struct *task);
+extern int cgroup_freezing_or_frozen(struct task_struct *task);
 #else /* !CONFIG_CGROUP_FREEZER */
-static inline int cgroup_frozen(struct task_struct *task) { return 0; }
+static inline int cgroup_freezing_or_frozen(struct task_struct *task)
+{
+	return 0;
+}
 #endif /* !CONFIG_CGROUP_FREEZER */
 
 /*
diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index fb249e2..765e2c1 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -47,17 +47,20 @@ static inline struct freezer *task_freezer(struct task_struct *task)
 			    struct freezer, css);
 }
 
-int cgroup_frozen(struct task_struct *task)
+int cgroup_freezing_or_frozen(struct task_struct *task)
 {
 	struct freezer *freezer;
 	enum freezer_state state;
 
 	task_lock(task);
 	freezer = task_freezer(task);
-	state = freezer->state;
+	if (!freezer->css.cgroup->parent)
+		state = CGROUP_THAWED; /* root cgroup can't be frozen */
+	else
+		state = freezer->state;
 	task_unlock(task);
 
-	return state == CGROUP_FROZEN;
+	return (state == CGROUP_FREEZING) || (state == CGROUP_FROZEN);
 }
 
 /*
diff --git a/kernel/power/process.c b/kernel/power/process.c
index ca63401..2c64932 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -135,7 +135,7 @@ static void thaw_tasks(bool nosig_only)
 		if (nosig_only && should_send_signal(p))
 			continue;
 
-		if (cgroup_frozen(p))
+		if (cgroup_freezing_or_frozen(p))
 			continue;
 
 		thaw_process(p);
-- 
1.5.6.3

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 2/4] cgroup freezer: Avoid lazy state changes when convenient
       [not found]   ` <91260d5797bcf233ca54c7b41da8278e99b4ca66.1244019829.git.matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-06-03  9:31     ` Matt Helsley
  2009-06-03 16:05     ` [PATCH 1/4] cgroup freezer: Fix buggy resume test for tasks frozen with cgroup freezer Serge E. Hallyn
  1 sibling, 0 replies; 33+ messages in thread
From: Matt Helsley @ 2009-06-03  9:31 UTC (permalink / raw)
  Cc: Containers, Paul Menage

 When all the tasks of a cgroup were successfully frozen we can avoid
 the lazy FREEZING -> FROZEN transition and move into FROZEN during the
 write to freezer.state.

Signed-off-by: Matt Helsley <matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
---
 kernel/cgroup_freezer.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index 765e2c1..19702ac 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -286,7 +286,10 @@ static int try_to_freeze_cgroup(struct cgroup *cgroup, struct freezer *freezer)
 	}
 	cgroup_iter_end(cgroup, &it);
 
-	return num_cant_freeze_now ? -EBUSY : 0;
+	if (num_cant_freeze_now)
+		return -EBUSY;
+	freezer->state = CGROUP_FROZEN;
+	return 0;
 }
 
 static void unfreeze_cgroup(struct cgroup *cgroup, struct freezer *freezer)
-- 
1.5.6.3

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 3/4] cgroup freezer: Update stale locking comments
       [not found]   ` <03412f8681d89c99ac575330381ab49f6e5e61ba.1244019829.git.matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-06-03  9:31     ` Matt Helsley
  2009-06-03 16:10     ` [PATCH 2/4] cgroup freezer: Avoid lazy state changes when convenient Serge E. Hallyn
  2009-06-03 19:52     ` Serge E. Hallyn
  2 siblings, 0 replies; 33+ messages in thread
From: Matt Helsley @ 2009-06-03  9:31 UTC (permalink / raw)
  Cc: Containers, Paul Menage

Update stale comments regarding locking order and add a little more detail
so it's easier to follow the locking between the cgroup freezer and the
power management freezer code.

Signed-off-by: Matt Helsley <matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Cc: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
Cc: Cedric Le Goater <legoater-GANU6spQydw@public.gmane.org>
Cc: Paul Menage <menage-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Cc: Li Zefan <lizf-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
---
 kernel/cgroup_freezer.c |   21 +++++++++++++--------
 1 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index 19702ac..05795b7 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -88,10 +88,10 @@ struct cgroup_subsys freezer_subsys;
 
 /* Locks taken and their ordering
  * ------------------------------
- * css_set_lock
  * cgroup_mutex (AKA cgroup_lock)
- * task->alloc_lock (AKA task_lock)
  * freezer->lock
+ * css_set_lock
+ * task->alloc_lock (AKA task_lock)
  * task->sighand->siglock
  *
  * cgroup code forces css_set_lock to be taken before task->alloc_lock
@@ -99,33 +99,38 @@ struct cgroup_subsys freezer_subsys;
  * freezer_create(), freezer_destroy():
  * cgroup_mutex [ by cgroup core ]
  *
- * can_attach():
- * cgroup_mutex
+ * freezer_can_attach():
+ * cgroup_mutex (held by caller of can_attach)
  *
- * cgroup_frozen():
+ * cgroup_freezing_or_frozen():
  * task->alloc_lock (to get task's cgroup)
  *
  * freezer_fork() (preserving fork() performance means can't take cgroup_mutex):
- * task->alloc_lock (to get task's cgroup)
  * freezer->lock
  *  sighand->siglock (if the cgroup is freezing)
  *
  * freezer_read():
  * cgroup_mutex
  *  freezer->lock
+ *   write_lock css_set_lock (cgroup iterator start)
+ *    task->alloc_lock
  *   read_lock css_set_lock (cgroup iterator start)
  *
  * freezer_write() (freeze):
  * cgroup_mutex
  *  freezer->lock
+ *   write_lock css_set_lock (cgroup iterator start)
+ *    task->alloc_lock
  *   read_lock css_set_lock (cgroup iterator start)
- *    sighand->siglock
+ *    sighand->siglock (fake signal delivery inside freeze_task())
  *
  * freezer_write() (unfreeze):
  * cgroup_mutex
  *  freezer->lock
+ *   write_lock css_set_lock (cgroup iterator start)
+ *    task->alloc_lock
  *   read_lock css_set_lock (cgroup iterator start)
- *    task->alloc_lock (to prevent races with freeze_task())
+ *    task->alloc_lock (inside thaw_process(), prevents race with refrigerator())
  *     sighand->siglock
  */
 static struct cgroup_subsys_state *freezer_create(struct cgroup_subsys *ss,
-- 
1.5.6.3

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 4/4] cgroup freezer: Add CHECKPOINTING state to safeguard container checkpoint
       [not found]   ` <4abbb542b220448d2e8d17e5074899568c584f23.1244019829.git.matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-06-03  9:31     ` Matt Helsley
       [not found]       ` <89c3726813accffb7c51cd30ff93b79a4391f382.1244019829.git.matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 33+ messages in thread
From: Matt Helsley @ 2009-06-03  9:31 UTC (permalink / raw)
  Cc: Containers, Paul Menage

The CHECKPOINTING state prevents userspace from unfreezing tasks until
sys_checkpoint() is finished. When doing container checkpoint userspace
will do:

	echo FROZEN > /cgroups/my_container/freezer.state
	...
	rc = sys_checkpoint( <pid of container root> );

To ensure a consistent checkpoint image userspace should not be allowed
to thaw the cgroup (echo THAWED > /cgroups/my_container/freezer.state)
during checkpoint.

"CHECKPOINTING" can only be set on a "FROZEN" cgroup using the checkpoint
system call. Once in the "CHECKPOINTING" state, the cgroup may not leave until
the checkpoint system call is finished and ready to return. Then the
freezer state returns to "FROZEN". Writing any new state to freezer.state while
checkpointing will return EBUSY. These semantics ensure that userspace cannot
unfreeze the cgroup midway through the checkpoint system call.

The cgroup_freezer_begin_checkpoint() and cgroup_freezer_end_checkpoint()
make relatively few assumptions about the task that is passed in. However the
way they are called in do_checkpoint() assumes that the root of the container
is in the same freezer cgroup as all the other tasks that will be
checkpointed.

Signed-off-by: Matt Helsley <matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Cc: Paul Menage <menage-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Cc: Li Zefan <lizf-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
Cc: Cedric Le Goater <legoater-GANU6spQydw@public.gmane.org>
Cc: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>

Notes:
	Meant to work with Oren's checkpoint/restart v16-dev git tree.
        Still needs testing.
        As a side-effect this prevents the multiple tasks from entering the
                CHECKPOINTING state simultaneously. All but one will get -EBUSY.
---
 Documentation/cgroups/freezer-subsystem.txt |   10 ++
 checkpoint/checkpoint.c                     |    8 ++-
 include/linux/freezer.h                     |    8 ++
 kernel/cgroup_freezer.c                     |  128 +++++++++++++++++++-------
 4 files changed, 117 insertions(+), 37 deletions(-)

diff --git a/Documentation/cgroups/freezer-subsystem.txt b/Documentation/cgroups/freezer-subsystem.txt
index 41f37fe..92b68e6 100644
--- a/Documentation/cgroups/freezer-subsystem.txt
+++ b/Documentation/cgroups/freezer-subsystem.txt
@@ -100,3 +100,13 @@ things happens:
 		and returns EINVAL)
 	3) The tasks that blocked the cgroup from entering the "FROZEN"
 		state disappear from the cgroup's set of tasks.
+
+When the cgroup freezer is used to guard container checkpoint operations the
+freezer.state may be "CHECKPOINTING". "CHECKPOINTING" can only be set on a
+"FROZEN" cgroup using the checkpoint system call. Once in the "CHECKPOINTING"
+state, the cgroup may not leave until the checkpoint system call returns the
+freezer state to "FROZEN". Writing any new state to freezer.state while
+checkpointing will return EBUSY. These semantics ensure that userspace cannot
+unfreeze the cgroup midway through the checkpoint system call. Note that,
+unlike "FROZEN" and "FREEZING", there is no corresponding "CHECKPOINTED"
+state.
diff --git a/checkpoint/checkpoint.c b/checkpoint/checkpoint.c
index afc7300..d586a9b 100644
--- a/checkpoint/checkpoint.c
+++ b/checkpoint/checkpoint.c
@@ -569,7 +569,10 @@ int do_checkpoint(struct ckpt_ctx *ctx, pid_t pid)
 
 	ret = init_checkpoint_ctx(ctx, pid);
 	if (ret < 0)
-		goto out;
+		return ret;
+	ret = cgroup_freezer_begin_checkpoint(ctx->root_task);
+	if (ret < 0)
+		return ret;
 	ret = build_tree(ctx);
 	if (ret < 0)
 		goto out;
@@ -597,6 +600,7 @@ int do_checkpoint(struct ckpt_ctx *ctx, pid_t pid)
 	/* on success, return (unique) checkpoint identifier */
 	ctx->crid = atomic_inc_return(&ctx_count);
 	ret = ctx->crid;
- out:
+out:
+	cgroup_freezer_end_checkpoint(ctx->root_task);
 	return ret;
 }
diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index da7e52b..cd31593 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -65,11 +65,19 @@ extern void cancel_freezing(struct task_struct *p);
 
 #ifdef CONFIG_CGROUP_FREEZER
 extern int cgroup_freezing_or_frozen(struct task_struct *task);
+extern int cgroup_freezer_begin_checkpoint(struct task_struct *task);
+extern void cgroup_freezer_end_checkpoint(struct task_struct *task);
 #else /* !CONFIG_CGROUP_FREEZER */
 static inline int cgroup_freezing_or_frozen(struct task_struct *task)
 {
 	return 0;
 }
+static inline int cgroup_freezer_begin_checkpoint(struct task_struct *task)
+{
+	return -ENOTSUP;
+}
+static inline void cgroup_freezer_end_checkpoint(struct task_struct *task)
+{}
 #endif /* !CONFIG_CGROUP_FREEZER */
 
 /*
diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index 05795b7..6519692 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -25,6 +25,7 @@ enum freezer_state {
 	CGROUP_THAWED = 0,
 	CGROUP_FREEZING,
 	CGROUP_FROZEN,
+	CGROUP_CHECKPOINTING,
 };
 
 struct freezer {
@@ -64,6 +65,90 @@ int cgroup_freezing_or_frozen(struct task_struct *task)
 }
 
 /*
+ * caller must hold freezer->lock
+ */
+static void update_freezer_state(struct cgroup *cgroup,
+				 struct freezer *freezer)
+{
+	struct cgroup_iter it;
+	struct task_struct *task;
+	unsigned int nfrozen = 0, ntotal = 0;
+
+	cgroup_iter_start(cgroup, &it);
+	while ((task = cgroup_iter_next(cgroup, &it))) {
+		ntotal++;
+		if (is_task_frozen_enough(task))
+			nfrozen++;
+	}
+
+	/*
+	 * Transition to FROZEN when no new tasks can be added ensures
+	 * that we never exist in the FROZEN state while there are unfrozen
+	 * tasks.
+	 */
+	if (nfrozen == ntotal)
+		freezer->state = CGROUP_FROZEN;
+	else if (nfrozen > 0)
+		freezer->state = CGROUP_FREEZING;
+	else
+		freezer->state = CGROUP_THAWED;
+	cgroup_iter_end(cgroup, &it);
+}
+
+/*
+ * cgroup freezer state changes made without the aid of the cgroup filesystem
+ * must go through this function to ensure proper locking is observed.
+ */
+static int freezer_checkpointing(struct task_struct *task,
+				 enum freezer_state next_state)
+{
+	struct freezer *freezer;
+	struct cgroup_subsys_state *css;
+	enum freezer_state state;
+
+	task_lock(task);
+	css = task_subsys_state(task, freezer_subsys_id);
+	css_get(css); /* make sure freezer doesn't go away */
+	freezer = containerof(css, struct freezer, css);
+	task_unlock(task);
+
+	if (freezer->state == CGROUP_FREEZING) {
+		/* May be in middle of a lazy FREEZING -> FROZEN transition */
+		if (cgroup_lock_live_group(css->cgroup)) {
+			spin_lock_irq(&freezer->lock);
+			update_freezer_state(css->cgroup, freezer);
+			spin_unlock_irq(&freezer->lock);
+			cgroup_unlock();
+		}
+	}
+
+	spin_lock_irq(&freezer->lock);
+	state = freezer->state;
+	if ((state == CGROUP_FROZEN && next_state == CGROUP_CHECKPOINTING) ||
+	    (state == CGROUP_CHECKPOINTING && next_state == CGROUP_FROZEN))
+		freezer->state = next_state;
+	spin_unlock_irq(&freezer->lock);
+	css_put(css);
+	return state;
+}
+
+int cgroup_freezer_begin_checkpoint(struct task_struct *task)
+{
+	if (freezer_checkpointing(task, CGROUP_CHECKPOINTING) != CGROUP_FROZEN)
+		return -EBUSY;
+	return 0;
+}
+
+void cgroup_freezer_end_checkpoint(struct task_struct *task)
+{
+	/*
+	 * If we weren't in CHECKPOINTING state then userspace could have
+	 * unfrozen a task and given us an inconsistent checkpoint image
+	 */
+	WARN_ON(freezer_checkpointing(task, CGROUP_FROZEN) != CGROUP_CHECKPOINTING);
+}
+
+/*
  * cgroups_write_string() limits the size of freezer state strings to
  * CGROUP_LOCAL_BUFFER_SIZE
  */
@@ -71,6 +156,7 @@ static const char *freezer_state_strs[] = {
 	"THAWED",
 	"FREEZING",
 	"FROZEN",
+	"CHECKPOINTING",
 };
 
 /*
@@ -78,9 +164,9 @@ static const char *freezer_state_strs[] = {
  * Transitions are caused by userspace writes to the freezer.state file.
  * The values in parenthesis are state labels. The rest are edge labels.
  *
- * (THAWED) --FROZEN--> (FREEZING) --FROZEN--> (FROZEN)
- *    ^ ^                    |                     |
- *    | \_______THAWED_______/                     |
+ * (THAWED) --FROZEN--> (FREEZING) --FROZEN--> (FROZEN) --> (CHECKPOINTING)
+ *    ^ ^                    |                     | ^             |
+ *    | \_______THAWED_______/                     | \_____________/
  *    \__________________________THAWED____________/
  */
 
@@ -216,37 +302,6 @@ static void freezer_fork(struct cgroup_subsys *ss, struct task_struct *task)
 	spin_unlock_irq(&freezer->lock);
 }
 
-/*
- * caller must hold freezer->lock
- */
-static void update_freezer_state(struct cgroup *cgroup,
-				 struct freezer *freezer)
-{
-	struct cgroup_iter it;
-	struct task_struct *task;
-	unsigned int nfrozen = 0, ntotal = 0;
-
-	cgroup_iter_start(cgroup, &it);
-	while ((task = cgroup_iter_next(cgroup, &it))) {
-		ntotal++;
-		if (is_task_frozen_enough(task))
-			nfrozen++;
-	}
-
-	/*
-	 * Transition to FROZEN when no new tasks can be added ensures
-	 * that we never exist in the FROZEN state while there are unfrozen
-	 * tasks.
-	 */
-	if (nfrozen == ntotal)
-		freezer->state = CGROUP_FROZEN;
-	else if (nfrozen > 0)
-		freezer->state = CGROUP_FREEZING;
-	else
-		freezer->state = CGROUP_THAWED;
-	cgroup_iter_end(cgroup, &it);
-}
-
 static int freezer_read(struct cgroup *cgroup, struct cftype *cft,
 			struct seq_file *m)
 {
@@ -320,7 +375,10 @@ static int freezer_change_state(struct cgroup *cgroup,
 	freezer = cgroup_freezer(cgroup);
 
 	spin_lock_irq(&freezer->lock);
-
+	if (freezer->state == CGROUP_CHECKPOINTING) {
+		retval = -EBUSY;
+		goto out;
+	}
 	update_freezer_state(cgroup, freezer);
 	if (goal_state == freezer->state)
 		goto out;
-- 
1.5.6.3

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/4] cgroup freezer: Fix buggy resume test for tasks frozen with cgroup freezer
       [not found]   ` <91260d5797bcf233ca54c7b41da8278e99b4ca66.1244019829.git.matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  2009-06-03  9:31     ` [PATCH 2/4] cgroup freezer: Avoid lazy state changes when convenient Matt Helsley
@ 2009-06-03 16:05     ` Serge E. Hallyn
  2009-06-04  0:55       ` Li Zefan
       [not found]       ` <20090603160551.GB7848-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  1 sibling, 2 replies; 33+ messages in thread
From: Serge E. Hallyn @ 2009-06-03 16:05 UTC (permalink / raw)
  To: Matt Helsley
  Cc: Rafael J. Wysocki, Paul Menage, Containers, Pavel Machek,
	linux-pm-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Quoting Matt Helsley (matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org):
> -int cgroup_frozen(struct task_struct *task)
> +int cgroup_freezing_or_frozen(struct task_struct *task)
>  {
>  	struct freezer *freezer;
>  	enum freezer_state state;
> 
>  	task_lock(task);
>  	freezer = task_freezer(task);
> -	state = freezer->state;
> +	if (!freezer->css.cgroup->parent)
> +		state = CGROUP_THAWED; /* root cgroup can't be frozen */
> +	else
> +		state = freezer->state;

Why do you have to add this special-casing now?  I thought
we were already preventing freezing the root cgroup, so
freezer->state should be sane in that case anyway?

>  	task_unlock(task);
> 
> -	return state == CGROUP_FROZEN;
> +	return (state == CGROUP_FREEZING) || (state == CGROUP_FROZEN);
>  }

-serge

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/4] cgroup freezer: Fix buggy resume test for tasks frozen with cgroup freezer
  2009-06-03  9:31   ` [PATCH 1/4] cgroup freezer: Fix buggy resume test for tasks frozen with cgroup freezer Matt Helsley
@ 2009-06-03 16:05     ` Serge E. Hallyn
  0 siblings, 0 replies; 33+ messages in thread
From: Serge E. Hallyn @ 2009-06-03 16:05 UTC (permalink / raw)
  To: Matt Helsley; +Cc: Paul Menage, Containers, Pavel Machek, linux-pm

Quoting Matt Helsley (matthltc@us.ibm.com):
> -int cgroup_frozen(struct task_struct *task)
> +int cgroup_freezing_or_frozen(struct task_struct *task)
>  {
>  	struct freezer *freezer;
>  	enum freezer_state state;
> 
>  	task_lock(task);
>  	freezer = task_freezer(task);
> -	state = freezer->state;
> +	if (!freezer->css.cgroup->parent)
> +		state = CGROUP_THAWED; /* root cgroup can't be frozen */
> +	else
> +		state = freezer->state;

Why do you have to add this special-casing now?  I thought
we were already preventing freezing the root cgroup, so
freezer->state should be sane in that case anyway?

>  	task_unlock(task);
> 
> -	return state == CGROUP_FROZEN;
> +	return (state == CGROUP_FREEZING) || (state == CGROUP_FROZEN);
>  }

-serge

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/4] cgroup freezer: Avoid lazy state changes when convenient
       [not found]   ` <03412f8681d89c99ac575330381ab49f6e5e61ba.1244019829.git.matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  2009-06-03  9:31     ` [PATCH 3/4] cgroup freezer: Update stale locking comments Matt Helsley
@ 2009-06-03 16:10     ` Serge E. Hallyn
       [not found]       ` <20090603161046.GC7848-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  2009-06-03 19:52     ` Serge E. Hallyn
  2 siblings, 1 reply; 33+ messages in thread
From: Serge E. Hallyn @ 2009-06-03 16:10 UTC (permalink / raw)
  To: Matt Helsley; +Cc: Containers, Paul Menage

Quoting Matt Helsley (matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org):
>  When all the tasks of a cgroup were successfully frozen we can avoid
>  the lazy FREEZING -> FROZEN transition and move into FROZEN during the
>  write to freezer.state.

Can you remind us then what the point of the FREEZING state is?
It doesn't look to me like, after this patch, a cgroup will
ever be FREEZING?

> Signed-off-by: Matt Helsley <matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> ---
>  kernel/cgroup_freezer.c |    5 ++++-
>  1 files changed, 4 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
> index 765e2c1..19702ac 100644
> --- a/kernel/cgroup_freezer.c
> +++ b/kernel/cgroup_freezer.c
> @@ -286,7 +286,10 @@ static int try_to_freeze_cgroup(struct cgroup *cgroup, struct freezer *freezer)
>  	}
>  	cgroup_iter_end(cgroup, &it);
> 
> -	return num_cant_freeze_now ? -EBUSY : 0;
> +	if (num_cant_freeze_now)
> +		return -EBUSY;
> +	freezer->state = CGROUP_FROZEN;
> +	return 0;
>  }
> 
>  static void unfreeze_cgroup(struct cgroup *cgroup, struct freezer *freezer)
> -- 
> 1.5.6.3
> 
> 
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linux-foundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/4] cgroup freezer: Add CHECKPOINTING state to safeguard container checkpoint
       [not found]       ` <89c3726813accffb7c51cd30ff93b79a4391f382.1244019829.git.matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-06-03 16:18         ` Serge E. Hallyn
       [not found]           ` <20090603161840.GD7848-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  2009-06-03 16:53         ` Oren Laadan
  1 sibling, 1 reply; 33+ messages in thread
From: Serge E. Hallyn @ 2009-06-03 16:18 UTC (permalink / raw)
  To: Matt Helsley; +Cc: Containers, Paul Menage

Quoting Matt Helsley (matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org):
> The CHECKPOINTING state prevents userspace from unfreezing tasks until
> sys_checkpoint() is finished. When doing container checkpoint userspace
> will do:
> 
> 	echo FROZEN > /cgroups/my_container/freezer.state
> 	...
> 	rc = sys_checkpoint( <pid of container root> );
> 
> To ensure a consistent checkpoint image userspace should not be allowed
> to thaw the cgroup (echo THAWED > /cgroups/my_container/freezer.state)
> during checkpoint.
> 
> "CHECKPOINTING" can only be set on a "FROZEN" cgroup using the checkpoint
> system call. Once in the "CHECKPOINTING" state, the cgroup may not leave until
> the checkpoint system call is finished and ready to return. Then the
> freezer state returns to "FROZEN". Writing any new state to freezer.state while
> checkpointing will return EBUSY. These semantics ensure that userspace cannot
> unfreeze the cgroup midway through the checkpoint system call.
> 
> The cgroup_freezer_begin_checkpoint() and cgroup_freezer_end_checkpoint()
> make relatively few assumptions about the task that is passed in. However the
> way they are called in do_checkpoint() assumes that the root of the container
> is in the same freezer cgroup as all the other tasks that will be
> checkpointed.
> 
> Signed-off-by: Matt Helsley <matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> Cc: Paul Menage <menage-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> Cc: Li Zefan <lizf-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
> Cc: Cedric Le Goater <legoater-GANU6spQydw@public.gmane.org>
> Cc: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
> 
> Notes:
> 	Meant to work with Oren's checkpoint/restart v16-dev git tree.
>         Still needs testing.
>         As a side-effect this prevents the multiple tasks from entering the
>                 CHECKPOINTING state simultaneously. All but one will get -EBUSY.
> ---
>  Documentation/cgroups/freezer-subsystem.txt |   10 ++
>  checkpoint/checkpoint.c                     |    8 ++-
>  include/linux/freezer.h                     |    8 ++
>  kernel/cgroup_freezer.c                     |  128 +++++++++++++++++++-------
>  4 files changed, 117 insertions(+), 37 deletions(-)
> 
> diff --git a/Documentation/cgroups/freezer-subsystem.txt b/Documentation/cgroups/freezer-subsystem.txt
> index 41f37fe..92b68e6 100644
> --- a/Documentation/cgroups/freezer-subsystem.txt
> +++ b/Documentation/cgroups/freezer-subsystem.txt
> @@ -100,3 +100,13 @@ things happens:
>  		and returns EINVAL)
>  	3) The tasks that blocked the cgroup from entering the "FROZEN"
>  		state disappear from the cgroup's set of tasks.
> +
> +When the cgroup freezer is used to guard container checkpoint operations the
> +freezer.state may be "CHECKPOINTING". "CHECKPOINTING" can only be set on a
> +"FROZEN" cgroup using the checkpoint system call. Once in the "CHECKPOINTING"
> +state, the cgroup may not leave until the checkpoint system call returns the
> +freezer state to "FROZEN". Writing any new state to freezer.state while
> +checkpointing will return EBUSY. These semantics ensure that userspace cannot
> +unfreeze the cgroup midway through the checkpoint system call. Note that,
> +unlike "FROZEN" and "FREEZING", there is no corresponding "CHECKPOINTED"
> +state.
> diff --git a/checkpoint/checkpoint.c b/checkpoint/checkpoint.c
> index afc7300..d586a9b 100644
> --- a/checkpoint/checkpoint.c
> +++ b/checkpoint/checkpoint.c
> @@ -569,7 +569,10 @@ int do_checkpoint(struct ckpt_ctx *ctx, pid_t pid)
> 
>  	ret = init_checkpoint_ctx(ctx, pid);
>  	if (ret < 0)
> -		goto out;
> +		return ret;
> +	ret = cgroup_freezer_begin_checkpoint(ctx->root_task);
> +	if (ret < 0)
> +		return ret;
>  	ret = build_tree(ctx);
>  	if (ret < 0)
>  		goto out;
> @@ -597,6 +600,7 @@ int do_checkpoint(struct ckpt_ctx *ctx, pid_t pid)
>  	/* on success, return (unique) checkpoint identifier */
>  	ctx->crid = atomic_inc_return(&ctx_count);
>  	ret = ctx->crid;
> - out:
> +out:
> +	cgroup_freezer_end_checkpoint(ctx->root_task);
>  	return ret;
>  }

We have no guarantees that all tasks under ctx->root_task are in
the same freezer cgroup.  This presents a problem if we want to
make this guarantee which you're trying to make.

First, you need to update the may_checkpoint() check to check that
the tasks are CHECKPOINTING, not FROZEN.

But in addition, note that (a) there may be two parallel checkpoints of
separate tasksets (C and D), and (b) userspace may have mucked up
and put some tasks from set C into set D's cgroup.  That means that it's
possible that all of the may_checkpoint() tasks pass, but if D's
checkpoint finishes before C's, then some tasks will be unfrozen
before their checkpoint is complete.

So I guess now you need to enforce at may_checkpoint() that all tasks
being checkpointed are in ctx->root_task's freezer cgroup?

thanks,
-serge

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/4] cgroup freezer: Add CHECKPOINTING state to safeguard container checkpoint
       [not found]       ` <89c3726813accffb7c51cd30ff93b79a4391f382.1244019829.git.matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  2009-06-03 16:18         ` Serge E. Hallyn
@ 2009-06-03 16:53         ` Oren Laadan
  1 sibling, 0 replies; 33+ messages in thread
From: Oren Laadan @ 2009-06-03 16:53 UTC (permalink / raw)
  To: Matt Helsley; +Cc: Paul Menage, Containers


Patch doesn't compile ... apply this:

diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index cd31593..4acc2a1 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -74,7 +74,7 @@ static inline int cgroup_freezing_or_frozen(struct task_struct *task)
 }
 static inline int cgroup_freezer_begin_checkpoint(struct task_struct *task)
 {
-	return -ENOTSUP;
+	return -ENOTSUPP;
 }
 static inline void cgroup_freezer_end_checkpoint(struct task_struct *task)
 {}
diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index 6519692..f81b333 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -64,6 +64,13 @@ int cgroup_freezing_or_frozen(struct task_struct *task)
 	return (state == CGROUP_FREEZING) || (state == CGROUP_FROZEN);
 }
 
+/* Task is frozen or will freeze immediately when next it gets woken */
+static bool is_task_frozen_enough(struct task_struct *task)
+{
+	return frozen(task) ||
+		(task_is_stopped_or_traced(task) && freezing(task));
+}
+
 /*
  * caller must hold freezer->lock
  */
@@ -109,7 +116,7 @@ static int freezer_checkpointing(struct task_struct *task,
 	task_lock(task);
 	css = task_subsys_state(task, freezer_subsys_id);
 	css_get(css); /* make sure freezer doesn't go away */
-	freezer = containerof(css, struct freezer, css);
+	freezer = container_of(css, struct freezer, css);
 	task_unlock(task);
 
 	if (freezer->state == CGROUP_FREEZING) {
@@ -239,13 +246,6 @@ static void freezer_destroy(struct cgroup_subsys *ss,
 	kfree(cgroup_freezer(cgroup));
 }
 
-/* Task is frozen or will freeze immediately when next it gets woken */
-static bool is_task_frozen_enough(struct task_struct *task)
-{
-	return frozen(task) ||
-		(task_is_stopped_or_traced(task) && freezing(task));
-}
-
 /*
  * The call to cgroup_lock() in the freezer.state write method prevents
  * a write to that file racing against an attach, and hence the




On Wed, 3 Jun 2009, Matt Helsley wrote:

> The CHECKPOINTING state prevents userspace from unfreezing tasks until
> sys_checkpoint() is finished. When doing container checkpoint userspace
> will do:
> 
> 	echo FROZEN > /cgroups/my_container/freezer.state
> 	...
> 	rc = sys_checkpoint( <pid of container root> );
> 
> To ensure a consistent checkpoint image userspace should not be allowed
> to thaw the cgroup (echo THAWED > /cgroups/my_container/freezer.state)
> during checkpoint.
> 
> "CHECKPOINTING" can only be set on a "FROZEN" cgroup using the checkpoint
> system call. Once in the "CHECKPOINTING" state, the cgroup may not leave until
> the checkpoint system call is finished and ready to return. Then the
> freezer state returns to "FROZEN". Writing any new state to freezer.state while
> checkpointing will return EBUSY. These semantics ensure that userspace cannot
> unfreeze the cgroup midway through the checkpoint system call.
> 
> The cgroup_freezer_begin_checkpoint() and cgroup_freezer_end_checkpoint()
> make relatively few assumptions about the task that is passed in. However the
> way they are called in do_checkpoint() assumes that the root of the container
> is in the same freezer cgroup as all the other tasks that will be
> checkpointed.
> 
> Signed-off-by: Matt Helsley <matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> Cc: Paul Menage <menage-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> Cc: Li Zefan <lizf-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
> Cc: Cedric Le Goater <legoater-GANU6spQydw@public.gmane.org>
> Cc: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
> 
> Notes:
> 	Meant to work with Oren's checkpoint/restart v16-dev git tree.
>         Still needs testing.
>         As a side-effect this prevents the multiple tasks from entering the
>                 CHECKPOINTING state simultaneously. All but one will get -EBUSY.
> ---
>  Documentation/cgroups/freezer-subsystem.txt |   10 ++
>  checkpoint/checkpoint.c                     |    8 ++-
>  include/linux/freezer.h                     |    8 ++
>  kernel/cgroup_freezer.c                     |  128 +++++++++++++++++++-------
>  4 files changed, 117 insertions(+), 37 deletions(-)
> 
> diff --git a/Documentation/cgroups/freezer-subsystem.txt b/Documentation/cgroups/freezer-subsystem.txt
> index 41f37fe..92b68e6 100644
> --- a/Documentation/cgroups/freezer-subsystem.txt
> +++ b/Documentation/cgroups/freezer-subsystem.txt
> @@ -100,3 +100,13 @@ things happens:
>  		and returns EINVAL)
>  	3) The tasks that blocked the cgroup from entering the "FROZEN"
>  		state disappear from the cgroup's set of tasks.
> +
> +When the cgroup freezer is used to guard container checkpoint operations the
> +freezer.state may be "CHECKPOINTING". "CHECKPOINTING" can only be set on a
> +"FROZEN" cgroup using the checkpoint system call. Once in the "CHECKPOINTING"
> +state, the cgroup may not leave until the checkpoint system call returns the
> +freezer state to "FROZEN". Writing any new state to freezer.state while
> +checkpointing will return EBUSY. These semantics ensure that userspace cannot
> +unfreeze the cgroup midway through the checkpoint system call. Note that,
> +unlike "FROZEN" and "FREEZING", there is no corresponding "CHECKPOINTED"
> +state.
> diff --git a/checkpoint/checkpoint.c b/checkpoint/checkpoint.c
> index afc7300..d586a9b 100644
> --- a/checkpoint/checkpoint.c
> +++ b/checkpoint/checkpoint.c
> @@ -569,7 +569,10 @@ int do_checkpoint(struct ckpt_ctx *ctx, pid_t pid)
>  
>  	ret = init_checkpoint_ctx(ctx, pid);
>  	if (ret < 0)
> -		goto out;
> +		return ret;
> +	ret = cgroup_freezer_begin_checkpoint(ctx->root_task);
> +	if (ret < 0)
> +		return ret;
>  	ret = build_tree(ctx);
>  	if (ret < 0)
>  		goto out;
> @@ -597,6 +600,7 @@ int do_checkpoint(struct ckpt_ctx *ctx, pid_t pid)
>  	/* on success, return (unique) checkpoint identifier */
>  	ctx->crid = atomic_inc_return(&ctx_count);
>  	ret = ctx->crid;
> - out:
> +out:
> +	cgroup_freezer_end_checkpoint(ctx->root_task);
>  	return ret;
>  }
> diff --git a/include/linux/freezer.h b/include/linux/freezer.h
> index da7e52b..cd31593 100644
> --- a/include/linux/freezer.h
> +++ b/include/linux/freezer.h
> @@ -65,11 +65,19 @@ extern void cancel_freezing(struct task_struct *p);
>  
>  #ifdef CONFIG_CGROUP_FREEZER
>  extern int cgroup_freezing_or_frozen(struct task_struct *task);
> +extern int cgroup_freezer_begin_checkpoint(struct task_struct *task);
> +extern void cgroup_freezer_end_checkpoint(struct task_struct *task);
>  #else /* !CONFIG_CGROUP_FREEZER */
>  static inline int cgroup_freezing_or_frozen(struct task_struct *task)
>  {
>  	return 0;
>  }
> +static inline int cgroup_freezer_begin_checkpoint(struct task_struct *task)
> +{
> +	return -ENOTSUP;
> +}
> +static inline void cgroup_freezer_end_checkpoint(struct task_struct *task)
> +{}
>  #endif /* !CONFIG_CGROUP_FREEZER */
>  
>  /*
> diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
> index 05795b7..6519692 100644
> --- a/kernel/cgroup_freezer.c
> +++ b/kernel/cgroup_freezer.c
> @@ -25,6 +25,7 @@ enum freezer_state {
>  	CGROUP_THAWED = 0,
>  	CGROUP_FREEZING,
>  	CGROUP_FROZEN,
> +	CGROUP_CHECKPOINTING,
>  };
>  
>  struct freezer {
> @@ -64,6 +65,90 @@ int cgroup_freezing_or_frozen(struct task_struct *task)
>  }
>  
>  /*
> + * caller must hold freezer->lock
> + */
> +static void update_freezer_state(struct cgroup *cgroup,
> +				 struct freezer *freezer)
> +{
> +	struct cgroup_iter it;
> +	struct task_struct *task;
> +	unsigned int nfrozen = 0, ntotal = 0;
> +
> +	cgroup_iter_start(cgroup, &it);
> +	while ((task = cgroup_iter_next(cgroup, &it))) {
> +		ntotal++;
> +		if (is_task_frozen_enough(task))
> +			nfrozen++;
> +	}
> +
> +	/*
> +	 * Transition to FROZEN when no new tasks can be added ensures
> +	 * that we never exist in the FROZEN state while there are unfrozen
> +	 * tasks.
> +	 */
> +	if (nfrozen == ntotal)
> +		freezer->state = CGROUP_FROZEN;
> +	else if (nfrozen > 0)
> +		freezer->state = CGROUP_FREEZING;
> +	else
> +		freezer->state = CGROUP_THAWED;
> +	cgroup_iter_end(cgroup, &it);
> +}
> +
> +/*
> + * cgroup freezer state changes made without the aid of the cgroup filesystem
> + * must go through this function to ensure proper locking is observed.
> + */
> +static int freezer_checkpointing(struct task_struct *task,
> +				 enum freezer_state next_state)
> +{
> +	struct freezer *freezer;
> +	struct cgroup_subsys_state *css;
> +	enum freezer_state state;
> +
> +	task_lock(task);
> +	css = task_subsys_state(task, freezer_subsys_id);
> +	css_get(css); /* make sure freezer doesn't go away */
> +	freezer = containerof(css, struct freezer, css);
> +	task_unlock(task);
> +
> +	if (freezer->state == CGROUP_FREEZING) {
> +		/* May be in middle of a lazy FREEZING -> FROZEN transition */
> +		if (cgroup_lock_live_group(css->cgroup)) {
> +			spin_lock_irq(&freezer->lock);
> +			update_freezer_state(css->cgroup, freezer);
> +			spin_unlock_irq(&freezer->lock);
> +			cgroup_unlock();
> +		}
> +	}
> +
> +	spin_lock_irq(&freezer->lock);
> +	state = freezer->state;
> +	if ((state == CGROUP_FROZEN && next_state == CGROUP_CHECKPOINTING) ||
> +	    (state == CGROUP_CHECKPOINTING && next_state == CGROUP_FROZEN))
> +		freezer->state = next_state;
> +	spin_unlock_irq(&freezer->lock);
> +	css_put(css);
> +	return state;
> +}
> +
> +int cgroup_freezer_begin_checkpoint(struct task_struct *task)
> +{
> +	if (freezer_checkpointing(task, CGROUP_CHECKPOINTING) != CGROUP_FROZEN)
> +		return -EBUSY;
> +	return 0;
> +}
> +
> +void cgroup_freezer_end_checkpoint(struct task_struct *task)
> +{
> +	/*
> +	 * If we weren't in CHECKPOINTING state then userspace could have
> +	 * unfrozen a task and given us an inconsistent checkpoint image
> +	 */
> +	WARN_ON(freezer_checkpointing(task, CGROUP_FROZEN) != CGROUP_CHECKPOINTING);
> +}
> +
> +/*
>   * cgroups_write_string() limits the size of freezer state strings to
>   * CGROUP_LOCAL_BUFFER_SIZE
>   */
> @@ -71,6 +156,7 @@ static const char *freezer_state_strs[] = {
>  	"THAWED",
>  	"FREEZING",
>  	"FROZEN",
> +	"CHECKPOINTING",
>  };
>  
>  /*
> @@ -78,9 +164,9 @@ static const char *freezer_state_strs[] = {
>   * Transitions are caused by userspace writes to the freezer.state file.
>   * The values in parenthesis are state labels. The rest are edge labels.
>   *
> - * (THAWED) --FROZEN--> (FREEZING) --FROZEN--> (FROZEN)
> - *    ^ ^                    |                     |
> - *    | \_______THAWED_______/                     |
> + * (THAWED) --FROZEN--> (FREEZING) --FROZEN--> (FROZEN) --> (CHECKPOINTING)
> + *    ^ ^                    |                     | ^             |
> + *    | \_______THAWED_______/                     | \_____________/
>   *    \__________________________THAWED____________/
>   */
>  
> @@ -216,37 +302,6 @@ static void freezer_fork(struct cgroup_subsys *ss, struct task_struct *task)
>  	spin_unlock_irq(&freezer->lock);
>  }
>  
> -/*
> - * caller must hold freezer->lock
> - */
> -static void update_freezer_state(struct cgroup *cgroup,
> -				 struct freezer *freezer)
> -{
> -	struct cgroup_iter it;
> -	struct task_struct *task;
> -	unsigned int nfrozen = 0, ntotal = 0;
> -
> -	cgroup_iter_start(cgroup, &it);
> -	while ((task = cgroup_iter_next(cgroup, &it))) {
> -		ntotal++;
> -		if (is_task_frozen_enough(task))
> -			nfrozen++;
> -	}
> -
> -	/*
> -	 * Transition to FROZEN when no new tasks can be added ensures
> -	 * that we never exist in the FROZEN state while there are unfrozen
> -	 * tasks.
> -	 */
> -	if (nfrozen == ntotal)
> -		freezer->state = CGROUP_FROZEN;
> -	else if (nfrozen > 0)
> -		freezer->state = CGROUP_FREEZING;
> -	else
> -		freezer->state = CGROUP_THAWED;
> -	cgroup_iter_end(cgroup, &it);
> -}
> -
>  static int freezer_read(struct cgroup *cgroup, struct cftype *cft,
>  			struct seq_file *m)
>  {
> @@ -320,7 +375,10 @@ static int freezer_change_state(struct cgroup *cgroup,
>  	freezer = cgroup_freezer(cgroup);
>  
>  	spin_lock_irq(&freezer->lock);
> -
> +	if (freezer->state == CGROUP_CHECKPOINTING) {
> +		retval = -EBUSY;
> +		goto out;
> +	}
>  	update_freezer_state(cgroup, freezer);
>  	if (goal_state == freezer->state)
>  		goto out;
> 

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/4] cgroup freezer: Avoid lazy state changes when convenient
       [not found]       ` <20090603161046.GC7848-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-06-03 17:52         ` Matt Helsley
       [not found]           ` <20090603175242.GQ9285-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 33+ messages in thread
From: Matt Helsley @ 2009-06-03 17:52 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: Paul Menage, Containers

On Wed, Jun 03, 2009 at 11:10:46AM -0500, Serge E. Hallyn wrote:
> Quoting Matt Helsley (matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org):
> >  When all the tasks of a cgroup were successfully frozen we can avoid
> >  the lazy FREEZING -> FROZEN transition and move into FROZEN during the
> >  write to freezer.state.
> 
> Can you remind us then what the point of the FREEZING state is?
> It doesn't look to me like, after this patch, a cgroup will
> ever be FREEZING?

FREEZING is an intermediate state indicating that the cgroup is
partially frozen and, unless userspace retries, it will remain so. 

This can happen when a task in the cgroup is busy doing vfork. Only
when the vfork completes can the task be frozen. How long it
will take for the vfork to complete is defined by the vfork-ing task.
Rather than sleep, the cgroup freezer proceeds to try to freeze
each remaining task in the cgroup before returning while still in the 
FREEZING state. 

This gives userspace a chance to choose for itself whether to retry
(echo FROZEN > freezer.state) or to cancel (echo THAWED > freezer.state).
It also ensures that userspace can take advantage of forward progress
rather than having each task in the cgroup race to the next vfork and
thereby postpone freezing indefinitely.

Because the FREEZING to FROZEN transition only happens upon entrance
to the read or write paths one additional write or read of freezer.state
is necessary. This patch allows userspace to skip that step when all the
tasks in the cgroup have been successfully frozen (num_cant_freeze_now == 0).

Cheers,
	-Matt

> 
> > Signed-off-by: Matt Helsley <matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> > ---
> >  kernel/cgroup_freezer.c |    5 ++++-
> >  1 files changed, 4 insertions(+), 1 deletions(-)
> > 
> > diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
> > index 765e2c1..19702ac 100644
> > --- a/kernel/cgroup_freezer.c
> > +++ b/kernel/cgroup_freezer.c
> > @@ -286,7 +286,10 @@ static int try_to_freeze_cgroup(struct cgroup *cgroup, struct freezer *freezer)
> >  	}
> >  	cgroup_iter_end(cgroup, &it);
> > 
> > -	return num_cant_freeze_now ? -EBUSY : 0;
> > +	if (num_cant_freeze_now)
> > +		return -EBUSY;
> > +	freezer->state = CGROUP_FROZEN;
> > +	return 0;
> >  }
> > 
> >  static void unfreeze_cgroup(struct cgroup *cgroup, struct freezer *freezer)
> > -- 
> > 1.5.6.3
> > 
> > 
> > _______________________________________________
> > Containers mailing list
> > Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> > https://lists.linux-foundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/4] cgroup freezer: Avoid lazy state changes when convenient
       [not found]           ` <20090603175242.GQ9285-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-06-03 18:15             ` Serge E. Hallyn
       [not found]               ` <20090603181547.GA10141-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 33+ messages in thread
From: Serge E. Hallyn @ 2009-06-03 18:15 UTC (permalink / raw)
  To: Matt Helsley; +Cc: Containers, Paul Menage

Quoting Matt Helsley (matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org):
> On Wed, Jun 03, 2009 at 11:10:46AM -0500, Serge E. Hallyn wrote:
> > Quoting Matt Helsley (matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org):
> > >  When all the tasks of a cgroup were successfully frozen we can avoid
> > >  the lazy FREEZING -> FROZEN transition and move into FROZEN during the
> > >  write to freezer.state.
> > 
> > Can you remind us then what the point of the FREEZING state is?
> > It doesn't look to me like, after this patch, a cgroup will
> > ever be FREEZING?
> 
> FREEZING is an intermediate state indicating that the cgroup is
> partially frozen and, unless userspace retries, it will remain so. 

Oh, so basically a cgroup will be in CGROUP_FREEZING state only
while try_to_freeze_cgroup() is looping over the tasks now?

-serge

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/4] cgroup freezer: Avoid lazy state changes when convenient
       [not found]   ` <03412f8681d89c99ac575330381ab49f6e5e61ba.1244019829.git.matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  2009-06-03  9:31     ` [PATCH 3/4] cgroup freezer: Update stale locking comments Matt Helsley
  2009-06-03 16:10     ` [PATCH 2/4] cgroup freezer: Avoid lazy state changes when convenient Serge E. Hallyn
@ 2009-06-03 19:52     ` Serge E. Hallyn
  2 siblings, 0 replies; 33+ messages in thread
From: Serge E. Hallyn @ 2009-06-03 19:52 UTC (permalink / raw)
  To: Matt Helsley; +Cc: Containers, Paul Menage

Quoting Matt Helsley (matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org):
>  When all the tasks of a cgroup were successfully frozen we can avoid
>  the lazy FREEZING -> FROZEN transition and move into FROZEN during the
>  write to freezer.state.
> 
> Signed-off-by: Matt Helsley <matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

> ---
>  kernel/cgroup_freezer.c |    5 ++++-
>  1 files changed, 4 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
> index 765e2c1..19702ac 100644
> --- a/kernel/cgroup_freezer.c
> +++ b/kernel/cgroup_freezer.c
> @@ -286,7 +286,10 @@ static int try_to_freeze_cgroup(struct cgroup *cgroup, struct freezer *freezer)
>  	}
>  	cgroup_iter_end(cgroup, &it);
> 
> -	return num_cant_freeze_now ? -EBUSY : 0;
> +	if (num_cant_freeze_now)
> +		return -EBUSY;
> +	freezer->state = CGROUP_FROZEN;
> +	return 0;
>  }
> 
>  static void unfreeze_cgroup(struct cgroup *cgroup, struct freezer *freezer)
> -- 
> 1.5.6.3
> 
> 
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linux-foundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/4] cgroup freezer: Add CHECKPOINTING state to safeguard container checkpoint
       [not found]           ` <20090603161840.GD7848-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-06-04  0:01             ` Oren Laadan
       [not found]               ` <Pine.LNX.4.64.0906031957110.19096-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
  2009-06-04 10:44             ` [PATCH 4/4] cgroup freezer: Add CHECKPOINTING state to safeguard container checkpoint Matt Helsley
  1 sibling, 1 reply; 33+ messages in thread
From: Oren Laadan @ 2009-06-04  0:01 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: Paul Menage, Containers


On Wed, 3 Jun 2009, Serge E. Hallyn wrote:

> Quoting Matt Helsley (matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org):
> > The CHECKPOINTING state prevents userspace from unfreezing tasks until
> > sys_checkpoint() is finished. When doing container checkpoint userspace
> > will do:
> > 
> > 	echo FROZEN > /cgroups/my_container/freezer.state
> > 	...
> > 	rc = sys_checkpoint( <pid of container root> );
> > 
> > To ensure a consistent checkpoint image userspace should not be allowed
> > to thaw the cgroup (echo THAWED > /cgroups/my_container/freezer.state)
> > during checkpoint.
> > 
> > "CHECKPOINTING" can only be set on a "FROZEN" cgroup using the checkpoint
> > system call. Once in the "CHECKPOINTING" state, the cgroup may not leave until
> > the checkpoint system call is finished and ready to return. Then the
> > freezer state returns to "FROZEN". Writing any new state to freezer.state while
> > checkpointing will return EBUSY. These semantics ensure that userspace cannot
> > unfreeze the cgroup midway through the checkpoint system call.
> > 
> > The cgroup_freezer_begin_checkpoint() and cgroup_freezer_end_checkpoint()
> > make relatively few assumptions about the task that is passed in. However the
> > way they are called in do_checkpoint() assumes that the root of the container
> > is in the same freezer cgroup as all the other tasks that will be
> > checkpointed.
> > 
> > Signed-off-by: Matt Helsley <matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> > Cc: Paul Menage <menage-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> > Cc: Li Zefan <lizf-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
> > Cc: Cedric Le Goater <legoater-GANU6spQydw@public.gmane.org>
> > Cc: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
> > 
> > Notes:
> > 	Meant to work with Oren's checkpoint/restart v16-dev git tree.
> >         Still needs testing.
> >         As a side-effect this prevents the multiple tasks from entering the
> >                 CHECKPOINTING state simultaneously. All but one will get -EBUSY.
> > ---
> >  Documentation/cgroups/freezer-subsystem.txt |   10 ++
> >  checkpoint/checkpoint.c                     |    8 ++-
> >  include/linux/freezer.h                     |    8 ++
> >  kernel/cgroup_freezer.c                     |  128 +++++++++++++++++++-------
> >  4 files changed, 117 insertions(+), 37 deletions(-)
> > 
> > diff --git a/Documentation/cgroups/freezer-subsystem.txt b/Documentation/cgroups/freezer-subsystem.txt
> > index 41f37fe..92b68e6 100644
> > --- a/Documentation/cgroups/freezer-subsystem.txt
> > +++ b/Documentation/cgroups/freezer-subsystem.txt
> > @@ -100,3 +100,13 @@ things happens:
> >  		and returns EINVAL)
> >  	3) The tasks that blocked the cgroup from entering the "FROZEN"
> >  		state disappear from the cgroup's set of tasks.
> > +
> > +When the cgroup freezer is used to guard container checkpoint operations the
> > +freezer.state may be "CHECKPOINTING". "CHECKPOINTING" can only be set on a
> > +"FROZEN" cgroup using the checkpoint system call. Once in the "CHECKPOINTING"
> > +state, the cgroup may not leave until the checkpoint system call returns the
> > +freezer state to "FROZEN". Writing any new state to freezer.state while
> > +checkpointing will return EBUSY. These semantics ensure that userspace cannot
> > +unfreeze the cgroup midway through the checkpoint system call. Note that,
> > +unlike "FROZEN" and "FREEZING", there is no corresponding "CHECKPOINTED"
> > +state.
> > diff --git a/checkpoint/checkpoint.c b/checkpoint/checkpoint.c
> > index afc7300..d586a9b 100644
> > --- a/checkpoint/checkpoint.c
> > +++ b/checkpoint/checkpoint.c
> > @@ -569,7 +569,10 @@ int do_checkpoint(struct ckpt_ctx *ctx, pid_t pid)
> > 
> >  	ret = init_checkpoint_ctx(ctx, pid);
> >  	if (ret < 0)
> > -		goto out;
> > +		return ret;
> > +	ret = cgroup_freezer_begin_checkpoint(ctx->root_task);
> > +	if (ret < 0)
> > +		return ret;
> >  	ret = build_tree(ctx);
> >  	if (ret < 0)
> >  		goto out;
> > @@ -597,6 +600,7 @@ int do_checkpoint(struct ckpt_ctx *ctx, pid_t pid)
> >  	/* on success, return (unique) checkpoint identifier */
> >  	ctx->crid = atomic_inc_return(&ctx_count);
> >  	ret = ctx->crid;
> > - out:
> > +out:
> > +	cgroup_freezer_end_checkpoint(ctx->root_task);
> >  	return ret;
> >  }
> 
> We have no guarantees that all tasks under ctx->root_task are in
> the same freezer cgroup.  This presents a problem if we want to
> make this guarantee which you're trying to make.
> 
> First, you need to update the may_checkpoint() check to check that
> the tasks are CHECKPOINTING, not FROZEN.
> 
> But in addition, note that (a) there may be two parallel checkpoints of
> separate tasksets (C and D), and (b) userspace may have mucked up
> and put some tasks from set C into set D's cgroup.  That means that it's
> possible that all of the may_checkpoint() tasks pass, but if D's
> checkpoint finishes before C's, then some tasks will be unfrozen
> before their checkpoint is complete.
> 
> So I guess now you need to enforce at may_checkpoint() that all tasks
> being checkpointed are in ctx->root_task's freezer cgroup?

Yes, so it works like this:
1) get_container() now also takes a reference to the freezer css of
the root task
2) do_checkpoint() changes the state to CGROUP_CHECKPOINTING
3) may_checkpoint_task() verifies that all tasks (including root task -
in case it changes groups in the meanwhile) belong to that group, then
takes checkpoint
4) checkpoint done, change back to CGROUP_FROZEN
5) release reference to freezer cgroup

Note that we don't need to test tasks FROZEN state anymore, because if
it belongs to the same cgroup, it must be.

The next two patches are intended to replace Matt's patch 4/4.

Oren.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/4] cgroup freezer: --- replacement patch 4/4 (a)
       [not found]               ` <Pine.LNX.4.64.0906031957110.19096-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
@ 2009-06-04  0:10                 ` Oren Laadan
       [not found]                   ` <Pine.LNX.4.64.0906032008490.19096-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
  2009-06-04  0:12                 ` [PATCH 4/4] cgroup freezer: --- replacement patch 4/4 (b) Oren Laadan
  1 sibling, 1 reply; 33+ messages in thread
From: Oren Laadan @ 2009-06-04  0:10 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: Paul Menage, Containers


From 6cdb7d9504a19ad88e6da8ad85374267c7acb1b2 Mon Sep 17 00:00:00 2001
From: Matt Helsley <matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Date: Wed, 3 Jun 2009 02:31:21 -0700
Subject: [PATCH] cgroup freezer: Add CHECKPOINTING state to safeguard container checkpoint

The CHECKPOINTING state prevents userspace from unfreezing tasks until
sys_checkpoint() is finished. When doing container checkpoint userspace
will do:

	echo FROZEN > /cgroups/my_container/freezer.state
	...
	rc = sys_checkpoint( <pid of container root> );

To ensure a consistent checkpoint image userspace should not be allowed
to thaw the cgroup (echo THAWED > /cgroups/my_container/freezer.state)
during checkpoint.

"CHECKPOINTING" can only be set on a "FROZEN" cgroup using the checkpoint
system call. Once in the "CHECKPOINTING" state, the cgroup may not leave until
the checkpoint system call is finished and ready to return. Then the
freezer state returns to "FROZEN". Writing any new state to freezer.state while
checkpointing will return EBUSY. These semantics ensure that userspace cannot
unfreeze the cgroup midway through the checkpoint system call.

The cgroup_freezer_begin_checkpoint() and cgroup_freezer_end_checkpoint()
make relatively few assumptions about the task that is passed in. However the
way they are called in do_checkpoint() assumes that the root of the container
is in the same freezer cgroup as all the other tasks that will be
checkpointed.

Matt Helsley's wrote the original patch.

Changlog:
  [2009-Jun-03] change cgroup_freezer_{begin,end}_checkpoint() to take a
  		struct cgroup_subsys_state pointer
		add struct cgroup_subsys_state *get_task_cgroup_freezer()

Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
Cc: Matt Helsley <matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Cc: Paul Menage <menage-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Cc: Li Zefan <lizf-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
Cc: Cedric Le Goater <legoater-GANU6spQydw@public.gmane.org>

Notes:
        As a side-effect this prevents the multiple tasks from entering the
                CHECKPOINTING state simultaneously. All but one will get -EBUSY.
---
 Documentation/cgroups/freezer-subsystem.txt |   10 ++
 include/linux/freezer.h                     |   10 ++
 kernel/cgroup_freezer.c                     |  149 +++++++++++++++++++--------
 3 files changed, 127 insertions(+), 42 deletions(-)

diff --git a/Documentation/cgroups/freezer-subsystem.txt b/Documentation/cgroups/freezer-subsystem.txt
index 41f37fe..92b68e6 100644
--- a/Documentation/cgroups/freezer-subsystem.txt
+++ b/Documentation/cgroups/freezer-subsystem.txt
@@ -100,3 +100,13 @@ things happens:
 		and returns EINVAL)
 	3) The tasks that blocked the cgroup from entering the "FROZEN"
 		state disappear from the cgroup's set of tasks.
+
+When the cgroup freezer is used to guard container checkpoint operations the
+freezer.state may be "CHECKPOINTING". "CHECKPOINTING" can only be set on a
+"FROZEN" cgroup using the checkpoint system call. Once in the "CHECKPOINTING"
+state, the cgroup may not leave until the checkpoint system call returns the
+freezer state to "FROZEN". Writing any new state to freezer.state while
+checkpointing will return EBUSY. These semantics ensure that userspace cannot
+unfreeze the cgroup midway through the checkpoint system call. Note that,
+unlike "FROZEN" and "FREEZING", there is no corresponding "CHECKPOINTED"
+state.
diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index da7e52b..1402911 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -64,12 +64,22 @@ extern bool freeze_task(struct task_struct *p, bool sig_only);
 extern void cancel_freezing(struct task_struct *p);
 
 #ifdef CONFIG_CGROUP_FREEZER
+struct cgroup_subsys_state;
 extern int cgroup_freezing_or_frozen(struct task_struct *task);
+extern struct cgroup_subsys_state *get_task_cgroup_freezer(struct task_struct *task);
+extern int cgroup_freezer_begin_checkpoint(struct cgroup_subsys_state *css);
+extern void cgroup_freezer_end_checkpoint(struct cgroup_subsys_state *css);
 #else /* !CONFIG_CGROUP_FREEZER */
 static inline int cgroup_freezing_or_frozen(struct task_struct *task)
 {
 	return 0;
 }
+static inline int cgroup_freezer_begin_checkpoint(struct task_struct *task)
+{
+	return -ENOTSUPP;
+}
+static inline void cgroup_freezer_end_checkpoint(struct task_struct *task)
+{}
 #endif /* !CONFIG_CGROUP_FREEZER */
 
 /*
diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index 05795b7..4790fb9 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -25,6 +25,7 @@ enum freezer_state {
 	CGROUP_THAWED = 0,
 	CGROUP_FREEZING,
 	CGROUP_FROZEN,
+	CGROUP_CHECKPOINTING,
 };
 
 struct freezer {
@@ -47,6 +48,18 @@ static inline struct freezer *task_freezer(struct task_struct *task)
 			    struct freezer, css);
 }
 
+struct cgroup_subsys_state *get_task_cgroup_freezer(struct task_struct *task)
+{
+	struct cgroup_subsys_state *css;
+
+	task_lock(task);
+	css = task_subsys_state(task, freezer_subsys_id);
+	css_get(css); /* make sure freezer doesn't go away */
+	task_unlock(task);
+
+	return css;
+}
+
 int cgroup_freezing_or_frozen(struct task_struct *task)
 {
 	struct freezer *freezer;
@@ -63,6 +76,92 @@ int cgroup_freezing_or_frozen(struct task_struct *task)
 	return (state == CGROUP_FREEZING) || (state == CGROUP_FROZEN);
 }
 
+/* Task is frozen or will freeze immediately when next it gets woken */
+static bool is_task_frozen_enough(struct task_struct *task)
+{
+	return frozen(task) ||
+		(task_is_stopped_or_traced(task) && freezing(task));
+}
+
+/*
+ * caller must hold freezer->lock
+ */
+static void update_freezer_state(struct cgroup *cgroup,
+				 struct freezer *freezer)
+{
+	struct cgroup_iter it;
+	struct task_struct *task;
+	unsigned int nfrozen = 0, ntotal = 0;
+
+	cgroup_iter_start(cgroup, &it);
+	while ((task = cgroup_iter_next(cgroup, &it))) {
+		ntotal++;
+		if (is_task_frozen_enough(task))
+			nfrozen++;
+	}
+
+	/*
+	 * Transition to FROZEN when no new tasks can be added ensures
+	 * that we never exist in the FROZEN state while there are unfrozen
+	 * tasks.
+	 */
+	if (nfrozen == ntotal)
+		freezer->state = CGROUP_FROZEN;
+	else if (nfrozen > 0)
+		freezer->state = CGROUP_FREEZING;
+	else
+		freezer->state = CGROUP_THAWED;
+	cgroup_iter_end(cgroup, &it);
+}
+
+/*
+ * cgroup freezer state changes made without the aid of the cgroup filesystem
+ * must go through this function to ensure proper locking is observed.
+ */
+static int freezer_checkpointing(struct cgroup_subsys_state *css,
+				 enum freezer_state next_state)
+{
+	struct freezer *freezer;
+	enum freezer_state state;
+
+	freezer = container_of(css, struct freezer, css);
+
+	if (freezer->state == CGROUP_FREEZING) {
+		/* May be in middle of a lazy FREEZING -> FROZEN transition */
+		if (cgroup_lock_live_group(css->cgroup)) {
+			spin_lock_irq(&freezer->lock);
+			update_freezer_state(css->cgroup, freezer);
+			spin_unlock_irq(&freezer->lock);
+			cgroup_unlock();
+		}
+	}
+
+	spin_lock_irq(&freezer->lock);
+	state = freezer->state;
+	if ((state == CGROUP_FROZEN && next_state == CGROUP_CHECKPOINTING) ||
+	    (state == CGROUP_CHECKPOINTING && next_state == CGROUP_FROZEN))
+		freezer->state = next_state;
+	spin_unlock_irq(&freezer->lock);
+	css_put(css);
+	return state;
+}
+
+int cgroup_freezer_begin_checkpoint(struct cgroup_subsys_state *css)
+{
+	if (freezer_checkpointing(css, CGROUP_CHECKPOINTING) != CGROUP_FROZEN)
+		return -EBUSY;
+	return 0;
+}
+
+void cgroup_freezer_end_checkpoint(struct cgroup_subsys_state *css)
+{
+	/*
+	 * If we weren't in CHECKPOINTING state then userspace could have
+	 * unfrozen a task and given us an inconsistent checkpoint image
+	 */
+	WARN_ON(freezer_checkpointing(css, CGROUP_FROZEN) != CGROUP_CHECKPOINTING);
+}
+
 /*
  * cgroups_write_string() limits the size of freezer state strings to
  * CGROUP_LOCAL_BUFFER_SIZE
@@ -71,6 +170,7 @@ static const char *freezer_state_strs[] = {
 	"THAWED",
 	"FREEZING",
 	"FROZEN",
+	"CHECKPOINTING",
 };
 
 /*
@@ -78,9 +178,9 @@ static const char *freezer_state_strs[] = {
  * Transitions are caused by userspace writes to the freezer.state file.
  * The values in parenthesis are state labels. The rest are edge labels.
  *
- * (THAWED) --FROZEN--> (FREEZING) --FROZEN--> (FROZEN)
- *    ^ ^                    |                     |
- *    | \_______THAWED_______/                     |
+ * (THAWED) --FROZEN--> (FREEZING) --FROZEN--> (FROZEN) --> (CHECKPOINTING)
+ *    ^ ^                    |                     | ^             |
+ *    | \_______THAWED_______/                     | \_____________/
  *    \__________________________THAWED____________/
  */
 
@@ -153,13 +253,6 @@ static void freezer_destroy(struct cgroup_subsys *ss,
 	kfree(cgroup_freezer(cgroup));
 }
 
-/* Task is frozen or will freeze immediately when next it gets woken */
-static bool is_task_frozen_enough(struct task_struct *task)
-{
-	return frozen(task) ||
-		(task_is_stopped_or_traced(task) && freezing(task));
-}
-
 /*
  * The call to cgroup_lock() in the freezer.state write method prevents
  * a write to that file racing against an attach, and hence the
@@ -216,37 +309,6 @@ static void freezer_fork(struct cgroup_subsys *ss, struct task_struct *task)
 	spin_unlock_irq(&freezer->lock);
 }
 
-/*
- * caller must hold freezer->lock
- */
-static void update_freezer_state(struct cgroup *cgroup,
-				 struct freezer *freezer)
-{
-	struct cgroup_iter it;
-	struct task_struct *task;
-	unsigned int nfrozen = 0, ntotal = 0;
-
-	cgroup_iter_start(cgroup, &it);
-	while ((task = cgroup_iter_next(cgroup, &it))) {
-		ntotal++;
-		if (is_task_frozen_enough(task))
-			nfrozen++;
-	}
-
-	/*
-	 * Transition to FROZEN when no new tasks can be added ensures
-	 * that we never exist in the FROZEN state while there are unfrozen
-	 * tasks.
-	 */
-	if (nfrozen == ntotal)
-		freezer->state = CGROUP_FROZEN;
-	else if (nfrozen > 0)
-		freezer->state = CGROUP_FREEZING;
-	else
-		freezer->state = CGROUP_THAWED;
-	cgroup_iter_end(cgroup, &it);
-}
-
 static int freezer_read(struct cgroup *cgroup, struct cftype *cft,
 			struct seq_file *m)
 {
@@ -320,7 +382,10 @@ static int freezer_change_state(struct cgroup *cgroup,
 	freezer = cgroup_freezer(cgroup);
 
 	spin_lock_irq(&freezer->lock);
-
+	if (freezer->state == CGROUP_CHECKPOINTING) {
+		retval = -EBUSY;
+		goto out;
+	}
 	update_freezer_state(cgroup, freezer);
 	if (goal_state == freezer->state)
 		goto out;
-- 
1.6.0.4

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/4] cgroup freezer: --- replacement patch 4/4 (b)
       [not found]               ` <Pine.LNX.4.64.0906031957110.19096-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
  2009-06-04  0:10                 ` [PATCH 4/4] cgroup freezer: --- replacement patch 4/4 (a) Oren Laadan
@ 2009-06-04  0:12                 ` Oren Laadan
       [not found]                   ` <Pine.LNX.4.64.0906032011470.19096-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
  1 sibling, 1 reply; 33+ messages in thread
From: Oren Laadan @ 2009-06-04  0:12 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: Paul Menage, Containers

From 3c24531980764a71705492c2dfc2cc99366784f3 Mon Sep 17 00:00:00 2001
From: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
Date: Wed, 3 Jun 2009 19:31:21 -0400
Subject: [PATCH] c/r: use CHECKPOINTING state for hierarchy's cgroup freezer

Set state of freezer cgroup of checkpointed task hierarchy to
"CHECKPOINTING" during a checkpoint, to ensure that tasks cannot
be thawed while at it.

get_container() grabs a reference to the root task's freezer cgroup.
Then in may_checkpoint_task() each it verifies that all tasks belong
to the same freezer group.

In particular, the root task is also tested, such that if the root
tasks changes its freezer cgroups before it moves to "CHECKPOINTING",
it will be notived and an error returned.

CONFIG_CHECKPOINT now depends on CONFIG_CGROUP_FREEZER

Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
---
 checkpoint/Kconfig               |    1 +
 checkpoint/checkpoint.c          |   33 ++++++++++++++++++++++++++-------
 checkpoint/sys.c                 |    3 +++
 include/linux/checkpoint_types.h |    1 +
 4 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/checkpoint/Kconfig b/checkpoint/Kconfig
index 53ed6fa..a884674 100644
--- a/checkpoint/Kconfig
+++ b/checkpoint/Kconfig
@@ -9,6 +9,7 @@ config DEFERQUEUE
 config CHECKPOINT
 	bool "Enable checkpoint/restart (EXPERIMENTAL)"
 	depends on CHECKPOINT_SUPPORT && EXPERIMENTAL
+	depends on CGROUP_FREEZER
 	select DEFERQUEUE
 	help
 	  Application checkpoint/restart is the ability to save the
diff --git a/checkpoint/checkpoint.c b/checkpoint/checkpoint.c
index 09af8b7..773657a 100644
--- a/checkpoint/checkpoint.c
+++ b/checkpoint/checkpoint.c
@@ -13,6 +13,7 @@
 
 #include <linux/version.h>
 #include <linux/sched.h>
+#include <linux/cgroup.h>
 #include <linux/freezer.h>
 #include <linux/ptrace.h>
 #include <linux/time.h>
@@ -267,6 +268,7 @@ static int checkpoint_all_tasks(struct ckpt_ctx *ctx)
 static int may_checkpoint_task(struct ckpt_ctx *ctx, struct task_struct *t)
 {
 	struct task_struct *root = ctx->root_task;
+	struct cgroup_subsys_state *css;
 	struct nsproxy *nsproxy;
 	int ret = 0;
 
@@ -283,11 +285,16 @@ static int may_checkpoint_task(struct ckpt_ctx *ctx, struct task_struct *t)
 		return -EPERM;
 	}
 
-	/* verify that the task is frozen (unless self) */
-	if (t != current && !frozen(t)) {
-		__ckpt_write_err(ctx, "task %d (%s) is not frozen",
-				 task_pid_vnr(t), t->comm);
-		return -EBUSY;
+	/* verify that task belongs to same freezer cgroup as root */
+	if (t != current) {
+		css = get_task_cgroup_freezer(t);
+		if (css)
+			css_put(css);
+		if (css != ctx->root_freezer) {
+			__ckpt_write_err(ctx, "task %d (%s) not frozen (or wrong cgroup)",
+					 task_pid_vnr(t), t->comm);
+			return -EBUSY;
+		}
 	}
 
 	/* FIX: add support for ptraced tasks */
@@ -543,6 +550,14 @@ static int get_container(struct ckpt_ctx *ctx, pid_t pid)
 	ctx->root_nsproxy = nsproxy;
 	ctx->root_init = is_container_init(task);
 
+	/*
+	 * If task changes cgroup later, we detect when comparing
+	 * against @ctx->root_freezer in may_checkpoint_task().
+	 */
+	ctx->root_freezer = get_task_cgroup_freezer(task);
+	if (!ctx->root_freezer)
+		return -EAGAIN;
+
 	if (!(ctx->uflags & CHECKPOINT_SUBTREE) && !ctx->root_init)
 		return -EINVAL;  /* cleanup by ckpt_ctx_free() */
 
@@ -583,7 +598,10 @@ int do_checkpoint(struct ckpt_ctx *ctx, pid_t pid)
 
 	ret = init_checkpoint_ctx(ctx, pid);
 	if (ret < 0)
-		goto out;
+		return ret;
+	ret = cgroup_freezer_begin_checkpoint(ctx->root_freezer);
+	if (ret < 0)
+		return ret;
 	ret = build_tree(ctx);
 	if (ret < 0)
 		goto out;
@@ -626,6 +644,7 @@ int do_checkpoint(struct ckpt_ctx *ctx, pid_t pid)
 	/* on success, return (unique) checkpoint identifier */
 	ctx->crid = atomic_inc_return(&ctx_count);
 	ret = ctx->crid;
- out:
+out:
+	cgroup_freezer_end_checkpoint(ctx->root_freezer);
 	return ret;
 }
diff --git a/checkpoint/sys.c b/checkpoint/sys.c
index 5dba2f9..5cc6e65 100644
--- a/checkpoint/sys.c
+++ b/checkpoint/sys.c
@@ -14,6 +14,7 @@
 #include <linux/sched.h>
 #include <linux/nsproxy.h>
 #include <linux/kernel.h>
+#include <linux/cgroup.h>
 #include <linux/syscalls.h>
 #include <linux/fs.h>
 #include <linux/file.h>
@@ -214,6 +215,8 @@ static void ckpt_ctx_free(struct ckpt_ctx *ctx)
 		put_nsproxy(ctx->root_nsproxy);
 	if (ctx->root_task)
 		put_task_struct(ctx->root_task);
+	if (ctx->root_freezer)
+		css_put(ctx->root_freezer);
 
 	kfree(ctx->pids_arr);
 
diff --git a/include/linux/checkpoint_types.h b/include/linux/checkpoint_types.h
index 3c9c39d..01f6c6f 100644
--- a/include/linux/checkpoint_types.h
+++ b/include/linux/checkpoint_types.h
@@ -33,6 +33,7 @@ struct ckpt_ctx {
 	pid_t root_pid;		/* (container) root identifier */
 	struct task_struct *root_task;	/* (container) root task */
 	struct nsproxy *root_nsproxy;	/* (container) root nsproxy */
+	struct cgroup_subsys_state *root_freezer;  /* (container) freezer cgroup */
 
 	unsigned long kflags;	/* kerenl flags */
 	unsigned long uflags;	/* user flags */
-- 
1.6.0.4

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/4] cgroup freezer: Fix buggy resume test for tasks frozen with cgroup freezer
       [not found]       ` <20090603160551.GB7848-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-06-04  0:55         ` Li Zefan
  2009-06-11 21:56           ` Matt Helsley
       [not found]           ` <4A271B75.7090607-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
  0 siblings, 2 replies; 33+ messages in thread
From: Li Zefan @ 2009-06-04  0:55 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Containers, Rafael J. Wysocki, Pavel Machek, Paul Menage,
	linux-pm-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Serge E. Hallyn wrote:
> Quoting Matt Helsley (matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org):
>> -int cgroup_frozen(struct task_struct *task)
>> +int cgroup_freezing_or_frozen(struct task_struct *task)
>>  {
>>  	struct freezer *freezer;
>>  	enum freezer_state state;
>>
>>  	task_lock(task);
>>  	freezer = task_freezer(task);
>> -	state = freezer->state;
>> +	if (!freezer->css.cgroup->parent)
>> +		state = CGROUP_THAWED; /* root cgroup can't be frozen */
>> +	else
>> +		state = freezer->state;
> 
> Why do you have to add this special-casing now?  I thought
> we were already preventing freezing the root cgroup, so
> freezer->state should be sane in that case anyway?
> 

Yeah, IIRC top_freezer->state is alwasy CGROUP_THAWED, so this
check is redundant.

>>  	task_unlock(task);
>>
>> -	return state == CGROUP_FROZEN;
>> +	return (state == CGROUP_FREEZING) || (state == CGROUP_FROZEN);
>>  }

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/4] cgroup freezer: Fix buggy resume test for tasks frozen with cgroup freezer
  2009-06-03 16:05     ` [PATCH 1/4] cgroup freezer: Fix buggy resume test for tasks frozen with cgroup freezer Serge E. Hallyn
@ 2009-06-04  0:55       ` Li Zefan
       [not found]       ` <20090603160551.GB7848-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  1 sibling, 0 replies; 33+ messages in thread
From: Li Zefan @ 2009-06-04  0:55 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: Containers, Pavel Machek, Paul Menage, linux-pm

Serge E. Hallyn wrote:
> Quoting Matt Helsley (matthltc@us.ibm.com):
>> -int cgroup_frozen(struct task_struct *task)
>> +int cgroup_freezing_or_frozen(struct task_struct *task)
>>  {
>>  	struct freezer *freezer;
>>  	enum freezer_state state;
>>
>>  	task_lock(task);
>>  	freezer = task_freezer(task);
>> -	state = freezer->state;
>> +	if (!freezer->css.cgroup->parent)
>> +		state = CGROUP_THAWED; /* root cgroup can't be frozen */
>> +	else
>> +		state = freezer->state;
> 
> Why do you have to add this special-casing now?  I thought
> we were already preventing freezing the root cgroup, so
> freezer->state should be sane in that case anyway?
> 

Yeah, IIRC top_freezer->state is alwasy CGROUP_THAWED, so this
check is redundant.

>>  	task_unlock(task);
>>
>> -	return state == CGROUP_FROZEN;
>> +	return (state == CGROUP_FREEZING) || (state == CGROUP_FROZEN);
>>  }

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/4] cgroup freezer: Avoid lazy state changes when convenient
       [not found]               ` <20090603181547.GA10141-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-06-04  3:18                 ` Matt Helsley
       [not found]                   ` <20090604031842.GR9285-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 33+ messages in thread
From: Matt Helsley @ 2009-06-04  3:18 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: Paul Menage, Containers

On Wed, Jun 03, 2009 at 01:15:47PM -0500, Serge E. Hallyn wrote:
> Quoting Matt Helsley (matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org):
> > On Wed, Jun 03, 2009 at 11:10:46AM -0500, Serge E. Hallyn wrote:
> > > Quoting Matt Helsley (matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org):
> > > >  When all the tasks of a cgroup were successfully frozen we can avoid
> > > >  the lazy FREEZING -> FROZEN transition and move into FROZEN during the
> > > >  write to freezer.state.
> > > 
> > > Can you remind us then what the point of the FREEZING state is?
> > > It doesn't look to me like, after this patch, a cgroup will
> > > ever be FREEZING?
> > 
> > FREEZING is an intermediate state indicating that the cgroup is
> > partially frozen and, unless userspace retries, it will remain so. 
> 
> Oh, so basically a cgroup will be in CGROUP_FREEZING state only
> while try_to_freeze_cgroup() is looping over the tasks now?

Not quite. freeze_task() can fail because of vfork. If it fails we do
some additional checks to make sure it'll eventually be freezable. If so
then we know we missed one. That's when it stays in the FREEZING
state until the next time userspace writes FROZEN or THAWED to freezer.state.

Cheers,
	-Matt

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/4] cgroup freezer: --- replacement patch 4/4 (a)
       [not found]                   ` <Pine.LNX.4.64.0906032008490.19096-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
@ 2009-06-04 10:34                     ` Matt Helsley
       [not found]                       ` <20090604103418.GW9285-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  2009-06-04 14:32                     ` Serge E. Hallyn
  1 sibling, 1 reply; 33+ messages in thread
From: Matt Helsley @ 2009-06-04 10:34 UTC (permalink / raw)
  To: Oren Laadan; +Cc: Paul Menage, Containers

On Wed, Jun 03, 2009 at 08:10:24PM -0400, Oren Laadan wrote:
> 
> From 6cdb7d9504a19ad88e6da8ad85374267c7acb1b2 Mon Sep 17 00:00:00 2001
> From: Matt Helsley <matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> Date: Wed, 3 Jun 2009 02:31:21 -0700
> Subject: [PATCH] cgroup freezer: Add CHECKPOINTING state to safeguard container checkpoint
> 
> The CHECKPOINTING state prevents userspace from unfreezing tasks until
> sys_checkpoint() is finished. When doing container checkpoint userspace
> will do:
> 
> 	echo FROZEN > /cgroups/my_container/freezer.state
> 	...
> 	rc = sys_checkpoint( <pid of container root> );
> 
> To ensure a consistent checkpoint image userspace should not be allowed
> to thaw the cgroup (echo THAWED > /cgroups/my_container/freezer.state)
> during checkpoint.
> 
> "CHECKPOINTING" can only be set on a "FROZEN" cgroup using the checkpoint
> system call. Once in the "CHECKPOINTING" state, the cgroup may not leave until
> the checkpoint system call is finished and ready to return. Then the
> freezer state returns to "FROZEN". Writing any new state to freezer.state while
> checkpointing will return EBUSY. These semantics ensure that userspace cannot
> unfreeze the cgroup midway through the checkpoint system call.
> 
> The cgroup_freezer_begin_checkpoint() and cgroup_freezer_end_checkpoint()
> make relatively few assumptions about the task that is passed in. However the
> way they are called in do_checkpoint() assumes that the root of the container
> is in the same freezer cgroup as all the other tasks that will be
> checkpointed.
> 
> Matt Helsley's wrote the original patch.
> 
> Changlog:
>   [2009-Jun-03] change cgroup_freezer_{begin,end}_checkpoint() to take a
>   		struct cgroup_subsys_state pointer
> 		add struct cgroup_subsys_state *get_task_cgroup_freezer()
> 
> Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
> Cc: Matt Helsley <matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> Cc: Paul Menage <menage-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> Cc: Li Zefan <lizf-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
> Cc: Cedric Le Goater <legoater-GANU6spQydw@public.gmane.org>
> 
> Notes:
>         As a side-effect this prevents the multiple tasks from entering the
>                 CHECKPOINTING state simultaneously. All but one will get -EBUSY.
> ---
>  Documentation/cgroups/freezer-subsystem.txt |   10 ++
>  include/linux/freezer.h                     |   10 ++
>  kernel/cgroup_freezer.c                     |  149 +++++++++++++++++++--------
>  3 files changed, 127 insertions(+), 42 deletions(-)
> 
> diff --git a/Documentation/cgroups/freezer-subsystem.txt b/Documentation/cgroups/freezer-subsystem.txt
> index 41f37fe..92b68e6 100644
> --- a/Documentation/cgroups/freezer-subsystem.txt
> +++ b/Documentation/cgroups/freezer-subsystem.txt
> @@ -100,3 +100,13 @@ things happens:
>  		and returns EINVAL)
>  	3) The tasks that blocked the cgroup from entering the "FROZEN"
>  		state disappear from the cgroup's set of tasks.
> +
> +When the cgroup freezer is used to guard container checkpoint operations the
> +freezer.state may be "CHECKPOINTING". "CHECKPOINTING" can only be set on a
> +"FROZEN" cgroup using the checkpoint system call. Once in the "CHECKPOINTING"
> +state, the cgroup may not leave until the checkpoint system call returns the
> +freezer state to "FROZEN". Writing any new state to freezer.state while
> +checkpointing will return EBUSY. These semantics ensure that userspace cannot
> +unfreeze the cgroup midway through the checkpoint system call. Note that,
> +unlike "FROZEN" and "FREEZING", there is no corresponding "CHECKPOINTED"
> +state.
> diff --git a/include/linux/freezer.h b/include/linux/freezer.h
> index da7e52b..1402911 100644
> --- a/include/linux/freezer.h
> +++ b/include/linux/freezer.h
> @@ -64,12 +64,22 @@ extern bool freeze_task(struct task_struct *p, bool sig_only);
>  extern void cancel_freezing(struct task_struct *p);
> 
>  #ifdef CONFIG_CGROUP_FREEZER
> +struct cgroup_subsys_state;
>  extern int cgroup_freezing_or_frozen(struct task_struct *task);
> +extern struct cgroup_subsys_state *get_task_cgroup_freezer(struct task_struct *task);
> +extern int cgroup_freezer_begin_checkpoint(struct cgroup_subsys_state *css);
> +extern void cgroup_freezer_end_checkpoint(struct cgroup_subsys_state *css);
>  #else /* !CONFIG_CGROUP_FREEZER */
>  static inline int cgroup_freezing_or_frozen(struct task_struct *task)
>  {
>  	return 0;
>  }
> +static inline int cgroup_freezer_begin_checkpoint(struct task_struct *task)
> +{
> +	return -ENOTSUPP;
> +}
> +static inline void cgroup_freezer_end_checkpoint(struct task_struct *task)
> +{}

With CONFIG_CHECKPOINT depending on CONFIG_FREEZER I don't see why these
empty definitions are needed. Maybe it's just too late in the evening...

>  #endif /* !CONFIG_CGROUP_FREEZER */
> 
>  /*
> diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
> index 05795b7..4790fb9 100644
> --- a/kernel/cgroup_freezer.c
> +++ b/kernel/cgroup_freezer.c
> @@ -25,6 +25,7 @@ enum freezer_state {
>  	CGROUP_THAWED = 0,
>  	CGROUP_FREEZING,
>  	CGROUP_FROZEN,
> +	CGROUP_CHECKPOINTING,
>  };
> 
>  struct freezer {
> @@ -47,6 +48,18 @@ static inline struct freezer *task_freezer(struct task_struct *task)
>  			    struct freezer, css);
>  }
> 
> +struct cgroup_subsys_state *get_task_cgroup_freezer(struct task_struct *task)
> +{
> +	struct cgroup_subsys_state *css;
> +
> +	task_lock(task);
> +	css = task_subsys_state(task, freezer_subsys_id);
> +	css_get(css); /* make sure freezer doesn't go away */
> +	task_unlock(task);
> +
> +	return css;
> +}

Seems like there should be a better way to do this than grabbing
this reference. I'd prefer to just introduce:

int in_same_cgroup_freezer(struct task_struct *task1, 
				struct task_struct *task2);

In the external checkpoint case the root task is frozen and hence cannot
change cgroups.

Regardless of that reference, I think the interaction between
sys_checkpoint() and the cgroup freezer as we have it right now is broken
in the self-checkpoint case. It doesn't work because the freezer allows a
task to freeze itself. So if it does simply:

echo FROZEN > /my/cgroup/freezer.state

then it will never reach sys_checkpoint. If it moves itself to a different
cgroup to avoid this then we need a new way to determine which cgroup to
move from FROZEN to CHECKPOINTING.

Alternatively, we can populate each group with a new freezer file besides
freezer.state. Depending on what seems best it could record which task(s)
to not freeze (ugh). Or it could be a flags file indicating that if the
task writing FROZEN to freezer.state belongs to this cgroup then it
shouldn't be frozen.

I think removing the ability to freeze oneself may not be a desirable
change to other users of the freezer interface. It also wouldn't be
backward-compatible.

Cheers,
	-Matt Helsley

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/4] cgroup freezer: Add CHECKPOINTING state to safeguard container checkpoint
       [not found]           ` <20090603161840.GD7848-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  2009-06-04  0:01             ` Oren Laadan
@ 2009-06-04 10:44             ` Matt Helsley
  1 sibling, 0 replies; 33+ messages in thread
From: Matt Helsley @ 2009-06-04 10:44 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: Paul Menage, Containers

On Wed, Jun 03, 2009 at 11:18:40AM -0500, Serge E. Hallyn wrote:
> Quoting Matt Helsley (matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org):
> > The CHECKPOINTING state prevents userspace from unfreezing tasks until
> > sys_checkpoint() is finished. When doing container checkpoint userspace
> > will do:
> > 
> > 	echo FROZEN > /cgroups/my_container/freezer.state
> > 	...
> > 	rc = sys_checkpoint( <pid of container root> );
> > 
> > To ensure a consistent checkpoint image userspace should not be allowed
> > to thaw the cgroup (echo THAWED > /cgroups/my_container/freezer.state)
> > during checkpoint.
> > 
> > "CHECKPOINTING" can only be set on a "FROZEN" cgroup using the checkpoint
> > system call. Once in the "CHECKPOINTING" state, the cgroup may not leave until
> > the checkpoint system call is finished and ready to return. Then the
> > freezer state returns to "FROZEN". Writing any new state to freezer.state while
> > checkpointing will return EBUSY. These semantics ensure that userspace cannot
> > unfreeze the cgroup midway through the checkpoint system call.
> > 
> > The cgroup_freezer_begin_checkpoint() and cgroup_freezer_end_checkpoint()
> > make relatively few assumptions about the task that is passed in. However the
> > way they are called in do_checkpoint() assumes that the root of the container
> > is in the same freezer cgroup as all the other tasks that will be
> > checkpointed.
> > 
> > Signed-off-by: Matt Helsley <matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> > Cc: Paul Menage <menage-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> > Cc: Li Zefan <lizf-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
> > Cc: Cedric Le Goater <legoater-GANU6spQydw@public.gmane.org>
> > Cc: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
> > 
> > Notes:
> > 	Meant to work with Oren's checkpoint/restart v16-dev git tree.
> >         Still needs testing.
> >         As a side-effect this prevents the multiple tasks from entering the
> >                 CHECKPOINTING state simultaneously. All but one will get -EBUSY.
> > ---
> >  Documentation/cgroups/freezer-subsystem.txt |   10 ++
> >  checkpoint/checkpoint.c                     |    8 ++-
> >  include/linux/freezer.h                     |    8 ++
> >  kernel/cgroup_freezer.c                     |  128 +++++++++++++++++++-------
> >  4 files changed, 117 insertions(+), 37 deletions(-)
> > 
> > diff --git a/Documentation/cgroups/freezer-subsystem.txt b/Documentation/cgroups/freezer-subsystem.txt
> > index 41f37fe..92b68e6 100644
> > --- a/Documentation/cgroups/freezer-subsystem.txt
> > +++ b/Documentation/cgroups/freezer-subsystem.txt
> > @@ -100,3 +100,13 @@ things happens:
> >  		and returns EINVAL)
> >  	3) The tasks that blocked the cgroup from entering the "FROZEN"
> >  		state disappear from the cgroup's set of tasks.
> > +
> > +When the cgroup freezer is used to guard container checkpoint operations the
> > +freezer.state may be "CHECKPOINTING". "CHECKPOINTING" can only be set on a
> > +"FROZEN" cgroup using the checkpoint system call. Once in the "CHECKPOINTING"
> > +state, the cgroup may not leave until the checkpoint system call returns the
> > +freezer state to "FROZEN". Writing any new state to freezer.state while
> > +checkpointing will return EBUSY. These semantics ensure that userspace cannot
> > +unfreeze the cgroup midway through the checkpoint system call. Note that,
> > +unlike "FROZEN" and "FREEZING", there is no corresponding "CHECKPOINTED"
> > +state.
> > diff --git a/checkpoint/checkpoint.c b/checkpoint/checkpoint.c
> > index afc7300..d586a9b 100644
> > --- a/checkpoint/checkpoint.c
> > +++ b/checkpoint/checkpoint.c
> > @@ -569,7 +569,10 @@ int do_checkpoint(struct ckpt_ctx *ctx, pid_t pid)
> > 
> >  	ret = init_checkpoint_ctx(ctx, pid);
> >  	if (ret < 0)
> > -		goto out;
> > +		return ret;
> > +	ret = cgroup_freezer_begin_checkpoint(ctx->root_task);
> > +	if (ret < 0)
> > +		return ret;
> >  	ret = build_tree(ctx);
> >  	if (ret < 0)
> >  		goto out;
> > @@ -597,6 +600,7 @@ int do_checkpoint(struct ckpt_ctx *ctx, pid_t pid)
> >  	/* on success, return (unique) checkpoint identifier */
> >  	ctx->crid = atomic_inc_return(&ctx_count);
> >  	ret = ctx->crid;
> > - out:
> > +out:
> > +	cgroup_freezer_end_checkpoint(ctx->root_task);
> >  	return ret;
> >  }
> 
> We have no guarantees that all tasks under ctx->root_task are in
> the same freezer cgroup.  This presents a problem if we want to
> make this guarantee which you're trying to make.
> 

Right. I know we don't have that guarantee -- that's why I mentioned it.

> First, you need to update the may_checkpoint() check to check that
> the tasks are CHECKPOINTING, not FROZEN.

Good point.

> 
> But in addition, note that (a) there may be two parallel checkpoints of
> separate tasksets (C and D), and (b) userspace may have mucked up
> and put some tasks from set C into set D's cgroup.  That means that it's
> possible that all of the may_checkpoint() tasks pass, but if D's
> checkpoint finishes before C's, then some tasks will be unfrozen
> before their checkpoint is complete.
> 
> So I guess now you need to enforce at may_checkpoint() that all tasks
> being checkpointed are in ctx->root_task's freezer cgroup?

One thing to note is the cgroup_freezer_begin|end_checkpoint() calls
could be applied to any task. Right now my patch only applies it to the
root. 

The use of it in the checkpoint code will look very different if it's applied
to all tasks. Since we haven't clearly defined "container", it's not quite
clear what should be done here. I'm thinking we either need to check that all
the tasks belong to the same cgroup (below) or we need to call
cgroup_freezer_begin|end_checkpoint() for each task we encounter :/.

So when the initial leak-detection pass is made you'd also try to cause
each task's cgroup to enter CHECKPOINTING. That would require a counter for
the number of times the cgroup has (re)entered CHECKPOINTING, and when the
counter reaches 0 -- on the last cgroup_freezer_end_checkpoint() call --
the freezer.state could be moved back to FROZEN.

Add to all that the self-checkpoint problem where a task freezes itself
before getting to sys_checkpoint :P...

Cheers,
	-Matt

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/4] cgroup freezer: --- replacement patch 4/4 (b)
       [not found]                   ` <Pine.LNX.4.64.0906032011470.19096-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
@ 2009-06-04 10:45                     ` Matt Helsley
       [not found]                       ` <20090604104527.GY9285-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 33+ messages in thread
From: Matt Helsley @ 2009-06-04 10:45 UTC (permalink / raw)
  To: Oren Laadan; +Cc: Paul Menage, Containers

On Wed, Jun 03, 2009 at 08:12:19PM -0400, Oren Laadan wrote:
> From 3c24531980764a71705492c2dfc2cc99366784f3 Mon Sep 17 00:00:00 2001
> From: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
> Date: Wed, 3 Jun 2009 19:31:21 -0400
> Subject: [PATCH] c/r: use CHECKPOINTING state for hierarchy's cgroup freezer
> 
> Set state of freezer cgroup of checkpointed task hierarchy to
> "CHECKPOINTING" during a checkpoint, to ensure that tasks cannot
> be thawed while at it.
>
> get_container() grabs a reference to the root task's freezer cgroup.
> Then in may_checkpoint_task() each it verifies that all tasks belong
> to the same freezer group.

Ugh, I really don't like grabbing this reference. It may be necessary
though -- I need to think about it more.

> 
> In particular, the root task is also tested, such that if the root
> tasks changes its freezer cgroups before it moves to "CHECKPOINTING",
> it will be notived and an error returned.

Tasks in a CHECKPOINTING cgroup are frozen and frozen tasks cannot move.
See freezer_can_attach(). This does rather complicate self-checkpoint
though! See my reply to your follow-on patches..

Cheers,
	-Matt Helsley

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/4] cgroup freezer: --- replacement patch 4/4 (a)
       [not found]                       ` <20090604103418.GW9285-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-06-04 13:44                         ` Oren Laadan
       [not found]                           ` <Pine.LNX.4.64.0906040939100.25421-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
  2009-06-04 16:32                         ` Oren Laadan
  1 sibling, 1 reply; 33+ messages in thread
From: Oren Laadan @ 2009-06-04 13:44 UTC (permalink / raw)
  To: Matt Helsley; +Cc: Paul Menage, Containers

On Thu, 4 Jun 2009, Matt Helsley wrote:

> On Wed, Jun 03, 2009 at 08:10:24PM -0400, Oren Laadan wrote:
> > 
> > From 6cdb7d9504a19ad88e6da8ad85374267c7acb1b2 Mon Sep 17 00:00:00 2001
> > From: Matt Helsley <matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> > Date: Wed, 3 Jun 2009 02:31:21 -0700
> > Subject: [PATCH] cgroup freezer: Add CHECKPOINTING state to safeguard container checkpoint
> > 
> > The CHECKPOINTING state prevents userspace from unfreezing tasks until
> > sys_checkpoint() is finished. When doing container checkpoint userspace
> > will do:
> > 
> > 	echo FROZEN > /cgroups/my_container/freezer.state
> > 	...
> > 	rc = sys_checkpoint( <pid of container root> );
> > 
> > To ensure a consistent checkpoint image userspace should not be allowed
> > to thaw the cgroup (echo THAWED > /cgroups/my_container/freezer.state)
> > during checkpoint.
> > 
> > "CHECKPOINTING" can only be set on a "FROZEN" cgroup using the checkpoint
> > system call. Once in the "CHECKPOINTING" state, the cgroup may not leave until
> > the checkpoint system call is finished and ready to return. Then the
> > freezer state returns to "FROZEN". Writing any new state to freezer.state while
> > checkpointing will return EBUSY. These semantics ensure that userspace cannot
> > unfreeze the cgroup midway through the checkpoint system call.
> > 
> > The cgroup_freezer_begin_checkpoint() and cgroup_freezer_end_checkpoint()
> > make relatively few assumptions about the task that is passed in. However the
> > way they are called in do_checkpoint() assumes that the root of the container
> > is in the same freezer cgroup as all the other tasks that will be
> > checkpointed.
> > 
> > Matt Helsley's wrote the original patch.
> > 
> > Changlog:
> >   [2009-Jun-03] change cgroup_freezer_{begin,end}_checkpoint() to take a
> >   		struct cgroup_subsys_state pointer
> > 		add struct cgroup_subsys_state *get_task_cgroup_freezer()
> > 
> > Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
> > Cc: Matt Helsley <matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> > Cc: Paul Menage <menage-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> > Cc: Li Zefan <lizf-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
> > Cc: Cedric Le Goater <legoater-GANU6spQydw@public.gmane.org>
> > 
> > Notes:
> >         As a side-effect this prevents the multiple tasks from entering the
> >                 CHECKPOINTING state simultaneously. All but one will get -EBUSY.
> > ---
> >  Documentation/cgroups/freezer-subsystem.txt |   10 ++
> >  include/linux/freezer.h                     |   10 ++
> >  kernel/cgroup_freezer.c                     |  149 +++++++++++++++++++--------
> >  3 files changed, 127 insertions(+), 42 deletions(-)
> > 
> > diff --git a/Documentation/cgroups/freezer-subsystem.txt b/Documentation/cgroups/freezer-subsystem.txt
> > index 41f37fe..92b68e6 100644
> > --- a/Documentation/cgroups/freezer-subsystem.txt
> > +++ b/Documentation/cgroups/freezer-subsystem.txt
> > @@ -100,3 +100,13 @@ things happens:
> >  		and returns EINVAL)
> >  	3) The tasks that blocked the cgroup from entering the "FROZEN"
> >  		state disappear from the cgroup's set of tasks.
> > +
> > +When the cgroup freezer is used to guard container checkpoint operations the
> > +freezer.state may be "CHECKPOINTING". "CHECKPOINTING" can only be set on a
> > +"FROZEN" cgroup using the checkpoint system call. Once in the "CHECKPOINTING"
> > +state, the cgroup may not leave until the checkpoint system call returns the
> > +freezer state to "FROZEN". Writing any new state to freezer.state while
> > +checkpointing will return EBUSY. These semantics ensure that userspace cannot
> > +unfreeze the cgroup midway through the checkpoint system call. Note that,
> > +unlike "FROZEN" and "FREEZING", there is no corresponding "CHECKPOINTED"
> > +state.
> > diff --git a/include/linux/freezer.h b/include/linux/freezer.h
> > index da7e52b..1402911 100644
> > --- a/include/linux/freezer.h
> > +++ b/include/linux/freezer.h
> > @@ -64,12 +64,22 @@ extern bool freeze_task(struct task_struct *p, bool sig_only);
> >  extern void cancel_freezing(struct task_struct *p);
> > 
> >  #ifdef CONFIG_CGROUP_FREEZER
> > +struct cgroup_subsys_state;
> >  extern int cgroup_freezing_or_frozen(struct task_struct *task);
> > +extern struct cgroup_subsys_state *get_task_cgroup_freezer(struct task_struct *task);
> > +extern int cgroup_freezer_begin_checkpoint(struct cgroup_subsys_state *css);
> > +extern void cgroup_freezer_end_checkpoint(struct cgroup_subsys_state *css);
> >  #else /* !CONFIG_CGROUP_FREEZER */
> >  static inline int cgroup_freezing_or_frozen(struct task_struct *task)
> >  {
> >  	return 0;
> >  }
> > +static inline int cgroup_freezer_begin_checkpoint(struct task_struct *task)
> > +{
> > +	return -ENOTSUPP;
> > +}
> > +static inline void cgroup_freezer_end_checkpoint(struct task_struct *task)
> > +{}
> 
> With CONFIG_CHECKPOINT depending on CONFIG_FREEZER I don't see why these
> empty definitions are needed. Maybe it's just too late in the evening...

zzzzz....

> 
> >  #endif /* !CONFIG_CGROUP_FREEZER */
> > 
> >  /*
> > diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
> > index 05795b7..4790fb9 100644
> > --- a/kernel/cgroup_freezer.c
> > +++ b/kernel/cgroup_freezer.c
> > @@ -25,6 +25,7 @@ enum freezer_state {
> >  	CGROUP_THAWED = 0,
> >  	CGROUP_FREEZING,
> >  	CGROUP_FROZEN,
> > +	CGROUP_CHECKPOINTING,
> >  };
> > 
> >  struct freezer {
> > @@ -47,6 +48,18 @@ static inline struct freezer *task_freezer(struct task_struct *task)
> >  			    struct freezer, css);
> >  }
> > 
> > +struct cgroup_subsys_state *get_task_cgroup_freezer(struct task_struct *task)
> > +{
> > +	struct cgroup_subsys_state *css;
> > +
> > +	task_lock(task);
> > +	css = task_subsys_state(task, freezer_subsys_id);
> > +	css_get(css); /* make sure freezer doesn't go away */
> > +	task_unlock(task);
> > +
> > +	return css;
> > +}
> 
> Seems like there should be a better way to do this than grabbing
> this reference. I'd prefer to just introduce:
> 
> int in_same_cgroup_freezer(struct task_struct *task1, 
> 				struct task_struct *task2);
> 
> In the external checkpoint case the root task is frozen and hence cannot
> change cgroups.
> 
> Regardless of that reference, I think the interaction between
> sys_checkpoint() and the cgroup freezer as we have it right now is broken
> in the self-checkpoint case. It doesn't work because the freezer allows a
> task to freeze itself. So if it does simply:

There should be no interaction. 

While I do make sure to skip the freezer cgroup test for the self
checkpoint test, I forgot to do it when calling begin/end checkpoint.

Since we don't require that a self checkpointing task freeze itself,
this can be fixed by adding suitable 'if (t != current) before these
two calls.

This is true regardless of how we eventually address the same-cgroup
question that you rightfully point.

Oren.

> 
> echo FROZEN > /my/cgroup/freezer.state
> 
> then it will never reach sys_checkpoint. If it moves itself to a different
> cgroup to avoid this then we need a new way to determine which cgroup to
> move from FROZEN to CHECKPOINTING.
> 
> Alternatively, we can populate each group with a new freezer file besides
> freezer.state. Depending on what seems best it could record which task(s)
> to not freeze (ugh). Or it could be a flags file indicating that if the
> task writing FROZEN to freezer.state belongs to this cgroup then it
> shouldn't be frozen.
> 
> I think removing the ability to freeze oneself may not be a desirable
> change to other users of the freezer interface. It also wouldn't be
> backward-compatible.
> 
> Cheers,
> 	-Matt Helsley
> 
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/4] cgroup freezer: --- replacement patch 4/4 (b)
       [not found]                       ` <20090604104527.GY9285-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-06-04 13:52                         ` Oren Laadan
  2009-06-04 14:04                         ` Serge E. Hallyn
  1 sibling, 0 replies; 33+ messages in thread
From: Oren Laadan @ 2009-06-04 13:52 UTC (permalink / raw)
  To: Matt Helsley; +Cc: Paul Menage, Containers

On Thu, 4 Jun 2009, Matt Helsley wrote:

> On Wed, Jun 03, 2009 at 08:12:19PM -0400, Oren Laadan wrote:
> > From 3c24531980764a71705492c2dfc2cc99366784f3 Mon Sep 17 00:00:00 2001
> > From: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
> > Date: Wed, 3 Jun 2009 19:31:21 -0400
> > Subject: [PATCH] c/r: use CHECKPOINTING state for hierarchy's cgroup freezer
> > 
> > Set state of freezer cgroup of checkpointed task hierarchy to
> > "CHECKPOINTING" during a checkpoint, to ensure that tasks cannot
> > be thawed while at it.
> >
> > get_container() grabs a reference to the root task's freezer cgroup.
> > Then in may_checkpoint_task() each it verifies that all tasks belong
> > to the same freezer group.
> 
> Ugh, I really don't like grabbing this reference. It may be necessary
> though -- I need to think about it more.
> 
> > 
> > In particular, the root task is also tested, such that if the root
> > tasks changes its freezer cgroups before it moves to "CHECKPOINTING",
> > it will be notived and an error returned.
> 
> Tasks in a CHECKPOINTING cgroup are frozen and frozen tasks cannot move.
> See freezer_can_attach(). This does rather complicate self-checkpoint
> though! See my reply to your follow-on patches..

We grab the reference of the css _before_ the group is CHECKPOINTING.
Between that point and calling cgroup_freezer_begin_checkpoint(), the
root task may still be thawed and/or change groups. In this comment I
argue that the code is safe, since the test in may_checkpoint_task()
that compares a task's css to the "container's" css will return an
error in this case.

As for the self-checkpoint you are right. Probably aneasiest way is
require isn't kept the self-checkpointer not in the freezer cgroup, 
and skip the relevant tests for it.

Oren.

> 
> Cheers,
> 	-Matt Helsley
> 
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/4] cgroup freezer: --- replacement patch 4/4 (a)
       [not found]                           ` <Pine.LNX.4.64.0906040939100.25421-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
@ 2009-06-04 13:54                             ` Oren Laadan
  0 siblings, 0 replies; 33+ messages in thread
From: Oren Laadan @ 2009-06-04 13:54 UTC (permalink / raw)
  To: Matt Helsley; +Cc: Paul Menage, Containers

On Thu, 4 Jun 2009, Oren Laadan wrote:

> On Thu, 4 Jun 2009, Matt Helsley wrote:
> 
> > On Wed, Jun 03, 2009 at 08:10:24PM -0400, Oren Laadan wrote:
> > > 
> > > From 6cdb7d9504a19ad88e6da8ad85374267c7acb1b2 Mon Sep 17 00:00:00 2001
> > > From: Matt Helsley <matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> > > Date: Wed, 3 Jun 2009 02:31:21 -0700
> > > Subject: [PATCH] cgroup freezer: Add CHECKPOINTING state to safeguard container checkpoint
> > > 
> > > The CHECKPOINTING state prevents userspace from unfreezing tasks until
> > > sys_checkpoint() is finished. When doing container checkpoint userspace
> > > will do:
> > > 
> > > 	echo FROZEN > /cgroups/my_container/freezer.state
> > > 	...
> > > 	rc = sys_checkpoint( <pid of container root> );
> > > 
> > > To ensure a consistent checkpoint image userspace should not be allowed
> > > to thaw the cgroup (echo THAWED > /cgroups/my_container/freezer.state)
> > > during checkpoint.
> > > 
> > > "CHECKPOINTING" can only be set on a "FROZEN" cgroup using the checkpoint
> > > system call. Once in the "CHECKPOINTING" state, the cgroup may not leave until
> > > the checkpoint system call is finished and ready to return. Then the
> > > freezer state returns to "FROZEN". Writing any new state to freezer.state while
> > > checkpointing will return EBUSY. These semantics ensure that userspace cannot
> > > unfreeze the cgroup midway through the checkpoint system call.
> > > 
> > > The cgroup_freezer_begin_checkpoint() and cgroup_freezer_end_checkpoint()
> > > make relatively few assumptions about the task that is passed in. However the
> > > way they are called in do_checkpoint() assumes that the root of the container
> > > is in the same freezer cgroup as all the other tasks that will be
> > > checkpointed.
> > > 
> > > Matt Helsley's wrote the original patch.
> > > 
> > > Changlog:
> > >   [2009-Jun-03] change cgroup_freezer_{begin,end}_checkpoint() to take a
> > >   		struct cgroup_subsys_state pointer
> > > 		add struct cgroup_subsys_state *get_task_cgroup_freezer()
> > > 
> > > Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
> > > Cc: Matt Helsley <matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> > > Cc: Paul Menage <menage-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> > > Cc: Li Zefan <lizf-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
> > > Cc: Cedric Le Goater <legoater-GANU6spQydw@public.gmane.org>
> > > 
> > > Notes:
> > >         As a side-effect this prevents the multiple tasks from entering the
> > >                 CHECKPOINTING state simultaneously. All but one will get -EBUSY.
> > > ---
> > >  Documentation/cgroups/freezer-subsystem.txt |   10 ++
> > >  include/linux/freezer.h                     |   10 ++
> > >  kernel/cgroup_freezer.c                     |  149 +++++++++++++++++++--------
> > >  3 files changed, 127 insertions(+), 42 deletions(-)
> > > 
> > > diff --git a/Documentation/cgroups/freezer-subsystem.txt b/Documentation/cgroups/freezer-subsystem.txt
> > > index 41f37fe..92b68e6 100644
> > > --- a/Documentation/cgroups/freezer-subsystem.txt
> > > +++ b/Documentation/cgroups/freezer-subsystem.txt
> > > @@ -100,3 +100,13 @@ things happens:
> > >  		and returns EINVAL)
> > >  	3) The tasks that blocked the cgroup from entering the "FROZEN"
> > >  		state disappear from the cgroup's set of tasks.
> > > +
> > > +When the cgroup freezer is used to guard container checkpoint operations the
> > > +freezer.state may be "CHECKPOINTING". "CHECKPOINTING" can only be set on a
> > > +"FROZEN" cgroup using the checkpoint system call. Once in the "CHECKPOINTING"
> > > +state, the cgroup may not leave until the checkpoint system call returns the
> > > +freezer state to "FROZEN". Writing any new state to freezer.state while
> > > +checkpointing will return EBUSY. These semantics ensure that userspace cannot
> > > +unfreeze the cgroup midway through the checkpoint system call. Note that,
> > > +unlike "FROZEN" and "FREEZING", there is no corresponding "CHECKPOINTED"
> > > +state.
> > > diff --git a/include/linux/freezer.h b/include/linux/freezer.h
> > > index da7e52b..1402911 100644
> > > --- a/include/linux/freezer.h
> > > +++ b/include/linux/freezer.h
> > > @@ -64,12 +64,22 @@ extern bool freeze_task(struct task_struct *p, bool sig_only);
> > >  extern void cancel_freezing(struct task_struct *p);
> > > 
> > >  #ifdef CONFIG_CGROUP_FREEZER
> > > +struct cgroup_subsys_state;
> > >  extern int cgroup_freezing_or_frozen(struct task_struct *task);
> > > +extern struct cgroup_subsys_state *get_task_cgroup_freezer(struct task_struct *task);
> > > +extern int cgroup_freezer_begin_checkpoint(struct cgroup_subsys_state *css);
> > > +extern void cgroup_freezer_end_checkpoint(struct cgroup_subsys_state *css);
> > >  #else /* !CONFIG_CGROUP_FREEZER */
> > >  static inline int cgroup_freezing_or_frozen(struct task_struct *task)
> > >  {
> > >  	return 0;
> > >  }
> > > +static inline int cgroup_freezer_begin_checkpoint(struct task_struct *task)
> > > +{
> > > +	return -ENOTSUPP;
> > > +}
> > > +static inline void cgroup_freezer_end_checkpoint(struct task_struct *task)
> > > +{}
> > 
> > With CONFIG_CHECKPOINT depending on CONFIG_FREEZER I don't see why these
> > empty definitions are needed. Maybe it's just too late in the evening...
> 
> zzzzz....
> 
> > 
> > >  #endif /* !CONFIG_CGROUP_FREEZER */
> > > 
> > >  /*
> > > diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
> > > index 05795b7..4790fb9 100644
> > > --- a/kernel/cgroup_freezer.c
> > > +++ b/kernel/cgroup_freezer.c
> > > @@ -25,6 +25,7 @@ enum freezer_state {
> > >  	CGROUP_THAWED = 0,
> > >  	CGROUP_FREEZING,
> > >  	CGROUP_FROZEN,
> > > +	CGROUP_CHECKPOINTING,
> > >  };
> > > 
> > >  struct freezer {
> > > @@ -47,6 +48,18 @@ static inline struct freezer *task_freezer(struct task_struct *task)
> > >  			    struct freezer, css);
> > >  }
> > > 
> > > +struct cgroup_subsys_state *get_task_cgroup_freezer(struct task_struct *task)
> > > +{
> > > +	struct cgroup_subsys_state *css;
> > > +
> > > +	task_lock(task);
> > > +	css = task_subsys_state(task, freezer_subsys_id);
> > > +	css_get(css); /* make sure freezer doesn't go away */
> > > +	task_unlock(task);
> > > +
> > > +	return css;
> > > +}
> > 
> > Seems like there should be a better way to do this than grabbing
> > this reference. I'd prefer to just introduce:
> > 
> > int in_same_cgroup_freezer(struct task_struct *task1, 
> > 				struct task_struct *task2);
> > 
> > In the external checkpoint case the root task is frozen and hence cannot
> > change cgroups.
> > 
> > Regardless of that reference, I think the interaction between
> > sys_checkpoint() and the cgroup freezer as we have it right now is broken
> > in the self-checkpoint case. It doesn't work because the freezer allows a
> > task to freeze itself. So if it does simply:
> 
> There should be no interaction. 
> 
> While I do make sure to skip the freezer cgroup test for the self
> checkpoint test, I forgot to do it when calling begin/end checkpoint.
> 
> Since we don't require that a self checkpointing task freeze itself,
> this can be fixed by adding suitable 'if (t != current) before these
> two calls.

Ahhh ... scratch that. Call should remain, of course.

Instead, user can remove the checkpointer task (in self-checkpoint) 
from the freezer cgroup. The tests is may_checkpoint() already allow
current task to not be in the cgroup.

Oren.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/4] cgroup freezer: --- replacement patch 4/4 (b)
       [not found]                       ` <20090604104527.GY9285-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  2009-06-04 13:52                         ` Oren Laadan
@ 2009-06-04 14:04                         ` Serge E. Hallyn
  1 sibling, 0 replies; 33+ messages in thread
From: Serge E. Hallyn @ 2009-06-04 14:04 UTC (permalink / raw)
  To: Matt Helsley; +Cc: Paul Menage, Containers

Quoting Matt Helsley (matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org):
> On Wed, Jun 03, 2009 at 08:12:19PM -0400, Oren Laadan wrote:
> > From 3c24531980764a71705492c2dfc2cc99366784f3 Mon Sep 17 00:00:00 2001
> > From: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
> > Date: Wed, 3 Jun 2009 19:31:21 -0400
> > Subject: [PATCH] c/r: use CHECKPOINTING state for hierarchy's cgroup freezer
> > 
> > Set state of freezer cgroup of checkpointed task hierarchy to
> > "CHECKPOINTING" during a checkpoint, to ensure that tasks cannot
> > be thawed while at it.
> >
> > get_container() grabs a reference to the root task's freezer cgroup.
> > Then in may_checkpoint_task() each it verifies that all tasks belong
> > to the same freezer group.
> 
> Ugh, I really don't like grabbing this reference. It may be necessary
> though -- I need to think about it more.

This was my first thought too.  The rest of Oren's algorithm seems
sounds, but this just seems like an unneeded step.  If we prevent
tasks from being moved out of the freezer container, and from
being thawed before checkpoint is done, then why do we need to
get the container?

-serge

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/4] cgroup freezer: --- replacement patch 4/4 (a)
       [not found]                   ` <Pine.LNX.4.64.0906032008490.19096-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
  2009-06-04 10:34                     ` Matt Helsley
@ 2009-06-04 14:32                     ` Serge E. Hallyn
  1 sibling, 0 replies; 33+ messages in thread
From: Serge E. Hallyn @ 2009-06-04 14:32 UTC (permalink / raw)
  To: Oren Laadan; +Cc: Paul Menage, Containers

Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> 
> >From 6cdb7d9504a19ad88e6da8ad85374267c7acb1b2 Mon Sep 17 00:00:00 2001
> From: Matt Helsley <matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> Date: Wed, 3 Jun 2009 02:31:21 -0700
> Subject: [PATCH] cgroup freezer: Add CHECKPOINTING state to safeguard container checkpoint
> 
> The CHECKPOINTING state prevents userspace from unfreezing tasks until
> sys_checkpoint() is finished. When doing container checkpoint userspace
> will do:
> 
> 	echo FROZEN > /cgroups/my_container/freezer.state
> 	...
> 	rc = sys_checkpoint( <pid of container root> );
> 
> To ensure a consistent checkpoint image userspace should not be allowed
> to thaw the cgroup (echo THAWED > /cgroups/my_container/freezer.state)
> during checkpoint.
> 
> "CHECKPOINTING" can only be set on a "FROZEN" cgroup using the checkpoint
> system call. Once in the "CHECKPOINTING" state, the cgroup may not leave until
> the checkpoint system call is finished and ready to return. Then the
> freezer state returns to "FROZEN". Writing any new state to freezer.state while
> checkpointing will return EBUSY. These semantics ensure that userspace cannot
> unfreeze the cgroup midway through the checkpoint system call.
> 
> The cgroup_freezer_begin_checkpoint() and cgroup_freezer_end_checkpoint()
> make relatively few assumptions about the task that is passed in. However the
> way they are called in do_checkpoint() assumes that the root of the container
> is in the same freezer cgroup as all the other tasks that will be
> checkpointed.
> 
> Matt Helsley's wrote the original patch.
> 
> Changlog:
>   [2009-Jun-03] change cgroup_freezer_{begin,end}_checkpoint() to take a
>   		struct cgroup_subsys_state pointer
> 		add struct cgroup_subsys_state *get_task_cgroup_freezer()
> 
> Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
> Cc: Matt Helsley <matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> Cc: Paul Menage <menage-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> Cc: Li Zefan <lizf-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
> Cc: Cedric Le Goater <legoater-GANU6spQydw@public.gmane.org>

Looks good -

Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

to both.  I still don't see why you need a ref to the root_freezer,
but it doesn't hurt at any rate so I won't harp on it.

thanks,
-serge

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/4] cgroup freezer: --- replacement patch 4/4 (a)
       [not found]                       ` <20090604103418.GW9285-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  2009-06-04 13:44                         ` Oren Laadan
@ 2009-06-04 16:32                         ` Oren Laadan
  1 sibling, 0 replies; 33+ messages in thread
From: Oren Laadan @ 2009-06-04 16:32 UTC (permalink / raw)
  To: Matt Helsley; +Cc: Paul Menage, Containers



Matt Helsley wrote:
> On Wed, Jun 03, 2009 at 08:10:24PM -0400, Oren Laadan wrote:
>> From 6cdb7d9504a19ad88e6da8ad85374267c7acb1b2 Mon Sep 17 00:00:00 2001
>> From: Matt Helsley <matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
>> Date: Wed, 3 Jun 2009 02:31:21 -0700
>> Subject: [PATCH] cgroup freezer: Add CHECKPOINTING state to safeguard container checkpoint
>>
>> The CHECKPOINTING state prevents userspace from unfreezing tasks until
>> sys_checkpoint() is finished. When doing container checkpoint userspace
>> will do:
>>
>> 	echo FROZEN > /cgroups/my_container/freezer.state
>> 	...
>> 	rc = sys_checkpoint( <pid of container root> );
>>
>> To ensure a consistent checkpoint image userspace should not be allowed
>> to thaw the cgroup (echo THAWED > /cgroups/my_container/freezer.state)
>> during checkpoint.
>>
>> "CHECKPOINTING" can only be set on a "FROZEN" cgroup using the checkpoint
>> system call. Once in the "CHECKPOINTING" state, the cgroup may not leave until
>> the checkpoint system call is finished and ready to return. Then the
>> freezer state returns to "FROZEN". Writing any new state to freezer.state while
>> checkpointing will return EBUSY. These semantics ensure that userspace cannot
>> unfreeze the cgroup midway through the checkpoint system call.
>>
>> The cgroup_freezer_begin_checkpoint() and cgroup_freezer_end_checkpoint()
>> make relatively few assumptions about the task that is passed in. However the
>> way they are called in do_checkpoint() assumes that the root of the container
>> is in the same freezer cgroup as all the other tasks that will be
>> checkpointed.
>>
>> Matt Helsley's wrote the original patch.
>>
>> Changlog:
>>   [2009-Jun-03] change cgroup_freezer_{begin,end}_checkpoint() to take a
>>   		struct cgroup_subsys_state pointer
>> 		add struct cgroup_subsys_state *get_task_cgroup_freezer()
>>
>> Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
>> Cc: Matt Helsley <matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
>> Cc: Paul Menage <menage-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
>> Cc: Li Zefan <lizf-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
>> Cc: Cedric Le Goater <legoater-GANU6spQydw@public.gmane.org>
>>
>> Notes:
>>         As a side-effect this prevents the multiple tasks from entering the
>>                 CHECKPOINTING state simultaneously. All but one will get -EBUSY.
>> ---
>>  Documentation/cgroups/freezer-subsystem.txt |   10 ++
>>  include/linux/freezer.h                     |   10 ++
>>  kernel/cgroup_freezer.c                     |  149 +++++++++++++++++++--------
>>  3 files changed, 127 insertions(+), 42 deletions(-)
>>
>> diff --git a/Documentation/cgroups/freezer-subsystem.txt b/Documentation/cgroups/freezer-subsystem.txt
>> index 41f37fe..92b68e6 100644
>> --- a/Documentation/cgroups/freezer-subsystem.txt
>> +++ b/Documentation/cgroups/freezer-subsystem.txt
>> @@ -100,3 +100,13 @@ things happens:
>>  		and returns EINVAL)
>>  	3) The tasks that blocked the cgroup from entering the "FROZEN"
>>  		state disappear from the cgroup's set of tasks.
>> +
>> +When the cgroup freezer is used to guard container checkpoint operations the
>> +freezer.state may be "CHECKPOINTING". "CHECKPOINTING" can only be set on a
>> +"FROZEN" cgroup using the checkpoint system call. Once in the "CHECKPOINTING"
>> +state, the cgroup may not leave until the checkpoint system call returns the
>> +freezer state to "FROZEN". Writing any new state to freezer.state while
>> +checkpointing will return EBUSY. These semantics ensure that userspace cannot
>> +unfreeze the cgroup midway through the checkpoint system call. Note that,
>> +unlike "FROZEN" and "FREEZING", there is no corresponding "CHECKPOINTED"
>> +state.
>> diff --git a/include/linux/freezer.h b/include/linux/freezer.h
>> index da7e52b..1402911 100644
>> --- a/include/linux/freezer.h
>> +++ b/include/linux/freezer.h
>> @@ -64,12 +64,22 @@ extern bool freeze_task(struct task_struct *p, bool sig_only);
>>  extern void cancel_freezing(struct task_struct *p);
>>
>>  #ifdef CONFIG_CGROUP_FREEZER
>> +struct cgroup_subsys_state;
>>  extern int cgroup_freezing_or_frozen(struct task_struct *task);
>> +extern struct cgroup_subsys_state *get_task_cgroup_freezer(struct task_struct *task);
>> +extern int cgroup_freezer_begin_checkpoint(struct cgroup_subsys_state *css);
>> +extern void cgroup_freezer_end_checkpoint(struct cgroup_subsys_state *css);
>>  #else /* !CONFIG_CGROUP_FREEZER */
>>  static inline int cgroup_freezing_or_frozen(struct task_struct *task)
>>  {
>>  	return 0;
>>  }
>> +static inline int cgroup_freezer_begin_checkpoint(struct task_struct *task)
>> +{
>> +	return -ENOTSUPP;
>> +}
>> +static inline void cgroup_freezer_end_checkpoint(struct task_struct *task)
>> +{}
> 
> With CONFIG_CHECKPOINT depending on CONFIG_FREEZER I don't see why these
> empty definitions are needed. Maybe it's just too late in the evening...
> 
>>  #endif /* !CONFIG_CGROUP_FREEZER */
>>
>>  /*
>> diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
>> index 05795b7..4790fb9 100644
>> --- a/kernel/cgroup_freezer.c
>> +++ b/kernel/cgroup_freezer.c
>> @@ -25,6 +25,7 @@ enum freezer_state {
>>  	CGROUP_THAWED = 0,
>>  	CGROUP_FREEZING,
>>  	CGROUP_FROZEN,
>> +	CGROUP_CHECKPOINTING,
>>  };
>>
>>  struct freezer {
>> @@ -47,6 +48,18 @@ static inline struct freezer *task_freezer(struct task_struct *task)
>>  			    struct freezer, css);
>>  }
>>
>> +struct cgroup_subsys_state *get_task_cgroup_freezer(struct task_struct *task)
>> +{
>> +	struct cgroup_subsys_state *css;
>> +
>> +	task_lock(task);
>> +	css = task_subsys_state(task, freezer_subsys_id);
>> +	css_get(css); /* make sure freezer doesn't go away */
>> +	task_unlock(task);
>> +
>> +	return css;
>> +}
> 
> Seems like there should be a better way to do this than grabbing
> this reference. I'd prefer to just introduce:
> 
> int in_same_cgroup_freezer(struct task_struct *task1, 
> 				struct task_struct *task2);
> 
> In the external checkpoint case the root task is frozen and hence cannot
> change cgroups.

Yeah, you and Serge are right. This reference is not well justified.
So we can revert this change and only add in_same_cgroup_freezer().

I'll change it now in ckpt-v16-dev. Are you going to resubmit the
patchset ?  If you leave out the part that adds the calls to begin
and end a checkpoint, it would be relevant regardless of the c/r
patchset in question.

Thanks,

Oren.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/4] cgroup freezer: Avoid lazy state changes when convenient
       [not found]                   ` <20090604031842.GR9285-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-06-07 21:36                     ` Nathan Lynch
  0 siblings, 0 replies; 33+ messages in thread
From: Nathan Lynch @ 2009-06-07 21:36 UTC (permalink / raw)
  To: Matt Helsley; +Cc: Paul Menage, Containers

Matt Helsley <matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> writes:
> On Wed, Jun 03, 2009 at 01:15:47PM -0500, Serge E. Hallyn wrote:
>> Quoting Matt Helsley (matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org):
>> > On Wed, Jun 03, 2009 at 11:10:46AM -0500, Serge E. Hallyn wrote:
>> > > Quoting Matt Helsley (matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org):
>> > > >  When all the tasks of a cgroup were successfully frozen we can avoid
>> > > >  the lazy FREEZING -> FROZEN transition and move into FROZEN during the
>> > > >  write to freezer.state.
>> > > 
>> > > Can you remind us then what the point of the FREEZING state is?
>> > > It doesn't look to me like, after this patch, a cgroup will
>> > > ever be FREEZING?
>> > 
>> > FREEZING is an intermediate state indicating that the cgroup is
>> > partially frozen and, unless userspace retries, it will remain so. 
>> 
>> Oh, so basically a cgroup will be in CGROUP_FREEZING state only
>> while try_to_freeze_cgroup() is looping over the tasks now?
>
> Not quite. freeze_task() can fail because of vfork. If it fails we do
> some additional checks to make sure it'll eventually be freezable. If so
> then we know we missed one. That's when it stays in the FREEZING
> state until the next time userspace writes FROZEN or THAWED to freezer.state.

Not really a comment on your patch, but some puzzlement regarding this
treatment of vfork.

I see how the do_fork code makes the freezer skip a task which is
waiting in vfork.  But is there anything preventing a vfork'd child from
being frozen (e.g. before it execs or exits)?  As far as I can tell this
isn't prevented, but perhaps it should be, since the parent cannot enter
frozen state until after the child has called mm_release.

Does the FREEZING state in cgroup_freezer.c exist solely for the sake of
vfork?  If so, it would help reviewers quite a bit to have that
documented in the code.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/4] cgroup freezer: Fix buggy resume test for tasks frozen with cgroup freezer
       [not found]           ` <4A271B75.7090607-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
@ 2009-06-11 21:56             ` Matt Helsley
  0 siblings, 0 replies; 33+ messages in thread
From: Matt Helsley @ 2009-06-11 21:56 UTC (permalink / raw)
  To: Li Zefan
  Cc: Containers, Rafael J. Wysocki, Pavel Machek, Paul Menage,
	linux-pm-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Thu, Jun 04, 2009 at 08:55:17AM +0800, Li Zefan wrote:
> Serge E. Hallyn wrote:
> > Quoting Matt Helsley (matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org):
> >> -int cgroup_frozen(struct task_struct *task)
> >> +int cgroup_freezing_or_frozen(struct task_struct *task)
> >>  {
> >>  	struct freezer *freezer;
> >>  	enum freezer_state state;
> >>
> >>  	task_lock(task);
> >>  	freezer = task_freezer(task);
> >> -	state = freezer->state;
> >> +	if (!freezer->css.cgroup->parent)
> >> +		state = CGROUP_THAWED; /* root cgroup can't be frozen */
> >> +	else
> >> +		state = freezer->state;
> > 
> > Why do you have to add this special-casing now?  I thought
> > we were already preventing freezing the root cgroup, so
> > freezer->state should be sane in that case anyway?
> > 
> 
> Yeah, IIRC top_freezer->state is alwasy CGROUP_THAWED, so this
> check is redundant.

Right, fixed.

Cheers,
	-Matt

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/4] cgroup freezer: Fix buggy resume test for tasks frozen with cgroup freezer
  2009-06-04  0:55         ` Li Zefan
@ 2009-06-11 21:56           ` Matt Helsley
       [not found]           ` <4A271B75.7090607-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
  1 sibling, 0 replies; 33+ messages in thread
From: Matt Helsley @ 2009-06-11 21:56 UTC (permalink / raw)
  To: Li Zefan; +Cc: Containers, Pavel Machek, Serge E. Hallyn, Paul Menage, linux-pm

On Thu, Jun 04, 2009 at 08:55:17AM +0800, Li Zefan wrote:
> Serge E. Hallyn wrote:
> > Quoting Matt Helsley (matthltc@us.ibm.com):
> >> -int cgroup_frozen(struct task_struct *task)
> >> +int cgroup_freezing_or_frozen(struct task_struct *task)
> >>  {
> >>  	struct freezer *freezer;
> >>  	enum freezer_state state;
> >>
> >>  	task_lock(task);
> >>  	freezer = task_freezer(task);
> >> -	state = freezer->state;
> >> +	if (!freezer->css.cgroup->parent)
> >> +		state = CGROUP_THAWED; /* root cgroup can't be frozen */
> >> +	else
> >> +		state = freezer->state;
> > 
> > Why do you have to add this special-casing now?  I thought
> > we were already preventing freezing the root cgroup, so
> > freezer->state should be sane in that case anyway?
> > 
> 
> Yeah, IIRC top_freezer->state is alwasy CGROUP_THAWED, so this
> check is redundant.

Right, fixed.

Cheers,
	-Matt

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 0/4] cgroup freezer: Fixes and CHECKPOINTING support
@ 2009-06-03  9:31 Matt Helsley
  0 siblings, 0 replies; 33+ messages in thread
From: Matt Helsley @ 2009-06-03  9:31 UTC (permalink / raw)
  Cc: Cedric Le Goater, Oren Laadan, Containers, Li Zefan,
	Pavel Machek, Paul Menage, linux-pm


This series fixes a bug with the mainline cgroup freezer code and
cleans up the cgroup freezer before introducing the new
CHECKPOINTING state in the last patch. All but this last patch are
independent of checkpoint/restart and hence should be suitable for
mainline.

The bug was noticed by Oren Ladaan and allows tasks frozen via the
cgroup freezer to be thawed during resume when they should remain
frozen. In Oren's checkpoint/restart tree is means that
sys_checkpoint() sometimes returns EBUSY even when the cgroup is, 
in fact, fully frozen. (Workaround: cat /cgroup/foo/freezer.state
before doing checkpoint..)

The final patch, which adds the CHECKPOINTING state, incorporates
Oren's comments and should fix the lockdep bug he posted too. I 
haven't tested this latest version yet myself and I have a few things to 
attend to before I can get to it so I'm posting this for initial review
now. I believe this resolves all the pending comments I've seen about
the CHECKPOINTING state.

Cheers,
	-Matt Helsley

Matt Helsley (4):
  cgroup freezer: Fix buggy resume test for tasks frozen with cgroup
    freezer
  cgroup freezer: Avoid lazy state changes when convenient
  cgroup freezer: Update stale locking comments
  cgroup freezer: Add CHECKPOINTING state to safeguard container
    checkpoint

 Documentation/cgroups/freezer-subsystem.txt |   10 ++
 checkpoint/checkpoint.c                     |    8 +-
 include/linux/freezer.h                     |   15 ++-
 kernel/cgroup_freezer.c                     |  163 +++++++++++++++++++--------
 kernel/power/process.c                      |    2 +-
 5 files changed, 146 insertions(+), 52 deletions(-)

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2009-06-11 21:56 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-03  9:31 [PATCH 0/4] cgroup freezer: Fixes and CHECKPOINTING support Matt Helsley
     [not found] ` <cover.1244019829.git.matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-06-03  9:31   ` [PATCH 1/4] cgroup freezer: Fix buggy resume test for tasks frozen with cgroup freezer Matt Helsley
2009-06-03 16:05     ` Serge E. Hallyn
     [not found]   ` <91260d5797bcf233ca54c7b41da8278e99b4ca66.1244019829.git.matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-06-03  9:31     ` [PATCH 2/4] cgroup freezer: Avoid lazy state changes when convenient Matt Helsley
2009-06-03 16:05     ` [PATCH 1/4] cgroup freezer: Fix buggy resume test for tasks frozen with cgroup freezer Serge E. Hallyn
2009-06-04  0:55       ` Li Zefan
     [not found]       ` <20090603160551.GB7848-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-06-04  0:55         ` Li Zefan
2009-06-11 21:56           ` Matt Helsley
     [not found]           ` <4A271B75.7090607-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2009-06-11 21:56             ` Matt Helsley
     [not found]   ` <4abbb542b220448d2e8d17e5074899568c584f23.1244019829.git.matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-06-03  9:31     ` [PATCH 4/4] cgroup freezer: Add CHECKPOINTING state to safeguard container checkpoint Matt Helsley
     [not found]       ` <89c3726813accffb7c51cd30ff93b79a4391f382.1244019829.git.matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-06-03 16:18         ` Serge E. Hallyn
     [not found]           ` <20090603161840.GD7848-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-06-04  0:01             ` Oren Laadan
     [not found]               ` <Pine.LNX.4.64.0906031957110.19096-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
2009-06-04  0:10                 ` [PATCH 4/4] cgroup freezer: --- replacement patch 4/4 (a) Oren Laadan
     [not found]                   ` <Pine.LNX.4.64.0906032008490.19096-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
2009-06-04 10:34                     ` Matt Helsley
     [not found]                       ` <20090604103418.GW9285-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-06-04 13:44                         ` Oren Laadan
     [not found]                           ` <Pine.LNX.4.64.0906040939100.25421-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
2009-06-04 13:54                             ` Oren Laadan
2009-06-04 16:32                         ` Oren Laadan
2009-06-04 14:32                     ` Serge E. Hallyn
2009-06-04  0:12                 ` [PATCH 4/4] cgroup freezer: --- replacement patch 4/4 (b) Oren Laadan
     [not found]                   ` <Pine.LNX.4.64.0906032011470.19096-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
2009-06-04 10:45                     ` Matt Helsley
     [not found]                       ` <20090604104527.GY9285-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-06-04 13:52                         ` Oren Laadan
2009-06-04 14:04                         ` Serge E. Hallyn
2009-06-04 10:44             ` [PATCH 4/4] cgroup freezer: Add CHECKPOINTING state to safeguard container checkpoint Matt Helsley
2009-06-03 16:53         ` Oren Laadan
     [not found]   ` <03412f8681d89c99ac575330381ab49f6e5e61ba.1244019829.git.matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-06-03  9:31     ` [PATCH 3/4] cgroup freezer: Update stale locking comments Matt Helsley
2009-06-03 16:10     ` [PATCH 2/4] cgroup freezer: Avoid lazy state changes when convenient Serge E. Hallyn
     [not found]       ` <20090603161046.GC7848-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-06-03 17:52         ` Matt Helsley
     [not found]           ` <20090603175242.GQ9285-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-06-03 18:15             ` Serge E. Hallyn
     [not found]               ` <20090603181547.GA10141-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-06-04  3:18                 ` Matt Helsley
     [not found]                   ` <20090604031842.GR9285-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-06-07 21:36                     ` Nathan Lynch
2009-06-03 19:52     ` Serge E. Hallyn
2009-06-03  9:31 ` [PATCH 1/4] cgroup freezer: Fix buggy resume test for tasks frozen with cgroup freezer Matt Helsley
2009-06-03  9:31 [PATCH 0/4] cgroup freezer: Fixes and CHECKPOINTING support Matt Helsley

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.