All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy
@ 2014-03-28  2:40 ` Tejun Heo
  0 siblings, 0 replies; 52+ messages in thread
From: Tejun Heo @ 2014-03-28  2:40 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Hello,

(this is too late for the upcoming merge window and is targeting the
next devel cycle)

cgroup currently allows creating arbitrary number of hierarchies and
any number of controllers may be associated with a given tree.  This
allows for huge amount of variance how tasks are associated with
various cgroups and controllers; unfortunately, the variance is
extreme to the extent that it unnecessarily complicates capabilities
which can otherwise be straight-forward and hinders implementation of
features which can benefit from coordination among different
controllers.

Here are some of the issues which we're facing with the current
multiple hierarchies.

* cgroup membership of a task can't be described in finite number of
  paths.  As there can be arbitrary number of hierarchies, the key
  describing a task's cgroup membership can be arbitrarily long.  This
  is painful when userland or other parts of the kernel needs to take
  cgroup membership into account and leads to proliferation of
  controllers which are just there to identify membership rather than
  actually control resources, which in turn exacerbates the problem.

* Different controllers may or may not reside on the same hierarchy.
  Features or optimizations which can benefit from sharing the
  hierarchical organization either can't be implemented or becomes
  overly complicated.

* Tasks of a process may belong to different cgroups, which doesn't
  make any sense for some controllers.  Those controllers end up
  ignoring such configurations in their own ways leading to
  inconsistent behavior.  In addition, in-process resource control
  fundamentally isn't something which belongs to cgroup.  As it has to
  be visible to the binary for the process, it must be part of the
  stable programming interface which is easily accessible to the
  process proper in an easy race-free way.

* The current cgroup allows cgroups which have child cgroups to have
  tasks in it.  This means that the child cgroups end up competing
  against the internal tasks.  This introduces inherent ambiguity as
  the two are separate types of entities and the latter doesn't have
  the same control knobs assigned to them.

  Different controllers are dealing with the issue in different ways.
  cpu treats internal tasks and child cgroups as equivalents, which
  makes giving a child cgroup a given ratio of the parent's cpu time
  difficult as the number of competing entities may fluctuate without
  any indication.  blkio, in my misguided attempt to deal with the
  issue, introduced a whole duplicate set of knobs for internal tasks
  and deal with them as if they belong to a separate child cgroup
  making the interface and implementation a mess.  memcg seems
  somewhat ambiguous on the issue but there are attempts to introduce
  ad-hoc modifications to tilt the way it's handled to suit specific
  use cases.

  This is an inherent problem.  All of the solutions that different
  controllers came up with are unsatisfactory, the different behaviors
  greatly increases the level of inconsistency and complicates the
  controller implementations.

This patchset finally implements the default unified hierarchy.  The
goal is providing enough flexibility while enforcing stricter common
structure where appropriate to address the above listed issues.

Controllers which aren't bound to other hierarchies are
automatically attached to the unified hierarchy, which is different in
that controllers are enabled explicitly for each subtree.
"cgroup.subtree_control" controls which controllers are enabled on the
child cgroups.  Let's assume a hierarchy like the following.

  root - A - B - C
               \ D

root's "cgroup.subtree_control" determines which controllers are
enabled on A.  A's on B.  B's on C and D.  This coincides with the
fact that controllers on the immediate sub-level are used to
distribute the resources of the parent.  In fact, it's natural to
assume that resource control knobs of a child belong to its parent.
Enabling a controller in "cgroup.subtree_control" declares that
distribution of the respective resources of the cgroup will be
controlled.  Note that this means that controller enable states are
shared among siblings.

The default hierarchy has an extra restriction - only cgroups which
don't contain any task may have controllers enabled in
"cgroup.subtree_control".  Combined with the other properties of the
default hierarchy, this guarantees that, from the view point of
controllers, tasks are only on the leaf cgroups.  In other words, only
leaf csses may contain tasks.  This rules out situations where child
cgroups compete against internal tasks of the parent.

This patchset contains the following twelve patches.

 0001-cgroup-update-cgroup-subsys_mask-to-child_subsys_mas.patch
 0002-cgroup-introduce-effective-cgroup_subsys_state.patch
 0003-cgroup-implement-cgroup-e_csets.patch
 0004-cgroup-make-css_next_child-skip-missing-csses.patch
 0005-cgroup-reorganize-css_task_iter.patch
 0006-cgroup-teach-css_task_iter-about-effective-csses.patch
 0007-cgroup-cgroup-subsys-should-be-cleared-after-the-css.patch
 0008-cgroup-allow-cgroup-creation-and-suppress-automatic-.patch
 0009-cgroup-add-css_set-dfl_cgrp.patch
 0010-cgroup-update-subsystem-rebind-restrictions.patch
 0011-cgroup-prepare-migration-path-for-unified-hierarchy.patch
 0012-cgroup-implement-dynamic-subtree-controller-enable-d.patch

0001 updates subsys_mask handling again to morph cgrp->subsys_mask to
cgrp->child_subsys_mask.

0002-0003 introduce effective cgroup.  The cgroup on the unified
hierarchy a task belongs to when viewed from a controller.

0004-0007 update iterators to handle effective cgroup correctly.

0008-0011 prepare various paths for explicit controller
enable/disable.

0012 implements explicit controller enable/disable.

The patchset is on top of cgroup/for-3.15 01a971406177 ("cgroup: Use
RCU_INIT_POINTER(x, NULL) in cgroup.c") and also available in the
following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-unified

diffstat follows.

 include/linux/cgroup.h |   44 ++-
 kernel/cgroup.c        |  651 +++++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 585 insertions(+), 110 deletions(-)

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy
@ 2014-03-28  2:40 ` Tejun Heo
  0 siblings, 0 replies; 52+ messages in thread
From: Tejun Heo @ 2014-03-28  2:40 UTC (permalink / raw)
  To: lizefan; +Cc: containers, cgroups, linux-kernel

Hello,

(this is too late for the upcoming merge window and is targeting the
next devel cycle)

cgroup currently allows creating arbitrary number of hierarchies and
any number of controllers may be associated with a given tree.  This
allows for huge amount of variance how tasks are associated with
various cgroups and controllers; unfortunately, the variance is
extreme to the extent that it unnecessarily complicates capabilities
which can otherwise be straight-forward and hinders implementation of
features which can benefit from coordination among different
controllers.

Here are some of the issues which we're facing with the current
multiple hierarchies.

* cgroup membership of a task can't be described in finite number of
  paths.  As there can be arbitrary number of hierarchies, the key
  describing a task's cgroup membership can be arbitrarily long.  This
  is painful when userland or other parts of the kernel needs to take
  cgroup membership into account and leads to proliferation of
  controllers which are just there to identify membership rather than
  actually control resources, which in turn exacerbates the problem.

* Different controllers may or may not reside on the same hierarchy.
  Features or optimizations which can benefit from sharing the
  hierarchical organization either can't be implemented or becomes
  overly complicated.

* Tasks of a process may belong to different cgroups, which doesn't
  make any sense for some controllers.  Those controllers end up
  ignoring such configurations in their own ways leading to
  inconsistent behavior.  In addition, in-process resource control
  fundamentally isn't something which belongs to cgroup.  As it has to
  be visible to the binary for the process, it must be part of the
  stable programming interface which is easily accessible to the
  process proper in an easy race-free way.

* The current cgroup allows cgroups which have child cgroups to have
  tasks in it.  This means that the child cgroups end up competing
  against the internal tasks.  This introduces inherent ambiguity as
  the two are separate types of entities and the latter doesn't have
  the same control knobs assigned to them.

  Different controllers are dealing with the issue in different ways.
  cpu treats internal tasks and child cgroups as equivalents, which
  makes giving a child cgroup a given ratio of the parent's cpu time
  difficult as the number of competing entities may fluctuate without
  any indication.  blkio, in my misguided attempt to deal with the
  issue, introduced a whole duplicate set of knobs for internal tasks
  and deal with them as if they belong to a separate child cgroup
  making the interface and implementation a mess.  memcg seems
  somewhat ambiguous on the issue but there are attempts to introduce
  ad-hoc modifications to tilt the way it's handled to suit specific
  use cases.

  This is an inherent problem.  All of the solutions that different
  controllers came up with are unsatisfactory, the different behaviors
  greatly increases the level of inconsistency and complicates the
  controller implementations.

This patchset finally implements the default unified hierarchy.  The
goal is providing enough flexibility while enforcing stricter common
structure where appropriate to address the above listed issues.

Controllers which aren't bound to other hierarchies are
automatically attached to the unified hierarchy, which is different in
that controllers are enabled explicitly for each subtree.
"cgroup.subtree_control" controls which controllers are enabled on the
child cgroups.  Let's assume a hierarchy like the following.

  root - A - B - C
               \ D

root's "cgroup.subtree_control" determines which controllers are
enabled on A.  A's on B.  B's on C and D.  This coincides with the
fact that controllers on the immediate sub-level are used to
distribute the resources of the parent.  In fact, it's natural to
assume that resource control knobs of a child belong to its parent.
Enabling a controller in "cgroup.subtree_control" declares that
distribution of the respective resources of the cgroup will be
controlled.  Note that this means that controller enable states are
shared among siblings.

The default hierarchy has an extra restriction - only cgroups which
don't contain any task may have controllers enabled in
"cgroup.subtree_control".  Combined with the other properties of the
default hierarchy, this guarantees that, from the view point of
controllers, tasks are only on the leaf cgroups.  In other words, only
leaf csses may contain tasks.  This rules out situations where child
cgroups compete against internal tasks of the parent.

This patchset contains the following twelve patches.

 0001-cgroup-update-cgroup-subsys_mask-to-child_subsys_mas.patch
 0002-cgroup-introduce-effective-cgroup_subsys_state.patch
 0003-cgroup-implement-cgroup-e_csets.patch
 0004-cgroup-make-css_next_child-skip-missing-csses.patch
 0005-cgroup-reorganize-css_task_iter.patch
 0006-cgroup-teach-css_task_iter-about-effective-csses.patch
 0007-cgroup-cgroup-subsys-should-be-cleared-after-the-css.patch
 0008-cgroup-allow-cgroup-creation-and-suppress-automatic-.patch
 0009-cgroup-add-css_set-dfl_cgrp.patch
 0010-cgroup-update-subsystem-rebind-restrictions.patch
 0011-cgroup-prepare-migration-path-for-unified-hierarchy.patch
 0012-cgroup-implement-dynamic-subtree-controller-enable-d.patch

0001 updates subsys_mask handling again to morph cgrp->subsys_mask to
cgrp->child_subsys_mask.

0002-0003 introduce effective cgroup.  The cgroup on the unified
hierarchy a task belongs to when viewed from a controller.

0004-0007 update iterators to handle effective cgroup correctly.

0008-0011 prepare various paths for explicit controller
enable/disable.

0012 implements explicit controller enable/disable.

The patchset is on top of cgroup/for-3.15 01a971406177 ("cgroup: Use
RCU_INIT_POINTER(x, NULL) in cgroup.c") and also available in the
following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-unified

diffstat follows.

 include/linux/cgroup.h |   44 ++-
 kernel/cgroup.c        |  651 +++++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 585 insertions(+), 110 deletions(-)

Thanks.

--
tejun


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 01/12] cgroup: update cgroup->subsys_mask to ->child_subsys_mask and restore cgroup_root->subsys_mask
  2014-03-28  2:40 ` Tejun Heo
@ 2014-03-28  2:40     ` Tejun Heo
  -1 siblings, 0 replies; 52+ messages in thread
From: Tejun Heo @ 2014-03-28  2:40 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA
  Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

944196278d3d ("cgroup: move ->subsys_mask from cgroupfs_root to
cgroup") moved ->subsys_mask from cgroup_root to cgroup to prepare for
the unified hierarhcy; however, it turns out that carrying the
subsys_mask of the children in the parent, instead of itself, is a lot
more natural.  This patch restores cgroup_root->subsys_mask and morphs
cgroup->subsys_mask into cgroup->child_subsys_mask.

* Uses of root->cgrp.subsys_mask are restored to root->subsys_mask.

* Remove automatic setting and clearing of cgrp->subsys_mask and
  instead just inherit ->child_subsys_mask from the parent during
  cgroup creation.  Note that this doesn't affect any current
  behaviors.

* Undo __kill_css() separation.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 include/linux/cgroup.h |  7 ++++--
 kernel/cgroup.c        | 64 +++++++++++++++++++++-----------------------------
 2 files changed, 32 insertions(+), 39 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 43d1ed3..e351a18 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -173,8 +173,8 @@ struct cgroup {
 	 */
 	u64 serial_nr;
 
-	/* The bitmask of subsystems attached to this cgroup */
-	unsigned long subsys_mask;
+	/* the bitmask of subsystems enabled on the child cgroups */
+	unsigned long child_subsys_mask;
 
 	/* Private pointers for each registered subsystem */
 	struct cgroup_subsys_state __rcu *subsys[CGROUP_SUBSYS_COUNT];
@@ -282,6 +282,9 @@ enum {
 struct cgroup_root {
 	struct kernfs_root *kf_root;
 
+	/* The bitmask of subsystems attached to this hierarchy */
+	unsigned long subsys_mask;
+
 	/* Unique id for this hierarchy. */
 	int hierarchy_id;
 
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index e378cb2..3c9c623 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -529,7 +529,7 @@ static struct css_set *find_existing_css_set(struct css_set *old_cset,
 	 * won't change, so no need for locking.
 	 */
 	for_each_subsys(ss, i) {
-		if (root->cgrp.subsys_mask & (1UL << i)) {
+		if (root->subsys_mask & (1UL << i)) {
 			/* Subsystem is in this hierarchy. So we want
 			 * the subsystem state from the new
 			 * cgroup */
@@ -742,7 +742,7 @@ static void cgroup_destroy_root(struct cgroup_root *root)
 	BUG_ON(!list_empty(&cgrp->children));
 
 	/* Rebind all subsystems back to the default hierarchy */
-	rebind_subsystems(&cgrp_dfl_root, cgrp->subsys_mask);
+	rebind_subsystems(&cgrp_dfl_root, root->subsys_mask);
 
 	/*
 	 * Release all the links from cset_links to this hierarchy's
@@ -1050,8 +1050,11 @@ static int rebind_subsystems(struct cgroup_root *dst_root,
 		ss->root = dst_root;
 		css->cgroup = &dst_root->cgrp;
 
-		src_root->cgrp.subsys_mask &= ~(1 << ssid);
-		dst_root->cgrp.subsys_mask |= 1 << ssid;
+		src_root->subsys_mask &= ~(1 << ssid);
+		src_root->cgrp.child_subsys_mask &= ~(1 << ssid);
+
+		dst_root->subsys_mask |= 1 << ssid;
+		dst_root->cgrp.child_subsys_mask |= 1 << ssid;
 
 		if (ss->bind)
 			ss->bind(css);
@@ -1069,7 +1072,7 @@ static int cgroup_show_options(struct seq_file *seq,
 	int ssid;
 
 	for_each_subsys(ss, ssid)
-		if (root->cgrp.subsys_mask & (1 << ssid))
+		if (root->subsys_mask & (1 << ssid))
 			seq_printf(seq, ",%s", ss->name);
 	if (root->flags & CGRP_ROOT_SANE_BEHAVIOR)
 		seq_puts(seq, ",sane_behavior");
@@ -1273,12 +1276,12 @@ static int cgroup_remount(struct kernfs_root *kf_root, int *flags, char *data)
 	if (ret)
 		goto out_unlock;
 
-	if (opts.subsys_mask != root->cgrp.subsys_mask || opts.release_agent)
+	if (opts.subsys_mask != root->subsys_mask || opts.release_agent)
 		pr_warning("cgroup: option changes via remount are deprecated (pid=%d comm=%s)\n",
 			   task_tgid_nr(current), current->comm);
 
-	added_mask = opts.subsys_mask & ~root->cgrp.subsys_mask;
-	removed_mask = root->cgrp.subsys_mask & ~opts.subsys_mask;
+	added_mask = opts.subsys_mask & ~root->subsys_mask;
+	removed_mask = root->subsys_mask & ~opts.subsys_mask;
 
 	/* Don't allow flags or name to change at remount */
 	if (((opts.flags ^ root->flags) & CGRP_ROOT_OPTION_MASK) ||
@@ -1534,7 +1537,7 @@ retry:
 		 * subsystems) then they must match.
 		 */
 		if ((opts.subsys_mask || opts.none) &&
-		    (opts.subsys_mask != root->cgrp.subsys_mask)) {
+		    (opts.subsys_mask != root->subsys_mask)) {
 			if (!name_match)
 				continue;
 			ret = -EBUSY;
@@ -3640,8 +3643,6 @@ static int create_css(struct cgroup *cgrp, struct cgroup_subsys *ss)
 	cgroup_get(cgrp);
 	css_get(css->parent);
 
-	cgrp->subsys_mask |= 1 << ss->id;
-
 	if (ss->broken_hierarchy && !ss->warned_broken_hierarchy &&
 	    parent->parent) {
 		pr_warning("cgroup: %s (%d) created nested cgroup for controller \"%s\" which has incomplete hierarchy support. Nested cgroups may change behavior in the future.\n",
@@ -3755,13 +3756,15 @@ static long cgroup_create(struct cgroup *parent, const char *name,
 
 	/* let's create and online css's */
 	for_each_subsys(ss, ssid) {
-		if (root->cgrp.subsys_mask & (1 << ssid)) {
+		if (parent->child_subsys_mask & (1 << ssid)) {
 			err = create_css(cgrp, ss);
 			if (err)
 				goto err_destroy;
 		}
 	}
 
+	cgrp->child_subsys_mask = parent->child_subsys_mask;
+
 	kernfs_activate(kn);
 
 	mutex_unlock(&cgroup_mutex);
@@ -3857,7 +3860,16 @@ static void css_killed_ref_fn(struct percpu_ref *ref)
 	queue_work(cgroup_destroy_wq, &css->destroy_work);
 }
 
-static void __kill_css(struct cgroup_subsys_state *css)
+/**
+ * kill_css - destroy a css
+ * @css: css to destroy
+ *
+ * This function initiates destruction of @css by removing cgroup interface
+ * files and putting its base reference.  ->css_offline() will be invoked
+ * asynchronously once css_tryget() is guaranteed to fail and when the
+ * reference count reaches zero, @css will be released.
+ */
+static void kill_css(struct cgroup_subsys_state *css)
 {
 	lockdep_assert_held(&cgroup_tree_mutex);
 
@@ -3887,28 +3899,6 @@ static void __kill_css(struct cgroup_subsys_state *css)
 }
 
 /**
- * kill_css - destroy a css
- * @css: css to destroy
- *
- * This function initiates destruction of @css by removing cgroup interface
- * files and putting its base reference.  ->css_offline() will be invoked
- * asynchronously once css_tryget() is guaranteed to fail and when the
- * reference count reaches zero, @css will be released.
- */
-static void kill_css(struct cgroup_subsys_state *css)
-{
-	struct cgroup *cgrp = css->cgroup;
-
-	lockdep_assert_held(&cgroup_tree_mutex);
-
-	/* if already killed, noop */
-	if (cgrp->subsys_mask & (1 << css->ss->id)) {
-		cgrp->subsys_mask &= ~(1 << css->ss->id);
-		__kill_css(css);
-	}
-}
-
-/**
  * cgroup_destroy_locked - the first stage of cgroup destruction
  * @cgrp: cgroup to be destroyed
  *
@@ -4120,7 +4110,7 @@ static void __init cgroup_init_subsys(struct cgroup_subsys *ss)
 
 	BUG_ON(online_css(css));
 
-	cgrp_dfl_root.cgrp.subsys_mask |= 1 << ss->id;
+	cgrp_dfl_root.subsys_mask |= 1 << ss->id;
 
 	mutex_unlock(&cgroup_mutex);
 	mutex_unlock(&cgroup_tree_mutex);
@@ -4277,7 +4267,7 @@ int proc_cgroup_show(struct seq_file *m, void *v)
 
 		seq_printf(m, "%d:", root->hierarchy_id);
 		for_each_subsys(ss, ssid)
-			if (root->cgrp.subsys_mask & (1 << ssid))
+			if (root->subsys_mask & (1 << ssid))
 				seq_printf(m, "%s%s", count++ ? "," : "", ss->name);
 		if (strlen(root->name))
 			seq_printf(m, "%sname=%s", count ? "," : "",
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 01/12] cgroup: update cgroup->subsys_mask to ->child_subsys_mask and restore cgroup_root->subsys_mask
@ 2014-03-28  2:40     ` Tejun Heo
  0 siblings, 0 replies; 52+ messages in thread
From: Tejun Heo @ 2014-03-28  2:40 UTC (permalink / raw)
  To: lizefan; +Cc: containers, cgroups, linux-kernel, Tejun Heo

944196278d3d ("cgroup: move ->subsys_mask from cgroupfs_root to
cgroup") moved ->subsys_mask from cgroup_root to cgroup to prepare for
the unified hierarhcy; however, it turns out that carrying the
subsys_mask of the children in the parent, instead of itself, is a lot
more natural.  This patch restores cgroup_root->subsys_mask and morphs
cgroup->subsys_mask into cgroup->child_subsys_mask.

* Uses of root->cgrp.subsys_mask are restored to root->subsys_mask.

* Remove automatic setting and clearing of cgrp->subsys_mask and
  instead just inherit ->child_subsys_mask from the parent during
  cgroup creation.  Note that this doesn't affect any current
  behaviors.

* Undo __kill_css() separation.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/cgroup.h |  7 ++++--
 kernel/cgroup.c        | 64 +++++++++++++++++++++-----------------------------
 2 files changed, 32 insertions(+), 39 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 43d1ed3..e351a18 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -173,8 +173,8 @@ struct cgroup {
 	 */
 	u64 serial_nr;
 
-	/* The bitmask of subsystems attached to this cgroup */
-	unsigned long subsys_mask;
+	/* the bitmask of subsystems enabled on the child cgroups */
+	unsigned long child_subsys_mask;
 
 	/* Private pointers for each registered subsystem */
 	struct cgroup_subsys_state __rcu *subsys[CGROUP_SUBSYS_COUNT];
@@ -282,6 +282,9 @@ enum {
 struct cgroup_root {
 	struct kernfs_root *kf_root;
 
+	/* The bitmask of subsystems attached to this hierarchy */
+	unsigned long subsys_mask;
+
 	/* Unique id for this hierarchy. */
 	int hierarchy_id;
 
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index e378cb2..3c9c623 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -529,7 +529,7 @@ static struct css_set *find_existing_css_set(struct css_set *old_cset,
 	 * won't change, so no need for locking.
 	 */
 	for_each_subsys(ss, i) {
-		if (root->cgrp.subsys_mask & (1UL << i)) {
+		if (root->subsys_mask & (1UL << i)) {
 			/* Subsystem is in this hierarchy. So we want
 			 * the subsystem state from the new
 			 * cgroup */
@@ -742,7 +742,7 @@ static void cgroup_destroy_root(struct cgroup_root *root)
 	BUG_ON(!list_empty(&cgrp->children));
 
 	/* Rebind all subsystems back to the default hierarchy */
-	rebind_subsystems(&cgrp_dfl_root, cgrp->subsys_mask);
+	rebind_subsystems(&cgrp_dfl_root, root->subsys_mask);
 
 	/*
 	 * Release all the links from cset_links to this hierarchy's
@@ -1050,8 +1050,11 @@ static int rebind_subsystems(struct cgroup_root *dst_root,
 		ss->root = dst_root;
 		css->cgroup = &dst_root->cgrp;
 
-		src_root->cgrp.subsys_mask &= ~(1 << ssid);
-		dst_root->cgrp.subsys_mask |= 1 << ssid;
+		src_root->subsys_mask &= ~(1 << ssid);
+		src_root->cgrp.child_subsys_mask &= ~(1 << ssid);
+
+		dst_root->subsys_mask |= 1 << ssid;
+		dst_root->cgrp.child_subsys_mask |= 1 << ssid;
 
 		if (ss->bind)
 			ss->bind(css);
@@ -1069,7 +1072,7 @@ static int cgroup_show_options(struct seq_file *seq,
 	int ssid;
 
 	for_each_subsys(ss, ssid)
-		if (root->cgrp.subsys_mask & (1 << ssid))
+		if (root->subsys_mask & (1 << ssid))
 			seq_printf(seq, ",%s", ss->name);
 	if (root->flags & CGRP_ROOT_SANE_BEHAVIOR)
 		seq_puts(seq, ",sane_behavior");
@@ -1273,12 +1276,12 @@ static int cgroup_remount(struct kernfs_root *kf_root, int *flags, char *data)
 	if (ret)
 		goto out_unlock;
 
-	if (opts.subsys_mask != root->cgrp.subsys_mask || opts.release_agent)
+	if (opts.subsys_mask != root->subsys_mask || opts.release_agent)
 		pr_warning("cgroup: option changes via remount are deprecated (pid=%d comm=%s)\n",
 			   task_tgid_nr(current), current->comm);
 
-	added_mask = opts.subsys_mask & ~root->cgrp.subsys_mask;
-	removed_mask = root->cgrp.subsys_mask & ~opts.subsys_mask;
+	added_mask = opts.subsys_mask & ~root->subsys_mask;
+	removed_mask = root->subsys_mask & ~opts.subsys_mask;
 
 	/* Don't allow flags or name to change at remount */
 	if (((opts.flags ^ root->flags) & CGRP_ROOT_OPTION_MASK) ||
@@ -1534,7 +1537,7 @@ retry:
 		 * subsystems) then they must match.
 		 */
 		if ((opts.subsys_mask || opts.none) &&
-		    (opts.subsys_mask != root->cgrp.subsys_mask)) {
+		    (opts.subsys_mask != root->subsys_mask)) {
 			if (!name_match)
 				continue;
 			ret = -EBUSY;
@@ -3640,8 +3643,6 @@ static int create_css(struct cgroup *cgrp, struct cgroup_subsys *ss)
 	cgroup_get(cgrp);
 	css_get(css->parent);
 
-	cgrp->subsys_mask |= 1 << ss->id;
-
 	if (ss->broken_hierarchy && !ss->warned_broken_hierarchy &&
 	    parent->parent) {
 		pr_warning("cgroup: %s (%d) created nested cgroup for controller \"%s\" which has incomplete hierarchy support. Nested cgroups may change behavior in the future.\n",
@@ -3755,13 +3756,15 @@ static long cgroup_create(struct cgroup *parent, const char *name,
 
 	/* let's create and online css's */
 	for_each_subsys(ss, ssid) {
-		if (root->cgrp.subsys_mask & (1 << ssid)) {
+		if (parent->child_subsys_mask & (1 << ssid)) {
 			err = create_css(cgrp, ss);
 			if (err)
 				goto err_destroy;
 		}
 	}
 
+	cgrp->child_subsys_mask = parent->child_subsys_mask;
+
 	kernfs_activate(kn);
 
 	mutex_unlock(&cgroup_mutex);
@@ -3857,7 +3860,16 @@ static void css_killed_ref_fn(struct percpu_ref *ref)
 	queue_work(cgroup_destroy_wq, &css->destroy_work);
 }
 
-static void __kill_css(struct cgroup_subsys_state *css)
+/**
+ * kill_css - destroy a css
+ * @css: css to destroy
+ *
+ * This function initiates destruction of @css by removing cgroup interface
+ * files and putting its base reference.  ->css_offline() will be invoked
+ * asynchronously once css_tryget() is guaranteed to fail and when the
+ * reference count reaches zero, @css will be released.
+ */
+static void kill_css(struct cgroup_subsys_state *css)
 {
 	lockdep_assert_held(&cgroup_tree_mutex);
 
@@ -3887,28 +3899,6 @@ static void __kill_css(struct cgroup_subsys_state *css)
 }
 
 /**
- * kill_css - destroy a css
- * @css: css to destroy
- *
- * This function initiates destruction of @css by removing cgroup interface
- * files and putting its base reference.  ->css_offline() will be invoked
- * asynchronously once css_tryget() is guaranteed to fail and when the
- * reference count reaches zero, @css will be released.
- */
-static void kill_css(struct cgroup_subsys_state *css)
-{
-	struct cgroup *cgrp = css->cgroup;
-
-	lockdep_assert_held(&cgroup_tree_mutex);
-
-	/* if already killed, noop */
-	if (cgrp->subsys_mask & (1 << css->ss->id)) {
-		cgrp->subsys_mask &= ~(1 << css->ss->id);
-		__kill_css(css);
-	}
-}
-
-/**
  * cgroup_destroy_locked - the first stage of cgroup destruction
  * @cgrp: cgroup to be destroyed
  *
@@ -4120,7 +4110,7 @@ static void __init cgroup_init_subsys(struct cgroup_subsys *ss)
 
 	BUG_ON(online_css(css));
 
-	cgrp_dfl_root.cgrp.subsys_mask |= 1 << ss->id;
+	cgrp_dfl_root.subsys_mask |= 1 << ss->id;
 
 	mutex_unlock(&cgroup_mutex);
 	mutex_unlock(&cgroup_tree_mutex);
@@ -4277,7 +4267,7 @@ int proc_cgroup_show(struct seq_file *m, void *v)
 
 		seq_printf(m, "%d:", root->hierarchy_id);
 		for_each_subsys(ss, ssid)
-			if (root->cgrp.subsys_mask & (1 << ssid))
+			if (root->subsys_mask & (1 << ssid))
 				seq_printf(m, "%s%s", count++ ? "," : "", ss->name);
 		if (strlen(root->name))
 			seq_printf(m, "%sname=%s", count ? "," : "",
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 02/12] cgroup: introduce effective cgroup_subsys_state
  2014-03-28  2:40 ` Tejun Heo
@ 2014-03-28  2:40     ` Tejun Heo
  -1 siblings, 0 replies; 52+ messages in thread
From: Tejun Heo @ 2014-03-28  2:40 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA
  Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

In the planned default unified hierarchy, controllers may get
dynamically attached to and detached from a cgroup and a cgroup may
not have csses for all the controllers associated with the hierarchy.

When a cgroup doesn't have its own css for a given controller, the css
of the nearest ancestor with the controller enabled will be used,
which is called the effective css.  This patch introduces
cgroup_e_css() and for_each_e_css() to access the effective csses and
convert compare_css_sets(), find_existing_css_set() and
cgroup_migrate() to use the effective csses so that they can handle
cgroups with partial csses correctly.

This means that for two css_sets to be considered identical, they
should have both matching csses and cgroups.  compare_css_sets()
already compares both, not for correctness but for optimization.  As
this now becomes a matter of correctness, update the comments
accordingly.

For all !default hierarchies, cgroup_e_css() always equals
cgroup_css(), so this patch doesn't change behavior.

While at it, fix incorrect locking comment for for_each_css().

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 kernel/cgroup.c | 83 ++++++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 64 insertions(+), 19 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 3c9c623..ae1fe5a 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -208,6 +208,34 @@ static struct cgroup_subsys_state *cgroup_css(struct cgroup *cgrp,
 		return &cgrp->dummy_css;
 }
 
+/**
+ * cgroup_e_css - obtain a cgroup's effective css for the specified subsystem
+ * @cgrp: the cgroup of interest
+ * @ss: the subsystem of interest (%NULL returns the dummy_css)
+ *
+ * Similar to cgroup_css() but returns the effctive css, which is defined
+ * as the matching css of the nearest ancestor including self which has @ss
+ * enabled.  If @ss is associated with the hierarchy @cgrp is on, this
+ * function is guaranteed to return non-NULL css.
+ */
+static struct cgroup_subsys_state *cgroup_e_css(struct cgroup *cgrp,
+						struct cgroup_subsys *ss)
+{
+	lockdep_assert_held(&cgroup_mutex);
+
+	if (!ss)
+		return &cgrp->dummy_css;
+
+	if (!(cgrp->root->subsys_mask & (1 << ss->id)))
+		return NULL;
+
+	while (cgrp->parent &&
+	       !(cgrp->parent->child_subsys_mask & (1 << ss->id)))
+		cgrp = cgrp->parent;
+
+	return cgroup_css(cgrp, ss);
+}
+
 /* convenient tests for these bits */
 static inline bool cgroup_is_dead(const struct cgroup *cgrp)
 {
@@ -273,7 +301,7 @@ static int notify_on_release(const struct cgroup *cgrp)
  * @ssid: the index of the subsystem, CGROUP_SUBSYS_COUNT after reaching the end
  * @cgrp: the target cgroup to iterate css's of
  *
- * Should be called under cgroup_mutex.
+ * Should be called under cgroup_[tree_]mutex.
  */
 #define for_each_css(css, ssid, cgrp)					\
 	for ((ssid) = 0; (ssid) < CGROUP_SUBSYS_COUNT; (ssid)++)	\
@@ -284,6 +312,20 @@ static int notify_on_release(const struct cgroup *cgrp)
 		else
 
 /**
+ * for_each_e_css - iterate all effective css's of a cgroup
+ * @css: the iteration cursor
+ * @ssid: the index of the subsystem, CGROUP_SUBSYS_COUNT after reaching the end
+ * @cgrp: the target cgroup to iterate css's of
+ *
+ * Should be called under cgroup_[tree_]mutex.
+ */
+#define for_each_e_css(css, ssid, cgrp)					\
+	for ((ssid) = 0; (ssid) < CGROUP_SUBSYS_COUNT; (ssid)++)	\
+		if (!((css) = cgroup_e_css(cgrp, cgroup_subsys[(ssid)]))) \
+			;						\
+		else
+
+/**
  * for_each_subsys - iterate all enabled cgroup subsystems
  * @ss: the iteration cursor
  * @ssid: the index of @ss, CGROUP_SUBSYS_COUNT after reaching the end
@@ -452,20 +494,20 @@ static bool compare_css_sets(struct css_set *cset,
 {
 	struct list_head *l1, *l2;
 
-	if (memcmp(template, cset->subsys, sizeof(cset->subsys))) {
-		/* Not all subsystems matched */
+	/*
+	 * On the default hierarchy, there can be csets which are
+	 * associated with the same set of cgroups but different csses.
+	 * Let's first ensure that csses match.
+	 */
+	if (memcmp(template, cset->subsys, sizeof(cset->subsys)))
 		return false;
-	}
 
 	/*
 	 * Compare cgroup pointers in order to distinguish between
-	 * different cgroups in heirarchies with no subsystems. We
-	 * could get by with just this check alone (and skip the
-	 * memcmp above) but on most setups the memcmp check will
-	 * avoid the need for this more expensive check on almost all
-	 * candidates.
+	 * different cgroups in hierarchies.  As different cgroups may
+	 * share the same effective css, this comparison is always
+	 * necessary.
 	 */
-
 	l1 = &cset->cgrp_links;
 	l2 = &old_cset->cgrp_links;
 	while (1) {
@@ -530,13 +572,16 @@ static struct css_set *find_existing_css_set(struct css_set *old_cset,
 	 */
 	for_each_subsys(ss, i) {
 		if (root->subsys_mask & (1UL << i)) {
-			/* Subsystem is in this hierarchy. So we want
-			 * the subsystem state from the new
-			 * cgroup */
-			template[i] = cgroup_css(cgrp, ss);
+			/*
+			 * @ss is in this hierarchy, so we want the
+			 * effective css from @cgrp.
+			 */
+			template[i] = cgroup_e_css(cgrp, ss);
 		} else {
-			/* Subsystem is not in this hierarchy, so we
-			 * don't want to change the subsystem state */
+			/*
+			 * @ss is not in this hierarchy, so we don't want
+			 * to change the css.
+			 */
 			template[i] = old_cset->subsys[i];
 		}
 	}
@@ -1968,7 +2013,7 @@ static int cgroup_migrate(struct cgroup *cgrp, struct task_struct *leader,
 		return 0;
 
 	/* check that we can legitimately attach to the cgroup */
-	for_each_css(css, i, cgrp) {
+	for_each_e_css(css, i, cgrp) {
 		if (css->ss->can_attach) {
 			ret = css->ss->can_attach(css, &tset);
 			if (ret) {
@@ -1998,7 +2043,7 @@ static int cgroup_migrate(struct cgroup *cgrp, struct task_struct *leader,
 	 */
 	tset.csets = &tset.dst_csets;
 
-	for_each_css(css, i, cgrp)
+	for_each_e_css(css, i, cgrp)
 		if (css->ss->attach)
 			css->ss->attach(css, &tset);
 
@@ -2006,7 +2051,7 @@ static int cgroup_migrate(struct cgroup *cgrp, struct task_struct *leader,
 	goto out_release_tset;
 
 out_cancel_attach:
-	for_each_css(css, i, cgrp) {
+	for_each_e_css(css, i, cgrp) {
 		if (css == failed_css)
 			break;
 		if (css->ss->cancel_attach)
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 02/12] cgroup: introduce effective cgroup_subsys_state
@ 2014-03-28  2:40     ` Tejun Heo
  0 siblings, 0 replies; 52+ messages in thread
From: Tejun Heo @ 2014-03-28  2:40 UTC (permalink / raw)
  To: lizefan; +Cc: containers, cgroups, linux-kernel, Tejun Heo

In the planned default unified hierarchy, controllers may get
dynamically attached to and detached from a cgroup and a cgroup may
not have csses for all the controllers associated with the hierarchy.

When a cgroup doesn't have its own css for a given controller, the css
of the nearest ancestor with the controller enabled will be used,
which is called the effective css.  This patch introduces
cgroup_e_css() and for_each_e_css() to access the effective csses and
convert compare_css_sets(), find_existing_css_set() and
cgroup_migrate() to use the effective csses so that they can handle
cgroups with partial csses correctly.

This means that for two css_sets to be considered identical, they
should have both matching csses and cgroups.  compare_css_sets()
already compares both, not for correctness but for optimization.  As
this now becomes a matter of correctness, update the comments
accordingly.

For all !default hierarchies, cgroup_e_css() always equals
cgroup_css(), so this patch doesn't change behavior.

While at it, fix incorrect locking comment for for_each_css().

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/cgroup.c | 83 ++++++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 64 insertions(+), 19 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 3c9c623..ae1fe5a 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -208,6 +208,34 @@ static struct cgroup_subsys_state *cgroup_css(struct cgroup *cgrp,
 		return &cgrp->dummy_css;
 }
 
+/**
+ * cgroup_e_css - obtain a cgroup's effective css for the specified subsystem
+ * @cgrp: the cgroup of interest
+ * @ss: the subsystem of interest (%NULL returns the dummy_css)
+ *
+ * Similar to cgroup_css() but returns the effctive css, which is defined
+ * as the matching css of the nearest ancestor including self which has @ss
+ * enabled.  If @ss is associated with the hierarchy @cgrp is on, this
+ * function is guaranteed to return non-NULL css.
+ */
+static struct cgroup_subsys_state *cgroup_e_css(struct cgroup *cgrp,
+						struct cgroup_subsys *ss)
+{
+	lockdep_assert_held(&cgroup_mutex);
+
+	if (!ss)
+		return &cgrp->dummy_css;
+
+	if (!(cgrp->root->subsys_mask & (1 << ss->id)))
+		return NULL;
+
+	while (cgrp->parent &&
+	       !(cgrp->parent->child_subsys_mask & (1 << ss->id)))
+		cgrp = cgrp->parent;
+
+	return cgroup_css(cgrp, ss);
+}
+
 /* convenient tests for these bits */
 static inline bool cgroup_is_dead(const struct cgroup *cgrp)
 {
@@ -273,7 +301,7 @@ static int notify_on_release(const struct cgroup *cgrp)
  * @ssid: the index of the subsystem, CGROUP_SUBSYS_COUNT after reaching the end
  * @cgrp: the target cgroup to iterate css's of
  *
- * Should be called under cgroup_mutex.
+ * Should be called under cgroup_[tree_]mutex.
  */
 #define for_each_css(css, ssid, cgrp)					\
 	for ((ssid) = 0; (ssid) < CGROUP_SUBSYS_COUNT; (ssid)++)	\
@@ -284,6 +312,20 @@ static int notify_on_release(const struct cgroup *cgrp)
 		else
 
 /**
+ * for_each_e_css - iterate all effective css's of a cgroup
+ * @css: the iteration cursor
+ * @ssid: the index of the subsystem, CGROUP_SUBSYS_COUNT after reaching the end
+ * @cgrp: the target cgroup to iterate css's of
+ *
+ * Should be called under cgroup_[tree_]mutex.
+ */
+#define for_each_e_css(css, ssid, cgrp)					\
+	for ((ssid) = 0; (ssid) < CGROUP_SUBSYS_COUNT; (ssid)++)	\
+		if (!((css) = cgroup_e_css(cgrp, cgroup_subsys[(ssid)]))) \
+			;						\
+		else
+
+/**
  * for_each_subsys - iterate all enabled cgroup subsystems
  * @ss: the iteration cursor
  * @ssid: the index of @ss, CGROUP_SUBSYS_COUNT after reaching the end
@@ -452,20 +494,20 @@ static bool compare_css_sets(struct css_set *cset,
 {
 	struct list_head *l1, *l2;
 
-	if (memcmp(template, cset->subsys, sizeof(cset->subsys))) {
-		/* Not all subsystems matched */
+	/*
+	 * On the default hierarchy, there can be csets which are
+	 * associated with the same set of cgroups but different csses.
+	 * Let's first ensure that csses match.
+	 */
+	if (memcmp(template, cset->subsys, sizeof(cset->subsys)))
 		return false;
-	}
 
 	/*
 	 * Compare cgroup pointers in order to distinguish between
-	 * different cgroups in heirarchies with no subsystems. We
-	 * could get by with just this check alone (and skip the
-	 * memcmp above) but on most setups the memcmp check will
-	 * avoid the need for this more expensive check on almost all
-	 * candidates.
+	 * different cgroups in hierarchies.  As different cgroups may
+	 * share the same effective css, this comparison is always
+	 * necessary.
 	 */
-
 	l1 = &cset->cgrp_links;
 	l2 = &old_cset->cgrp_links;
 	while (1) {
@@ -530,13 +572,16 @@ static struct css_set *find_existing_css_set(struct css_set *old_cset,
 	 */
 	for_each_subsys(ss, i) {
 		if (root->subsys_mask & (1UL << i)) {
-			/* Subsystem is in this hierarchy. So we want
-			 * the subsystem state from the new
-			 * cgroup */
-			template[i] = cgroup_css(cgrp, ss);
+			/*
+			 * @ss is in this hierarchy, so we want the
+			 * effective css from @cgrp.
+			 */
+			template[i] = cgroup_e_css(cgrp, ss);
 		} else {
-			/* Subsystem is not in this hierarchy, so we
-			 * don't want to change the subsystem state */
+			/*
+			 * @ss is not in this hierarchy, so we don't want
+			 * to change the css.
+			 */
 			template[i] = old_cset->subsys[i];
 		}
 	}
@@ -1968,7 +2013,7 @@ static int cgroup_migrate(struct cgroup *cgrp, struct task_struct *leader,
 		return 0;
 
 	/* check that we can legitimately attach to the cgroup */
-	for_each_css(css, i, cgrp) {
+	for_each_e_css(css, i, cgrp) {
 		if (css->ss->can_attach) {
 			ret = css->ss->can_attach(css, &tset);
 			if (ret) {
@@ -1998,7 +2043,7 @@ static int cgroup_migrate(struct cgroup *cgrp, struct task_struct *leader,
 	 */
 	tset.csets = &tset.dst_csets;
 
-	for_each_css(css, i, cgrp)
+	for_each_e_css(css, i, cgrp)
 		if (css->ss->attach)
 			css->ss->attach(css, &tset);
 
@@ -2006,7 +2051,7 @@ static int cgroup_migrate(struct cgroup *cgrp, struct task_struct *leader,
 	goto out_release_tset;
 
 out_cancel_attach:
-	for_each_css(css, i, cgrp) {
+	for_each_e_css(css, i, cgrp) {
 		if (css == failed_css)
 			break;
 		if (css->ss->cancel_attach)
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 03/12] cgroup: implement cgroup->e_csets[]
  2014-03-28  2:40 ` Tejun Heo
@ 2014-03-28  2:40     ` Tejun Heo
  -1 siblings, 0 replies; 52+ messages in thread
From: Tejun Heo @ 2014-03-28  2:40 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA
  Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On the default unified hierarchy, a cgroup may be associated with
csses of its ancestors, which means that a css of a given cgroup may
be associated with css_sets of descendant cgroups.  This means that we
can't walk all tasks associated with a css by iterating the css_sets
associated with the cgroup as there are css_sets which are pointing to
the css but linked on the descendants.

This patch adds per-subsystem list heads cgroup->e_csets[].  Any
css_set which is pointing to a css is linked to
css->cgroup->e_csets[$SUBSYS_ID] through
css_set->e_cset_node[$SUBSYS_ID].  The lists are protected by
css_set_rwsem and will allow us to walk all css_sets associated with a
given css so that we can find out all associated tasks.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 include/linux/cgroup.h | 18 ++++++++++++++++++
 kernel/cgroup.c        | 30 ++++++++++++++++++++++++++++--
 2 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index e351a18..e902eee 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -188,6 +188,15 @@ struct cgroup {
 	struct list_head cset_links;
 
 	/*
+	 * On the default hierarchy, a css_set for a cgroup with some
+	 * susbsys disabled will point to css's which are associated with
+	 * the closest ancestor which has the subsys enabled.  The
+	 * following lists all css_sets which point to this cgroup's css
+	 * for the given subsystem.
+	 */
+	struct list_head e_csets[CGROUP_SUBSYS_COUNT];
+
+	/*
 	 * Linked list running through all cgroups that can
 	 * potentially be reaped by the release agent. Protected by
 	 * release_list_lock
@@ -369,6 +378,15 @@ struct css_set {
 	struct cgroup *mg_src_cgrp;
 	struct css_set *mg_dst_cset;
 
+	/*
+	 * On the default hierarhcy, ->subsys[ssid] may point to a css
+	 * attached to an ancestor instead of the cgroup this css_set is
+	 * associated with.  The following node is anchored at
+	 * ->subsys[ssid]->cgroup->e_csets[ssid] and provides a way to
+	 * iterate through all css's attached to a given cgroup.
+	 */
+	struct list_head e_cset_node[CGROUP_SUBSYS_COUNT];
+
 	/* For RCU-protected deletion */
 	struct rcu_head rcu_head;
 };
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index ae1fe5a..2899a6f 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -425,6 +425,8 @@ static unsigned long css_set_hash(struct cgroup_subsys_state *css[])
 static void put_css_set_locked(struct css_set *cset, bool taskexit)
 {
 	struct cgrp_cset_link *link, *tmp_link;
+	struct cgroup_subsys *ss;
+	int ssid;
 
 	lockdep_assert_held(&css_set_rwsem);
 
@@ -432,6 +434,8 @@ static void put_css_set_locked(struct css_set *cset, bool taskexit)
 		return;
 
 	/* This css_set is dead. unlink it and release cgroup refcounts */
+	for_each_subsys(ss, ssid)
+		list_del(&cset->e_cset_node[ssid]);
 	hash_del(&cset->hlist);
 	css_set_count--;
 
@@ -673,7 +677,9 @@ static struct css_set *find_css_set(struct css_set *old_cset,
 	struct css_set *cset;
 	struct list_head tmp_links;
 	struct cgrp_cset_link *link;
+	struct cgroup_subsys *ss;
 	unsigned long key;
+	int ssid;
 
 	lockdep_assert_held(&cgroup_mutex);
 
@@ -724,10 +730,14 @@ static struct css_set *find_css_set(struct css_set *old_cset,
 
 	css_set_count++;
 
-	/* Add this cgroup group to the hash table */
+	/* Add @cset to the hash table */
 	key = css_set_hash(cset->subsys);
 	hash_add(css_set_table, &cset->hlist, key);
 
+	for_each_subsys(ss, ssid)
+		list_add_tail(&cset->e_cset_node[ssid],
+			      &cset->subsys[ssid]->cgroup->e_csets[ssid]);
+
 	up_write(&css_set_rwsem);
 
 	return cset;
@@ -1028,7 +1038,7 @@ static int rebind_subsystems(struct cgroup_root *dst_root,
 			     unsigned long ss_mask)
 {
 	struct cgroup_subsys *ss;
-	int ssid, ret;
+	int ssid, i, ret;
 
 	lockdep_assert_held(&cgroup_tree_mutex);
 	lockdep_assert_held(&cgroup_mutex);
@@ -1081,6 +1091,7 @@ static int rebind_subsystems(struct cgroup_root *dst_root,
 	for_each_subsys(ss, ssid) {
 		struct cgroup_root *src_root;
 		struct cgroup_subsys_state *css;
+		struct css_set *cset;
 
 		if (!(ss_mask & (1 << ssid)))
 			continue;
@@ -1095,6 +1106,12 @@ static int rebind_subsystems(struct cgroup_root *dst_root,
 		ss->root = dst_root;
 		css->cgroup = &dst_root->cgrp;
 
+		down_write(&css_set_rwsem);
+		hash_for_each(css_set_table, i, cset, hlist)
+			list_move_tail(&cset->e_cset_node[ss->id],
+				       &dst_root->cgrp.e_csets[ss->id]);
+		up_write(&css_set_rwsem);
+
 		src_root->subsys_mask &= ~(1 << ssid);
 		src_root->cgrp.child_subsys_mask &= ~(1 << ssid);
 
@@ -1417,6 +1434,9 @@ out_unlock:
 
 static void init_cgroup_housekeeping(struct cgroup *cgrp)
 {
+	struct cgroup_subsys *ss;
+	int ssid;
+
 	atomic_set(&cgrp->refcnt, 1);
 	INIT_LIST_HEAD(&cgrp->sibling);
 	INIT_LIST_HEAD(&cgrp->children);
@@ -1425,6 +1445,9 @@ static void init_cgroup_housekeeping(struct cgroup *cgrp)
 	INIT_LIST_HEAD(&cgrp->pidlists);
 	mutex_init(&cgrp->pidlist_mutex);
 	cgrp->dummy_css.cgroup = cgrp;
+
+	for_each_subsys(ss, ssid)
+		INIT_LIST_HEAD(&cgrp->e_csets[ssid]);
 }
 
 static void init_cgroup_root(struct cgroup_root *root,
@@ -4224,6 +4247,9 @@ int __init cgroup_init(void)
 		if (!ss->early_init)
 			cgroup_init_subsys(ss);
 
+		list_add_tail(&init_css_set.e_cset_node[ssid],
+			      &cgrp_dfl_root.cgrp.e_csets[ssid]);
+
 		/*
 		 * cftype registration needs kmalloc and can't be done
 		 * during early_init.  Register base cftypes separately.
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 03/12] cgroup: implement cgroup->e_csets[]
@ 2014-03-28  2:40     ` Tejun Heo
  0 siblings, 0 replies; 52+ messages in thread
From: Tejun Heo @ 2014-03-28  2:40 UTC (permalink / raw)
  To: lizefan; +Cc: containers, cgroups, linux-kernel, Tejun Heo

On the default unified hierarchy, a cgroup may be associated with
csses of its ancestors, which means that a css of a given cgroup may
be associated with css_sets of descendant cgroups.  This means that we
can't walk all tasks associated with a css by iterating the css_sets
associated with the cgroup as there are css_sets which are pointing to
the css but linked on the descendants.

This patch adds per-subsystem list heads cgroup->e_csets[].  Any
css_set which is pointing to a css is linked to
css->cgroup->e_csets[$SUBSYS_ID] through
css_set->e_cset_node[$SUBSYS_ID].  The lists are protected by
css_set_rwsem and will allow us to walk all css_sets associated with a
given css so that we can find out all associated tasks.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/cgroup.h | 18 ++++++++++++++++++
 kernel/cgroup.c        | 30 ++++++++++++++++++++++++++++--
 2 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index e351a18..e902eee 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -188,6 +188,15 @@ struct cgroup {
 	struct list_head cset_links;
 
 	/*
+	 * On the default hierarchy, a css_set for a cgroup with some
+	 * susbsys disabled will point to css's which are associated with
+	 * the closest ancestor which has the subsys enabled.  The
+	 * following lists all css_sets which point to this cgroup's css
+	 * for the given subsystem.
+	 */
+	struct list_head e_csets[CGROUP_SUBSYS_COUNT];
+
+	/*
 	 * Linked list running through all cgroups that can
 	 * potentially be reaped by the release agent. Protected by
 	 * release_list_lock
@@ -369,6 +378,15 @@ struct css_set {
 	struct cgroup *mg_src_cgrp;
 	struct css_set *mg_dst_cset;
 
+	/*
+	 * On the default hierarhcy, ->subsys[ssid] may point to a css
+	 * attached to an ancestor instead of the cgroup this css_set is
+	 * associated with.  The following node is anchored at
+	 * ->subsys[ssid]->cgroup->e_csets[ssid] and provides a way to
+	 * iterate through all css's attached to a given cgroup.
+	 */
+	struct list_head e_cset_node[CGROUP_SUBSYS_COUNT];
+
 	/* For RCU-protected deletion */
 	struct rcu_head rcu_head;
 };
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index ae1fe5a..2899a6f 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -425,6 +425,8 @@ static unsigned long css_set_hash(struct cgroup_subsys_state *css[])
 static void put_css_set_locked(struct css_set *cset, bool taskexit)
 {
 	struct cgrp_cset_link *link, *tmp_link;
+	struct cgroup_subsys *ss;
+	int ssid;
 
 	lockdep_assert_held(&css_set_rwsem);
 
@@ -432,6 +434,8 @@ static void put_css_set_locked(struct css_set *cset, bool taskexit)
 		return;
 
 	/* This css_set is dead. unlink it and release cgroup refcounts */
+	for_each_subsys(ss, ssid)
+		list_del(&cset->e_cset_node[ssid]);
 	hash_del(&cset->hlist);
 	css_set_count--;
 
@@ -673,7 +677,9 @@ static struct css_set *find_css_set(struct css_set *old_cset,
 	struct css_set *cset;
 	struct list_head tmp_links;
 	struct cgrp_cset_link *link;
+	struct cgroup_subsys *ss;
 	unsigned long key;
+	int ssid;
 
 	lockdep_assert_held(&cgroup_mutex);
 
@@ -724,10 +730,14 @@ static struct css_set *find_css_set(struct css_set *old_cset,
 
 	css_set_count++;
 
-	/* Add this cgroup group to the hash table */
+	/* Add @cset to the hash table */
 	key = css_set_hash(cset->subsys);
 	hash_add(css_set_table, &cset->hlist, key);
 
+	for_each_subsys(ss, ssid)
+		list_add_tail(&cset->e_cset_node[ssid],
+			      &cset->subsys[ssid]->cgroup->e_csets[ssid]);
+
 	up_write(&css_set_rwsem);
 
 	return cset;
@@ -1028,7 +1038,7 @@ static int rebind_subsystems(struct cgroup_root *dst_root,
 			     unsigned long ss_mask)
 {
 	struct cgroup_subsys *ss;
-	int ssid, ret;
+	int ssid, i, ret;
 
 	lockdep_assert_held(&cgroup_tree_mutex);
 	lockdep_assert_held(&cgroup_mutex);
@@ -1081,6 +1091,7 @@ static int rebind_subsystems(struct cgroup_root *dst_root,
 	for_each_subsys(ss, ssid) {
 		struct cgroup_root *src_root;
 		struct cgroup_subsys_state *css;
+		struct css_set *cset;
 
 		if (!(ss_mask & (1 << ssid)))
 			continue;
@@ -1095,6 +1106,12 @@ static int rebind_subsystems(struct cgroup_root *dst_root,
 		ss->root = dst_root;
 		css->cgroup = &dst_root->cgrp;
 
+		down_write(&css_set_rwsem);
+		hash_for_each(css_set_table, i, cset, hlist)
+			list_move_tail(&cset->e_cset_node[ss->id],
+				       &dst_root->cgrp.e_csets[ss->id]);
+		up_write(&css_set_rwsem);
+
 		src_root->subsys_mask &= ~(1 << ssid);
 		src_root->cgrp.child_subsys_mask &= ~(1 << ssid);
 
@@ -1417,6 +1434,9 @@ out_unlock:
 
 static void init_cgroup_housekeeping(struct cgroup *cgrp)
 {
+	struct cgroup_subsys *ss;
+	int ssid;
+
 	atomic_set(&cgrp->refcnt, 1);
 	INIT_LIST_HEAD(&cgrp->sibling);
 	INIT_LIST_HEAD(&cgrp->children);
@@ -1425,6 +1445,9 @@ static void init_cgroup_housekeeping(struct cgroup *cgrp)
 	INIT_LIST_HEAD(&cgrp->pidlists);
 	mutex_init(&cgrp->pidlist_mutex);
 	cgrp->dummy_css.cgroup = cgrp;
+
+	for_each_subsys(ss, ssid)
+		INIT_LIST_HEAD(&cgrp->e_csets[ssid]);
 }
 
 static void init_cgroup_root(struct cgroup_root *root,
@@ -4224,6 +4247,9 @@ int __init cgroup_init(void)
 		if (!ss->early_init)
 			cgroup_init_subsys(ss);
 
+		list_add_tail(&init_css_set.e_cset_node[ssid],
+			      &cgrp_dfl_root.cgrp.e_csets[ssid]);
+
 		/*
 		 * cftype registration needs kmalloc and can't be done
 		 * during early_init.  Register base cftypes separately.
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 04/12] cgroup: make css_next_child() skip missing csses
       [not found] ` <1395974461-12735-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (2 preceding siblings ...)
  2014-03-28  2:40     ` Tejun Heo
@ 2014-03-28  2:40   ` Tejun Heo
  2014-03-28  2:40     ` Tejun Heo
                     ` (10 subsequent siblings)
  14 siblings, 0 replies; 52+ messages in thread
From: Tejun Heo @ 2014-03-28  2:40 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA
  Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

css_next_child() walks the children of the specified css.  It does
this by finding the next cgroup and then returning the requested css.
On the default unified hierarchy, a cgroup may not have a css
associated with it even if the hierarchy has the subsystem enabled.
This patch updates css_next_child() so that it skips children without
the requested css associated.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 kernel/cgroup.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 2899a6f..9d2d6ef 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2690,10 +2690,19 @@ css_next_child(struct cgroup_subsys_state *pos_css,
 				break;
 	}
 
-	if (&next->sibling == &cgrp->children)
-		return NULL;
+	/*
+	 * @next, if not pointing to the head, can be dereferenced and is
+	 * the next sibling; however, it might have @ss disabled.  If so,
+	 * fast-forward to the next enabled one.
+	 */
+	while (&next->sibling != &cgrp->children) {
+		struct cgroup_subsys_state *next_css = cgroup_css(next, parent_css->ss);
 
-	return cgroup_css(next, parent_css->ss);
+		if (next_css)
+			return next_css;
+		next = list_entry_rcu(next->sibling.next, struct cgroup, sibling);
+	}
+	return NULL;
 }
 
 /**
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 04/12] cgroup: make css_next_child() skip missing csses
  2014-03-28  2:40 ` Tejun Heo
  (?)
@ 2014-03-28  2:40 ` Tejun Heo
  -1 siblings, 0 replies; 52+ messages in thread
From: Tejun Heo @ 2014-03-28  2:40 UTC (permalink / raw)
  To: lizefan; +Cc: containers, cgroups, linux-kernel, Tejun Heo

css_next_child() walks the children of the specified css.  It does
this by finding the next cgroup and then returning the requested css.
On the default unified hierarchy, a cgroup may not have a css
associated with it even if the hierarchy has the subsystem enabled.
This patch updates css_next_child() so that it skips children without
the requested css associated.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/cgroup.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 2899a6f..9d2d6ef 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2690,10 +2690,19 @@ css_next_child(struct cgroup_subsys_state *pos_css,
 				break;
 	}
 
-	if (&next->sibling == &cgrp->children)
-		return NULL;
+	/*
+	 * @next, if not pointing to the head, can be dereferenced and is
+	 * the next sibling; however, it might have @ss disabled.  If so,
+	 * fast-forward to the next enabled one.
+	 */
+	while (&next->sibling != &cgrp->children) {
+		struct cgroup_subsys_state *next_css = cgroup_css(next, parent_css->ss);
 
-	return cgroup_css(next, parent_css->ss);
+		if (next_css)
+			return next_css;
+		next = list_entry_rcu(next->sibling.next, struct cgroup, sibling);
+	}
+	return NULL;
 }
 
 /**
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 05/12] cgroup: reorganize css_task_iter
  2014-03-28  2:40 ` Tejun Heo
@ 2014-03-28  2:40     ` Tejun Heo
  -1 siblings, 0 replies; 52+ messages in thread
From: Tejun Heo @ 2014-03-28  2:40 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA
  Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

This patch reorganizes css_task_iter so that adding effective css
support is easier.

* s/->cset_link/->cset_pos/ and s/->task/->task_pos/ for consistency

* ->origin_css is used to determine whether the iteration reached the
  last css_set.  Replace it with explicit ->cset_head so that
  css_advance_task_iter() doesn't have to know the termination
  condition directly.

* css_task_iter_next() currently assumes that it's walking list of
  cgrp_cset_link and reaches into the current cset through the current
  link to determine the termination conditions for task walking.  As
  this won't always be true for effective css walking, add
  ->tasks_head and ->mg_tasks_head and use them to control task
  walking so that css_task_iter_next() doesn't have to know how
  css_sets are being walked.

This patch doesn't make any behavior changes.  The iteration logic
stays unchanged after the patch.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 include/linux/cgroup.h |  9 ++++++---
 kernel/cgroup.c        | 33 +++++++++++++++++----------------
 2 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index e902eee..6bbe143 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -842,9 +842,12 @@ css_next_descendant_post(struct cgroup_subsys_state *pos,
 
 /* A css_task_iter should be treated as an opaque object */
 struct css_task_iter {
-	struct cgroup_subsys_state	*origin_css;
-	struct list_head		*cset_link;
-	struct list_head		*task;
+	struct list_head		*cset_pos;
+	struct list_head		*cset_head;
+
+	struct list_head		*task_pos;
+	struct list_head		*tasks_head;
+	struct list_head		*mg_tasks_head;
 };
 
 void css_task_iter_start(struct cgroup_subsys_state *css,
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 9d2d6ef..318b6a2 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2839,27 +2839,30 @@ css_next_descendant_post(struct cgroup_subsys_state *pos,
  */
 static void css_advance_task_iter(struct css_task_iter *it)
 {
-	struct list_head *l = it->cset_link;
+	struct list_head *l = it->cset_pos;
 	struct cgrp_cset_link *link;
 	struct css_set *cset;
 
 	/* Advance to the next non-empty css_set */
 	do {
 		l = l->next;
-		if (l == &it->origin_css->cgroup->cset_links) {
-			it->cset_link = NULL;
+		if (l == it->cset_head) {
+			it->cset_pos = NULL;
 			return;
 		}
 		link = list_entry(l, struct cgrp_cset_link, cset_link);
 		cset = link->cset;
 	} while (list_empty(&cset->tasks) && list_empty(&cset->mg_tasks));
 
-	it->cset_link = l;
+	it->cset_pos = l;
 
 	if (!list_empty(&cset->tasks))
-		it->task = cset->tasks.next;
+		it->task_pos = cset->tasks.next;
 	else
-		it->task = cset->mg_tasks.next;
+		it->task_pos = cset->mg_tasks.next;
+
+	it->tasks_head = &cset->tasks;
+	it->mg_tasks_head = &cset->mg_tasks;
 }
 
 /**
@@ -2885,8 +2888,8 @@ void css_task_iter_start(struct cgroup_subsys_state *css,
 
 	down_read(&css_set_rwsem);
 
-	it->origin_css = css;
-	it->cset_link = &css->cgroup->cset_links;
+	it->cset_pos = &css->cgroup->cset_links;
+	it->cset_head = it->cset_pos;
 
 	css_advance_task_iter(it);
 }
@@ -2902,12 +2905,10 @@ void css_task_iter_start(struct cgroup_subsys_state *css,
 struct task_struct *css_task_iter_next(struct css_task_iter *it)
 {
 	struct task_struct *res;
-	struct list_head *l = it->task;
-	struct cgrp_cset_link *link = list_entry(it->cset_link,
-					struct cgrp_cset_link, cset_link);
+	struct list_head *l = it->task_pos;
 
 	/* If the iterator cg is NULL, we have no tasks */
-	if (!it->cset_link)
+	if (!it->cset_pos)
 		return NULL;
 	res = list_entry(l, struct task_struct, cg_list);
 
@@ -2918,13 +2919,13 @@ struct task_struct *css_task_iter_next(struct css_task_iter *it)
 	 */
 	l = l->next;
 
-	if (l == &link->cset->tasks)
-		l = link->cset->mg_tasks.next;
+	if (l == it->tasks_head)
+		l = it->mg_tasks_head->next;
 
-	if (l == &link->cset->mg_tasks)
+	if (l == it->mg_tasks_head)
 		css_advance_task_iter(it);
 	else
-		it->task = l;
+		it->task_pos = l;
 
 	return res;
 }
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 05/12] cgroup: reorganize css_task_iter
@ 2014-03-28  2:40     ` Tejun Heo
  0 siblings, 0 replies; 52+ messages in thread
From: Tejun Heo @ 2014-03-28  2:40 UTC (permalink / raw)
  To: lizefan; +Cc: containers, cgroups, linux-kernel, Tejun Heo

This patch reorganizes css_task_iter so that adding effective css
support is easier.

* s/->cset_link/->cset_pos/ and s/->task/->task_pos/ for consistency

* ->origin_css is used to determine whether the iteration reached the
  last css_set.  Replace it with explicit ->cset_head so that
  css_advance_task_iter() doesn't have to know the termination
  condition directly.

* css_task_iter_next() currently assumes that it's walking list of
  cgrp_cset_link and reaches into the current cset through the current
  link to determine the termination conditions for task walking.  As
  this won't always be true for effective css walking, add
  ->tasks_head and ->mg_tasks_head and use them to control task
  walking so that css_task_iter_next() doesn't have to know how
  css_sets are being walked.

This patch doesn't make any behavior changes.  The iteration logic
stays unchanged after the patch.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/cgroup.h |  9 ++++++---
 kernel/cgroup.c        | 33 +++++++++++++++++----------------
 2 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index e902eee..6bbe143 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -842,9 +842,12 @@ css_next_descendant_post(struct cgroup_subsys_state *pos,
 
 /* A css_task_iter should be treated as an opaque object */
 struct css_task_iter {
-	struct cgroup_subsys_state	*origin_css;
-	struct list_head		*cset_link;
-	struct list_head		*task;
+	struct list_head		*cset_pos;
+	struct list_head		*cset_head;
+
+	struct list_head		*task_pos;
+	struct list_head		*tasks_head;
+	struct list_head		*mg_tasks_head;
 };
 
 void css_task_iter_start(struct cgroup_subsys_state *css,
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 9d2d6ef..318b6a2 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2839,27 +2839,30 @@ css_next_descendant_post(struct cgroup_subsys_state *pos,
  */
 static void css_advance_task_iter(struct css_task_iter *it)
 {
-	struct list_head *l = it->cset_link;
+	struct list_head *l = it->cset_pos;
 	struct cgrp_cset_link *link;
 	struct css_set *cset;
 
 	/* Advance to the next non-empty css_set */
 	do {
 		l = l->next;
-		if (l == &it->origin_css->cgroup->cset_links) {
-			it->cset_link = NULL;
+		if (l == it->cset_head) {
+			it->cset_pos = NULL;
 			return;
 		}
 		link = list_entry(l, struct cgrp_cset_link, cset_link);
 		cset = link->cset;
 	} while (list_empty(&cset->tasks) && list_empty(&cset->mg_tasks));
 
-	it->cset_link = l;
+	it->cset_pos = l;
 
 	if (!list_empty(&cset->tasks))
-		it->task = cset->tasks.next;
+		it->task_pos = cset->tasks.next;
 	else
-		it->task = cset->mg_tasks.next;
+		it->task_pos = cset->mg_tasks.next;
+
+	it->tasks_head = &cset->tasks;
+	it->mg_tasks_head = &cset->mg_tasks;
 }
 
 /**
@@ -2885,8 +2888,8 @@ void css_task_iter_start(struct cgroup_subsys_state *css,
 
 	down_read(&css_set_rwsem);
 
-	it->origin_css = css;
-	it->cset_link = &css->cgroup->cset_links;
+	it->cset_pos = &css->cgroup->cset_links;
+	it->cset_head = it->cset_pos;
 
 	css_advance_task_iter(it);
 }
@@ -2902,12 +2905,10 @@ void css_task_iter_start(struct cgroup_subsys_state *css,
 struct task_struct *css_task_iter_next(struct css_task_iter *it)
 {
 	struct task_struct *res;
-	struct list_head *l = it->task;
-	struct cgrp_cset_link *link = list_entry(it->cset_link,
-					struct cgrp_cset_link, cset_link);
+	struct list_head *l = it->task_pos;
 
 	/* If the iterator cg is NULL, we have no tasks */
-	if (!it->cset_link)
+	if (!it->cset_pos)
 		return NULL;
 	res = list_entry(l, struct task_struct, cg_list);
 
@@ -2918,13 +2919,13 @@ struct task_struct *css_task_iter_next(struct css_task_iter *it)
 	 */
 	l = l->next;
 
-	if (l == &link->cset->tasks)
-		l = link->cset->mg_tasks.next;
+	if (l == it->tasks_head)
+		l = it->mg_tasks_head->next;
 
-	if (l == &link->cset->mg_tasks)
+	if (l == it->mg_tasks_head)
 		css_advance_task_iter(it);
 	else
-		it->task = l;
+		it->task_pos = l;
 
 	return res;
 }
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 06/12] cgroup: teach css_task_iter about effective csses
       [not found] ` <1395974461-12735-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (4 preceding siblings ...)
  2014-03-28  2:40     ` Tejun Heo
@ 2014-03-28  2:40   ` Tejun Heo
  2014-03-28  2:40     ` Tejun Heo
                     ` (8 subsequent siblings)
  14 siblings, 0 replies; 52+ messages in thread
From: Tejun Heo @ 2014-03-28  2:40 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA
  Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Currently, css_task_iter iterates tasks associated with a css by
visiting each css_set associated with the owning cgroup and walking
tasks of each of them.  This works fine for !unified hierarchies as
each cgroup has its own css for each associated subsystem on the
hierarchy; however, on the planned unified hierarchy, a cgroup may not
have csses associated and its tasks would be considered associated
with the matching css of the nearest ancestor which has the subsystem
enabled.

This means that on the default unified hierarchy, just walking all
tasks associated with a cgroup isn't enough to walk all tasks which
are associated with the specified css.  If any of its children doesn't
have the matching css enabled, task iteration should also include all
tasks from the subtree.  We already added cgroup->e_csets[] to list
all css_sets effectively associated with a given css and walk css_sets
on that list instead to achieve such iteration.

This patch updates css_task_iter iteration such that it walks css_sets
on cgroup->e_csets[] instead of cgroup->cset_links if iteration is
requested on an non-dummy css.  Thanks to the previous iteration
update, this change can be achieved with the addition of
css_task_iter->ss and minimal updates to css_advance_task_iter() and
css_task_iter_start().

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 include/linux/cgroup.h |  2 ++
 kernel/cgroup.c        | 18 +++++++++++++++---
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 6bbe143..cbfd8be 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -842,6 +842,8 @@ css_next_descendant_post(struct cgroup_subsys_state *pos,
 
 /* A css_task_iter should be treated as an opaque object */
 struct css_task_iter {
+	struct cgroup_subsys		*ss;
+
 	struct list_head		*cset_pos;
 	struct list_head		*cset_head;
 
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 318b6a2..e3aa2aa 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2850,8 +2850,14 @@ static void css_advance_task_iter(struct css_task_iter *it)
 			it->cset_pos = NULL;
 			return;
 		}
-		link = list_entry(l, struct cgrp_cset_link, cset_link);
-		cset = link->cset;
+
+		if (it->ss) {
+			cset = container_of(l, struct css_set,
+					    e_cset_node[it->ss->id]);
+		} else {
+			link = list_entry(l, struct cgrp_cset_link, cset_link);
+			cset = link->cset;
+		}
 	} while (list_empty(&cset->tasks) && list_empty(&cset->mg_tasks));
 
 	it->cset_pos = l;
@@ -2888,7 +2894,13 @@ void css_task_iter_start(struct cgroup_subsys_state *css,
 
 	down_read(&css_set_rwsem);
 
-	it->cset_pos = &css->cgroup->cset_links;
+	it->ss = css->ss;
+
+	if (it->ss)
+		it->cset_pos = &css->cgroup->e_csets[css->ss->id];
+	else
+		it->cset_pos = &css->cgroup->cset_links;
+
 	it->cset_head = it->cset_pos;
 
 	css_advance_task_iter(it);
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 06/12] cgroup: teach css_task_iter about effective csses
  2014-03-28  2:40 ` Tejun Heo
  (?)
  (?)
@ 2014-03-28  2:40 ` Tejun Heo
  -1 siblings, 0 replies; 52+ messages in thread
From: Tejun Heo @ 2014-03-28  2:40 UTC (permalink / raw)
  To: lizefan; +Cc: containers, cgroups, linux-kernel, Tejun Heo

Currently, css_task_iter iterates tasks associated with a css by
visiting each css_set associated with the owning cgroup and walking
tasks of each of them.  This works fine for !unified hierarchies as
each cgroup has its own css for each associated subsystem on the
hierarchy; however, on the planned unified hierarchy, a cgroup may not
have csses associated and its tasks would be considered associated
with the matching css of the nearest ancestor which has the subsystem
enabled.

This means that on the default unified hierarchy, just walking all
tasks associated with a cgroup isn't enough to walk all tasks which
are associated with the specified css.  If any of its children doesn't
have the matching css enabled, task iteration should also include all
tasks from the subtree.  We already added cgroup->e_csets[] to list
all css_sets effectively associated with a given css and walk css_sets
on that list instead to achieve such iteration.

This patch updates css_task_iter iteration such that it walks css_sets
on cgroup->e_csets[] instead of cgroup->cset_links if iteration is
requested on an non-dummy css.  Thanks to the previous iteration
update, this change can be achieved with the addition of
css_task_iter->ss and minimal updates to css_advance_task_iter() and
css_task_iter_start().

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/cgroup.h |  2 ++
 kernel/cgroup.c        | 18 +++++++++++++++---
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 6bbe143..cbfd8be 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -842,6 +842,8 @@ css_next_descendant_post(struct cgroup_subsys_state *pos,
 
 /* A css_task_iter should be treated as an opaque object */
 struct css_task_iter {
+	struct cgroup_subsys		*ss;
+
 	struct list_head		*cset_pos;
 	struct list_head		*cset_head;
 
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 318b6a2..e3aa2aa 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2850,8 +2850,14 @@ static void css_advance_task_iter(struct css_task_iter *it)
 			it->cset_pos = NULL;
 			return;
 		}
-		link = list_entry(l, struct cgrp_cset_link, cset_link);
-		cset = link->cset;
+
+		if (it->ss) {
+			cset = container_of(l, struct css_set,
+					    e_cset_node[it->ss->id]);
+		} else {
+			link = list_entry(l, struct cgrp_cset_link, cset_link);
+			cset = link->cset;
+		}
 	} while (list_empty(&cset->tasks) && list_empty(&cset->mg_tasks));
 
 	it->cset_pos = l;
@@ -2888,7 +2894,13 @@ void css_task_iter_start(struct cgroup_subsys_state *css,
 
 	down_read(&css_set_rwsem);
 
-	it->cset_pos = &css->cgroup->cset_links;
+	it->ss = css->ss;
+
+	if (it->ss)
+		it->cset_pos = &css->cgroup->e_csets[css->ss->id];
+	else
+		it->cset_pos = &css->cgroup->cset_links;
+
 	it->cset_head = it->cset_pos;
 
 	css_advance_task_iter(it);
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 07/12] cgroup: cgroup->subsys[] should be cleared after the css is offlined
  2014-03-28  2:40 ` Tejun Heo
@ 2014-03-28  2:40     ` Tejun Heo
  -1 siblings, 0 replies; 52+ messages in thread
From: Tejun Heo @ 2014-03-28  2:40 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA
  Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

After a css finishes offlining, offline_css() mistakenly performs
RCU_INIT_POINTER(css->cgroup->subsys[ss->id], css) which just sets the
cgroup->subsys[] pointer to the current value.  The intention was to
clear it after offline is complete, not reassign the same value.

Update it to assign NULL instead of the current value.  This makes
cgroup_css() to return NULL once offline is complete.  All the
existing users of the function either can handle NULL return already
or guarantee that the css doesn't get offlined.

While this is a bugfix, as css lifetime is currently tied to the
cgroup it belongs to, this bug doesn't cause any actual problems.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Acked-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 kernel/cgroup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index e3aa2aa..c59946c 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -3692,7 +3692,7 @@ static void offline_css(struct cgroup_subsys_state *css)
 
 	css->flags &= ~CSS_ONLINE;
 	css->cgroup->nr_css--;
-	RCU_INIT_POINTER(css->cgroup->subsys[ss->id], css);
+	RCU_INIT_POINTER(css->cgroup->subsys[ss->id], NULL);
 }
 
 /**
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 07/12] cgroup: cgroup->subsys[] should be cleared after the css is offlined
@ 2014-03-28  2:40     ` Tejun Heo
  0 siblings, 0 replies; 52+ messages in thread
From: Tejun Heo @ 2014-03-28  2:40 UTC (permalink / raw)
  To: lizefan; +Cc: containers, cgroups, linux-kernel, Tejun Heo

After a css finishes offlining, offline_css() mistakenly performs
RCU_INIT_POINTER(css->cgroup->subsys[ss->id], css) which just sets the
cgroup->subsys[] pointer to the current value.  The intention was to
clear it after offline is complete, not reassign the same value.

Update it to assign NULL instead of the current value.  This makes
cgroup_css() to return NULL once offline is complete.  All the
existing users of the function either can handle NULL return already
or guarantee that the css doesn't get offlined.

While this is a bugfix, as css lifetime is currently tied to the
cgroup it belongs to, this bug doesn't cause any actual problems.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
---
 kernel/cgroup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index e3aa2aa..c59946c 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -3692,7 +3692,7 @@ static void offline_css(struct cgroup_subsys_state *css)
 
 	css->flags &= ~CSS_ONLINE;
 	css->cgroup->nr_css--;
-	RCU_INIT_POINTER(css->cgroup->subsys[ss->id], css);
+	RCU_INIT_POINTER(css->cgroup->subsys[ss->id], NULL);
 }
 
 /**
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 08/12] cgroup: allow cgroup creation and suppress automatic css creation in the unified hierarchy
  2014-03-28  2:40 ` Tejun Heo
@ 2014-03-28  2:40     ` Tejun Heo
  -1 siblings, 0 replies; 52+ messages in thread
From: Tejun Heo @ 2014-03-28  2:40 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA
  Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Now that effective css handling has been added and iterators updated
accordingly, it's safe to allow cgroup creation in the default
hierarchy.  Unblock cgroup creation in the default hierarchy.

As the default hierarchy will implement explicit enabling and
disabling of controllers on each cgroup, suppress automatic css
enabling on cgroup creation.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 kernel/cgroup.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index c59946c..30f48b0 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1115,8 +1115,10 @@ static int rebind_subsystems(struct cgroup_root *dst_root,
 		src_root->subsys_mask &= ~(1 << ssid);
 		src_root->cgrp.child_subsys_mask &= ~(1 << ssid);
 
+		/* default hierarchy doesn't enable controllers by default */
 		dst_root->subsys_mask |= 1 << ssid;
-		dst_root->cgrp.child_subsys_mask |= 1 << ssid;
+		if (dst_root != &cgrp_dfl_root)
+			dst_root->cgrp.child_subsys_mask |= 1 << ssid;
 
 		if (ss->bind)
 			ss->bind(css);
@@ -3765,13 +3767,6 @@ static long cgroup_create(struct cgroup *parent, const char *name,
 	struct cgroup_subsys *ss;
 	struct kernfs_node *kn;
 
-	/*
-	 * XXX: The default hierarchy isn't fully implemented yet.  Block
-	 * !root cgroup creation on it for now.
-	 */
-	if (root == &cgrp_dfl_root)
-		return -EINVAL;
-
 	/* allocate the cgroup and its ID, 0 is reserved for the root */
 	cgrp = kzalloc(sizeof(*cgrp), GFP_KERNEL);
 	if (!cgrp)
@@ -3853,7 +3848,12 @@ static long cgroup_create(struct cgroup *parent, const char *name,
 		}
 	}
 
-	cgrp->child_subsys_mask = parent->child_subsys_mask;
+	/*
+	 * On the default hierarchy, a child doesn't automatically inherit
+	 * child_subsys_mask from the parent.  Each is configured manually.
+	 */
+	if (!cgroup_on_dfl(cgrp))
+		cgrp->child_subsys_mask = parent->child_subsys_mask;
 
 	kernfs_activate(kn);
 
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 08/12] cgroup: allow cgroup creation and suppress automatic css creation in the unified hierarchy
@ 2014-03-28  2:40     ` Tejun Heo
  0 siblings, 0 replies; 52+ messages in thread
From: Tejun Heo @ 2014-03-28  2:40 UTC (permalink / raw)
  To: lizefan; +Cc: containers, cgroups, linux-kernel, Tejun Heo

Now that effective css handling has been added and iterators updated
accordingly, it's safe to allow cgroup creation in the default
hierarchy.  Unblock cgroup creation in the default hierarchy.

As the default hierarchy will implement explicit enabling and
disabling of controllers on each cgroup, suppress automatic css
enabling on cgroup creation.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/cgroup.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index c59946c..30f48b0 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1115,8 +1115,10 @@ static int rebind_subsystems(struct cgroup_root *dst_root,
 		src_root->subsys_mask &= ~(1 << ssid);
 		src_root->cgrp.child_subsys_mask &= ~(1 << ssid);
 
+		/* default hierarchy doesn't enable controllers by default */
 		dst_root->subsys_mask |= 1 << ssid;
-		dst_root->cgrp.child_subsys_mask |= 1 << ssid;
+		if (dst_root != &cgrp_dfl_root)
+			dst_root->cgrp.child_subsys_mask |= 1 << ssid;
 
 		if (ss->bind)
 			ss->bind(css);
@@ -3765,13 +3767,6 @@ static long cgroup_create(struct cgroup *parent, const char *name,
 	struct cgroup_subsys *ss;
 	struct kernfs_node *kn;
 
-	/*
-	 * XXX: The default hierarchy isn't fully implemented yet.  Block
-	 * !root cgroup creation on it for now.
-	 */
-	if (root == &cgrp_dfl_root)
-		return -EINVAL;
-
 	/* allocate the cgroup and its ID, 0 is reserved for the root */
 	cgrp = kzalloc(sizeof(*cgrp), GFP_KERNEL);
 	if (!cgrp)
@@ -3853,7 +3848,12 @@ static long cgroup_create(struct cgroup *parent, const char *name,
 		}
 	}
 
-	cgrp->child_subsys_mask = parent->child_subsys_mask;
+	/*
+	 * On the default hierarchy, a child doesn't automatically inherit
+	 * child_subsys_mask from the parent.  Each is configured manually.
+	 */
+	if (!cgroup_on_dfl(cgrp))
+		cgrp->child_subsys_mask = parent->child_subsys_mask;
 
 	kernfs_activate(kn);
 
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 09/12] cgroup: add css_set->dfl_cgrp
  2014-03-28  2:40 ` Tejun Heo
@ 2014-03-28  2:40     ` Tejun Heo
  -1 siblings, 0 replies; 52+ messages in thread
From: Tejun Heo @ 2014-03-28  2:40 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA
  Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

To implement the unified hierarchy behavior, we'll need to be able to
determine the associated cgroup on the default hierarchy from css_set.
Let's add css_set->dfl_cgrp so that it can be accessed conveniently
and efficiently.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 include/linux/cgroup.h | 3 +++
 kernel/cgroup.c        | 4 ++++
 2 files changed, 7 insertions(+)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index cbfd8be..b6f5e40 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -354,6 +354,9 @@ struct css_set {
 	 */
 	struct list_head cgrp_links;
 
+	/* the default cgroup associated with this css_set */
+	struct cgroup *dfl_cgrp;
+
 	/*
 	 * Set of subsystem states, one for each subsystem. This array is
 	 * immutable after creation apart from the init_css_set during
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 30f48b0..e714ccf 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -651,6 +651,10 @@ static void link_css_set(struct list_head *tmp_links, struct css_set *cset,
 	struct cgrp_cset_link *link;
 
 	BUG_ON(list_empty(tmp_links));
+
+	if (cgroup_on_dfl(cgrp))
+		cset->dfl_cgrp = cgrp;
+
 	link = list_first_entry(tmp_links, struct cgrp_cset_link, cset_link);
 	link->cset = cset;
 	link->cgrp = cgrp;
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 09/12] cgroup: add css_set->dfl_cgrp
@ 2014-03-28  2:40     ` Tejun Heo
  0 siblings, 0 replies; 52+ messages in thread
From: Tejun Heo @ 2014-03-28  2:40 UTC (permalink / raw)
  To: lizefan; +Cc: containers, cgroups, linux-kernel, Tejun Heo

To implement the unified hierarchy behavior, we'll need to be able to
determine the associated cgroup on the default hierarchy from css_set.
Let's add css_set->dfl_cgrp so that it can be accessed conveniently
and efficiently.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/cgroup.h | 3 +++
 kernel/cgroup.c        | 4 ++++
 2 files changed, 7 insertions(+)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index cbfd8be..b6f5e40 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -354,6 +354,9 @@ struct css_set {
 	 */
 	struct list_head cgrp_links;
 
+	/* the default cgroup associated with this css_set */
+	struct cgroup *dfl_cgrp;
+
 	/*
 	 * Set of subsystem states, one for each subsystem. This array is
 	 * immutable after creation apart from the init_css_set during
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 30f48b0..e714ccf 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -651,6 +651,10 @@ static void link_css_set(struct list_head *tmp_links, struct css_set *cset,
 	struct cgrp_cset_link *link;
 
 	BUG_ON(list_empty(tmp_links));
+
+	if (cgroup_on_dfl(cgrp))
+		cset->dfl_cgrp = cgrp;
+
 	link = list_first_entry(tmp_links, struct cgrp_cset_link, cset_link);
 	link->cset = cset;
 	link->cgrp = cgrp;
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 10/12] cgroup: update subsystem rebind restrictions
       [not found] ` <1395974461-12735-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (8 preceding siblings ...)
  2014-03-28  2:40     ` Tejun Heo
@ 2014-03-28  2:40   ` Tejun Heo
  2014-03-28  2:41   ` [PATCH 11/12] cgroup: prepare migration path for unified hierarchy Tejun Heo
                     ` (4 subsequent siblings)
  14 siblings, 0 replies; 52+ messages in thread
From: Tejun Heo @ 2014-03-28  2:40 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA
  Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Because the default root couldn't have any non-root csses attached to
it, rebinding away from it was always allowed; however, the default
hierarchy will soon host the unified hierarchy and have non-root csses
so the rebind restrictions need to be updated accordingly.

Instead of special casing rebinding from the default hierarchy and
then checking whether the source hierarchy has children cgroups, which
implies non-root csses for !dfl hierarchies, simply check whether the
source hierarchy has non-root csses for the subsystem using
css_next_child().

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 kernel/cgroup.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index e714ccf..0c67141 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1051,16 +1051,12 @@ static int rebind_subsystems(struct cgroup_root *dst_root,
 		if (!(ss_mask & (1 << ssid)))
 			continue;
 
-		/* if @ss is on the dummy_root, we can always move it */
-		if (ss->root == &cgrp_dfl_root)
-			continue;
-
-		/* if @ss has non-root cgroups attached to it, can't move */
-		if (!list_empty(&ss->root->cgrp.children))
+		/* if @ss has non-root csses attached to it, can't move */
+		if (css_next_child(NULL, cgroup_css(&ss->root->cgrp, ss)))
 			return -EBUSY;
 
 		/* can't move between two non-dummy roots either */
-		if (dst_root != &cgrp_dfl_root)
+		if (ss->root != &cgrp_dfl_root && dst_root != &cgrp_dfl_root)
 			return -EBUSY;
 	}
 
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 10/12] cgroup: update subsystem rebind restrictions
  2014-03-28  2:40 ` Tejun Heo
                   ` (2 preceding siblings ...)
  (?)
@ 2014-03-28  2:40 ` Tejun Heo
  -1 siblings, 0 replies; 52+ messages in thread
From: Tejun Heo @ 2014-03-28  2:40 UTC (permalink / raw)
  To: lizefan; +Cc: containers, cgroups, linux-kernel, Tejun Heo

Because the default root couldn't have any non-root csses attached to
it, rebinding away from it was always allowed; however, the default
hierarchy will soon host the unified hierarchy and have non-root csses
so the rebind restrictions need to be updated accordingly.

Instead of special casing rebinding from the default hierarchy and
then checking whether the source hierarchy has children cgroups, which
implies non-root csses for !dfl hierarchies, simply check whether the
source hierarchy has non-root csses for the subsystem using
css_next_child().

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/cgroup.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index e714ccf..0c67141 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1051,16 +1051,12 @@ static int rebind_subsystems(struct cgroup_root *dst_root,
 		if (!(ss_mask & (1 << ssid)))
 			continue;
 
-		/* if @ss is on the dummy_root, we can always move it */
-		if (ss->root == &cgrp_dfl_root)
-			continue;
-
-		/* if @ss has non-root cgroups attached to it, can't move */
-		if (!list_empty(&ss->root->cgrp.children))
+		/* if @ss has non-root csses attached to it, can't move */
+		if (css_next_child(NULL, cgroup_css(&ss->root->cgrp, ss)))
 			return -EBUSY;
 
 		/* can't move between two non-dummy roots either */
-		if (dst_root != &cgrp_dfl_root)
+		if (ss->root != &cgrp_dfl_root && dst_root != &cgrp_dfl_root)
 			return -EBUSY;
 	}
 
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 11/12] cgroup: prepare migration path for unified hierarchy
       [not found] ` <1395974461-12735-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (9 preceding siblings ...)
  2014-03-28  2:40   ` [PATCH 10/12] cgroup: update subsystem rebind restrictions Tejun Heo
@ 2014-03-28  2:41   ` Tejun Heo
  2014-03-28  2:41   ` [PATCH 12/12] cgroup: implement dynamic subtree controller enable/disable on the default hierarchy Tejun Heo
                     ` (3 subsequent siblings)
  14 siblings, 0 replies; 52+ messages in thread
From: Tejun Heo @ 2014-03-28  2:41 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA
  Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Unified hierarchy implementation would require re-migrating tasks onto
the same cgroup on the default hierarchy to reflect updated effective
csses.  Update cgroup_migrate_prepare_dst() so that it accepts NULL as
the destination cgrp.  When NULL is specified, the destination is
considered to be the cgroup on the default hierarchy associated with
each css_set.

After this change, the identity check in cgroup_migrate_add_src()
isn't sufficient for noop detection as the associated csses may change
without any cgroup association changing.  The only way to tell whether
a migration is noop or not is testing whether the source and
destination csets are identical.  The noop check in
cgroup_migrate_add_src() is removed and cset identity test is added to
cgroup_migreate_prepare_dst().  If it's detected that source and
destination csets are identical, the cset is removed removed from
@preloaded_csets and all the migration nodes are cleared which makes
cgroup_migrate() ignore the cset.

Also, make the function append the destination css_sets to
@preloaded_list so that destination css_sets always come after source
css_sets.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 kernel/cgroup.c | 34 +++++++++++++++++++++++-----------
 1 file changed, 23 insertions(+), 11 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 0c67141..5faa115 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1901,10 +1901,6 @@ static void cgroup_migrate_add_src(struct css_set *src_cset,
 
 	src_cgrp = cset_cgroup_from_root(src_cset, dst_cgrp->root);
 
-	/* nothing to do if this cset already belongs to the cgroup */
-	if (src_cgrp == dst_cgrp)
-		return;
-
 	if (!list_empty(&src_cset->mg_preload_node))
 		return;
 
@@ -1919,13 +1915,14 @@ static void cgroup_migrate_add_src(struct css_set *src_cset,
 
 /**
  * cgroup_migrate_prepare_dst - prepare destination css_sets for migration
- * @dst_cgrp: the destination cgroup
+ * @dst_cgrp: the destination cgroup (may be %NULL)
  * @preloaded_csets: list of preloaded source css_sets
  *
  * Tasks are about to be moved to @dst_cgrp and all the source css_sets
  * have been preloaded to @preloaded_csets.  This function looks up and
- * pins all destination css_sets, links each to its source, and put them on
- * @preloaded_csets.
+ * pins all destination css_sets, links each to its source, and append them
+ * to @preloaded_csets.  If @dst_cgrp is %NULL, the destination of each
+ * source css_set is assumed to be its cgroup on the default hierarchy.
  *
  * This function must be called after cgroup_migrate_add_src() has been
  * called on each migration source css_set.  After migration is performed
@@ -1936,19 +1933,34 @@ static int cgroup_migrate_prepare_dst(struct cgroup *dst_cgrp,
 				      struct list_head *preloaded_csets)
 {
 	LIST_HEAD(csets);
-	struct css_set *src_cset;
+	struct css_set *src_cset, *tmp_cset;
 
 	lockdep_assert_held(&cgroup_mutex);
 
 	/* look up the dst cset for each src cset and link it to src */
-	list_for_each_entry(src_cset, preloaded_csets, mg_preload_node) {
+	list_for_each_entry_safe(src_cset, tmp_cset, preloaded_csets, mg_preload_node) {
 		struct css_set *dst_cset;
 
-		dst_cset = find_css_set(src_cset, dst_cgrp);
+		dst_cset = find_css_set(src_cset,
+					dst_cgrp ?: src_cset->dfl_cgrp);
 		if (!dst_cset)
 			goto err;
 
 		WARN_ON_ONCE(src_cset->mg_dst_cset || dst_cset->mg_dst_cset);
+
+		/*
+		 * If src cset equals dst, it's noop.  Drop the src.
+		 * cgroup_migrate() will skip the cset too.  Note that we
+		 * can't handle src == dst as some nodes are used by both.
+		 */
+		if (src_cset == dst_cset) {
+			src_cset->mg_src_cgrp = NULL;
+			list_del_init(&src_cset->mg_preload_node);
+			put_css_set(src_cset, false);
+			put_css_set(dst_cset, false);
+			continue;
+		}
+
 		src_cset->mg_dst_cset = dst_cset;
 
 		if (list_empty(&dst_cset->mg_preload_node))
@@ -1957,7 +1969,7 @@ static int cgroup_migrate_prepare_dst(struct cgroup *dst_cgrp,
 			put_css_set(dst_cset, false);
 	}
 
-	list_splice(&csets, preloaded_csets);
+	list_splice_tail(&csets, preloaded_csets);
 	return 0;
 err:
 	cgroup_migrate_finish(&csets);
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 11/12] cgroup: prepare migration path for unified hierarchy
  2014-03-28  2:40 ` Tejun Heo
                   ` (3 preceding siblings ...)
  (?)
@ 2014-03-28  2:41 ` Tejun Heo
  -1 siblings, 0 replies; 52+ messages in thread
From: Tejun Heo @ 2014-03-28  2:41 UTC (permalink / raw)
  To: lizefan; +Cc: containers, cgroups, linux-kernel, Tejun Heo

Unified hierarchy implementation would require re-migrating tasks onto
the same cgroup on the default hierarchy to reflect updated effective
csses.  Update cgroup_migrate_prepare_dst() so that it accepts NULL as
the destination cgrp.  When NULL is specified, the destination is
considered to be the cgroup on the default hierarchy associated with
each css_set.

After this change, the identity check in cgroup_migrate_add_src()
isn't sufficient for noop detection as the associated csses may change
without any cgroup association changing.  The only way to tell whether
a migration is noop or not is testing whether the source and
destination csets are identical.  The noop check in
cgroup_migrate_add_src() is removed and cset identity test is added to
cgroup_migreate_prepare_dst().  If it's detected that source and
destination csets are identical, the cset is removed removed from
@preloaded_csets and all the migration nodes are cleared which makes
cgroup_migrate() ignore the cset.

Also, make the function append the destination css_sets to
@preloaded_list so that destination css_sets always come after source
css_sets.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/cgroup.c | 34 +++++++++++++++++++++++-----------
 1 file changed, 23 insertions(+), 11 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 0c67141..5faa115 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1901,10 +1901,6 @@ static void cgroup_migrate_add_src(struct css_set *src_cset,
 
 	src_cgrp = cset_cgroup_from_root(src_cset, dst_cgrp->root);
 
-	/* nothing to do if this cset already belongs to the cgroup */
-	if (src_cgrp == dst_cgrp)
-		return;
-
 	if (!list_empty(&src_cset->mg_preload_node))
 		return;
 
@@ -1919,13 +1915,14 @@ static void cgroup_migrate_add_src(struct css_set *src_cset,
 
 /**
  * cgroup_migrate_prepare_dst - prepare destination css_sets for migration
- * @dst_cgrp: the destination cgroup
+ * @dst_cgrp: the destination cgroup (may be %NULL)
  * @preloaded_csets: list of preloaded source css_sets
  *
  * Tasks are about to be moved to @dst_cgrp and all the source css_sets
  * have been preloaded to @preloaded_csets.  This function looks up and
- * pins all destination css_sets, links each to its source, and put them on
- * @preloaded_csets.
+ * pins all destination css_sets, links each to its source, and append them
+ * to @preloaded_csets.  If @dst_cgrp is %NULL, the destination of each
+ * source css_set is assumed to be its cgroup on the default hierarchy.
  *
  * This function must be called after cgroup_migrate_add_src() has been
  * called on each migration source css_set.  After migration is performed
@@ -1936,19 +1933,34 @@ static int cgroup_migrate_prepare_dst(struct cgroup *dst_cgrp,
 				      struct list_head *preloaded_csets)
 {
 	LIST_HEAD(csets);
-	struct css_set *src_cset;
+	struct css_set *src_cset, *tmp_cset;
 
 	lockdep_assert_held(&cgroup_mutex);
 
 	/* look up the dst cset for each src cset and link it to src */
-	list_for_each_entry(src_cset, preloaded_csets, mg_preload_node) {
+	list_for_each_entry_safe(src_cset, tmp_cset, preloaded_csets, mg_preload_node) {
 		struct css_set *dst_cset;
 
-		dst_cset = find_css_set(src_cset, dst_cgrp);
+		dst_cset = find_css_set(src_cset,
+					dst_cgrp ?: src_cset->dfl_cgrp);
 		if (!dst_cset)
 			goto err;
 
 		WARN_ON_ONCE(src_cset->mg_dst_cset || dst_cset->mg_dst_cset);
+
+		/*
+		 * If src cset equals dst, it's noop.  Drop the src.
+		 * cgroup_migrate() will skip the cset too.  Note that we
+		 * can't handle src == dst as some nodes are used by both.
+		 */
+		if (src_cset == dst_cset) {
+			src_cset->mg_src_cgrp = NULL;
+			list_del_init(&src_cset->mg_preload_node);
+			put_css_set(src_cset, false);
+			put_css_set(dst_cset, false);
+			continue;
+		}
+
 		src_cset->mg_dst_cset = dst_cset;
 
 		if (list_empty(&dst_cset->mg_preload_node))
@@ -1957,7 +1969,7 @@ static int cgroup_migrate_prepare_dst(struct cgroup *dst_cgrp,
 			put_css_set(dst_cset, false);
 	}
 
-	list_splice(&csets, preloaded_csets);
+	list_splice_tail(&csets, preloaded_csets);
 	return 0;
 err:
 	cgroup_migrate_finish(&csets);
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 12/12] cgroup: implement dynamic subtree controller enable/disable on the default hierarchy
       [not found] ` <1395974461-12735-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (10 preceding siblings ...)
  2014-03-28  2:41   ` [PATCH 11/12] cgroup: prepare migration path for unified hierarchy Tejun Heo
@ 2014-03-28  2:41   ` Tejun Heo
  2014-04-14 15:45     ` Vivek Goyal
                     ` (2 subsequent siblings)
  14 siblings, 0 replies; 52+ messages in thread
From: Tejun Heo @ 2014-03-28  2:41 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA
  Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

cgroup is switching away from multiple hierarchies and will use one
unified default hierarchy where controllers can be dynamically enabled
and disabled per subtree.  The default hierarchy will serve as the
unified hierarchy to which all controllers are attached and a css on
the default hierarchy would need to also serve the tasks of descendant
cgroups which don't have the controller enabled - ie. the tree may be
collapsed from leaf towards root when viewed from specific
controllers.  This has been implemented through effective css in the
previous patches.

This patch finally implements dynamic subtree controller
enable/disable on the default hierarchy via a new knob -
"cgroup.subtree_control" which controls which controllers are enabled
on the child cgroups.  Let's assume a hierarchy like the following.

  root - A - B - C
               \ D

root's "cgroup.subtree_control" determines which controllers are
enabled on A.  A's on B.  B's on C and D.  This coincides with the
fact that controllers on the immediate sub-level are used to
distribute the resources of the parent.  In fact, it's natural to
assume that resource control knobs of a child belong to its parent.
Enabling a controller in "cgroup.subtree_control" declares that
distribution of the respective resources of the cgroup will be
controlled.  Note that this means that controller enable states are
shared among siblings.

The default hierarchy has an extra restriction - only cgroups which
don't contain any task may have controllers enabled in
"cgroup.subtree_control".  Combined with the other properties of the
default hierarchy, this guarantees that, from the view point of
controllers, tasks are only on the leaf cgroups.  In other words, only
leaf csses may contain tasks.  This rules out situations where child
cgroups compete against internal tasks of the parent, which is a
competition between two different types of entities without any clear
way to determine resource distribution between the two.  Different
controllers handle it differently and all the implemented behaviors
are ambiguous, ad-hoc, cumbersome and/or just wrong.  Having this
structural constraints imposed from cgroup core removes the burden
from controller implementations and enables showing one consistent
behavior across all controllers.

When a controller is enabled or disabled, css associations for the
controller in the subtrees of each child should be updated.  After
enabling, the whole subtree of a child should point to the new css of
the child.  After disabling, the whole subtree of a child should point
to the cgroup's css.  This is implemented by first updating cgroup
states such that cgroup_e_css() result points to the appropriate css
and then invoking cgroup_update_dfl_csses() which migrates all tasks
in the affected subtrees to the self cgroup on the default hierarchy.

* When read, "cgroup.subtree_control" lists all the currently enabled
  controllers on the children of the cgroup.

* White-space separated list of controller names prefixed with either
  '+' or '-' can be written to "cgroup.subtree_control".  The ones
  prefixed with '+' are enabled on the controller and '-' disabled.

* A controller can be enabled iff the parent's
  "cgroup.subtree_control" enables it and disabled iff no child's
  "cgroup.subtree_control" has it enabled.

* If a cgroup has tasks, no controller can be enabled via
  "cgroup.subtree_control".  Likewise, if "cgroup.subtree_control" has
  some controllers enabled, tasks can't be migrated into the cgroup.

* All controllers which aren't bound on other hierarchies are
  automatically associated with the root cgroup of the default
  hierarchy.  All the controllers which are bound to the default
  hierarchy are listed in the read-only file "cgroup.controllers" in
  the root directory.

This is still experimental and there are some holes, one of which is
that ->can_attach() failure during cgroup_update_dfl_csses() may leave
the cgroups in an undefined state.  The issues will be addressed by
future patches.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 include/linux/cgroup.h |   5 +
 kernel/cgroup.c        | 346 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 351 insertions(+)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index b6f5e40..dee6f3c 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -21,6 +21,7 @@
 #include <linux/percpu-refcount.h>
 #include <linux/seq_file.h>
 #include <linux/kernfs.h>
+#include <linux/wait.h>
 
 #ifdef CONFIG_CGROUPS
 
@@ -164,6 +165,7 @@ struct cgroup {
 
 	struct cgroup *parent;		/* my parent */
 	struct kernfs_node *kn;		/* cgroup kernfs entry */
+	struct kernfs_node *control_kn;	/* kn for "cgroup.subtree_control" */
 
 	/*
 	 * Monotonically increasing unique serial number which defines a
@@ -216,6 +218,9 @@ struct cgroup {
 	/* For css percpu_ref killing and RCU-protected deletion */
 	struct rcu_head rcu_head;
 	struct work_struct destroy_work;
+
+	/* used to wait for offlining of csses */
+	wait_queue_head_t offline_waitq;
 };
 
 #define MAX_CGROUP_ROOT_NAMELEN 64
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 5faa115..4e958c7 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -182,6 +182,8 @@ static int rebind_subsystems(struct cgroup_root *dst_root,
 			     unsigned long ss_mask);
 static void cgroup_destroy_css_killed(struct cgroup *cgrp);
 static int cgroup_destroy_locked(struct cgroup *cgrp);
+static int create_css(struct cgroup *cgrp, struct cgroup_subsys *ss);
+static void kill_css(struct cgroup_subsys_state *css);
 static int cgroup_addrm_files(struct cgroup *cgrp, struct cftype cfts[],
 			      bool is_add);
 static void cgroup_pidlist_destroy_all(struct cgroup *cgrp);
@@ -338,6 +340,14 @@ static int notify_on_release(const struct cgroup *cgrp)
 #define for_each_root(root)						\
 	list_for_each_entry((root), &cgroup_roots, root_list)
 
+/* iterate over child cgrps, lock should be held throughout iteration */
+#define cgroup_for_each_live_child(child, cgrp)				\
+	list_for_each_entry((child), &(cgrp)->children, sibling)	\
+		if (({ lockdep_assert_held(&cgroup_tree_mutex);		\
+		       cgroup_is_dead(child); }))			\
+			;						\
+		else
+
 /**
  * cgroup_lock_live_group - take cgroup_mutex and check that cgrp is alive.
  * @cgrp: the cgroup to be checked for liveness
@@ -1450,6 +1460,8 @@ static void init_cgroup_housekeeping(struct cgroup *cgrp)
 
 	for_each_subsys(ss, ssid)
 		INIT_LIST_HEAD(&cgrp->e_csets[ssid]);
+
+	init_waitqueue_head(&cgrp->offline_waitq);
 }
 
 static void init_cgroup_root(struct cgroup_root *root,
@@ -1937,6 +1949,14 @@ static int cgroup_migrate_prepare_dst(struct cgroup *dst_cgrp,
 
 	lockdep_assert_held(&cgroup_mutex);
 
+	/*
+	 * Except for the root, child_subsys_mask must be zero for a cgroup
+	 * with tasks so that child cgroups don't compete against tasks.
+	 */
+	if (dst_cgrp && cgroup_on_dfl(dst_cgrp) && dst_cgrp->parent &&
+	    dst_cgrp->child_subsys_mask)
+		return -EBUSY;
+
 	/* look up the dst cset for each src cset and link it to src */
 	list_for_each_entry_safe(src_cset, tmp_cset, preloaded_csets, mg_preload_node) {
 		struct css_set *dst_cset;
@@ -2302,6 +2322,317 @@ static int cgroup_sane_behavior_show(struct seq_file *seq, void *v)
 	return 0;
 }
 
+static void cgroup_print_ss_mask(struct seq_file *seq, unsigned int ss_mask)
+{
+	struct cgroup_subsys *ss;
+	bool printed = false;
+	int ssid;
+
+	for_each_subsys(ss, ssid) {
+		if (ss_mask & (1 << ssid)) {
+			if (printed)
+				seq_putc(seq, ' ');
+			seq_printf(seq, "%s", ss->name);
+			printed = true;
+		}
+	}
+	if (printed)
+		seq_putc(seq, '\n');
+}
+
+/* show controllers which are currently attached to the default hierarchy */
+static int cgroup_controllers_show(struct seq_file *seq, void *v)
+{
+	struct cgroup *cgrp = seq_css(seq)->cgroup;
+
+	cgroup_print_ss_mask(seq, cgrp->root->subsys_mask);
+	return 0;
+}
+
+/* show controllers which are enabled for a given cgroup's children */
+static int cgroup_subtree_control_show(struct seq_file *seq, void *v)
+{
+	struct cgroup *cgrp = seq_css(seq)->cgroup;
+
+	cgroup_print_ss_mask(seq, cgrp->child_subsys_mask);
+	return 0;
+}
+
+/**
+ * cgroup_update_dfl_csses - update css assoc of a subtree in default hierarchy
+ * @cgrp: root of the subtree to update csses for
+ *
+ * @cgrp's child_subsys_mask has changed and its subtree's (self excluded)
+ * css associations need to be updated accordingly.  This function looks up
+ * all css_sets which are attached to the subtree, creates the matching
+ * updated css_sets and migrates the tasks to the new ones.
+ */
+static int cgroup_update_dfl_csses(struct cgroup *cgrp)
+{
+	LIST_HEAD(preloaded_csets);
+	struct cgroup_subsys_state *css;
+	struct css_set *src_cset;
+	int ret;
+
+	lockdep_assert_held(&cgroup_tree_mutex);
+	lockdep_assert_held(&cgroup_mutex);
+
+	/* look up all csses currently attached to @cgrp's subtree */
+	down_read(&css_set_rwsem);
+	css_for_each_descendant_pre(css, cgroup_css(cgrp, NULL)) {
+		struct cgrp_cset_link *link;
+
+		/* self is not affected by child_subsys_mask change */
+		if (css->cgroup == cgrp)
+			continue;
+
+		list_for_each_entry(link, &css->cgroup->cset_links, cset_link)
+			cgroup_migrate_add_src(link->cset, cgrp,
+					       &preloaded_csets);
+	}
+	up_read(&css_set_rwsem);
+
+	/* NULL dst indicates self on default hierarchy */
+	ret = cgroup_migrate_prepare_dst(NULL, &preloaded_csets);
+	if (ret)
+		goto out_finish;
+
+	list_for_each_entry(src_cset, &preloaded_csets, mg_preload_node) {
+		struct task_struct *last_task = NULL, *task;
+
+		/* src_csets precede dst_csets, break on the first dst_cset */
+		if (!src_cset->mg_src_cgrp)
+			break;
+
+		/*
+		 * All tasks in src_cset need to be migrated to the
+		 * matching dst_cset.  Empty it process by process.  We
+		 * walk tasks but migrate processes.  The leader might even
+		 * belong to a different cset but such src_cset would also
+		 * be among the target src_csets because the default
+		 * hierarchy enforces per-process membership.
+		 */
+		while (true) {
+			down_read(&css_set_rwsem);
+			task = list_first_entry_or_null(&src_cset->tasks,
+						struct task_struct, cg_list);
+			if (task) {
+				task = task->group_leader;
+				WARN_ON_ONCE(!task_css_set(task)->mg_src_cgrp);
+				get_task_struct(task);
+			}
+			up_read(&css_set_rwsem);
+
+			if (!task)
+				break;
+
+			/* guard against possible infinite loop */
+			if (WARN(last_task == task,
+				 "cgroup: update_dfl_csses failed to make progress, aborting in inconsistent state\n"))
+				goto out_finish;
+			last_task = task;
+
+			threadgroup_lock(task);
+			/* raced against de_thread() from another thread? */
+			if (!thread_group_leader(task)) {
+				threadgroup_unlock(task);
+				put_task_struct(task);
+				continue;
+			}
+
+			ret = cgroup_migrate(src_cset->dfl_cgrp, task, true);
+
+			threadgroup_unlock(task);
+			put_task_struct(task);
+
+			if (WARN(ret, "cgroup: failed to update controllers for the default hierarchy (%d), further operations may crash or hang\n", ret))
+				goto out_finish;
+		}
+	}
+
+out_finish:
+	cgroup_migrate_finish(&preloaded_csets);
+	return ret;
+}
+
+/* change the enabled child controllers for a cgroup in the default hierarchy */
+static int cgroup_subtree_control_write(struct cgroup_subsys_state *dummy_css,
+					struct cftype *cft, char *buffer)
+{
+	unsigned long enable_req = 0, disable_req = 0, enable, disable;
+	struct cgroup *cgrp = dummy_css->cgroup, *child;
+	struct cgroup_subsys *ss;
+	char *tok, *p;
+	int ssid, ret;
+
+	/*
+	 * Parse input - white space separated list of subsystem names
+	 * prefixed with either + or -.
+	 */
+	p = buffer;
+	while ((tok = strsep(&p, " \t\n"))) {
+		for_each_subsys(ss, ssid) {
+			if (ss->disabled || strcmp(tok + 1, ss->name))
+				continue;
+
+			if (*tok == '+') {
+				enable_req |= 1 << ssid;
+				disable_req &= ~(1 << ssid);
+			} else if (*tok == '-') {
+				disable_req |= 1 << ssid;
+				enable_req &= ~(1 << ssid);
+			} else {
+				return -EINVAL;
+			}
+			break;
+		}
+		if (ssid == CGROUP_SUBSYS_COUNT)
+			return -EINVAL;
+	}
+
+	/*
+	 * We're gonna grab cgroup_tree_mutex which nests outside kernfs
+	 * active_ref.  cgroup_lock_live_group() already provides enough
+	 * protection.  Ensure @cgrp stays accessible and break the
+	 * active_ref protection.
+	 */
+	cgroup_get(cgrp);
+	kernfs_break_active_protection(cgrp->control_kn);
+retry:
+	enable = enable_req;
+	disable = disable_req;
+
+	mutex_lock(&cgroup_tree_mutex);
+
+	for_each_subsys(ss, ssid) {
+		if (enable & (1 << ssid)) {
+			if (cgrp->child_subsys_mask & (1 << ssid)) {
+				enable &= ~(1 << ssid);
+				continue;
+			}
+
+			/*
+			 * Because css offlining is asynchronous, userland
+			 * might try to re-enable the same controller while
+			 * the previous instance is still around.  In such
+			 * cases, wait till it's gone using offline_waitq.
+			 */
+			cgroup_for_each_live_child(child, cgrp) {
+				wait_queue_t wait;
+
+				if (!cgroup_css(child, ss))
+					continue;
+
+				prepare_to_wait(&child->offline_waitq, &wait,
+						TASK_UNINTERRUPTIBLE);
+				mutex_unlock(&cgroup_tree_mutex);
+				schedule();
+				finish_wait(&child->offline_waitq, &wait);
+				goto retry;
+			}
+
+			/* unavailable or not enabled on the parent? */
+			if (!(cgrp_dfl_root.subsys_mask & (1 << ssid)) ||
+			    (cgrp->parent &&
+			     !(cgrp->parent->child_subsys_mask & (1 << ssid)))) {
+				ret = -ENOENT;
+				goto out_unlock_tree;
+			}
+		} else if (disable & (1 << ssid)) {
+			if (!(cgrp->child_subsys_mask & (1 << ssid))) {
+				disable &= ~(1 << ssid);
+				continue;
+			}
+
+			/* a child has it enabled? */
+			cgroup_for_each_live_child(child, cgrp) {
+				if (child->child_subsys_mask & (1 << ssid)) {
+					ret = -EBUSY;
+					goto out_unlock_tree;
+				}
+			}
+		}
+	}
+
+	if (!enable && !disable) {
+		ret = 0;
+		goto out_unlock_tree;
+	}
+
+	if (!cgroup_lock_live_group(cgrp)) {
+		ret = -ENODEV;
+		goto out_unlock_tree;
+	}
+
+	/*
+	 * Except for the root, child_subsys_mask must be zero for a cgroup
+	 * with tasks so that child cgroups don't compete against tasks.
+	 */
+	if (enable && cgrp->parent && !list_empty(&cgrp->cset_links)) {
+		ret = -EBUSY;
+		goto out_unlock;
+	}
+
+	/*
+	 * Create csses for enables and update child_subsys_mask.  This
+	 * changes cgroup_e_css() results which in turn makes the
+	 * subsequent cgroup_update_dfl_csses() associate all tasks in the
+	 * subtree to the updated csses.
+	 */
+	for_each_subsys(ss, ssid) {
+		if (!(enable & (1 << ssid)))
+			continue;
+
+		cgroup_for_each_live_child(child, cgrp) {
+			ret = create_css(child, ss);
+			if (ret)
+				goto err_undo_css;
+		}
+	}
+
+	cgrp->child_subsys_mask |= enable;
+	cgrp->child_subsys_mask &= ~disable;
+
+	ret = cgroup_update_dfl_csses(cgrp);
+	if (ret)
+		goto err_undo_css;
+
+	/* all tasks are now migrated away from the old csses, kill them */
+	for_each_subsys(ss, ssid) {
+		if (!(disable & (1 << ssid)))
+			continue;
+
+		cgroup_for_each_live_child(child, cgrp)
+			kill_css(cgroup_css(child, ss));
+	}
+
+	kernfs_activate(cgrp->kn);
+	ret = 0;
+out_unlock:
+	mutex_unlock(&cgroup_mutex);
+out_unlock_tree:
+	mutex_unlock(&cgroup_tree_mutex);
+	kernfs_unbreak_active_protection(cgrp->control_kn);
+	cgroup_put(cgrp);
+	return ret;
+
+err_undo_css:
+	cgrp->child_subsys_mask &= ~enable;
+	cgrp->child_subsys_mask |= disable;
+
+	for_each_subsys(ss, ssid) {
+		if (!(enable & (1 << ssid)))
+			continue;
+
+		cgroup_for_each_live_child(child, cgrp) {
+			struct cgroup_subsys_state *css = cgroup_css(child, ss);
+			if (css)
+				kill_css(css);
+		}
+	}
+	goto out_unlock;
+}
+
 static ssize_t cgroup_file_write(struct kernfs_open_file *of, char *buf,
 				 size_t nbytes, loff_t off)
 {
@@ -2442,6 +2773,8 @@ static int cgroup_add_file(struct cgroup *cgrp, struct cftype *cft)
 	kn = __kernfs_create_file(cgrp->kn, cgroup_file_name(cgrp, cft, name),
 				  cgroup_file_mode(cft), 0, cft->kf_ops, cft,
 				  NULL, false, key);
+	if (cft->seq_show == cgroup_subtree_control_show)
+		cgrp->control_kn = kn;
 	return PTR_ERR_OR_ZERO(kn);
 }
 
@@ -3539,6 +3872,17 @@ static struct cftype cgroup_base_files[] = {
 		.flags = CFTYPE_ONLY_ON_ROOT,
 		.seq_show = cgroup_sane_behavior_show,
 	},
+	{
+		.name = "cgroup.controllers",
+		.flags = CFTYPE_ONLY_ON_DFL | CFTYPE_ONLY_ON_ROOT,
+		.seq_show = cgroup_controllers_show,
+	},
+	{
+		.name = "cgroup.subtree_control",
+		.flags = CFTYPE_ONLY_ON_DFL,
+		.seq_show = cgroup_subtree_control_show,
+		.write_string = cgroup_subtree_control_write,
+	},
 
 	/*
 	 * Historical crazy stuff.  These don't have "cgroup."  prefix and
@@ -3707,6 +4051,8 @@ static void offline_css(struct cgroup_subsys_state *css)
 	css->flags &= ~CSS_ONLINE;
 	css->cgroup->nr_css--;
 	RCU_INIT_POINTER(css->cgroup->subsys[ss->id], NULL);
+
+	wake_up_all(&css->cgroup->offline_waitq);
 }
 
 /**
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 12/12] cgroup: implement dynamic subtree controller enable/disable on the default hierarchy
       [not found] ` <1395974461-12735-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2014-03-28  2:41   ` Tejun Heo
  2014-03-28  2:40     ` Tejun Heo
                     ` (13 subsequent siblings)
  14 siblings, 0 replies; 52+ messages in thread
From: Tejun Heo @ 2014-03-28  2:41 UTC (permalink / raw)
  To: lizefan; +Cc: containers, cgroups, linux-kernel, Tejun Heo

cgroup is switching away from multiple hierarchies and will use one
unified default hierarchy where controllers can be dynamically enabled
and disabled per subtree.  The default hierarchy will serve as the
unified hierarchy to which all controllers are attached and a css on
the default hierarchy would need to also serve the tasks of descendant
cgroups which don't have the controller enabled - ie. the tree may be
collapsed from leaf towards root when viewed from specific
controllers.  This has been implemented through effective css in the
previous patches.

This patch finally implements dynamic subtree controller
enable/disable on the default hierarchy via a new knob -
"cgroup.subtree_control" which controls which controllers are enabled
on the child cgroups.  Let's assume a hierarchy like the following.

  root - A - B - C
               \ D

root's "cgroup.subtree_control" determines which controllers are
enabled on A.  A's on B.  B's on C and D.  This coincides with the
fact that controllers on the immediate sub-level are used to
distribute the resources of the parent.  In fact, it's natural to
assume that resource control knobs of a child belong to its parent.
Enabling a controller in "cgroup.subtree_control" declares that
distribution of the respective resources of the cgroup will be
controlled.  Note that this means that controller enable states are
shared among siblings.

The default hierarchy has an extra restriction - only cgroups which
don't contain any task may have controllers enabled in
"cgroup.subtree_control".  Combined with the other properties of the
default hierarchy, this guarantees that, from the view point of
controllers, tasks are only on the leaf cgroups.  In other words, only
leaf csses may contain tasks.  This rules out situations where child
cgroups compete against internal tasks of the parent, which is a
competition between two different types of entities without any clear
way to determine resource distribution between the two.  Different
controllers handle it differently and all the implemented behaviors
are ambiguous, ad-hoc, cumbersome and/or just wrong.  Having this
structural constraints imposed from cgroup core removes the burden
from controller implementations and enables showing one consistent
behavior across all controllers.

When a controller is enabled or disabled, css associations for the
controller in the subtrees of each child should be updated.  After
enabling, the whole subtree of a child should point to the new css of
the child.  After disabling, the whole subtree of a child should point
to the cgroup's css.  This is implemented by first updating cgroup
states such that cgroup_e_css() result points to the appropriate css
and then invoking cgroup_update_dfl_csses() which migrates all tasks
in the affected subtrees to the self cgroup on the default hierarchy.

* When read, "cgroup.subtree_control" lists all the currently enabled
  controllers on the children of the cgroup.

* White-space separated list of controller names prefixed with either
  '+' or '-' can be written to "cgroup.subtree_control".  The ones
  prefixed with '+' are enabled on the controller and '-' disabled.

* A controller can be enabled iff the parent's
  "cgroup.subtree_control" enables it and disabled iff no child's
  "cgroup.subtree_control" has it enabled.

* If a cgroup has tasks, no controller can be enabled via
  "cgroup.subtree_control".  Likewise, if "cgroup.subtree_control" has
  some controllers enabled, tasks can't be migrated into the cgroup.

* All controllers which aren't bound on other hierarchies are
  automatically associated with the root cgroup of the default
  hierarchy.  All the controllers which are bound to the default
  hierarchy are listed in the read-only file "cgroup.controllers" in
  the root directory.

This is still experimental and there are some holes, one of which is
that ->can_attach() failure during cgroup_update_dfl_csses() may leave
the cgroups in an undefined state.  The issues will be addressed by
future patches.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/cgroup.h |   5 +
 kernel/cgroup.c        | 346 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 351 insertions(+)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index b6f5e40..dee6f3c 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -21,6 +21,7 @@
 #include <linux/percpu-refcount.h>
 #include <linux/seq_file.h>
 #include <linux/kernfs.h>
+#include <linux/wait.h>
 
 #ifdef CONFIG_CGROUPS
 
@@ -164,6 +165,7 @@ struct cgroup {
 
 	struct cgroup *parent;		/* my parent */
 	struct kernfs_node *kn;		/* cgroup kernfs entry */
+	struct kernfs_node *control_kn;	/* kn for "cgroup.subtree_control" */
 
 	/*
 	 * Monotonically increasing unique serial number which defines a
@@ -216,6 +218,9 @@ struct cgroup {
 	/* For css percpu_ref killing and RCU-protected deletion */
 	struct rcu_head rcu_head;
 	struct work_struct destroy_work;
+
+	/* used to wait for offlining of csses */
+	wait_queue_head_t offline_waitq;
 };
 
 #define MAX_CGROUP_ROOT_NAMELEN 64
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 5faa115..4e958c7 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -182,6 +182,8 @@ static int rebind_subsystems(struct cgroup_root *dst_root,
 			     unsigned long ss_mask);
 static void cgroup_destroy_css_killed(struct cgroup *cgrp);
 static int cgroup_destroy_locked(struct cgroup *cgrp);
+static int create_css(struct cgroup *cgrp, struct cgroup_subsys *ss);
+static void kill_css(struct cgroup_subsys_state *css);
 static int cgroup_addrm_files(struct cgroup *cgrp, struct cftype cfts[],
 			      bool is_add);
 static void cgroup_pidlist_destroy_all(struct cgroup *cgrp);
@@ -338,6 +340,14 @@ static int notify_on_release(const struct cgroup *cgrp)
 #define for_each_root(root)						\
 	list_for_each_entry((root), &cgroup_roots, root_list)
 
+/* iterate over child cgrps, lock should be held throughout iteration */
+#define cgroup_for_each_live_child(child, cgrp)				\
+	list_for_each_entry((child), &(cgrp)->children, sibling)	\
+		if (({ lockdep_assert_held(&cgroup_tree_mutex);		\
+		       cgroup_is_dead(child); }))			\
+			;						\
+		else
+
 /**
  * cgroup_lock_live_group - take cgroup_mutex and check that cgrp is alive.
  * @cgrp: the cgroup to be checked for liveness
@@ -1450,6 +1460,8 @@ static void init_cgroup_housekeeping(struct cgroup *cgrp)
 
 	for_each_subsys(ss, ssid)
 		INIT_LIST_HEAD(&cgrp->e_csets[ssid]);
+
+	init_waitqueue_head(&cgrp->offline_waitq);
 }
 
 static void init_cgroup_root(struct cgroup_root *root,
@@ -1937,6 +1949,14 @@ static int cgroup_migrate_prepare_dst(struct cgroup *dst_cgrp,
 
 	lockdep_assert_held(&cgroup_mutex);
 
+	/*
+	 * Except for the root, child_subsys_mask must be zero for a cgroup
+	 * with tasks so that child cgroups don't compete against tasks.
+	 */
+	if (dst_cgrp && cgroup_on_dfl(dst_cgrp) && dst_cgrp->parent &&
+	    dst_cgrp->child_subsys_mask)
+		return -EBUSY;
+
 	/* look up the dst cset for each src cset and link it to src */
 	list_for_each_entry_safe(src_cset, tmp_cset, preloaded_csets, mg_preload_node) {
 		struct css_set *dst_cset;
@@ -2302,6 +2322,317 @@ static int cgroup_sane_behavior_show(struct seq_file *seq, void *v)
 	return 0;
 }
 
+static void cgroup_print_ss_mask(struct seq_file *seq, unsigned int ss_mask)
+{
+	struct cgroup_subsys *ss;
+	bool printed = false;
+	int ssid;
+
+	for_each_subsys(ss, ssid) {
+		if (ss_mask & (1 << ssid)) {
+			if (printed)
+				seq_putc(seq, ' ');
+			seq_printf(seq, "%s", ss->name);
+			printed = true;
+		}
+	}
+	if (printed)
+		seq_putc(seq, '\n');
+}
+
+/* show controllers which are currently attached to the default hierarchy */
+static int cgroup_controllers_show(struct seq_file *seq, void *v)
+{
+	struct cgroup *cgrp = seq_css(seq)->cgroup;
+
+	cgroup_print_ss_mask(seq, cgrp->root->subsys_mask);
+	return 0;
+}
+
+/* show controllers which are enabled for a given cgroup's children */
+static int cgroup_subtree_control_show(struct seq_file *seq, void *v)
+{
+	struct cgroup *cgrp = seq_css(seq)->cgroup;
+
+	cgroup_print_ss_mask(seq, cgrp->child_subsys_mask);
+	return 0;
+}
+
+/**
+ * cgroup_update_dfl_csses - update css assoc of a subtree in default hierarchy
+ * @cgrp: root of the subtree to update csses for
+ *
+ * @cgrp's child_subsys_mask has changed and its subtree's (self excluded)
+ * css associations need to be updated accordingly.  This function looks up
+ * all css_sets which are attached to the subtree, creates the matching
+ * updated css_sets and migrates the tasks to the new ones.
+ */
+static int cgroup_update_dfl_csses(struct cgroup *cgrp)
+{
+	LIST_HEAD(preloaded_csets);
+	struct cgroup_subsys_state *css;
+	struct css_set *src_cset;
+	int ret;
+
+	lockdep_assert_held(&cgroup_tree_mutex);
+	lockdep_assert_held(&cgroup_mutex);
+
+	/* look up all csses currently attached to @cgrp's subtree */
+	down_read(&css_set_rwsem);
+	css_for_each_descendant_pre(css, cgroup_css(cgrp, NULL)) {
+		struct cgrp_cset_link *link;
+
+		/* self is not affected by child_subsys_mask change */
+		if (css->cgroup == cgrp)
+			continue;
+
+		list_for_each_entry(link, &css->cgroup->cset_links, cset_link)
+			cgroup_migrate_add_src(link->cset, cgrp,
+					       &preloaded_csets);
+	}
+	up_read(&css_set_rwsem);
+
+	/* NULL dst indicates self on default hierarchy */
+	ret = cgroup_migrate_prepare_dst(NULL, &preloaded_csets);
+	if (ret)
+		goto out_finish;
+
+	list_for_each_entry(src_cset, &preloaded_csets, mg_preload_node) {
+		struct task_struct *last_task = NULL, *task;
+
+		/* src_csets precede dst_csets, break on the first dst_cset */
+		if (!src_cset->mg_src_cgrp)
+			break;
+
+		/*
+		 * All tasks in src_cset need to be migrated to the
+		 * matching dst_cset.  Empty it process by process.  We
+		 * walk tasks but migrate processes.  The leader might even
+		 * belong to a different cset but such src_cset would also
+		 * be among the target src_csets because the default
+		 * hierarchy enforces per-process membership.
+		 */
+		while (true) {
+			down_read(&css_set_rwsem);
+			task = list_first_entry_or_null(&src_cset->tasks,
+						struct task_struct, cg_list);
+			if (task) {
+				task = task->group_leader;
+				WARN_ON_ONCE(!task_css_set(task)->mg_src_cgrp);
+				get_task_struct(task);
+			}
+			up_read(&css_set_rwsem);
+
+			if (!task)
+				break;
+
+			/* guard against possible infinite loop */
+			if (WARN(last_task == task,
+				 "cgroup: update_dfl_csses failed to make progress, aborting in inconsistent state\n"))
+				goto out_finish;
+			last_task = task;
+
+			threadgroup_lock(task);
+			/* raced against de_thread() from another thread? */
+			if (!thread_group_leader(task)) {
+				threadgroup_unlock(task);
+				put_task_struct(task);
+				continue;
+			}
+
+			ret = cgroup_migrate(src_cset->dfl_cgrp, task, true);
+
+			threadgroup_unlock(task);
+			put_task_struct(task);
+
+			if (WARN(ret, "cgroup: failed to update controllers for the default hierarchy (%d), further operations may crash or hang\n", ret))
+				goto out_finish;
+		}
+	}
+
+out_finish:
+	cgroup_migrate_finish(&preloaded_csets);
+	return ret;
+}
+
+/* change the enabled child controllers for a cgroup in the default hierarchy */
+static int cgroup_subtree_control_write(struct cgroup_subsys_state *dummy_css,
+					struct cftype *cft, char *buffer)
+{
+	unsigned long enable_req = 0, disable_req = 0, enable, disable;
+	struct cgroup *cgrp = dummy_css->cgroup, *child;
+	struct cgroup_subsys *ss;
+	char *tok, *p;
+	int ssid, ret;
+
+	/*
+	 * Parse input - white space separated list of subsystem names
+	 * prefixed with either + or -.
+	 */
+	p = buffer;
+	while ((tok = strsep(&p, " \t\n"))) {
+		for_each_subsys(ss, ssid) {
+			if (ss->disabled || strcmp(tok + 1, ss->name))
+				continue;
+
+			if (*tok == '+') {
+				enable_req |= 1 << ssid;
+				disable_req &= ~(1 << ssid);
+			} else if (*tok == '-') {
+				disable_req |= 1 << ssid;
+				enable_req &= ~(1 << ssid);
+			} else {
+				return -EINVAL;
+			}
+			break;
+		}
+		if (ssid == CGROUP_SUBSYS_COUNT)
+			return -EINVAL;
+	}
+
+	/*
+	 * We're gonna grab cgroup_tree_mutex which nests outside kernfs
+	 * active_ref.  cgroup_lock_live_group() already provides enough
+	 * protection.  Ensure @cgrp stays accessible and break the
+	 * active_ref protection.
+	 */
+	cgroup_get(cgrp);
+	kernfs_break_active_protection(cgrp->control_kn);
+retry:
+	enable = enable_req;
+	disable = disable_req;
+
+	mutex_lock(&cgroup_tree_mutex);
+
+	for_each_subsys(ss, ssid) {
+		if (enable & (1 << ssid)) {
+			if (cgrp->child_subsys_mask & (1 << ssid)) {
+				enable &= ~(1 << ssid);
+				continue;
+			}
+
+			/*
+			 * Because css offlining is asynchronous, userland
+			 * might try to re-enable the same controller while
+			 * the previous instance is still around.  In such
+			 * cases, wait till it's gone using offline_waitq.
+			 */
+			cgroup_for_each_live_child(child, cgrp) {
+				wait_queue_t wait;
+
+				if (!cgroup_css(child, ss))
+					continue;
+
+				prepare_to_wait(&child->offline_waitq, &wait,
+						TASK_UNINTERRUPTIBLE);
+				mutex_unlock(&cgroup_tree_mutex);
+				schedule();
+				finish_wait(&child->offline_waitq, &wait);
+				goto retry;
+			}
+
+			/* unavailable or not enabled on the parent? */
+			if (!(cgrp_dfl_root.subsys_mask & (1 << ssid)) ||
+			    (cgrp->parent &&
+			     !(cgrp->parent->child_subsys_mask & (1 << ssid)))) {
+				ret = -ENOENT;
+				goto out_unlock_tree;
+			}
+		} else if (disable & (1 << ssid)) {
+			if (!(cgrp->child_subsys_mask & (1 << ssid))) {
+				disable &= ~(1 << ssid);
+				continue;
+			}
+
+			/* a child has it enabled? */
+			cgroup_for_each_live_child(child, cgrp) {
+				if (child->child_subsys_mask & (1 << ssid)) {
+					ret = -EBUSY;
+					goto out_unlock_tree;
+				}
+			}
+		}
+	}
+
+	if (!enable && !disable) {
+		ret = 0;
+		goto out_unlock_tree;
+	}
+
+	if (!cgroup_lock_live_group(cgrp)) {
+		ret = -ENODEV;
+		goto out_unlock_tree;
+	}
+
+	/*
+	 * Except for the root, child_subsys_mask must be zero for a cgroup
+	 * with tasks so that child cgroups don't compete against tasks.
+	 */
+	if (enable && cgrp->parent && !list_empty(&cgrp->cset_links)) {
+		ret = -EBUSY;
+		goto out_unlock;
+	}
+
+	/*
+	 * Create csses for enables and update child_subsys_mask.  This
+	 * changes cgroup_e_css() results which in turn makes the
+	 * subsequent cgroup_update_dfl_csses() associate all tasks in the
+	 * subtree to the updated csses.
+	 */
+	for_each_subsys(ss, ssid) {
+		if (!(enable & (1 << ssid)))
+			continue;
+
+		cgroup_for_each_live_child(child, cgrp) {
+			ret = create_css(child, ss);
+			if (ret)
+				goto err_undo_css;
+		}
+	}
+
+	cgrp->child_subsys_mask |= enable;
+	cgrp->child_subsys_mask &= ~disable;
+
+	ret = cgroup_update_dfl_csses(cgrp);
+	if (ret)
+		goto err_undo_css;
+
+	/* all tasks are now migrated away from the old csses, kill them */
+	for_each_subsys(ss, ssid) {
+		if (!(disable & (1 << ssid)))
+			continue;
+
+		cgroup_for_each_live_child(child, cgrp)
+			kill_css(cgroup_css(child, ss));
+	}
+
+	kernfs_activate(cgrp->kn);
+	ret = 0;
+out_unlock:
+	mutex_unlock(&cgroup_mutex);
+out_unlock_tree:
+	mutex_unlock(&cgroup_tree_mutex);
+	kernfs_unbreak_active_protection(cgrp->control_kn);
+	cgroup_put(cgrp);
+	return ret;
+
+err_undo_css:
+	cgrp->child_subsys_mask &= ~enable;
+	cgrp->child_subsys_mask |= disable;
+
+	for_each_subsys(ss, ssid) {
+		if (!(enable & (1 << ssid)))
+			continue;
+
+		cgroup_for_each_live_child(child, cgrp) {
+			struct cgroup_subsys_state *css = cgroup_css(child, ss);
+			if (css)
+				kill_css(css);
+		}
+	}
+	goto out_unlock;
+}
+
 static ssize_t cgroup_file_write(struct kernfs_open_file *of, char *buf,
 				 size_t nbytes, loff_t off)
 {
@@ -2442,6 +2773,8 @@ static int cgroup_add_file(struct cgroup *cgrp, struct cftype *cft)
 	kn = __kernfs_create_file(cgrp->kn, cgroup_file_name(cgrp, cft, name),
 				  cgroup_file_mode(cft), 0, cft->kf_ops, cft,
 				  NULL, false, key);
+	if (cft->seq_show == cgroup_subtree_control_show)
+		cgrp->control_kn = kn;
 	return PTR_ERR_OR_ZERO(kn);
 }
 
@@ -3539,6 +3872,17 @@ static struct cftype cgroup_base_files[] = {
 		.flags = CFTYPE_ONLY_ON_ROOT,
 		.seq_show = cgroup_sane_behavior_show,
 	},
+	{
+		.name = "cgroup.controllers",
+		.flags = CFTYPE_ONLY_ON_DFL | CFTYPE_ONLY_ON_ROOT,
+		.seq_show = cgroup_controllers_show,
+	},
+	{
+		.name = "cgroup.subtree_control",
+		.flags = CFTYPE_ONLY_ON_DFL,
+		.seq_show = cgroup_subtree_control_show,
+		.write_string = cgroup_subtree_control_write,
+	},
 
 	/*
 	 * Historical crazy stuff.  These don't have "cgroup."  prefix and
@@ -3707,6 +4051,8 @@ static void offline_css(struct cgroup_subsys_state *css)
 	css->flags &= ~CSS_ONLINE;
 	css->cgroup->nr_css--;
 	RCU_INIT_POINTER(css->cgroup->subsys[ss->id], NULL);
+
+	wake_up_all(&css->cgroup->offline_waitq);
 }
 
 /**
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 12/12] cgroup: implement dynamic subtree controller enable/disable on the default hierarchy
@ 2014-03-28  2:41   ` Tejun Heo
  0 siblings, 0 replies; 52+ messages in thread
From: Tejun Heo @ 2014-03-28  2:41 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo

cgroup is switching away from multiple hierarchies and will use one
unified default hierarchy where controllers can be dynamically enabled
and disabled per subtree.  The default hierarchy will serve as the
unified hierarchy to which all controllers are attached and a css on
the default hierarchy would need to also serve the tasks of descendant
cgroups which don't have the controller enabled - ie. the tree may be
collapsed from leaf towards root when viewed from specific
controllers.  This has been implemented through effective css in the
previous patches.

This patch finally implements dynamic subtree controller
enable/disable on the default hierarchy via a new knob -
"cgroup.subtree_control" which controls which controllers are enabled
on the child cgroups.  Let's assume a hierarchy like the following.

  root - A - B - C
               \ D

root's "cgroup.subtree_control" determines which controllers are
enabled on A.  A's on B.  B's on C and D.  This coincides with the
fact that controllers on the immediate sub-level are used to
distribute the resources of the parent.  In fact, it's natural to
assume that resource control knobs of a child belong to its parent.
Enabling a controller in "cgroup.subtree_control" declares that
distribution of the respective resources of the cgroup will be
controlled.  Note that this means that controller enable states are
shared among siblings.

The default hierarchy has an extra restriction - only cgroups which
don't contain any task may have controllers enabled in
"cgroup.subtree_control".  Combined with the other properties of the
default hierarchy, this guarantees that, from the view point of
controllers, tasks are only on the leaf cgroups.  In other words, only
leaf csses may contain tasks.  This rules out situations where child
cgroups compete against internal tasks of the parent, which is a
competition between two different types of entities without any clear
way to determine resource distribution between the two.  Different
controllers handle it differently and all the implemented behaviors
are ambiguous, ad-hoc, cumbersome and/or just wrong.  Having this
structural constraints imposed from cgroup core removes the burden
from controller implementations and enables showing one consistent
behavior across all controllers.

When a controller is enabled or disabled, css associations for the
controller in the subtrees of each child should be updated.  After
enabling, the whole subtree of a child should point to the new css of
the child.  After disabling, the whole subtree of a child should point
to the cgroup's css.  This is implemented by first updating cgroup
states such that cgroup_e_css() result points to the appropriate css
and then invoking cgroup_update_dfl_csses() which migrates all tasks
in the affected subtrees to the self cgroup on the default hierarchy.

* When read, "cgroup.subtree_control" lists all the currently enabled
  controllers on the children of the cgroup.

* White-space separated list of controller names prefixed with either
  '+' or '-' can be written to "cgroup.subtree_control".  The ones
  prefixed with '+' are enabled on the controller and '-' disabled.

* A controller can be enabled iff the parent's
  "cgroup.subtree_control" enables it and disabled iff no child's
  "cgroup.subtree_control" has it enabled.

* If a cgroup has tasks, no controller can be enabled via
  "cgroup.subtree_control".  Likewise, if "cgroup.subtree_control" has
  some controllers enabled, tasks can't be migrated into the cgroup.

* All controllers which aren't bound on other hierarchies are
  automatically associated with the root cgroup of the default
  hierarchy.  All the controllers which are bound to the default
  hierarchy are listed in the read-only file "cgroup.controllers" in
  the root directory.

This is still experimental and there are some holes, one of which is
that ->can_attach() failure during cgroup_update_dfl_csses() may leave
the cgroups in an undefined state.  The issues will be addressed by
future patches.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 include/linux/cgroup.h |   5 +
 kernel/cgroup.c        | 346 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 351 insertions(+)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index b6f5e40..dee6f3c 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -21,6 +21,7 @@
 #include <linux/percpu-refcount.h>
 #include <linux/seq_file.h>
 #include <linux/kernfs.h>
+#include <linux/wait.h>
 
 #ifdef CONFIG_CGROUPS
 
@@ -164,6 +165,7 @@ struct cgroup {
 
 	struct cgroup *parent;		/* my parent */
 	struct kernfs_node *kn;		/* cgroup kernfs entry */
+	struct kernfs_node *control_kn;	/* kn for "cgroup.subtree_control" */
 
 	/*
 	 * Monotonically increasing unique serial number which defines a
@@ -216,6 +218,9 @@ struct cgroup {
 	/* For css percpu_ref killing and RCU-protected deletion */
 	struct rcu_head rcu_head;
 	struct work_struct destroy_work;
+
+	/* used to wait for offlining of csses */
+	wait_queue_head_t offline_waitq;
 };
 
 #define MAX_CGROUP_ROOT_NAMELEN 64
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 5faa115..4e958c7 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -182,6 +182,8 @@ static int rebind_subsystems(struct cgroup_root *dst_root,
 			     unsigned long ss_mask);
 static void cgroup_destroy_css_killed(struct cgroup *cgrp);
 static int cgroup_destroy_locked(struct cgroup *cgrp);
+static int create_css(struct cgroup *cgrp, struct cgroup_subsys *ss);
+static void kill_css(struct cgroup_subsys_state *css);
 static int cgroup_addrm_files(struct cgroup *cgrp, struct cftype cfts[],
 			      bool is_add);
 static void cgroup_pidlist_destroy_all(struct cgroup *cgrp);
@@ -338,6 +340,14 @@ static int notify_on_release(const struct cgroup *cgrp)
 #define for_each_root(root)						\
 	list_for_each_entry((root), &cgroup_roots, root_list)
 
+/* iterate over child cgrps, lock should be held throughout iteration */
+#define cgroup_for_each_live_child(child, cgrp)				\
+	list_for_each_entry((child), &(cgrp)->children, sibling)	\
+		if (({ lockdep_assert_held(&cgroup_tree_mutex);		\
+		       cgroup_is_dead(child); }))			\
+			;						\
+		else
+
 /**
  * cgroup_lock_live_group - take cgroup_mutex and check that cgrp is alive.
  * @cgrp: the cgroup to be checked for liveness
@@ -1450,6 +1460,8 @@ static void init_cgroup_housekeeping(struct cgroup *cgrp)
 
 	for_each_subsys(ss, ssid)
 		INIT_LIST_HEAD(&cgrp->e_csets[ssid]);
+
+	init_waitqueue_head(&cgrp->offline_waitq);
 }
 
 static void init_cgroup_root(struct cgroup_root *root,
@@ -1937,6 +1949,14 @@ static int cgroup_migrate_prepare_dst(struct cgroup *dst_cgrp,
 
 	lockdep_assert_held(&cgroup_mutex);
 
+	/*
+	 * Except for the root, child_subsys_mask must be zero for a cgroup
+	 * with tasks so that child cgroups don't compete against tasks.
+	 */
+	if (dst_cgrp && cgroup_on_dfl(dst_cgrp) && dst_cgrp->parent &&
+	    dst_cgrp->child_subsys_mask)
+		return -EBUSY;
+
 	/* look up the dst cset for each src cset and link it to src */
 	list_for_each_entry_safe(src_cset, tmp_cset, preloaded_csets, mg_preload_node) {
 		struct css_set *dst_cset;
@@ -2302,6 +2322,317 @@ static int cgroup_sane_behavior_show(struct seq_file *seq, void *v)
 	return 0;
 }
 
+static void cgroup_print_ss_mask(struct seq_file *seq, unsigned int ss_mask)
+{
+	struct cgroup_subsys *ss;
+	bool printed = false;
+	int ssid;
+
+	for_each_subsys(ss, ssid) {
+		if (ss_mask & (1 << ssid)) {
+			if (printed)
+				seq_putc(seq, ' ');
+			seq_printf(seq, "%s", ss->name);
+			printed = true;
+		}
+	}
+	if (printed)
+		seq_putc(seq, '\n');
+}
+
+/* show controllers which are currently attached to the default hierarchy */
+static int cgroup_controllers_show(struct seq_file *seq, void *v)
+{
+	struct cgroup *cgrp = seq_css(seq)->cgroup;
+
+	cgroup_print_ss_mask(seq, cgrp->root->subsys_mask);
+	return 0;
+}
+
+/* show controllers which are enabled for a given cgroup's children */
+static int cgroup_subtree_control_show(struct seq_file *seq, void *v)
+{
+	struct cgroup *cgrp = seq_css(seq)->cgroup;
+
+	cgroup_print_ss_mask(seq, cgrp->child_subsys_mask);
+	return 0;
+}
+
+/**
+ * cgroup_update_dfl_csses - update css assoc of a subtree in default hierarchy
+ * @cgrp: root of the subtree to update csses for
+ *
+ * @cgrp's child_subsys_mask has changed and its subtree's (self excluded)
+ * css associations need to be updated accordingly.  This function looks up
+ * all css_sets which are attached to the subtree, creates the matching
+ * updated css_sets and migrates the tasks to the new ones.
+ */
+static int cgroup_update_dfl_csses(struct cgroup *cgrp)
+{
+	LIST_HEAD(preloaded_csets);
+	struct cgroup_subsys_state *css;
+	struct css_set *src_cset;
+	int ret;
+
+	lockdep_assert_held(&cgroup_tree_mutex);
+	lockdep_assert_held(&cgroup_mutex);
+
+	/* look up all csses currently attached to @cgrp's subtree */
+	down_read(&css_set_rwsem);
+	css_for_each_descendant_pre(css, cgroup_css(cgrp, NULL)) {
+		struct cgrp_cset_link *link;
+
+		/* self is not affected by child_subsys_mask change */
+		if (css->cgroup == cgrp)
+			continue;
+
+		list_for_each_entry(link, &css->cgroup->cset_links, cset_link)
+			cgroup_migrate_add_src(link->cset, cgrp,
+					       &preloaded_csets);
+	}
+	up_read(&css_set_rwsem);
+
+	/* NULL dst indicates self on default hierarchy */
+	ret = cgroup_migrate_prepare_dst(NULL, &preloaded_csets);
+	if (ret)
+		goto out_finish;
+
+	list_for_each_entry(src_cset, &preloaded_csets, mg_preload_node) {
+		struct task_struct *last_task = NULL, *task;
+
+		/* src_csets precede dst_csets, break on the first dst_cset */
+		if (!src_cset->mg_src_cgrp)
+			break;
+
+		/*
+		 * All tasks in src_cset need to be migrated to the
+		 * matching dst_cset.  Empty it process by process.  We
+		 * walk tasks but migrate processes.  The leader might even
+		 * belong to a different cset but such src_cset would also
+		 * be among the target src_csets because the default
+		 * hierarchy enforces per-process membership.
+		 */
+		while (true) {
+			down_read(&css_set_rwsem);
+			task = list_first_entry_or_null(&src_cset->tasks,
+						struct task_struct, cg_list);
+			if (task) {
+				task = task->group_leader;
+				WARN_ON_ONCE(!task_css_set(task)->mg_src_cgrp);
+				get_task_struct(task);
+			}
+			up_read(&css_set_rwsem);
+
+			if (!task)
+				break;
+
+			/* guard against possible infinite loop */
+			if (WARN(last_task == task,
+				 "cgroup: update_dfl_csses failed to make progress, aborting in inconsistent state\n"))
+				goto out_finish;
+			last_task = task;
+
+			threadgroup_lock(task);
+			/* raced against de_thread() from another thread? */
+			if (!thread_group_leader(task)) {
+				threadgroup_unlock(task);
+				put_task_struct(task);
+				continue;
+			}
+
+			ret = cgroup_migrate(src_cset->dfl_cgrp, task, true);
+
+			threadgroup_unlock(task);
+			put_task_struct(task);
+
+			if (WARN(ret, "cgroup: failed to update controllers for the default hierarchy (%d), further operations may crash or hang\n", ret))
+				goto out_finish;
+		}
+	}
+
+out_finish:
+	cgroup_migrate_finish(&preloaded_csets);
+	return ret;
+}
+
+/* change the enabled child controllers for a cgroup in the default hierarchy */
+static int cgroup_subtree_control_write(struct cgroup_subsys_state *dummy_css,
+					struct cftype *cft, char *buffer)
+{
+	unsigned long enable_req = 0, disable_req = 0, enable, disable;
+	struct cgroup *cgrp = dummy_css->cgroup, *child;
+	struct cgroup_subsys *ss;
+	char *tok, *p;
+	int ssid, ret;
+
+	/*
+	 * Parse input - white space separated list of subsystem names
+	 * prefixed with either + or -.
+	 */
+	p = buffer;
+	while ((tok = strsep(&p, " \t\n"))) {
+		for_each_subsys(ss, ssid) {
+			if (ss->disabled || strcmp(tok + 1, ss->name))
+				continue;
+
+			if (*tok == '+') {
+				enable_req |= 1 << ssid;
+				disable_req &= ~(1 << ssid);
+			} else if (*tok == '-') {
+				disable_req |= 1 << ssid;
+				enable_req &= ~(1 << ssid);
+			} else {
+				return -EINVAL;
+			}
+			break;
+		}
+		if (ssid == CGROUP_SUBSYS_COUNT)
+			return -EINVAL;
+	}
+
+	/*
+	 * We're gonna grab cgroup_tree_mutex which nests outside kernfs
+	 * active_ref.  cgroup_lock_live_group() already provides enough
+	 * protection.  Ensure @cgrp stays accessible and break the
+	 * active_ref protection.
+	 */
+	cgroup_get(cgrp);
+	kernfs_break_active_protection(cgrp->control_kn);
+retry:
+	enable = enable_req;
+	disable = disable_req;
+
+	mutex_lock(&cgroup_tree_mutex);
+
+	for_each_subsys(ss, ssid) {
+		if (enable & (1 << ssid)) {
+			if (cgrp->child_subsys_mask & (1 << ssid)) {
+				enable &= ~(1 << ssid);
+				continue;
+			}
+
+			/*
+			 * Because css offlining is asynchronous, userland
+			 * might try to re-enable the same controller while
+			 * the previous instance is still around.  In such
+			 * cases, wait till it's gone using offline_waitq.
+			 */
+			cgroup_for_each_live_child(child, cgrp) {
+				wait_queue_t wait;
+
+				if (!cgroup_css(child, ss))
+					continue;
+
+				prepare_to_wait(&child->offline_waitq, &wait,
+						TASK_UNINTERRUPTIBLE);
+				mutex_unlock(&cgroup_tree_mutex);
+				schedule();
+				finish_wait(&child->offline_waitq, &wait);
+				goto retry;
+			}
+
+			/* unavailable or not enabled on the parent? */
+			if (!(cgrp_dfl_root.subsys_mask & (1 << ssid)) ||
+			    (cgrp->parent &&
+			     !(cgrp->parent->child_subsys_mask & (1 << ssid)))) {
+				ret = -ENOENT;
+				goto out_unlock_tree;
+			}
+		} else if (disable & (1 << ssid)) {
+			if (!(cgrp->child_subsys_mask & (1 << ssid))) {
+				disable &= ~(1 << ssid);
+				continue;
+			}
+
+			/* a child has it enabled? */
+			cgroup_for_each_live_child(child, cgrp) {
+				if (child->child_subsys_mask & (1 << ssid)) {
+					ret = -EBUSY;
+					goto out_unlock_tree;
+				}
+			}
+		}
+	}
+
+	if (!enable && !disable) {
+		ret = 0;
+		goto out_unlock_tree;
+	}
+
+	if (!cgroup_lock_live_group(cgrp)) {
+		ret = -ENODEV;
+		goto out_unlock_tree;
+	}
+
+	/*
+	 * Except for the root, child_subsys_mask must be zero for a cgroup
+	 * with tasks so that child cgroups don't compete against tasks.
+	 */
+	if (enable && cgrp->parent && !list_empty(&cgrp->cset_links)) {
+		ret = -EBUSY;
+		goto out_unlock;
+	}
+
+	/*
+	 * Create csses for enables and update child_subsys_mask.  This
+	 * changes cgroup_e_css() results which in turn makes the
+	 * subsequent cgroup_update_dfl_csses() associate all tasks in the
+	 * subtree to the updated csses.
+	 */
+	for_each_subsys(ss, ssid) {
+		if (!(enable & (1 << ssid)))
+			continue;
+
+		cgroup_for_each_live_child(child, cgrp) {
+			ret = create_css(child, ss);
+			if (ret)
+				goto err_undo_css;
+		}
+	}
+
+	cgrp->child_subsys_mask |= enable;
+	cgrp->child_subsys_mask &= ~disable;
+
+	ret = cgroup_update_dfl_csses(cgrp);
+	if (ret)
+		goto err_undo_css;
+
+	/* all tasks are now migrated away from the old csses, kill them */
+	for_each_subsys(ss, ssid) {
+		if (!(disable & (1 << ssid)))
+			continue;
+
+		cgroup_for_each_live_child(child, cgrp)
+			kill_css(cgroup_css(child, ss));
+	}
+
+	kernfs_activate(cgrp->kn);
+	ret = 0;
+out_unlock:
+	mutex_unlock(&cgroup_mutex);
+out_unlock_tree:
+	mutex_unlock(&cgroup_tree_mutex);
+	kernfs_unbreak_active_protection(cgrp->control_kn);
+	cgroup_put(cgrp);
+	return ret;
+
+err_undo_css:
+	cgrp->child_subsys_mask &= ~enable;
+	cgrp->child_subsys_mask |= disable;
+
+	for_each_subsys(ss, ssid) {
+		if (!(enable & (1 << ssid)))
+			continue;
+
+		cgroup_for_each_live_child(child, cgrp) {
+			struct cgroup_subsys_state *css = cgroup_css(child, ss);
+			if (css)
+				kill_css(css);
+		}
+	}
+	goto out_unlock;
+}
+
 static ssize_t cgroup_file_write(struct kernfs_open_file *of, char *buf,
 				 size_t nbytes, loff_t off)
 {
@@ -2442,6 +2773,8 @@ static int cgroup_add_file(struct cgroup *cgrp, struct cftype *cft)
 	kn = __kernfs_create_file(cgrp->kn, cgroup_file_name(cgrp, cft, name),
 				  cgroup_file_mode(cft), 0, cft->kf_ops, cft,
 				  NULL, false, key);
+	if (cft->seq_show == cgroup_subtree_control_show)
+		cgrp->control_kn = kn;
 	return PTR_ERR_OR_ZERO(kn);
 }
 
@@ -3539,6 +3872,17 @@ static struct cftype cgroup_base_files[] = {
 		.flags = CFTYPE_ONLY_ON_ROOT,
 		.seq_show = cgroup_sane_behavior_show,
 	},
+	{
+		.name = "cgroup.controllers",
+		.flags = CFTYPE_ONLY_ON_DFL | CFTYPE_ONLY_ON_ROOT,
+		.seq_show = cgroup_controllers_show,
+	},
+	{
+		.name = "cgroup.subtree_control",
+		.flags = CFTYPE_ONLY_ON_DFL,
+		.seq_show = cgroup_subtree_control_show,
+		.write_string = cgroup_subtree_control_write,
+	},
 
 	/*
 	 * Historical crazy stuff.  These don't have "cgroup."  prefix and
@@ -3707,6 +4051,8 @@ static void offline_css(struct cgroup_subsys_state *css)
 	css->flags &= ~CSS_ONLINE;
 	css->cgroup->nr_css--;
 	RCU_INIT_POINTER(css->cgroup->subsys[ss->id], NULL);
+
+	wake_up_all(&css->cgroup->offline_waitq);
 }
 
 /**
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy
  2014-03-28  2:40 ` Tejun Heo
@ 2014-04-14 15:45     ` Vivek Goyal
  -1 siblings, 0 replies; 52+ messages in thread
From: Vivek Goyal @ 2014-04-14 15:45 UTC (permalink / raw)
  To: Tejun Heo
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Thu, Mar 27, 2014 at 10:40:49PM -0400, Tejun Heo wrote:

[..]
> This patchset finally implements the default unified hierarchy.  The
> goal is providing enough flexibility while enforcing stricter common
> structure where appropriate to address the above listed issues.
> 
> Controllers which aren't bound to other hierarchies are
> automatically attached to the unified hierarchy,

Hi Tejun,

So this patchset does not *enforce* a single hierarchy? So user space
can still mount other hierarchies.

> which is different in
> that controllers are enabled explicitly for each subtree.
> "cgroup.subtree_control" controls which controllers are enabled on the
> child cgroups.  Let's assume a hierarchy like the following.
> 
>   root - A - B - C
>                \ D
> 
> root's "cgroup.subtree_control" determines which controllers are
> enabled on A.  A's on B.  B's on C and D.  This coincides with the
> fact that controllers on the immediate sub-level are used to
> distribute the resources of the parent.  In fact, it's natural to
> assume that resource control knobs of a child belong to its parent.
> Enabling a controller in "cgroup.subtree_control" declares that
> distribution of the respective resources of the cgroup will be
> controlled.  Note that this means that controller enable states are
> shared among siblings.
> 
> The default hierarchy has an extra restriction - only cgroups which
> don't contain any task may have controllers enabled in
> "cgroup.subtree_control".  Combined with the other properties of the
> default hierarchy, this guarantees that, from the view point of
> controllers, tasks are only on the leaf cgroups.  In other words, only
> leaf csses may contain tasks.  This rules out situations where child
> cgroups compete against internal tasks of the parent.

How does this work for root's tasks now? Given that task can only be
in leaf cgroups, that means tasks can't be in / cgroup (If one wants
to create some cgroups). Does that mean / will be empty and init system
need to setup things right.

Before one can create child cgroups, there will be some processes
already running (init itlsef, kernel threads etc). I am assuming they
will be running in /. So how would that work.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy
@ 2014-04-14 15:45     ` Vivek Goyal
  0 siblings, 0 replies; 52+ messages in thread
From: Vivek Goyal @ 2014-04-14 15:45 UTC (permalink / raw)
  To: Tejun Heo; +Cc: lizefan, containers, cgroups, linux-kernel

On Thu, Mar 27, 2014 at 10:40:49PM -0400, Tejun Heo wrote:

[..]
> This patchset finally implements the default unified hierarchy.  The
> goal is providing enough flexibility while enforcing stricter common
> structure where appropriate to address the above listed issues.
> 
> Controllers which aren't bound to other hierarchies are
> automatically attached to the unified hierarchy,

Hi Tejun,

So this patchset does not *enforce* a single hierarchy? So user space
can still mount other hierarchies.

> which is different in
> that controllers are enabled explicitly for each subtree.
> "cgroup.subtree_control" controls which controllers are enabled on the
> child cgroups.  Let's assume a hierarchy like the following.
> 
>   root - A - B - C
>                \ D
> 
> root's "cgroup.subtree_control" determines which controllers are
> enabled on A.  A's on B.  B's on C and D.  This coincides with the
> fact that controllers on the immediate sub-level are used to
> distribute the resources of the parent.  In fact, it's natural to
> assume that resource control knobs of a child belong to its parent.
> Enabling a controller in "cgroup.subtree_control" declares that
> distribution of the respective resources of the cgroup will be
> controlled.  Note that this means that controller enable states are
> shared among siblings.
> 
> The default hierarchy has an extra restriction - only cgroups which
> don't contain any task may have controllers enabled in
> "cgroup.subtree_control".  Combined with the other properties of the
> default hierarchy, this guarantees that, from the view point of
> controllers, tasks are only on the leaf cgroups.  In other words, only
> leaf csses may contain tasks.  This rules out situations where child
> cgroups compete against internal tasks of the parent.

How does this work for root's tasks now? Given that task can only be
in leaf cgroups, that means tasks can't be in / cgroup (If one wants
to create some cgroups). Does that mean / will be empty and init system
need to setup things right.

Before one can create child cgroups, there will be some processes
already running (init itlsef, kernel threads etc). I am assuming they
will be running in /. So how would that work.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy
  2014-04-14 15:45     ` Vivek Goyal
@ 2014-04-14 17:52         ` Tejun Heo
  -1 siblings, 0 replies; 52+ messages in thread
From: Tejun Heo @ 2014-04-14 17:52 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Hello,

On Mon, Apr 14, 2014 at 11:45:56AM -0400, Vivek Goyal wrote:
> So this patchset does not *enforce* a single hierarchy? So user space
> can still mount other hierarchies.

Nope, nothing is forced.  Controllers may be moved between unified and
traditional multiple hierarchies.

> How does this work for root's tasks now? Given that task can only be
> in leaf cgroups, that means tasks can't be in / cgroup (If one wants
> to create some cgroups). Does that mean / will be empty and init system
> need to setup things right.

Root is exempt from the restriction.  Root has always been special
anyway.

I'll soon post a document describing the design and restrictions of
unified hierarchy along with an updated version of this patchset.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy
@ 2014-04-14 17:52         ` Tejun Heo
  0 siblings, 0 replies; 52+ messages in thread
From: Tejun Heo @ 2014-04-14 17:52 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: lizefan, containers, cgroups, linux-kernel

Hello,

On Mon, Apr 14, 2014 at 11:45:56AM -0400, Vivek Goyal wrote:
> So this patchset does not *enforce* a single hierarchy? So user space
> can still mount other hierarchies.

Nope, nothing is forced.  Controllers may be moved between unified and
traditional multiple hierarchies.

> How does this work for root's tasks now? Given that task can only be
> in leaf cgroups, that means tasks can't be in / cgroup (If one wants
> to create some cgroups). Does that mean / will be empty and init system
> need to setup things right.

Root is exempt from the restriction.  Root has always been special
anyway.

I'll soon post a document describing the design and restrictions of
unified hierarchy along with an updated version of this patchset.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy
  2014-04-14 17:52         ` Tejun Heo
@ 2014-04-14 18:14             ` Vivek Goyal
  -1 siblings, 0 replies; 52+ messages in thread
From: Vivek Goyal @ 2014-04-14 18:14 UTC (permalink / raw)
  To: Tejun Heo
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Mon, Apr 14, 2014 at 01:52:36PM -0400, Tejun Heo wrote:
> Hello,
> 
> On Mon, Apr 14, 2014 at 11:45:56AM -0400, Vivek Goyal wrote:
> > So this patchset does not *enforce* a single hierarchy? So user space
> > can still mount other hierarchies.
> 
> Nope, nothing is forced.  Controllers may be moved between unified and
> traditional multiple hierarchies.

So is this an intermediate mode before we move to single hiearchy *only*
mode. AFAIK, you had mentioned that we will support legacy multiple
hiearchy mode but single hiearchy is the new default mode.

How does one figure out which one is unified hiearchy (out of multiple
hierarchies). Is there a flag somewhere? Or sane flag will be used to
mark unified hiearchy.

> 
> > How does this work for root's tasks now? Given that task can only be
> > in leaf cgroups, that means tasks can't be in / cgroup (If one wants
> > to create some cgroups). Does that mean / will be empty and init system
> > need to setup things right.
> 
> Root is exempt from the restriction.  Root has always been special
> anyway.

So task and child groups can co-exist in root *only*?

> 
> I'll soon post a document describing the design and restrictions of
> unified hierarchy along with an updated version of this patchset.

That would help a lot in understanding various modes and that in
turn will help with patch reviews too.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy
@ 2014-04-14 18:14             ` Vivek Goyal
  0 siblings, 0 replies; 52+ messages in thread
From: Vivek Goyal @ 2014-04-14 18:14 UTC (permalink / raw)
  To: Tejun Heo; +Cc: lizefan, containers, cgroups, linux-kernel

On Mon, Apr 14, 2014 at 01:52:36PM -0400, Tejun Heo wrote:
> Hello,
> 
> On Mon, Apr 14, 2014 at 11:45:56AM -0400, Vivek Goyal wrote:
> > So this patchset does not *enforce* a single hierarchy? So user space
> > can still mount other hierarchies.
> 
> Nope, nothing is forced.  Controllers may be moved between unified and
> traditional multiple hierarchies.

So is this an intermediate mode before we move to single hiearchy *only*
mode. AFAIK, you had mentioned that we will support legacy multiple
hiearchy mode but single hiearchy is the new default mode.

How does one figure out which one is unified hiearchy (out of multiple
hierarchies). Is there a flag somewhere? Or sane flag will be used to
mark unified hiearchy.

> 
> > How does this work for root's tasks now? Given that task can only be
> > in leaf cgroups, that means tasks can't be in / cgroup (If one wants
> > to create some cgroups). Does that mean / will be empty and init system
> > need to setup things right.
> 
> Root is exempt from the restriction.  Root has always been special
> anyway.

So task and child groups can co-exist in root *only*?

> 
> I'll soon post a document describing the design and restrictions of
> unified hierarchy along with an updated version of this patchset.

That would help a lot in understanding various modes and that in
turn will help with patch reviews too.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy
  2014-04-14 18:14             ` Vivek Goyal
@ 2014-04-14 19:21                 ` Tejun Heo
  -1 siblings, 0 replies; 52+ messages in thread
From: Tejun Heo @ 2014-04-14 19:21 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Hello,

On Mon, Apr 14, 2014 at 02:14:21PM -0400, Vivek Goyal wrote:
> So is this an intermediate mode before we move to single hiearchy *only*
> mode. AFAIK, you had mentioned that we will support legacy multiple
> hiearchy mode but single hiearchy is the new default mode.

Hmmm?  It's a separate special hierarchy which can co-exist with other
hierarchies.  There's no inherent intermediacy to it.  As long as
traditional hierarchies exist, it's gonna be like this.

> How does one figure out which one is unified hiearchy (out of multiple
> hierarchies). Is there a flag somewhere? Or sane flag will be used to
> mark unified hiearchy.

You can tell it multiple ways - from its mount options, whether
cgroup.controllers exists, but it'll most likely be mounted on a
distinctive mount point, wouldn't it?

> So task and child groups can co-exist in root *only*?

Note that the restriction doesn't apply if no controller is enabled
but yes root is the only cgroup where tasks can exist and controllers
are enabled at the same time.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy
@ 2014-04-14 19:21                 ` Tejun Heo
  0 siblings, 0 replies; 52+ messages in thread
From: Tejun Heo @ 2014-04-14 19:21 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: lizefan, containers, cgroups, linux-kernel

Hello,

On Mon, Apr 14, 2014 at 02:14:21PM -0400, Vivek Goyal wrote:
> So is this an intermediate mode before we move to single hiearchy *only*
> mode. AFAIK, you had mentioned that we will support legacy multiple
> hiearchy mode but single hiearchy is the new default mode.

Hmmm?  It's a separate special hierarchy which can co-exist with other
hierarchies.  There's no inherent intermediacy to it.  As long as
traditional hierarchies exist, it's gonna be like this.

> How does one figure out which one is unified hiearchy (out of multiple
> hierarchies). Is there a flag somewhere? Or sane flag will be used to
> mark unified hiearchy.

You can tell it multiple ways - from its mount options, whether
cgroup.controllers exists, but it'll most likely be mounted on a
distinctive mount point, wouldn't it?

> So task and child groups can co-exist in root *only*?

Note that the restriction doesn't apply if no controller is enabled
but yes root is the only cgroup where tasks can exist and controllers
are enabled at the same time.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy
  2014-03-28  2:40 ` Tejun Heo
@ 2014-04-14 21:31     ` Tejun Heo
  -1 siblings, 0 replies; 52+ messages in thread
From: Tejun Heo @ 2014-04-14 21:31 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Thu, Mar 27, 2014 at 10:40:49PM -0400, Tejun Heo wrote:
> The patchset is on top of cgroup/for-3.15 01a971406177 ("cgroup: Use
> RCU_INIT_POINTER(x, NULL) in cgroup.c") and also available in the
> following git branch.
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-unified

Branch renamed to review-unified-v1.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy
@ 2014-04-14 21:31     ` Tejun Heo
  0 siblings, 0 replies; 52+ messages in thread
From: Tejun Heo @ 2014-04-14 21:31 UTC (permalink / raw)
  To: lizefan; +Cc: containers, cgroups, linux-kernel

On Thu, Mar 27, 2014 at 10:40:49PM -0400, Tejun Heo wrote:
> The patchset is on top of cgroup/for-3.15 01a971406177 ("cgroup: Use
> RCU_INIT_POINTER(x, NULL) in cgroup.c") and also available in the
> following git branch.
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-unified

Branch renamed to review-unified-v1.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy
  2014-04-14 19:21                 ` Tejun Heo
@ 2014-04-15  2:58                     ` Li Zefan
  -1 siblings, 0 replies; 52+ messages in thread
From: Li Zefan @ 2014-04-15  2:58 UTC (permalink / raw)
  To: Tejun Heo
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Vivek Goyal

on 2014/4/15 3:21, Tejun Heo wrote:
> Hello,
> 
> On Mon, Apr 14, 2014 at 02:14:21PM -0400, Vivek Goyal wrote:
>> So is this an intermediate mode before we move to single hiearchy *only*
>> mode. AFAIK, you had mentioned that we will support legacy multiple
>> hiearchy mode but single hiearchy is the new default mode.
> 
> Hmmm?  It's a separate special hierarchy which can co-exist with other
> hierarchies.  There's no inherent intermediacy to it.  As long as
> traditional hierarchies exist, it's gonna be like this.
> 

I know we should allow multiple hierarchies, but why we allow this even
if sane_behavior is specified?

Like this:

# mount -t cgroup -o cpuset,__DEVEL__sane_behavior xxx /cgroup
# mount -t cgroup -o __DEVEL__sane_behavior xxx /cgroup2/

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy
@ 2014-04-15  2:58                     ` Li Zefan
  0 siblings, 0 replies; 52+ messages in thread
From: Li Zefan @ 2014-04-15  2:58 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Vivek Goyal, containers, cgroups, linux-kernel

on 2014/4/15 3:21, Tejun Heo wrote:
> Hello,
> 
> On Mon, Apr 14, 2014 at 02:14:21PM -0400, Vivek Goyal wrote:
>> So is this an intermediate mode before we move to single hiearchy *only*
>> mode. AFAIK, you had mentioned that we will support legacy multiple
>> hiearchy mode but single hiearchy is the new default mode.
> 
> Hmmm?  It's a separate special hierarchy which can co-exist with other
> hierarchies.  There's no inherent intermediacy to it.  As long as
> traditional hierarchies exist, it's gonna be like this.
> 

I know we should allow multiple hierarchies, but why we allow this even
if sane_behavior is specified?

Like this:

# mount -t cgroup -o cpuset,__DEVEL__sane_behavior xxx /cgroup
# mount -t cgroup -o __DEVEL__sane_behavior xxx /cgroup2/



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy
  2014-04-15  2:58                     ` Li Zefan
@ 2014-04-15  4:57                         ` Tejun Heo
  -1 siblings, 0 replies; 52+ messages in thread
From: Tejun Heo @ 2014-04-15  4:57 UTC (permalink / raw)
  To: Li Zefan
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Vivek Goyal

Hello,

On Tue, Apr 15, 2014 at 10:58:24AM +0800, Li Zefan wrote:
> I know we should allow multiple hierarchies, but why we allow this even
> if sane_behavior is specified?

Oh, because we didn't have unified hierarchy before.  The whole
sane_behavior thing is gonna go away.  It'll be absorbed into unified
hierarchy and probably be handled as a separate filesystem type, so
the current situation is transitional.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy
@ 2014-04-15  4:57                         ` Tejun Heo
  0 siblings, 0 replies; 52+ messages in thread
From: Tejun Heo @ 2014-04-15  4:57 UTC (permalink / raw)
  To: Li Zefan; +Cc: Vivek Goyal, containers, cgroups, linux-kernel

Hello,

On Tue, Apr 15, 2014 at 10:58:24AM +0800, Li Zefan wrote:
> I know we should allow multiple hierarchies, but why we allow this even
> if sane_behavior is specified?

Oh, because we didn't have unified hierarchy before.  The whole
sane_behavior thing is gonna go away.  It'll be absorbed into unified
hierarchy and probably be handled as a separate filesystem type, so
the current situation is transitional.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 0.5/12] cgroup: cgroup_apply_cftypes() shouldn't skip the default hierarhcy
       [not found] ` <1395974461-12735-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (13 preceding siblings ...)
  2014-04-14 21:31     ` Tejun Heo
@ 2014-04-15 22:05   ` Tejun Heo
  14 siblings, 0 replies; 52+ messages in thread
From: Tejun Heo @ 2014-04-15 22:05 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

From c7596b2f33e9f7a57cef6909c30fec268e921c3b Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Date: Tue, 15 Apr 2014 18:02:12 -0400

cgroup_apply_cftypes() skip creating or removing files if the
subsystem is attached to the default hierarchy, which led to missing
files in the root of the default hierarchy.

Skipping made sense when the default hierarchy was dummy; however, now
that the default hierarchy is full functional and planned to be used
as the unified hierarchy, it shouldn't be skipped over.

Reported-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 kernel/cgroup.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 9fcdaa7..fbcc710 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2436,10 +2436,6 @@ static int cgroup_apply_cftypes(struct cftype *cfts, bool is_add)
 
 	lockdep_assert_held(&cgroup_tree_mutex);
 
-	/* don't bother if @ss isn't attached */
-	if (ss->root == &cgrp_dfl_root)
-		return 0;
-
 	/* add/rm files for all cgroups created before */
 	css_for_each_descendant_pre(css, cgroup_css(root, ss)) {
 		struct cgroup *cgrp = css->cgroup;
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 0.5/12] cgroup: cgroup_apply_cftypes() shouldn't skip the default hierarhcy
       [not found] ` <1395974461-12735-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2014-04-15 22:05   ` Tejun Heo
  2014-03-28  2:40     ` Tejun Heo
                     ` (13 subsequent siblings)
  14 siblings, 0 replies; 52+ messages in thread
From: Tejun Heo @ 2014-04-15 22:05 UTC (permalink / raw)
  To: lizefan; +Cc: containers, cgroups, linux-kernel

>From c7596b2f33e9f7a57cef6909c30fec268e921c3b Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@kernel.org>
Date: Tue, 15 Apr 2014 18:02:12 -0400

cgroup_apply_cftypes() skip creating or removing files if the
subsystem is attached to the default hierarchy, which led to missing
files in the root of the default hierarchy.

Skipping made sense when the default hierarchy was dummy; however, now
that the default hierarchy is full functional and planned to be used
as the unified hierarchy, it shouldn't be skipped over.

Reported-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/cgroup.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 9fcdaa7..fbcc710 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2436,10 +2436,6 @@ static int cgroup_apply_cftypes(struct cftype *cfts, bool is_add)
 
 	lockdep_assert_held(&cgroup_tree_mutex);
 
-	/* don't bother if @ss isn't attached */
-	if (ss->root == &cgrp_dfl_root)
-		return 0;
-
 	/* add/rm files for all cgroups created before */
 	css_for_each_descendant_pre(css, cgroup_css(root, ss)) {
 		struct cgroup *cgrp = css->cgroup;
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 0.5/12] cgroup: cgroup_apply_cftypes() shouldn't skip the default hierarhcy
@ 2014-04-15 22:05   ` Tejun Heo
  0 siblings, 0 replies; 52+ messages in thread
From: Tejun Heo @ 2014-04-15 22:05 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

From c7596b2f33e9f7a57cef6909c30fec268e921c3b Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Date: Tue, 15 Apr 2014 18:02:12 -0400

cgroup_apply_cftypes() skip creating or removing files if the
subsystem is attached to the default hierarchy, which led to missing
files in the root of the default hierarchy.

Skipping made sense when the default hierarchy was dummy; however, now
that the default hierarchy is full functional and planned to be used
as the unified hierarchy, it shouldn't be skipped over.

Reported-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 kernel/cgroup.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 9fcdaa7..fbcc710 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2436,10 +2436,6 @@ static int cgroup_apply_cftypes(struct cftype *cfts, bool is_add)
 
 	lockdep_assert_held(&cgroup_tree_mutex);
 
-	/* don't bother if @ss isn't attached */
-	if (ss->root == &cgrp_dfl_root)
-		return 0;
-
 	/* add/rm files for all cgroups created before */
 	css_for_each_descendant_pre(css, cgroup_css(root, ss)) {
 		struct cgroup *cgrp = css->cgroup;
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH 0.5/12] cgroup: cgroup_apply_cftypes() shouldn't skip the default hierarhcy
  2014-04-15 22:05   ` Tejun Heo
@ 2014-04-15 22:05       ` Tejun Heo
  -1 siblings, 0 replies; 52+ messages in thread
From: Tejun Heo @ 2014-04-15 22:05 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Tue, Apr 15, 2014 at 06:05:04PM -0400, Tejun Heo wrote:
> From c7596b2f33e9f7a57cef6909c30fec268e921c3b Mon Sep 17 00:00:00 2001
> From: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Date: Tue, 15 Apr 2014 18:02:12 -0400
> 
> cgroup_apply_cftypes() skip creating or removing files if the
> subsystem is attached to the default hierarchy, which led to missing
> files in the root of the default hierarchy.
> 
> Skipping made sense when the default hierarchy was dummy; however, now
> that the default hierarchy is full functional and planned to be used
> as the unified hierarchy, it shouldn't be skipped over.
> 
> Reported-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

Ugh... wrong thread.  Will repost.

thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 0.5/12] cgroup: cgroup_apply_cftypes() shouldn't skip the default hierarhcy
@ 2014-04-15 22:05       ` Tejun Heo
  0 siblings, 0 replies; 52+ messages in thread
From: Tejun Heo @ 2014-04-15 22:05 UTC (permalink / raw)
  To: lizefan; +Cc: containers, cgroups, linux-kernel

On Tue, Apr 15, 2014 at 06:05:04PM -0400, Tejun Heo wrote:
> From c7596b2f33e9f7a57cef6909c30fec268e921c3b Mon Sep 17 00:00:00 2001
> From: Tejun Heo <tj@kernel.org>
> Date: Tue, 15 Apr 2014 18:02:12 -0400
> 
> cgroup_apply_cftypes() skip creating or removing files if the
> subsystem is attached to the default hierarchy, which led to missing
> files in the root of the default hierarchy.
> 
> Skipping made sense when the default hierarchy was dummy; however, now
> that the default hierarchy is full functional and planned to be used
> as the unified hierarchy, it shouldn't be skipped over.
> 
> Reported-by: Li Zefan <lizefan@huawei.com>
> Signed-off-by: Tejun Heo <tj@kernel.org>

Ugh... wrong thread.  Will repost.

thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy
  2014-04-14 17:52         ` Tejun Heo
@ 2014-04-30 10:57             ` Raghavendra KT
  -1 siblings, 0 replies; 52+ messages in thread
From: Raghavendra KT @ 2014-04-30 10:57 UTC (permalink / raw)
  To: Tejun Heo
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Linux Kernel Mailing List, Vivek Goyal

On Mon, Apr 14, 2014 at 11:22 PM, Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>> How does this work for root's tasks now? Given that task can only be
>> in leaf cgroups, that means tasks can't be in / cgroup (If one wants
>> to create some cgroups). Does that mean / will be empty and init system
>> need to setup things right.
>
> Root is exempt from the restriction.  Root has always been special
> anyway.
>

(I do not wish to Hijack this thread,  but found a relevant context
 here to initiate discussion.  But would discuss in a separate thread
once I get a positive go on the topic)

Hi Tejun,

For some controllers like cpuset, when we have exclusive_cpu is set,
what about having a knob in the cpuset controller
to move the task to non-root (when option is set).

Because all the way along, though we have freedom to make the cpusets
exclusive and move tasks (say VMs) into them,
making sure they do not interfere with each other, we can not prevent
the other tasks spawned in a system eating into cpus of
exclusive cpuset since they go to root automatically.

Do you think having a knob, to make sure new tasks spawned go to say a
default  directory under root makes sense?

 I understand that we could easily have a userspace script which could
achieve intended goal, but kernel solution
 would really make the exclusive cpusets have exclusive access to cpus
it should have.

(I also have a POC implemented for pre-unified hierarchy tree and
understand some bit of complications involved in that and
understand that we should not have complex policies in kernel for e.g.
filtering tasks based on patterns etc..).

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy
@ 2014-04-30 10:57             ` Raghavendra KT
  0 siblings, 0 replies; 52+ messages in thread
From: Raghavendra KT @ 2014-04-30 10:57 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Vivek Goyal, lizefan, containers, cgroups, Linux Kernel Mailing List

On Mon, Apr 14, 2014 at 11:22 PM, Tejun Heo <tj@kernel.org> wrote:
>> How does this work for root's tasks now? Given that task can only be
>> in leaf cgroups, that means tasks can't be in / cgroup (If one wants
>> to create some cgroups). Does that mean / will be empty and init system
>> need to setup things right.
>
> Root is exempt from the restriction.  Root has always been special
> anyway.
>

(I do not wish to Hijack this thread,  but found a relevant context
 here to initiate discussion.  But would discuss in a separate thread
once I get a positive go on the topic)

Hi Tejun,

For some controllers like cpuset, when we have exclusive_cpu is set,
what about having a knob in the cpuset controller
to move the task to non-root (when option is set).

Because all the way along, though we have freedom to make the cpusets
exclusive and move tasks (say VMs) into them,
making sure they do not interfere with each other, we can not prevent
the other tasks spawned in a system eating into cpus of
exclusive cpuset since they go to root automatically.

Do you think having a knob, to make sure new tasks spawned go to say a
default  directory under root makes sense?

 I understand that we could easily have a userspace script which could
achieve intended goal, but kernel solution
 would really make the exclusive cpusets have exclusive access to cpus
it should have.

(I also have a POC implemented for pre-unified hierarchy tree and
understand some bit of complications involved in that and
understand that we should not have complex policies in kernel for e.g.
filtering tasks based on patterns etc..).

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy
  2014-04-30 10:57             ` Raghavendra KT
@ 2014-05-02 15:12                 ` Tejun Heo
  -1 siblings, 0 replies; 52+ messages in thread
From: Tejun Heo @ 2014-05-02 15:12 UTC (permalink / raw)
  To: Raghavendra KT
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Linux Kernel Mailing List, Vivek Goyal

Hello,

On Wed, Apr 30, 2014 at 04:27:51PM +0530, Raghavendra KT wrote:
> For some controllers like cpuset, when we have exclusive_cpu is set,
> what about having a knob in the cpuset controller
> to move the task to non-root (when option is set).

I'm doubtful.

> Because all the way along, though we have freedom to make the cpusets
> exclusive and move tasks (say VMs) into them,
> making sure they do not interfere with each other, we can not prevent
> the other tasks spawned in a system eating into cpus of
> exclusive cpuset since they go to root automatically.

I believe the right thing to do would be starting / confining other
tasks in the appropriate non-root cgroups.  cgroup already provides
mechanisms to achieve that.  The rest is upto userland.

> Do you think having a knob, to make sure new tasks spawned go to say a
> default  directory under root makes sense?
> 
>  I understand that we could easily have a userspace script which could
> achieve intended goal, but kernel solution
>  would really make the exclusive cpusets have exclusive access to cpus
> it should have.

This would just be a more reliable implementation of an ad-hoc
mechanism when it can already be properly achieved by managing cgroups
of all processes in the system.

> (I also have a POC implemented for pre-unified hierarchy tree and
> understand some bit of complications involved in that and
> understand that we should not have complex policies in kernel for e.g.
> filtering tasks based on patterns etc..).

As I wrote above, I think this is something to be solved from
userland.  I'm very positively against adding this sort of hacky
ad-hoc policies which serves one or a few use cases well while
introducing possible long-term maintenance issues.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy
@ 2014-05-02 15:12                 ` Tejun Heo
  0 siblings, 0 replies; 52+ messages in thread
From: Tejun Heo @ 2014-05-02 15:12 UTC (permalink / raw)
  To: Raghavendra KT
  Cc: Vivek Goyal, lizefan, containers, cgroups, Linux Kernel Mailing List

Hello,

On Wed, Apr 30, 2014 at 04:27:51PM +0530, Raghavendra KT wrote:
> For some controllers like cpuset, when we have exclusive_cpu is set,
> what about having a knob in the cpuset controller
> to move the task to non-root (when option is set).

I'm doubtful.

> Because all the way along, though we have freedom to make the cpusets
> exclusive and move tasks (say VMs) into them,
> making sure they do not interfere with each other, we can not prevent
> the other tasks spawned in a system eating into cpus of
> exclusive cpuset since they go to root automatically.

I believe the right thing to do would be starting / confining other
tasks in the appropriate non-root cgroups.  cgroup already provides
mechanisms to achieve that.  The rest is upto userland.

> Do you think having a knob, to make sure new tasks spawned go to say a
> default  directory under root makes sense?
> 
>  I understand that we could easily have a userspace script which could
> achieve intended goal, but kernel solution
>  would really make the exclusive cpusets have exclusive access to cpus
> it should have.

This would just be a more reliable implementation of an ad-hoc
mechanism when it can already be properly achieved by managing cgroups
of all processes in the system.

> (I also have a POC implemented for pre-unified hierarchy tree and
> understand some bit of complications involved in that and
> understand that we should not have complex policies in kernel for e.g.
> filtering tasks based on patterns etc..).

As I wrote above, I think this is something to be solved from
userland.  I'm very positively against adding this sort of hacky
ad-hoc policies which serves one or a few use cases well while
introducing possible long-term maintenance issues.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy
@ 2014-05-05 16:37                   ` Raghavendra KT
  0 siblings, 0 replies; 52+ messages in thread
From: Raghavendra KT @ 2014-05-05 16:37 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Vivek Goyal, lizefan, containers, cgroups, Linux Kernel Mailing List

On 5/2/14, Tejun Heo <tj@kernel.org> wrote:
> Hello,
>
> On Wed, Apr 30, 2014 at 04:27:51PM +0530, Raghavendra KT wrote:
[...]
>> Because all the way along, though we have freedom to make the cpusets
>> exclusive and move tasks (say VMs) into them,
>> making sure they do not interfere with each other, we can not prevent
>> the other tasks spawned in a system eating into cpus of
>> exclusive cpuset since they go to root automatically.
>
> I believe the right thing to do would be starting / confining other
> tasks in the appropriate non-root cgroups.  cgroup already provides
> mechanisms to achieve that.  The rest is upto userland.
>

Thanks Tejun for the reply. I agree.

>> Do you think having a knob, to make sure new tasks spawned go to say a
>> default  directory under root makes sense?
>>
>>  I understand that we could easily have a userspace script which could
>> achieve intended goal, but kernel solution
>>  would really make the exclusive cpusets have exclusive access to cpus
>> it should have.
>
> This would just be a more reliable implementation of an ad-hoc
> mechanism when it can already be properly achieved by managing cgroups
> of all processes in the system.

Correct. Perhaps your answer clears my dilemma to go for a userspace
since I donot have any strong reason except convenience to go for a
kernel solution.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy
@ 2014-05-05 16:37                   ` Raghavendra KT
  0 siblings, 0 replies; 52+ messages in thread
From: Raghavendra KT @ 2014-05-05 16:37 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Vivek Goyal, lizefan-hv44wF8Li93QT0dZR+AlfA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Linux Kernel Mailing List

On 5/2/14, Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> Hello,
>
> On Wed, Apr 30, 2014 at 04:27:51PM +0530, Raghavendra KT wrote:
[...]
>> Because all the way along, though we have freedom to make the cpusets
>> exclusive and move tasks (say VMs) into them,
>> making sure they do not interfere with each other, we can not prevent
>> the other tasks spawned in a system eating into cpus of
>> exclusive cpuset since they go to root automatically.
>
> I believe the right thing to do would be starting / confining other
> tasks in the appropriate non-root cgroups.  cgroup already provides
> mechanisms to achieve that.  The rest is upto userland.
>

Thanks Tejun for the reply. I agree.

>> Do you think having a knob, to make sure new tasks spawned go to say a
>> default  directory under root makes sense?
>>
>>  I understand that we could easily have a userspace script which could
>> achieve intended goal, but kernel solution
>>  would really make the exclusive cpusets have exclusive access to cpus
>> it should have.
>
> This would just be a more reliable implementation of an ad-hoc
> mechanism when it can already be properly achieved by managing cgroups
> of all processes in the system.

Correct. Perhaps your answer clears my dilemma to go for a userspace
since I donot have any strong reason except convenience to go for a
kernel solution.

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2014-05-05 16:37 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-28  2:40 [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy Tejun Heo
2014-03-28  2:40 ` Tejun Heo
2014-03-28  2:40 ` [PATCH 04/12] cgroup: make css_next_child() skip missing csses Tejun Heo
2014-03-28  2:40 ` [PATCH 06/12] cgroup: teach css_task_iter about effective csses Tejun Heo
2014-03-28  2:40 ` [PATCH 10/12] cgroup: update subsystem rebind restrictions Tejun Heo
2014-03-28  2:41 ` [PATCH 11/12] cgroup: prepare migration path for unified hierarchy Tejun Heo
2014-03-28  2:41 ` [PATCH 12/12] cgroup: implement dynamic subtree controller enable/disable on the default hierarchy Tejun Heo
2014-03-28  2:41   ` Tejun Heo
     [not found] ` <1395974461-12735-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2014-03-28  2:40   ` [PATCH 01/12] cgroup: update cgroup->subsys_mask to ->child_subsys_mask and restore cgroup_root->subsys_mask Tejun Heo
2014-03-28  2:40     ` Tejun Heo
2014-03-28  2:40   ` [PATCH 02/12] cgroup: introduce effective cgroup_subsys_state Tejun Heo
2014-03-28  2:40     ` Tejun Heo
2014-03-28  2:40   ` [PATCH 03/12] cgroup: implement cgroup->e_csets[] Tejun Heo
2014-03-28  2:40     ` Tejun Heo
2014-03-28  2:40   ` [PATCH 04/12] cgroup: make css_next_child() skip missing csses Tejun Heo
2014-03-28  2:40   ` [PATCH 05/12] cgroup: reorganize css_task_iter Tejun Heo
2014-03-28  2:40     ` Tejun Heo
2014-03-28  2:40   ` [PATCH 06/12] cgroup: teach css_task_iter about effective csses Tejun Heo
2014-03-28  2:40   ` [PATCH 07/12] cgroup: cgroup->subsys[] should be cleared after the css is offlined Tejun Heo
2014-03-28  2:40     ` Tejun Heo
2014-03-28  2:40   ` [PATCH 08/12] cgroup: allow cgroup creation and suppress automatic css creation in the unified hierarchy Tejun Heo
2014-03-28  2:40     ` Tejun Heo
2014-03-28  2:40   ` [PATCH 09/12] cgroup: add css_set->dfl_cgrp Tejun Heo
2014-03-28  2:40     ` Tejun Heo
2014-03-28  2:40   ` [PATCH 10/12] cgroup: update subsystem rebind restrictions Tejun Heo
2014-03-28  2:41   ` [PATCH 11/12] cgroup: prepare migration path for unified hierarchy Tejun Heo
2014-03-28  2:41   ` [PATCH 12/12] cgroup: implement dynamic subtree controller enable/disable on the default hierarchy Tejun Heo
2014-04-14 15:45   ` [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy Vivek Goyal
2014-04-14 15:45     ` Vivek Goyal
     [not found]     ` <20140414154556.GA9552-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-04-14 17:52       ` Tejun Heo
2014-04-14 17:52         ` Tejun Heo
     [not found]         ` <20140414175236.GB15249-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2014-04-14 18:14           ` Vivek Goyal
2014-04-14 18:14             ` Vivek Goyal
     [not found]             ` <20140414181421.GC9552-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-04-14 19:21               ` Tejun Heo
2014-04-14 19:21                 ` Tejun Heo
     [not found]                 ` <20140414192139.GB16835-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2014-04-15  2:58                   ` Li Zefan
2014-04-15  2:58                     ` Li Zefan
     [not found]                     ` <534CA050.8060709-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2014-04-15  4:57                       ` Tejun Heo
2014-04-15  4:57                         ` Tejun Heo
2014-04-30 10:57           ` Raghavendra KT
2014-04-30 10:57             ` Raghavendra KT
     [not found]             ` <CAC4Lta0qd2vW4ENSOpzK+0i06MorzfmsZZZuW9xUtpZ_gWA9gA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-05-02 15:12               ` Tejun Heo
2014-05-02 15:12                 ` Tejun Heo
2014-05-05 16:37                 ` Raghavendra KT
2014-05-05 16:37                   ` Raghavendra KT
2014-04-14 21:31   ` Tejun Heo
2014-04-14 21:31     ` Tejun Heo
2014-04-15 22:05   ` [PATCH 0.5/12] cgroup: cgroup_apply_cftypes() shouldn't skip the default hierarhcy Tejun Heo
2014-04-15 22:05 ` Tejun Heo
2014-04-15 22:05   ` Tejun Heo
     [not found]   ` <20140415220504.GA13099-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2014-04-15 22:05     ` Tejun Heo
2014-04-15 22:05       ` Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.