* [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy @ 2014-03-28 2:40 ` Tejun Heo 0 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-03-28 2:40 UTC (permalink / raw) To: lizefan-hv44wF8Li93QT0dZR+AlfA Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-kernel-u79uwXL29TY76Z2rM5mHXA Hello, (this is too late for the upcoming merge window and is targeting the next devel cycle) cgroup currently allows creating arbitrary number of hierarchies and any number of controllers may be associated with a given tree. This allows for huge amount of variance how tasks are associated with various cgroups and controllers; unfortunately, the variance is extreme to the extent that it unnecessarily complicates capabilities which can otherwise be straight-forward and hinders implementation of features which can benefit from coordination among different controllers. Here are some of the issues which we're facing with the current multiple hierarchies. * cgroup membership of a task can't be described in finite number of paths. As there can be arbitrary number of hierarchies, the key describing a task's cgroup membership can be arbitrarily long. This is painful when userland or other parts of the kernel needs to take cgroup membership into account and leads to proliferation of controllers which are just there to identify membership rather than actually control resources, which in turn exacerbates the problem. * Different controllers may or may not reside on the same hierarchy. Features or optimizations which can benefit from sharing the hierarchical organization either can't be implemented or becomes overly complicated. * Tasks of a process may belong to different cgroups, which doesn't make any sense for some controllers. Those controllers end up ignoring such configurations in their own ways leading to inconsistent behavior. In addition, in-process resource control fundamentally isn't something which belongs to cgroup. As it has to be visible to the binary for the process, it must be part of the stable programming interface which is easily accessible to the process proper in an easy race-free way. * The current cgroup allows cgroups which have child cgroups to have tasks in it. This means that the child cgroups end up competing against the internal tasks. This introduces inherent ambiguity as the two are separate types of entities and the latter doesn't have the same control knobs assigned to them. Different controllers are dealing with the issue in different ways. cpu treats internal tasks and child cgroups as equivalents, which makes giving a child cgroup a given ratio of the parent's cpu time difficult as the number of competing entities may fluctuate without any indication. blkio, in my misguided attempt to deal with the issue, introduced a whole duplicate set of knobs for internal tasks and deal with them as if they belong to a separate child cgroup making the interface and implementation a mess. memcg seems somewhat ambiguous on the issue but there are attempts to introduce ad-hoc modifications to tilt the way it's handled to suit specific use cases. This is an inherent problem. All of the solutions that different controllers came up with are unsatisfactory, the different behaviors greatly increases the level of inconsistency and complicates the controller implementations. This patchset finally implements the default unified hierarchy. The goal is providing enough flexibility while enforcing stricter common structure where appropriate to address the above listed issues. Controllers which aren't bound to other hierarchies are automatically attached to the unified hierarchy, which is different in that controllers are enabled explicitly for each subtree. "cgroup.subtree_control" controls which controllers are enabled on the child cgroups. Let's assume a hierarchy like the following. root - A - B - C \ D root's "cgroup.subtree_control" determines which controllers are enabled on A. A's on B. B's on C and D. This coincides with the fact that controllers on the immediate sub-level are used to distribute the resources of the parent. In fact, it's natural to assume that resource control knobs of a child belong to its parent. Enabling a controller in "cgroup.subtree_control" declares that distribution of the respective resources of the cgroup will be controlled. Note that this means that controller enable states are shared among siblings. The default hierarchy has an extra restriction - only cgroups which don't contain any task may have controllers enabled in "cgroup.subtree_control". Combined with the other properties of the default hierarchy, this guarantees that, from the view point of controllers, tasks are only on the leaf cgroups. In other words, only leaf csses may contain tasks. This rules out situations where child cgroups compete against internal tasks of the parent. This patchset contains the following twelve patches. 0001-cgroup-update-cgroup-subsys_mask-to-child_subsys_mas.patch 0002-cgroup-introduce-effective-cgroup_subsys_state.patch 0003-cgroup-implement-cgroup-e_csets.patch 0004-cgroup-make-css_next_child-skip-missing-csses.patch 0005-cgroup-reorganize-css_task_iter.patch 0006-cgroup-teach-css_task_iter-about-effective-csses.patch 0007-cgroup-cgroup-subsys-should-be-cleared-after-the-css.patch 0008-cgroup-allow-cgroup-creation-and-suppress-automatic-.patch 0009-cgroup-add-css_set-dfl_cgrp.patch 0010-cgroup-update-subsystem-rebind-restrictions.patch 0011-cgroup-prepare-migration-path-for-unified-hierarchy.patch 0012-cgroup-implement-dynamic-subtree-controller-enable-d.patch 0001 updates subsys_mask handling again to morph cgrp->subsys_mask to cgrp->child_subsys_mask. 0002-0003 introduce effective cgroup. The cgroup on the unified hierarchy a task belongs to when viewed from a controller. 0004-0007 update iterators to handle effective cgroup correctly. 0008-0011 prepare various paths for explicit controller enable/disable. 0012 implements explicit controller enable/disable. The patchset is on top of cgroup/for-3.15 01a971406177 ("cgroup: Use RCU_INIT_POINTER(x, NULL) in cgroup.c") and also available in the following git branch. git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-unified diffstat follows. include/linux/cgroup.h | 44 ++- kernel/cgroup.c | 651 +++++++++++++++++++++++++++++++++++++++++-------- 2 files changed, 585 insertions(+), 110 deletions(-) Thanks. -- tejun ^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy @ 2014-03-28 2:40 ` Tejun Heo 0 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-03-28 2:40 UTC (permalink / raw) To: lizefan; +Cc: containers, cgroups, linux-kernel Hello, (this is too late for the upcoming merge window and is targeting the next devel cycle) cgroup currently allows creating arbitrary number of hierarchies and any number of controllers may be associated with a given tree. This allows for huge amount of variance how tasks are associated with various cgroups and controllers; unfortunately, the variance is extreme to the extent that it unnecessarily complicates capabilities which can otherwise be straight-forward and hinders implementation of features which can benefit from coordination among different controllers. Here are some of the issues which we're facing with the current multiple hierarchies. * cgroup membership of a task can't be described in finite number of paths. As there can be arbitrary number of hierarchies, the key describing a task's cgroup membership can be arbitrarily long. This is painful when userland or other parts of the kernel needs to take cgroup membership into account and leads to proliferation of controllers which are just there to identify membership rather than actually control resources, which in turn exacerbates the problem. * Different controllers may or may not reside on the same hierarchy. Features or optimizations which can benefit from sharing the hierarchical organization either can't be implemented or becomes overly complicated. * Tasks of a process may belong to different cgroups, which doesn't make any sense for some controllers. Those controllers end up ignoring such configurations in their own ways leading to inconsistent behavior. In addition, in-process resource control fundamentally isn't something which belongs to cgroup. As it has to be visible to the binary for the process, it must be part of the stable programming interface which is easily accessible to the process proper in an easy race-free way. * The current cgroup allows cgroups which have child cgroups to have tasks in it. This means that the child cgroups end up competing against the internal tasks. This introduces inherent ambiguity as the two are separate types of entities and the latter doesn't have the same control knobs assigned to them. Different controllers are dealing with the issue in different ways. cpu treats internal tasks and child cgroups as equivalents, which makes giving a child cgroup a given ratio of the parent's cpu time difficult as the number of competing entities may fluctuate without any indication. blkio, in my misguided attempt to deal with the issue, introduced a whole duplicate set of knobs for internal tasks and deal with them as if they belong to a separate child cgroup making the interface and implementation a mess. memcg seems somewhat ambiguous on the issue but there are attempts to introduce ad-hoc modifications to tilt the way it's handled to suit specific use cases. This is an inherent problem. All of the solutions that different controllers came up with are unsatisfactory, the different behaviors greatly increases the level of inconsistency and complicates the controller implementations. This patchset finally implements the default unified hierarchy. The goal is providing enough flexibility while enforcing stricter common structure where appropriate to address the above listed issues. Controllers which aren't bound to other hierarchies are automatically attached to the unified hierarchy, which is different in that controllers are enabled explicitly for each subtree. "cgroup.subtree_control" controls which controllers are enabled on the child cgroups. Let's assume a hierarchy like the following. root - A - B - C \ D root's "cgroup.subtree_control" determines which controllers are enabled on A. A's on B. B's on C and D. This coincides with the fact that controllers on the immediate sub-level are used to distribute the resources of the parent. In fact, it's natural to assume that resource control knobs of a child belong to its parent. Enabling a controller in "cgroup.subtree_control" declares that distribution of the respective resources of the cgroup will be controlled. Note that this means that controller enable states are shared among siblings. The default hierarchy has an extra restriction - only cgroups which don't contain any task may have controllers enabled in "cgroup.subtree_control". Combined with the other properties of the default hierarchy, this guarantees that, from the view point of controllers, tasks are only on the leaf cgroups. In other words, only leaf csses may contain tasks. This rules out situations where child cgroups compete against internal tasks of the parent. This patchset contains the following twelve patches. 0001-cgroup-update-cgroup-subsys_mask-to-child_subsys_mas.patch 0002-cgroup-introduce-effective-cgroup_subsys_state.patch 0003-cgroup-implement-cgroup-e_csets.patch 0004-cgroup-make-css_next_child-skip-missing-csses.patch 0005-cgroup-reorganize-css_task_iter.patch 0006-cgroup-teach-css_task_iter-about-effective-csses.patch 0007-cgroup-cgroup-subsys-should-be-cleared-after-the-css.patch 0008-cgroup-allow-cgroup-creation-and-suppress-automatic-.patch 0009-cgroup-add-css_set-dfl_cgrp.patch 0010-cgroup-update-subsystem-rebind-restrictions.patch 0011-cgroup-prepare-migration-path-for-unified-hierarchy.patch 0012-cgroup-implement-dynamic-subtree-controller-enable-d.patch 0001 updates subsys_mask handling again to morph cgrp->subsys_mask to cgrp->child_subsys_mask. 0002-0003 introduce effective cgroup. The cgroup on the unified hierarchy a task belongs to when viewed from a controller. 0004-0007 update iterators to handle effective cgroup correctly. 0008-0011 prepare various paths for explicit controller enable/disable. 0012 implements explicit controller enable/disable. The patchset is on top of cgroup/for-3.15 01a971406177 ("cgroup: Use RCU_INIT_POINTER(x, NULL) in cgroup.c") and also available in the following git branch. git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-unified diffstat follows. include/linux/cgroup.h | 44 ++- kernel/cgroup.c | 651 +++++++++++++++++++++++++++++++++++++++++-------- 2 files changed, 585 insertions(+), 110 deletions(-) Thanks. -- tejun ^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH 04/12] cgroup: make css_next_child() skip missing csses 2014-03-28 2:40 ` Tejun Heo (?) @ 2014-03-28 2:40 ` Tejun Heo -1 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-03-28 2:40 UTC (permalink / raw) To: lizefan; +Cc: containers, cgroups, linux-kernel, Tejun Heo css_next_child() walks the children of the specified css. It does this by finding the next cgroup and then returning the requested css. On the default unified hierarchy, a cgroup may not have a css associated with it even if the hierarchy has the subsystem enabled. This patch updates css_next_child() so that it skips children without the requested css associated. Signed-off-by: Tejun Heo <tj@kernel.org> --- kernel/cgroup.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 2899a6f..9d2d6ef 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -2690,10 +2690,19 @@ css_next_child(struct cgroup_subsys_state *pos_css, break; } - if (&next->sibling == &cgrp->children) - return NULL; + /* + * @next, if not pointing to the head, can be dereferenced and is + * the next sibling; however, it might have @ss disabled. If so, + * fast-forward to the next enabled one. + */ + while (&next->sibling != &cgrp->children) { + struct cgroup_subsys_state *next_css = cgroup_css(next, parent_css->ss); - return cgroup_css(next, parent_css->ss); + if (next_css) + return next_css; + next = list_entry_rcu(next->sibling.next, struct cgroup, sibling); + } + return NULL; } /** -- 1.8.5.3 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH 06/12] cgroup: teach css_task_iter about effective csses 2014-03-28 2:40 ` Tejun Heo (?) (?) @ 2014-03-28 2:40 ` Tejun Heo -1 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-03-28 2:40 UTC (permalink / raw) To: lizefan; +Cc: containers, cgroups, linux-kernel, Tejun Heo Currently, css_task_iter iterates tasks associated with a css by visiting each css_set associated with the owning cgroup and walking tasks of each of them. This works fine for !unified hierarchies as each cgroup has its own css for each associated subsystem on the hierarchy; however, on the planned unified hierarchy, a cgroup may not have csses associated and its tasks would be considered associated with the matching css of the nearest ancestor which has the subsystem enabled. This means that on the default unified hierarchy, just walking all tasks associated with a cgroup isn't enough to walk all tasks which are associated with the specified css. If any of its children doesn't have the matching css enabled, task iteration should also include all tasks from the subtree. We already added cgroup->e_csets[] to list all css_sets effectively associated with a given css and walk css_sets on that list instead to achieve such iteration. This patch updates css_task_iter iteration such that it walks css_sets on cgroup->e_csets[] instead of cgroup->cset_links if iteration is requested on an non-dummy css. Thanks to the previous iteration update, this change can be achieved with the addition of css_task_iter->ss and minimal updates to css_advance_task_iter() and css_task_iter_start(). Signed-off-by: Tejun Heo <tj@kernel.org> --- include/linux/cgroup.h | 2 ++ kernel/cgroup.c | 18 +++++++++++++++--- 2 files changed, 17 insertions(+), 3 deletions(-) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 6bbe143..cbfd8be 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -842,6 +842,8 @@ css_next_descendant_post(struct cgroup_subsys_state *pos, /* A css_task_iter should be treated as an opaque object */ struct css_task_iter { + struct cgroup_subsys *ss; + struct list_head *cset_pos; struct list_head *cset_head; diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 318b6a2..e3aa2aa 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -2850,8 +2850,14 @@ static void css_advance_task_iter(struct css_task_iter *it) it->cset_pos = NULL; return; } - link = list_entry(l, struct cgrp_cset_link, cset_link); - cset = link->cset; + + if (it->ss) { + cset = container_of(l, struct css_set, + e_cset_node[it->ss->id]); + } else { + link = list_entry(l, struct cgrp_cset_link, cset_link); + cset = link->cset; + } } while (list_empty(&cset->tasks) && list_empty(&cset->mg_tasks)); it->cset_pos = l; @@ -2888,7 +2894,13 @@ void css_task_iter_start(struct cgroup_subsys_state *css, down_read(&css_set_rwsem); - it->cset_pos = &css->cgroup->cset_links; + it->ss = css->ss; + + if (it->ss) + it->cset_pos = &css->cgroup->e_csets[css->ss->id]; + else + it->cset_pos = &css->cgroup->cset_links; + it->cset_head = it->cset_pos; css_advance_task_iter(it); -- 1.8.5.3 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH 10/12] cgroup: update subsystem rebind restrictions 2014-03-28 2:40 ` Tejun Heo ` (2 preceding siblings ...) (?) @ 2014-03-28 2:40 ` Tejun Heo -1 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-03-28 2:40 UTC (permalink / raw) To: lizefan; +Cc: containers, cgroups, linux-kernel, Tejun Heo Because the default root couldn't have any non-root csses attached to it, rebinding away from it was always allowed; however, the default hierarchy will soon host the unified hierarchy and have non-root csses so the rebind restrictions need to be updated accordingly. Instead of special casing rebinding from the default hierarchy and then checking whether the source hierarchy has children cgroups, which implies non-root csses for !dfl hierarchies, simply check whether the source hierarchy has non-root csses for the subsystem using css_next_child(). Signed-off-by: Tejun Heo <tj@kernel.org> --- kernel/cgroup.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index e714ccf..0c67141 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -1051,16 +1051,12 @@ static int rebind_subsystems(struct cgroup_root *dst_root, if (!(ss_mask & (1 << ssid))) continue; - /* if @ss is on the dummy_root, we can always move it */ - if (ss->root == &cgrp_dfl_root) - continue; - - /* if @ss has non-root cgroups attached to it, can't move */ - if (!list_empty(&ss->root->cgrp.children)) + /* if @ss has non-root csses attached to it, can't move */ + if (css_next_child(NULL, cgroup_css(&ss->root->cgrp, ss))) return -EBUSY; /* can't move between two non-dummy roots either */ - if (dst_root != &cgrp_dfl_root) + if (ss->root != &cgrp_dfl_root && dst_root != &cgrp_dfl_root) return -EBUSY; } -- 1.8.5.3 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH 11/12] cgroup: prepare migration path for unified hierarchy 2014-03-28 2:40 ` Tejun Heo ` (3 preceding siblings ...) (?) @ 2014-03-28 2:41 ` Tejun Heo -1 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-03-28 2:41 UTC (permalink / raw) To: lizefan; +Cc: containers, cgroups, linux-kernel, Tejun Heo Unified hierarchy implementation would require re-migrating tasks onto the same cgroup on the default hierarchy to reflect updated effective csses. Update cgroup_migrate_prepare_dst() so that it accepts NULL as the destination cgrp. When NULL is specified, the destination is considered to be the cgroup on the default hierarchy associated with each css_set. After this change, the identity check in cgroup_migrate_add_src() isn't sufficient for noop detection as the associated csses may change without any cgroup association changing. The only way to tell whether a migration is noop or not is testing whether the source and destination csets are identical. The noop check in cgroup_migrate_add_src() is removed and cset identity test is added to cgroup_migreate_prepare_dst(). If it's detected that source and destination csets are identical, the cset is removed removed from @preloaded_csets and all the migration nodes are cleared which makes cgroup_migrate() ignore the cset. Also, make the function append the destination css_sets to @preloaded_list so that destination css_sets always come after source css_sets. Signed-off-by: Tejun Heo <tj@kernel.org> --- kernel/cgroup.c | 34 +++++++++++++++++++++++----------- 1 file changed, 23 insertions(+), 11 deletions(-) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 0c67141..5faa115 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -1901,10 +1901,6 @@ static void cgroup_migrate_add_src(struct css_set *src_cset, src_cgrp = cset_cgroup_from_root(src_cset, dst_cgrp->root); - /* nothing to do if this cset already belongs to the cgroup */ - if (src_cgrp == dst_cgrp) - return; - if (!list_empty(&src_cset->mg_preload_node)) return; @@ -1919,13 +1915,14 @@ static void cgroup_migrate_add_src(struct css_set *src_cset, /** * cgroup_migrate_prepare_dst - prepare destination css_sets for migration - * @dst_cgrp: the destination cgroup + * @dst_cgrp: the destination cgroup (may be %NULL) * @preloaded_csets: list of preloaded source css_sets * * Tasks are about to be moved to @dst_cgrp and all the source css_sets * have been preloaded to @preloaded_csets. This function looks up and - * pins all destination css_sets, links each to its source, and put them on - * @preloaded_csets. + * pins all destination css_sets, links each to its source, and append them + * to @preloaded_csets. If @dst_cgrp is %NULL, the destination of each + * source css_set is assumed to be its cgroup on the default hierarchy. * * This function must be called after cgroup_migrate_add_src() has been * called on each migration source css_set. After migration is performed @@ -1936,19 +1933,34 @@ static int cgroup_migrate_prepare_dst(struct cgroup *dst_cgrp, struct list_head *preloaded_csets) { LIST_HEAD(csets); - struct css_set *src_cset; + struct css_set *src_cset, *tmp_cset; lockdep_assert_held(&cgroup_mutex); /* look up the dst cset for each src cset and link it to src */ - list_for_each_entry(src_cset, preloaded_csets, mg_preload_node) { + list_for_each_entry_safe(src_cset, tmp_cset, preloaded_csets, mg_preload_node) { struct css_set *dst_cset; - dst_cset = find_css_set(src_cset, dst_cgrp); + dst_cset = find_css_set(src_cset, + dst_cgrp ?: src_cset->dfl_cgrp); if (!dst_cset) goto err; WARN_ON_ONCE(src_cset->mg_dst_cset || dst_cset->mg_dst_cset); + + /* + * If src cset equals dst, it's noop. Drop the src. + * cgroup_migrate() will skip the cset too. Note that we + * can't handle src == dst as some nodes are used by both. + */ + if (src_cset == dst_cset) { + src_cset->mg_src_cgrp = NULL; + list_del_init(&src_cset->mg_preload_node); + put_css_set(src_cset, false); + put_css_set(dst_cset, false); + continue; + } + src_cset->mg_dst_cset = dst_cset; if (list_empty(&dst_cset->mg_preload_node)) @@ -1957,7 +1969,7 @@ static int cgroup_migrate_prepare_dst(struct cgroup *dst_cgrp, put_css_set(dst_cset, false); } - list_splice(&csets, preloaded_csets); + list_splice_tail(&csets, preloaded_csets); return 0; err: cgroup_migrate_finish(&csets); -- 1.8.5.3 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH 12/12] cgroup: implement dynamic subtree controller enable/disable on the default hierarchy [not found] ` <1395974461-12735-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> @ 2014-03-28 2:41 ` Tejun Heo 2014-03-28 2:40 ` Tejun Heo ` (13 subsequent siblings) 14 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-03-28 2:41 UTC (permalink / raw) To: lizefan; +Cc: containers, cgroups, linux-kernel, Tejun Heo cgroup is switching away from multiple hierarchies and will use one unified default hierarchy where controllers can be dynamically enabled and disabled per subtree. The default hierarchy will serve as the unified hierarchy to which all controllers are attached and a css on the default hierarchy would need to also serve the tasks of descendant cgroups which don't have the controller enabled - ie. the tree may be collapsed from leaf towards root when viewed from specific controllers. This has been implemented through effective css in the previous patches. This patch finally implements dynamic subtree controller enable/disable on the default hierarchy via a new knob - "cgroup.subtree_control" which controls which controllers are enabled on the child cgroups. Let's assume a hierarchy like the following. root - A - B - C \ D root's "cgroup.subtree_control" determines which controllers are enabled on A. A's on B. B's on C and D. This coincides with the fact that controllers on the immediate sub-level are used to distribute the resources of the parent. In fact, it's natural to assume that resource control knobs of a child belong to its parent. Enabling a controller in "cgroup.subtree_control" declares that distribution of the respective resources of the cgroup will be controlled. Note that this means that controller enable states are shared among siblings. The default hierarchy has an extra restriction - only cgroups which don't contain any task may have controllers enabled in "cgroup.subtree_control". Combined with the other properties of the default hierarchy, this guarantees that, from the view point of controllers, tasks are only on the leaf cgroups. In other words, only leaf csses may contain tasks. This rules out situations where child cgroups compete against internal tasks of the parent, which is a competition between two different types of entities without any clear way to determine resource distribution between the two. Different controllers handle it differently and all the implemented behaviors are ambiguous, ad-hoc, cumbersome and/or just wrong. Having this structural constraints imposed from cgroup core removes the burden from controller implementations and enables showing one consistent behavior across all controllers. When a controller is enabled or disabled, css associations for the controller in the subtrees of each child should be updated. After enabling, the whole subtree of a child should point to the new css of the child. After disabling, the whole subtree of a child should point to the cgroup's css. This is implemented by first updating cgroup states such that cgroup_e_css() result points to the appropriate css and then invoking cgroup_update_dfl_csses() which migrates all tasks in the affected subtrees to the self cgroup on the default hierarchy. * When read, "cgroup.subtree_control" lists all the currently enabled controllers on the children of the cgroup. * White-space separated list of controller names prefixed with either '+' or '-' can be written to "cgroup.subtree_control". The ones prefixed with '+' are enabled on the controller and '-' disabled. * A controller can be enabled iff the parent's "cgroup.subtree_control" enables it and disabled iff no child's "cgroup.subtree_control" has it enabled. * If a cgroup has tasks, no controller can be enabled via "cgroup.subtree_control". Likewise, if "cgroup.subtree_control" has some controllers enabled, tasks can't be migrated into the cgroup. * All controllers which aren't bound on other hierarchies are automatically associated with the root cgroup of the default hierarchy. All the controllers which are bound to the default hierarchy are listed in the read-only file "cgroup.controllers" in the root directory. This is still experimental and there are some holes, one of which is that ->can_attach() failure during cgroup_update_dfl_csses() may leave the cgroups in an undefined state. The issues will be addressed by future patches. Signed-off-by: Tejun Heo <tj@kernel.org> --- include/linux/cgroup.h | 5 + kernel/cgroup.c | 346 +++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 351 insertions(+) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index b6f5e40..dee6f3c 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -21,6 +21,7 @@ #include <linux/percpu-refcount.h> #include <linux/seq_file.h> #include <linux/kernfs.h> +#include <linux/wait.h> #ifdef CONFIG_CGROUPS @@ -164,6 +165,7 @@ struct cgroup { struct cgroup *parent; /* my parent */ struct kernfs_node *kn; /* cgroup kernfs entry */ + struct kernfs_node *control_kn; /* kn for "cgroup.subtree_control" */ /* * Monotonically increasing unique serial number which defines a @@ -216,6 +218,9 @@ struct cgroup { /* For css percpu_ref killing and RCU-protected deletion */ struct rcu_head rcu_head; struct work_struct destroy_work; + + /* used to wait for offlining of csses */ + wait_queue_head_t offline_waitq; }; #define MAX_CGROUP_ROOT_NAMELEN 64 diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 5faa115..4e958c7 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -182,6 +182,8 @@ static int rebind_subsystems(struct cgroup_root *dst_root, unsigned long ss_mask); static void cgroup_destroy_css_killed(struct cgroup *cgrp); static int cgroup_destroy_locked(struct cgroup *cgrp); +static int create_css(struct cgroup *cgrp, struct cgroup_subsys *ss); +static void kill_css(struct cgroup_subsys_state *css); static int cgroup_addrm_files(struct cgroup *cgrp, struct cftype cfts[], bool is_add); static void cgroup_pidlist_destroy_all(struct cgroup *cgrp); @@ -338,6 +340,14 @@ static int notify_on_release(const struct cgroup *cgrp) #define for_each_root(root) \ list_for_each_entry((root), &cgroup_roots, root_list) +/* iterate over child cgrps, lock should be held throughout iteration */ +#define cgroup_for_each_live_child(child, cgrp) \ + list_for_each_entry((child), &(cgrp)->children, sibling) \ + if (({ lockdep_assert_held(&cgroup_tree_mutex); \ + cgroup_is_dead(child); })) \ + ; \ + else + /** * cgroup_lock_live_group - take cgroup_mutex and check that cgrp is alive. * @cgrp: the cgroup to be checked for liveness @@ -1450,6 +1460,8 @@ static void init_cgroup_housekeeping(struct cgroup *cgrp) for_each_subsys(ss, ssid) INIT_LIST_HEAD(&cgrp->e_csets[ssid]); + + init_waitqueue_head(&cgrp->offline_waitq); } static void init_cgroup_root(struct cgroup_root *root, @@ -1937,6 +1949,14 @@ static int cgroup_migrate_prepare_dst(struct cgroup *dst_cgrp, lockdep_assert_held(&cgroup_mutex); + /* + * Except for the root, child_subsys_mask must be zero for a cgroup + * with tasks so that child cgroups don't compete against tasks. + */ + if (dst_cgrp && cgroup_on_dfl(dst_cgrp) && dst_cgrp->parent && + dst_cgrp->child_subsys_mask) + return -EBUSY; + /* look up the dst cset for each src cset and link it to src */ list_for_each_entry_safe(src_cset, tmp_cset, preloaded_csets, mg_preload_node) { struct css_set *dst_cset; @@ -2302,6 +2322,317 @@ static int cgroup_sane_behavior_show(struct seq_file *seq, void *v) return 0; } +static void cgroup_print_ss_mask(struct seq_file *seq, unsigned int ss_mask) +{ + struct cgroup_subsys *ss; + bool printed = false; + int ssid; + + for_each_subsys(ss, ssid) { + if (ss_mask & (1 << ssid)) { + if (printed) + seq_putc(seq, ' '); + seq_printf(seq, "%s", ss->name); + printed = true; + } + } + if (printed) + seq_putc(seq, '\n'); +} + +/* show controllers which are currently attached to the default hierarchy */ +static int cgroup_controllers_show(struct seq_file *seq, void *v) +{ + struct cgroup *cgrp = seq_css(seq)->cgroup; + + cgroup_print_ss_mask(seq, cgrp->root->subsys_mask); + return 0; +} + +/* show controllers which are enabled for a given cgroup's children */ +static int cgroup_subtree_control_show(struct seq_file *seq, void *v) +{ + struct cgroup *cgrp = seq_css(seq)->cgroup; + + cgroup_print_ss_mask(seq, cgrp->child_subsys_mask); + return 0; +} + +/** + * cgroup_update_dfl_csses - update css assoc of a subtree in default hierarchy + * @cgrp: root of the subtree to update csses for + * + * @cgrp's child_subsys_mask has changed and its subtree's (self excluded) + * css associations need to be updated accordingly. This function looks up + * all css_sets which are attached to the subtree, creates the matching + * updated css_sets and migrates the tasks to the new ones. + */ +static int cgroup_update_dfl_csses(struct cgroup *cgrp) +{ + LIST_HEAD(preloaded_csets); + struct cgroup_subsys_state *css; + struct css_set *src_cset; + int ret; + + lockdep_assert_held(&cgroup_tree_mutex); + lockdep_assert_held(&cgroup_mutex); + + /* look up all csses currently attached to @cgrp's subtree */ + down_read(&css_set_rwsem); + css_for_each_descendant_pre(css, cgroup_css(cgrp, NULL)) { + struct cgrp_cset_link *link; + + /* self is not affected by child_subsys_mask change */ + if (css->cgroup == cgrp) + continue; + + list_for_each_entry(link, &css->cgroup->cset_links, cset_link) + cgroup_migrate_add_src(link->cset, cgrp, + &preloaded_csets); + } + up_read(&css_set_rwsem); + + /* NULL dst indicates self on default hierarchy */ + ret = cgroup_migrate_prepare_dst(NULL, &preloaded_csets); + if (ret) + goto out_finish; + + list_for_each_entry(src_cset, &preloaded_csets, mg_preload_node) { + struct task_struct *last_task = NULL, *task; + + /* src_csets precede dst_csets, break on the first dst_cset */ + if (!src_cset->mg_src_cgrp) + break; + + /* + * All tasks in src_cset need to be migrated to the + * matching dst_cset. Empty it process by process. We + * walk tasks but migrate processes. The leader might even + * belong to a different cset but such src_cset would also + * be among the target src_csets because the default + * hierarchy enforces per-process membership. + */ + while (true) { + down_read(&css_set_rwsem); + task = list_first_entry_or_null(&src_cset->tasks, + struct task_struct, cg_list); + if (task) { + task = task->group_leader; + WARN_ON_ONCE(!task_css_set(task)->mg_src_cgrp); + get_task_struct(task); + } + up_read(&css_set_rwsem); + + if (!task) + break; + + /* guard against possible infinite loop */ + if (WARN(last_task == task, + "cgroup: update_dfl_csses failed to make progress, aborting in inconsistent state\n")) + goto out_finish; + last_task = task; + + threadgroup_lock(task); + /* raced against de_thread() from another thread? */ + if (!thread_group_leader(task)) { + threadgroup_unlock(task); + put_task_struct(task); + continue; + } + + ret = cgroup_migrate(src_cset->dfl_cgrp, task, true); + + threadgroup_unlock(task); + put_task_struct(task); + + if (WARN(ret, "cgroup: failed to update controllers for the default hierarchy (%d), further operations may crash or hang\n", ret)) + goto out_finish; + } + } + +out_finish: + cgroup_migrate_finish(&preloaded_csets); + return ret; +} + +/* change the enabled child controllers for a cgroup in the default hierarchy */ +static int cgroup_subtree_control_write(struct cgroup_subsys_state *dummy_css, + struct cftype *cft, char *buffer) +{ + unsigned long enable_req = 0, disable_req = 0, enable, disable; + struct cgroup *cgrp = dummy_css->cgroup, *child; + struct cgroup_subsys *ss; + char *tok, *p; + int ssid, ret; + + /* + * Parse input - white space separated list of subsystem names + * prefixed with either + or -. + */ + p = buffer; + while ((tok = strsep(&p, " \t\n"))) { + for_each_subsys(ss, ssid) { + if (ss->disabled || strcmp(tok + 1, ss->name)) + continue; + + if (*tok == '+') { + enable_req |= 1 << ssid; + disable_req &= ~(1 << ssid); + } else if (*tok == '-') { + disable_req |= 1 << ssid; + enable_req &= ~(1 << ssid); + } else { + return -EINVAL; + } + break; + } + if (ssid == CGROUP_SUBSYS_COUNT) + return -EINVAL; + } + + /* + * We're gonna grab cgroup_tree_mutex which nests outside kernfs + * active_ref. cgroup_lock_live_group() already provides enough + * protection. Ensure @cgrp stays accessible and break the + * active_ref protection. + */ + cgroup_get(cgrp); + kernfs_break_active_protection(cgrp->control_kn); +retry: + enable = enable_req; + disable = disable_req; + + mutex_lock(&cgroup_tree_mutex); + + for_each_subsys(ss, ssid) { + if (enable & (1 << ssid)) { + if (cgrp->child_subsys_mask & (1 << ssid)) { + enable &= ~(1 << ssid); + continue; + } + + /* + * Because css offlining is asynchronous, userland + * might try to re-enable the same controller while + * the previous instance is still around. In such + * cases, wait till it's gone using offline_waitq. + */ + cgroup_for_each_live_child(child, cgrp) { + wait_queue_t wait; + + if (!cgroup_css(child, ss)) + continue; + + prepare_to_wait(&child->offline_waitq, &wait, + TASK_UNINTERRUPTIBLE); + mutex_unlock(&cgroup_tree_mutex); + schedule(); + finish_wait(&child->offline_waitq, &wait); + goto retry; + } + + /* unavailable or not enabled on the parent? */ + if (!(cgrp_dfl_root.subsys_mask & (1 << ssid)) || + (cgrp->parent && + !(cgrp->parent->child_subsys_mask & (1 << ssid)))) { + ret = -ENOENT; + goto out_unlock_tree; + } + } else if (disable & (1 << ssid)) { + if (!(cgrp->child_subsys_mask & (1 << ssid))) { + disable &= ~(1 << ssid); + continue; + } + + /* a child has it enabled? */ + cgroup_for_each_live_child(child, cgrp) { + if (child->child_subsys_mask & (1 << ssid)) { + ret = -EBUSY; + goto out_unlock_tree; + } + } + } + } + + if (!enable && !disable) { + ret = 0; + goto out_unlock_tree; + } + + if (!cgroup_lock_live_group(cgrp)) { + ret = -ENODEV; + goto out_unlock_tree; + } + + /* + * Except for the root, child_subsys_mask must be zero for a cgroup + * with tasks so that child cgroups don't compete against tasks. + */ + if (enable && cgrp->parent && !list_empty(&cgrp->cset_links)) { + ret = -EBUSY; + goto out_unlock; + } + + /* + * Create csses for enables and update child_subsys_mask. This + * changes cgroup_e_css() results which in turn makes the + * subsequent cgroup_update_dfl_csses() associate all tasks in the + * subtree to the updated csses. + */ + for_each_subsys(ss, ssid) { + if (!(enable & (1 << ssid))) + continue; + + cgroup_for_each_live_child(child, cgrp) { + ret = create_css(child, ss); + if (ret) + goto err_undo_css; + } + } + + cgrp->child_subsys_mask |= enable; + cgrp->child_subsys_mask &= ~disable; + + ret = cgroup_update_dfl_csses(cgrp); + if (ret) + goto err_undo_css; + + /* all tasks are now migrated away from the old csses, kill them */ + for_each_subsys(ss, ssid) { + if (!(disable & (1 << ssid))) + continue; + + cgroup_for_each_live_child(child, cgrp) + kill_css(cgroup_css(child, ss)); + } + + kernfs_activate(cgrp->kn); + ret = 0; +out_unlock: + mutex_unlock(&cgroup_mutex); +out_unlock_tree: + mutex_unlock(&cgroup_tree_mutex); + kernfs_unbreak_active_protection(cgrp->control_kn); + cgroup_put(cgrp); + return ret; + +err_undo_css: + cgrp->child_subsys_mask &= ~enable; + cgrp->child_subsys_mask |= disable; + + for_each_subsys(ss, ssid) { + if (!(enable & (1 << ssid))) + continue; + + cgroup_for_each_live_child(child, cgrp) { + struct cgroup_subsys_state *css = cgroup_css(child, ss); + if (css) + kill_css(css); + } + } + goto out_unlock; +} + static ssize_t cgroup_file_write(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off) { @@ -2442,6 +2773,8 @@ static int cgroup_add_file(struct cgroup *cgrp, struct cftype *cft) kn = __kernfs_create_file(cgrp->kn, cgroup_file_name(cgrp, cft, name), cgroup_file_mode(cft), 0, cft->kf_ops, cft, NULL, false, key); + if (cft->seq_show == cgroup_subtree_control_show) + cgrp->control_kn = kn; return PTR_ERR_OR_ZERO(kn); } @@ -3539,6 +3872,17 @@ static struct cftype cgroup_base_files[] = { .flags = CFTYPE_ONLY_ON_ROOT, .seq_show = cgroup_sane_behavior_show, }, + { + .name = "cgroup.controllers", + .flags = CFTYPE_ONLY_ON_DFL | CFTYPE_ONLY_ON_ROOT, + .seq_show = cgroup_controllers_show, + }, + { + .name = "cgroup.subtree_control", + .flags = CFTYPE_ONLY_ON_DFL, + .seq_show = cgroup_subtree_control_show, + .write_string = cgroup_subtree_control_write, + }, /* * Historical crazy stuff. These don't have "cgroup." prefix and @@ -3707,6 +4051,8 @@ static void offline_css(struct cgroup_subsys_state *css) css->flags &= ~CSS_ONLINE; css->cgroup->nr_css--; RCU_INIT_POINTER(css->cgroup->subsys[ss->id], NULL); + + wake_up_all(&css->cgroup->offline_waitq); } /** -- 1.8.5.3 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH 12/12] cgroup: implement dynamic subtree controller enable/disable on the default hierarchy @ 2014-03-28 2:41 ` Tejun Heo 0 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-03-28 2:41 UTC (permalink / raw) To: lizefan-hv44wF8Li93QT0dZR+AlfA Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo cgroup is switching away from multiple hierarchies and will use one unified default hierarchy where controllers can be dynamically enabled and disabled per subtree. The default hierarchy will serve as the unified hierarchy to which all controllers are attached and a css on the default hierarchy would need to also serve the tasks of descendant cgroups which don't have the controller enabled - ie. the tree may be collapsed from leaf towards root when viewed from specific controllers. This has been implemented through effective css in the previous patches. This patch finally implements dynamic subtree controller enable/disable on the default hierarchy via a new knob - "cgroup.subtree_control" which controls which controllers are enabled on the child cgroups. Let's assume a hierarchy like the following. root - A - B - C \ D root's "cgroup.subtree_control" determines which controllers are enabled on A. A's on B. B's on C and D. This coincides with the fact that controllers on the immediate sub-level are used to distribute the resources of the parent. In fact, it's natural to assume that resource control knobs of a child belong to its parent. Enabling a controller in "cgroup.subtree_control" declares that distribution of the respective resources of the cgroup will be controlled. Note that this means that controller enable states are shared among siblings. The default hierarchy has an extra restriction - only cgroups which don't contain any task may have controllers enabled in "cgroup.subtree_control". Combined with the other properties of the default hierarchy, this guarantees that, from the view point of controllers, tasks are only on the leaf cgroups. In other words, only leaf csses may contain tasks. This rules out situations where child cgroups compete against internal tasks of the parent, which is a competition between two different types of entities without any clear way to determine resource distribution between the two. Different controllers handle it differently and all the implemented behaviors are ambiguous, ad-hoc, cumbersome and/or just wrong. Having this structural constraints imposed from cgroup core removes the burden from controller implementations and enables showing one consistent behavior across all controllers. When a controller is enabled or disabled, css associations for the controller in the subtrees of each child should be updated. After enabling, the whole subtree of a child should point to the new css of the child. After disabling, the whole subtree of a child should point to the cgroup's css. This is implemented by first updating cgroup states such that cgroup_e_css() result points to the appropriate css and then invoking cgroup_update_dfl_csses() which migrates all tasks in the affected subtrees to the self cgroup on the default hierarchy. * When read, "cgroup.subtree_control" lists all the currently enabled controllers on the children of the cgroup. * White-space separated list of controller names prefixed with either '+' or '-' can be written to "cgroup.subtree_control". The ones prefixed with '+' are enabled on the controller and '-' disabled. * A controller can be enabled iff the parent's "cgroup.subtree_control" enables it and disabled iff no child's "cgroup.subtree_control" has it enabled. * If a cgroup has tasks, no controller can be enabled via "cgroup.subtree_control". Likewise, if "cgroup.subtree_control" has some controllers enabled, tasks can't be migrated into the cgroup. * All controllers which aren't bound on other hierarchies are automatically associated with the root cgroup of the default hierarchy. All the controllers which are bound to the default hierarchy are listed in the read-only file "cgroup.controllers" in the root directory. This is still experimental and there are some holes, one of which is that ->can_attach() failure during cgroup_update_dfl_csses() may leave the cgroups in an undefined state. The issues will be addressed by future patches. Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> --- include/linux/cgroup.h | 5 + kernel/cgroup.c | 346 +++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 351 insertions(+) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index b6f5e40..dee6f3c 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -21,6 +21,7 @@ #include <linux/percpu-refcount.h> #include <linux/seq_file.h> #include <linux/kernfs.h> +#include <linux/wait.h> #ifdef CONFIG_CGROUPS @@ -164,6 +165,7 @@ struct cgroup { struct cgroup *parent; /* my parent */ struct kernfs_node *kn; /* cgroup kernfs entry */ + struct kernfs_node *control_kn; /* kn for "cgroup.subtree_control" */ /* * Monotonically increasing unique serial number which defines a @@ -216,6 +218,9 @@ struct cgroup { /* For css percpu_ref killing and RCU-protected deletion */ struct rcu_head rcu_head; struct work_struct destroy_work; + + /* used to wait for offlining of csses */ + wait_queue_head_t offline_waitq; }; #define MAX_CGROUP_ROOT_NAMELEN 64 diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 5faa115..4e958c7 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -182,6 +182,8 @@ static int rebind_subsystems(struct cgroup_root *dst_root, unsigned long ss_mask); static void cgroup_destroy_css_killed(struct cgroup *cgrp); static int cgroup_destroy_locked(struct cgroup *cgrp); +static int create_css(struct cgroup *cgrp, struct cgroup_subsys *ss); +static void kill_css(struct cgroup_subsys_state *css); static int cgroup_addrm_files(struct cgroup *cgrp, struct cftype cfts[], bool is_add); static void cgroup_pidlist_destroy_all(struct cgroup *cgrp); @@ -338,6 +340,14 @@ static int notify_on_release(const struct cgroup *cgrp) #define for_each_root(root) \ list_for_each_entry((root), &cgroup_roots, root_list) +/* iterate over child cgrps, lock should be held throughout iteration */ +#define cgroup_for_each_live_child(child, cgrp) \ + list_for_each_entry((child), &(cgrp)->children, sibling) \ + if (({ lockdep_assert_held(&cgroup_tree_mutex); \ + cgroup_is_dead(child); })) \ + ; \ + else + /** * cgroup_lock_live_group - take cgroup_mutex and check that cgrp is alive. * @cgrp: the cgroup to be checked for liveness @@ -1450,6 +1460,8 @@ static void init_cgroup_housekeeping(struct cgroup *cgrp) for_each_subsys(ss, ssid) INIT_LIST_HEAD(&cgrp->e_csets[ssid]); + + init_waitqueue_head(&cgrp->offline_waitq); } static void init_cgroup_root(struct cgroup_root *root, @@ -1937,6 +1949,14 @@ static int cgroup_migrate_prepare_dst(struct cgroup *dst_cgrp, lockdep_assert_held(&cgroup_mutex); + /* + * Except for the root, child_subsys_mask must be zero for a cgroup + * with tasks so that child cgroups don't compete against tasks. + */ + if (dst_cgrp && cgroup_on_dfl(dst_cgrp) && dst_cgrp->parent && + dst_cgrp->child_subsys_mask) + return -EBUSY; + /* look up the dst cset for each src cset and link it to src */ list_for_each_entry_safe(src_cset, tmp_cset, preloaded_csets, mg_preload_node) { struct css_set *dst_cset; @@ -2302,6 +2322,317 @@ static int cgroup_sane_behavior_show(struct seq_file *seq, void *v) return 0; } +static void cgroup_print_ss_mask(struct seq_file *seq, unsigned int ss_mask) +{ + struct cgroup_subsys *ss; + bool printed = false; + int ssid; + + for_each_subsys(ss, ssid) { + if (ss_mask & (1 << ssid)) { + if (printed) + seq_putc(seq, ' '); + seq_printf(seq, "%s", ss->name); + printed = true; + } + } + if (printed) + seq_putc(seq, '\n'); +} + +/* show controllers which are currently attached to the default hierarchy */ +static int cgroup_controllers_show(struct seq_file *seq, void *v) +{ + struct cgroup *cgrp = seq_css(seq)->cgroup; + + cgroup_print_ss_mask(seq, cgrp->root->subsys_mask); + return 0; +} + +/* show controllers which are enabled for a given cgroup's children */ +static int cgroup_subtree_control_show(struct seq_file *seq, void *v) +{ + struct cgroup *cgrp = seq_css(seq)->cgroup; + + cgroup_print_ss_mask(seq, cgrp->child_subsys_mask); + return 0; +} + +/** + * cgroup_update_dfl_csses - update css assoc of a subtree in default hierarchy + * @cgrp: root of the subtree to update csses for + * + * @cgrp's child_subsys_mask has changed and its subtree's (self excluded) + * css associations need to be updated accordingly. This function looks up + * all css_sets which are attached to the subtree, creates the matching + * updated css_sets and migrates the tasks to the new ones. + */ +static int cgroup_update_dfl_csses(struct cgroup *cgrp) +{ + LIST_HEAD(preloaded_csets); + struct cgroup_subsys_state *css; + struct css_set *src_cset; + int ret; + + lockdep_assert_held(&cgroup_tree_mutex); + lockdep_assert_held(&cgroup_mutex); + + /* look up all csses currently attached to @cgrp's subtree */ + down_read(&css_set_rwsem); + css_for_each_descendant_pre(css, cgroup_css(cgrp, NULL)) { + struct cgrp_cset_link *link; + + /* self is not affected by child_subsys_mask change */ + if (css->cgroup == cgrp) + continue; + + list_for_each_entry(link, &css->cgroup->cset_links, cset_link) + cgroup_migrate_add_src(link->cset, cgrp, + &preloaded_csets); + } + up_read(&css_set_rwsem); + + /* NULL dst indicates self on default hierarchy */ + ret = cgroup_migrate_prepare_dst(NULL, &preloaded_csets); + if (ret) + goto out_finish; + + list_for_each_entry(src_cset, &preloaded_csets, mg_preload_node) { + struct task_struct *last_task = NULL, *task; + + /* src_csets precede dst_csets, break on the first dst_cset */ + if (!src_cset->mg_src_cgrp) + break; + + /* + * All tasks in src_cset need to be migrated to the + * matching dst_cset. Empty it process by process. We + * walk tasks but migrate processes. The leader might even + * belong to a different cset but such src_cset would also + * be among the target src_csets because the default + * hierarchy enforces per-process membership. + */ + while (true) { + down_read(&css_set_rwsem); + task = list_first_entry_or_null(&src_cset->tasks, + struct task_struct, cg_list); + if (task) { + task = task->group_leader; + WARN_ON_ONCE(!task_css_set(task)->mg_src_cgrp); + get_task_struct(task); + } + up_read(&css_set_rwsem); + + if (!task) + break; + + /* guard against possible infinite loop */ + if (WARN(last_task == task, + "cgroup: update_dfl_csses failed to make progress, aborting in inconsistent state\n")) + goto out_finish; + last_task = task; + + threadgroup_lock(task); + /* raced against de_thread() from another thread? */ + if (!thread_group_leader(task)) { + threadgroup_unlock(task); + put_task_struct(task); + continue; + } + + ret = cgroup_migrate(src_cset->dfl_cgrp, task, true); + + threadgroup_unlock(task); + put_task_struct(task); + + if (WARN(ret, "cgroup: failed to update controllers for the default hierarchy (%d), further operations may crash or hang\n", ret)) + goto out_finish; + } + } + +out_finish: + cgroup_migrate_finish(&preloaded_csets); + return ret; +} + +/* change the enabled child controllers for a cgroup in the default hierarchy */ +static int cgroup_subtree_control_write(struct cgroup_subsys_state *dummy_css, + struct cftype *cft, char *buffer) +{ + unsigned long enable_req = 0, disable_req = 0, enable, disable; + struct cgroup *cgrp = dummy_css->cgroup, *child; + struct cgroup_subsys *ss; + char *tok, *p; + int ssid, ret; + + /* + * Parse input - white space separated list of subsystem names + * prefixed with either + or -. + */ + p = buffer; + while ((tok = strsep(&p, " \t\n"))) { + for_each_subsys(ss, ssid) { + if (ss->disabled || strcmp(tok + 1, ss->name)) + continue; + + if (*tok == '+') { + enable_req |= 1 << ssid; + disable_req &= ~(1 << ssid); + } else if (*tok == '-') { + disable_req |= 1 << ssid; + enable_req &= ~(1 << ssid); + } else { + return -EINVAL; + } + break; + } + if (ssid == CGROUP_SUBSYS_COUNT) + return -EINVAL; + } + + /* + * We're gonna grab cgroup_tree_mutex which nests outside kernfs + * active_ref. cgroup_lock_live_group() already provides enough + * protection. Ensure @cgrp stays accessible and break the + * active_ref protection. + */ + cgroup_get(cgrp); + kernfs_break_active_protection(cgrp->control_kn); +retry: + enable = enable_req; + disable = disable_req; + + mutex_lock(&cgroup_tree_mutex); + + for_each_subsys(ss, ssid) { + if (enable & (1 << ssid)) { + if (cgrp->child_subsys_mask & (1 << ssid)) { + enable &= ~(1 << ssid); + continue; + } + + /* + * Because css offlining is asynchronous, userland + * might try to re-enable the same controller while + * the previous instance is still around. In such + * cases, wait till it's gone using offline_waitq. + */ + cgroup_for_each_live_child(child, cgrp) { + wait_queue_t wait; + + if (!cgroup_css(child, ss)) + continue; + + prepare_to_wait(&child->offline_waitq, &wait, + TASK_UNINTERRUPTIBLE); + mutex_unlock(&cgroup_tree_mutex); + schedule(); + finish_wait(&child->offline_waitq, &wait); + goto retry; + } + + /* unavailable or not enabled on the parent? */ + if (!(cgrp_dfl_root.subsys_mask & (1 << ssid)) || + (cgrp->parent && + !(cgrp->parent->child_subsys_mask & (1 << ssid)))) { + ret = -ENOENT; + goto out_unlock_tree; + } + } else if (disable & (1 << ssid)) { + if (!(cgrp->child_subsys_mask & (1 << ssid))) { + disable &= ~(1 << ssid); + continue; + } + + /* a child has it enabled? */ + cgroup_for_each_live_child(child, cgrp) { + if (child->child_subsys_mask & (1 << ssid)) { + ret = -EBUSY; + goto out_unlock_tree; + } + } + } + } + + if (!enable && !disable) { + ret = 0; + goto out_unlock_tree; + } + + if (!cgroup_lock_live_group(cgrp)) { + ret = -ENODEV; + goto out_unlock_tree; + } + + /* + * Except for the root, child_subsys_mask must be zero for a cgroup + * with tasks so that child cgroups don't compete against tasks. + */ + if (enable && cgrp->parent && !list_empty(&cgrp->cset_links)) { + ret = -EBUSY; + goto out_unlock; + } + + /* + * Create csses for enables and update child_subsys_mask. This + * changes cgroup_e_css() results which in turn makes the + * subsequent cgroup_update_dfl_csses() associate all tasks in the + * subtree to the updated csses. + */ + for_each_subsys(ss, ssid) { + if (!(enable & (1 << ssid))) + continue; + + cgroup_for_each_live_child(child, cgrp) { + ret = create_css(child, ss); + if (ret) + goto err_undo_css; + } + } + + cgrp->child_subsys_mask |= enable; + cgrp->child_subsys_mask &= ~disable; + + ret = cgroup_update_dfl_csses(cgrp); + if (ret) + goto err_undo_css; + + /* all tasks are now migrated away from the old csses, kill them */ + for_each_subsys(ss, ssid) { + if (!(disable & (1 << ssid))) + continue; + + cgroup_for_each_live_child(child, cgrp) + kill_css(cgroup_css(child, ss)); + } + + kernfs_activate(cgrp->kn); + ret = 0; +out_unlock: + mutex_unlock(&cgroup_mutex); +out_unlock_tree: + mutex_unlock(&cgroup_tree_mutex); + kernfs_unbreak_active_protection(cgrp->control_kn); + cgroup_put(cgrp); + return ret; + +err_undo_css: + cgrp->child_subsys_mask &= ~enable; + cgrp->child_subsys_mask |= disable; + + for_each_subsys(ss, ssid) { + if (!(enable & (1 << ssid))) + continue; + + cgroup_for_each_live_child(child, cgrp) { + struct cgroup_subsys_state *css = cgroup_css(child, ss); + if (css) + kill_css(css); + } + } + goto out_unlock; +} + static ssize_t cgroup_file_write(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off) { @@ -2442,6 +2773,8 @@ static int cgroup_add_file(struct cgroup *cgrp, struct cftype *cft) kn = __kernfs_create_file(cgrp->kn, cgroup_file_name(cgrp, cft, name), cgroup_file_mode(cft), 0, cft->kf_ops, cft, NULL, false, key); + if (cft->seq_show == cgroup_subtree_control_show) + cgrp->control_kn = kn; return PTR_ERR_OR_ZERO(kn); } @@ -3539,6 +3872,17 @@ static struct cftype cgroup_base_files[] = { .flags = CFTYPE_ONLY_ON_ROOT, .seq_show = cgroup_sane_behavior_show, }, + { + .name = "cgroup.controllers", + .flags = CFTYPE_ONLY_ON_DFL | CFTYPE_ONLY_ON_ROOT, + .seq_show = cgroup_controllers_show, + }, + { + .name = "cgroup.subtree_control", + .flags = CFTYPE_ONLY_ON_DFL, + .seq_show = cgroup_subtree_control_show, + .write_string = cgroup_subtree_control_write, + }, /* * Historical crazy stuff. These don't have "cgroup." prefix and @@ -3707,6 +4051,8 @@ static void offline_css(struct cgroup_subsys_state *css) css->flags &= ~CSS_ONLINE; css->cgroup->nr_css--; RCU_INIT_POINTER(css->cgroup->subsys[ss->id], NULL); + + wake_up_all(&css->cgroup->offline_waitq); } /** -- 1.8.5.3 ^ permalink raw reply related [flat|nested] 63+ messages in thread
[parent not found: <1395974461-12735-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>]
* [PATCH 01/12] cgroup: update cgroup->subsys_mask to ->child_subsys_mask and restore cgroup_root->subsys_mask 2014-03-28 2:40 ` Tejun Heo @ 2014-03-28 2:40 ` Tejun Heo -1 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-03-28 2:40 UTC (permalink / raw) To: lizefan-hv44wF8Li93QT0dZR+AlfA Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-kernel-u79uwXL29TY76Z2rM5mHXA 944196278d3d ("cgroup: move ->subsys_mask from cgroupfs_root to cgroup") moved ->subsys_mask from cgroup_root to cgroup to prepare for the unified hierarhcy; however, it turns out that carrying the subsys_mask of the children in the parent, instead of itself, is a lot more natural. This patch restores cgroup_root->subsys_mask and morphs cgroup->subsys_mask into cgroup->child_subsys_mask. * Uses of root->cgrp.subsys_mask are restored to root->subsys_mask. * Remove automatic setting and clearing of cgrp->subsys_mask and instead just inherit ->child_subsys_mask from the parent during cgroup creation. Note that this doesn't affect any current behaviors. * Undo __kill_css() separation. Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> --- include/linux/cgroup.h | 7 ++++-- kernel/cgroup.c | 64 +++++++++++++++++++++----------------------------- 2 files changed, 32 insertions(+), 39 deletions(-) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 43d1ed3..e351a18 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -173,8 +173,8 @@ struct cgroup { */ u64 serial_nr; - /* The bitmask of subsystems attached to this cgroup */ - unsigned long subsys_mask; + /* the bitmask of subsystems enabled on the child cgroups */ + unsigned long child_subsys_mask; /* Private pointers for each registered subsystem */ struct cgroup_subsys_state __rcu *subsys[CGROUP_SUBSYS_COUNT]; @@ -282,6 +282,9 @@ enum { struct cgroup_root { struct kernfs_root *kf_root; + /* The bitmask of subsystems attached to this hierarchy */ + unsigned long subsys_mask; + /* Unique id for this hierarchy. */ int hierarchy_id; diff --git a/kernel/cgroup.c b/kernel/cgroup.c index e378cb2..3c9c623 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -529,7 +529,7 @@ static struct css_set *find_existing_css_set(struct css_set *old_cset, * won't change, so no need for locking. */ for_each_subsys(ss, i) { - if (root->cgrp.subsys_mask & (1UL << i)) { + if (root->subsys_mask & (1UL << i)) { /* Subsystem is in this hierarchy. So we want * the subsystem state from the new * cgroup */ @@ -742,7 +742,7 @@ static void cgroup_destroy_root(struct cgroup_root *root) BUG_ON(!list_empty(&cgrp->children)); /* Rebind all subsystems back to the default hierarchy */ - rebind_subsystems(&cgrp_dfl_root, cgrp->subsys_mask); + rebind_subsystems(&cgrp_dfl_root, root->subsys_mask); /* * Release all the links from cset_links to this hierarchy's @@ -1050,8 +1050,11 @@ static int rebind_subsystems(struct cgroup_root *dst_root, ss->root = dst_root; css->cgroup = &dst_root->cgrp; - src_root->cgrp.subsys_mask &= ~(1 << ssid); - dst_root->cgrp.subsys_mask |= 1 << ssid; + src_root->subsys_mask &= ~(1 << ssid); + src_root->cgrp.child_subsys_mask &= ~(1 << ssid); + + dst_root->subsys_mask |= 1 << ssid; + dst_root->cgrp.child_subsys_mask |= 1 << ssid; if (ss->bind) ss->bind(css); @@ -1069,7 +1072,7 @@ static int cgroup_show_options(struct seq_file *seq, int ssid; for_each_subsys(ss, ssid) - if (root->cgrp.subsys_mask & (1 << ssid)) + if (root->subsys_mask & (1 << ssid)) seq_printf(seq, ",%s", ss->name); if (root->flags & CGRP_ROOT_SANE_BEHAVIOR) seq_puts(seq, ",sane_behavior"); @@ -1273,12 +1276,12 @@ static int cgroup_remount(struct kernfs_root *kf_root, int *flags, char *data) if (ret) goto out_unlock; - if (opts.subsys_mask != root->cgrp.subsys_mask || opts.release_agent) + if (opts.subsys_mask != root->subsys_mask || opts.release_agent) pr_warning("cgroup: option changes via remount are deprecated (pid=%d comm=%s)\n", task_tgid_nr(current), current->comm); - added_mask = opts.subsys_mask & ~root->cgrp.subsys_mask; - removed_mask = root->cgrp.subsys_mask & ~opts.subsys_mask; + added_mask = opts.subsys_mask & ~root->subsys_mask; + removed_mask = root->subsys_mask & ~opts.subsys_mask; /* Don't allow flags or name to change at remount */ if (((opts.flags ^ root->flags) & CGRP_ROOT_OPTION_MASK) || @@ -1534,7 +1537,7 @@ retry: * subsystems) then they must match. */ if ((opts.subsys_mask || opts.none) && - (opts.subsys_mask != root->cgrp.subsys_mask)) { + (opts.subsys_mask != root->subsys_mask)) { if (!name_match) continue; ret = -EBUSY; @@ -3640,8 +3643,6 @@ static int create_css(struct cgroup *cgrp, struct cgroup_subsys *ss) cgroup_get(cgrp); css_get(css->parent); - cgrp->subsys_mask |= 1 << ss->id; - if (ss->broken_hierarchy && !ss->warned_broken_hierarchy && parent->parent) { pr_warning("cgroup: %s (%d) created nested cgroup for controller \"%s\" which has incomplete hierarchy support. Nested cgroups may change behavior in the future.\n", @@ -3755,13 +3756,15 @@ static long cgroup_create(struct cgroup *parent, const char *name, /* let's create and online css's */ for_each_subsys(ss, ssid) { - if (root->cgrp.subsys_mask & (1 << ssid)) { + if (parent->child_subsys_mask & (1 << ssid)) { err = create_css(cgrp, ss); if (err) goto err_destroy; } } + cgrp->child_subsys_mask = parent->child_subsys_mask; + kernfs_activate(kn); mutex_unlock(&cgroup_mutex); @@ -3857,7 +3860,16 @@ static void css_killed_ref_fn(struct percpu_ref *ref) queue_work(cgroup_destroy_wq, &css->destroy_work); } -static void __kill_css(struct cgroup_subsys_state *css) +/** + * kill_css - destroy a css + * @css: css to destroy + * + * This function initiates destruction of @css by removing cgroup interface + * files and putting its base reference. ->css_offline() will be invoked + * asynchronously once css_tryget() is guaranteed to fail and when the + * reference count reaches zero, @css will be released. + */ +static void kill_css(struct cgroup_subsys_state *css) { lockdep_assert_held(&cgroup_tree_mutex); @@ -3887,28 +3899,6 @@ static void __kill_css(struct cgroup_subsys_state *css) } /** - * kill_css - destroy a css - * @css: css to destroy - * - * This function initiates destruction of @css by removing cgroup interface - * files and putting its base reference. ->css_offline() will be invoked - * asynchronously once css_tryget() is guaranteed to fail and when the - * reference count reaches zero, @css will be released. - */ -static void kill_css(struct cgroup_subsys_state *css) -{ - struct cgroup *cgrp = css->cgroup; - - lockdep_assert_held(&cgroup_tree_mutex); - - /* if already killed, noop */ - if (cgrp->subsys_mask & (1 << css->ss->id)) { - cgrp->subsys_mask &= ~(1 << css->ss->id); - __kill_css(css); - } -} - -/** * cgroup_destroy_locked - the first stage of cgroup destruction * @cgrp: cgroup to be destroyed * @@ -4120,7 +4110,7 @@ static void __init cgroup_init_subsys(struct cgroup_subsys *ss) BUG_ON(online_css(css)); - cgrp_dfl_root.cgrp.subsys_mask |= 1 << ss->id; + cgrp_dfl_root.subsys_mask |= 1 << ss->id; mutex_unlock(&cgroup_mutex); mutex_unlock(&cgroup_tree_mutex); @@ -4277,7 +4267,7 @@ int proc_cgroup_show(struct seq_file *m, void *v) seq_printf(m, "%d:", root->hierarchy_id); for_each_subsys(ss, ssid) - if (root->cgrp.subsys_mask & (1 << ssid)) + if (root->subsys_mask & (1 << ssid)) seq_printf(m, "%s%s", count++ ? "," : "", ss->name); if (strlen(root->name)) seq_printf(m, "%sname=%s", count ? "," : "", -- 1.8.5.3 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH 01/12] cgroup: update cgroup->subsys_mask to ->child_subsys_mask and restore cgroup_root->subsys_mask @ 2014-03-28 2:40 ` Tejun Heo 0 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-03-28 2:40 UTC (permalink / raw) To: lizefan; +Cc: containers, cgroups, linux-kernel, Tejun Heo 944196278d3d ("cgroup: move ->subsys_mask from cgroupfs_root to cgroup") moved ->subsys_mask from cgroup_root to cgroup to prepare for the unified hierarhcy; however, it turns out that carrying the subsys_mask of the children in the parent, instead of itself, is a lot more natural. This patch restores cgroup_root->subsys_mask and morphs cgroup->subsys_mask into cgroup->child_subsys_mask. * Uses of root->cgrp.subsys_mask are restored to root->subsys_mask. * Remove automatic setting and clearing of cgrp->subsys_mask and instead just inherit ->child_subsys_mask from the parent during cgroup creation. Note that this doesn't affect any current behaviors. * Undo __kill_css() separation. Signed-off-by: Tejun Heo <tj@kernel.org> --- include/linux/cgroup.h | 7 ++++-- kernel/cgroup.c | 64 +++++++++++++++++++++----------------------------- 2 files changed, 32 insertions(+), 39 deletions(-) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 43d1ed3..e351a18 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -173,8 +173,8 @@ struct cgroup { */ u64 serial_nr; - /* The bitmask of subsystems attached to this cgroup */ - unsigned long subsys_mask; + /* the bitmask of subsystems enabled on the child cgroups */ + unsigned long child_subsys_mask; /* Private pointers for each registered subsystem */ struct cgroup_subsys_state __rcu *subsys[CGROUP_SUBSYS_COUNT]; @@ -282,6 +282,9 @@ enum { struct cgroup_root { struct kernfs_root *kf_root; + /* The bitmask of subsystems attached to this hierarchy */ + unsigned long subsys_mask; + /* Unique id for this hierarchy. */ int hierarchy_id; diff --git a/kernel/cgroup.c b/kernel/cgroup.c index e378cb2..3c9c623 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -529,7 +529,7 @@ static struct css_set *find_existing_css_set(struct css_set *old_cset, * won't change, so no need for locking. */ for_each_subsys(ss, i) { - if (root->cgrp.subsys_mask & (1UL << i)) { + if (root->subsys_mask & (1UL << i)) { /* Subsystem is in this hierarchy. So we want * the subsystem state from the new * cgroup */ @@ -742,7 +742,7 @@ static void cgroup_destroy_root(struct cgroup_root *root) BUG_ON(!list_empty(&cgrp->children)); /* Rebind all subsystems back to the default hierarchy */ - rebind_subsystems(&cgrp_dfl_root, cgrp->subsys_mask); + rebind_subsystems(&cgrp_dfl_root, root->subsys_mask); /* * Release all the links from cset_links to this hierarchy's @@ -1050,8 +1050,11 @@ static int rebind_subsystems(struct cgroup_root *dst_root, ss->root = dst_root; css->cgroup = &dst_root->cgrp; - src_root->cgrp.subsys_mask &= ~(1 << ssid); - dst_root->cgrp.subsys_mask |= 1 << ssid; + src_root->subsys_mask &= ~(1 << ssid); + src_root->cgrp.child_subsys_mask &= ~(1 << ssid); + + dst_root->subsys_mask |= 1 << ssid; + dst_root->cgrp.child_subsys_mask |= 1 << ssid; if (ss->bind) ss->bind(css); @@ -1069,7 +1072,7 @@ static int cgroup_show_options(struct seq_file *seq, int ssid; for_each_subsys(ss, ssid) - if (root->cgrp.subsys_mask & (1 << ssid)) + if (root->subsys_mask & (1 << ssid)) seq_printf(seq, ",%s", ss->name); if (root->flags & CGRP_ROOT_SANE_BEHAVIOR) seq_puts(seq, ",sane_behavior"); @@ -1273,12 +1276,12 @@ static int cgroup_remount(struct kernfs_root *kf_root, int *flags, char *data) if (ret) goto out_unlock; - if (opts.subsys_mask != root->cgrp.subsys_mask || opts.release_agent) + if (opts.subsys_mask != root->subsys_mask || opts.release_agent) pr_warning("cgroup: option changes via remount are deprecated (pid=%d comm=%s)\n", task_tgid_nr(current), current->comm); - added_mask = opts.subsys_mask & ~root->cgrp.subsys_mask; - removed_mask = root->cgrp.subsys_mask & ~opts.subsys_mask; + added_mask = opts.subsys_mask & ~root->subsys_mask; + removed_mask = root->subsys_mask & ~opts.subsys_mask; /* Don't allow flags or name to change at remount */ if (((opts.flags ^ root->flags) & CGRP_ROOT_OPTION_MASK) || @@ -1534,7 +1537,7 @@ retry: * subsystems) then they must match. */ if ((opts.subsys_mask || opts.none) && - (opts.subsys_mask != root->cgrp.subsys_mask)) { + (opts.subsys_mask != root->subsys_mask)) { if (!name_match) continue; ret = -EBUSY; @@ -3640,8 +3643,6 @@ static int create_css(struct cgroup *cgrp, struct cgroup_subsys *ss) cgroup_get(cgrp); css_get(css->parent); - cgrp->subsys_mask |= 1 << ss->id; - if (ss->broken_hierarchy && !ss->warned_broken_hierarchy && parent->parent) { pr_warning("cgroup: %s (%d) created nested cgroup for controller \"%s\" which has incomplete hierarchy support. Nested cgroups may change behavior in the future.\n", @@ -3755,13 +3756,15 @@ static long cgroup_create(struct cgroup *parent, const char *name, /* let's create and online css's */ for_each_subsys(ss, ssid) { - if (root->cgrp.subsys_mask & (1 << ssid)) { + if (parent->child_subsys_mask & (1 << ssid)) { err = create_css(cgrp, ss); if (err) goto err_destroy; } } + cgrp->child_subsys_mask = parent->child_subsys_mask; + kernfs_activate(kn); mutex_unlock(&cgroup_mutex); @@ -3857,7 +3860,16 @@ static void css_killed_ref_fn(struct percpu_ref *ref) queue_work(cgroup_destroy_wq, &css->destroy_work); } -static void __kill_css(struct cgroup_subsys_state *css) +/** + * kill_css - destroy a css + * @css: css to destroy + * + * This function initiates destruction of @css by removing cgroup interface + * files and putting its base reference. ->css_offline() will be invoked + * asynchronously once css_tryget() is guaranteed to fail and when the + * reference count reaches zero, @css will be released. + */ +static void kill_css(struct cgroup_subsys_state *css) { lockdep_assert_held(&cgroup_tree_mutex); @@ -3887,28 +3899,6 @@ static void __kill_css(struct cgroup_subsys_state *css) } /** - * kill_css - destroy a css - * @css: css to destroy - * - * This function initiates destruction of @css by removing cgroup interface - * files and putting its base reference. ->css_offline() will be invoked - * asynchronously once css_tryget() is guaranteed to fail and when the - * reference count reaches zero, @css will be released. - */ -static void kill_css(struct cgroup_subsys_state *css) -{ - struct cgroup *cgrp = css->cgroup; - - lockdep_assert_held(&cgroup_tree_mutex); - - /* if already killed, noop */ - if (cgrp->subsys_mask & (1 << css->ss->id)) { - cgrp->subsys_mask &= ~(1 << css->ss->id); - __kill_css(css); - } -} - -/** * cgroup_destroy_locked - the first stage of cgroup destruction * @cgrp: cgroup to be destroyed * @@ -4120,7 +4110,7 @@ static void __init cgroup_init_subsys(struct cgroup_subsys *ss) BUG_ON(online_css(css)); - cgrp_dfl_root.cgrp.subsys_mask |= 1 << ss->id; + cgrp_dfl_root.subsys_mask |= 1 << ss->id; mutex_unlock(&cgroup_mutex); mutex_unlock(&cgroup_tree_mutex); @@ -4277,7 +4267,7 @@ int proc_cgroup_show(struct seq_file *m, void *v) seq_printf(m, "%d:", root->hierarchy_id); for_each_subsys(ss, ssid) - if (root->cgrp.subsys_mask & (1 << ssid)) + if (root->subsys_mask & (1 << ssid)) seq_printf(m, "%s%s", count++ ? "," : "", ss->name); if (strlen(root->name)) seq_printf(m, "%sname=%s", count ? "," : "", -- 1.8.5.3 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH 02/12] cgroup: introduce effective cgroup_subsys_state 2014-03-28 2:40 ` Tejun Heo @ 2014-03-28 2:40 ` Tejun Heo -1 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-03-28 2:40 UTC (permalink / raw) To: lizefan-hv44wF8Li93QT0dZR+AlfA Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-kernel-u79uwXL29TY76Z2rM5mHXA In the planned default unified hierarchy, controllers may get dynamically attached to and detached from a cgroup and a cgroup may not have csses for all the controllers associated with the hierarchy. When a cgroup doesn't have its own css for a given controller, the css of the nearest ancestor with the controller enabled will be used, which is called the effective css. This patch introduces cgroup_e_css() and for_each_e_css() to access the effective csses and convert compare_css_sets(), find_existing_css_set() and cgroup_migrate() to use the effective csses so that they can handle cgroups with partial csses correctly. This means that for two css_sets to be considered identical, they should have both matching csses and cgroups. compare_css_sets() already compares both, not for correctness but for optimization. As this now becomes a matter of correctness, update the comments accordingly. For all !default hierarchies, cgroup_e_css() always equals cgroup_css(), so this patch doesn't change behavior. While at it, fix incorrect locking comment for for_each_css(). Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> --- kernel/cgroup.c | 83 ++++++++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 64 insertions(+), 19 deletions(-) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 3c9c623..ae1fe5a 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -208,6 +208,34 @@ static struct cgroup_subsys_state *cgroup_css(struct cgroup *cgrp, return &cgrp->dummy_css; } +/** + * cgroup_e_css - obtain a cgroup's effective css for the specified subsystem + * @cgrp: the cgroup of interest + * @ss: the subsystem of interest (%NULL returns the dummy_css) + * + * Similar to cgroup_css() but returns the effctive css, which is defined + * as the matching css of the nearest ancestor including self which has @ss + * enabled. If @ss is associated with the hierarchy @cgrp is on, this + * function is guaranteed to return non-NULL css. + */ +static struct cgroup_subsys_state *cgroup_e_css(struct cgroup *cgrp, + struct cgroup_subsys *ss) +{ + lockdep_assert_held(&cgroup_mutex); + + if (!ss) + return &cgrp->dummy_css; + + if (!(cgrp->root->subsys_mask & (1 << ss->id))) + return NULL; + + while (cgrp->parent && + !(cgrp->parent->child_subsys_mask & (1 << ss->id))) + cgrp = cgrp->parent; + + return cgroup_css(cgrp, ss); +} + /* convenient tests for these bits */ static inline bool cgroup_is_dead(const struct cgroup *cgrp) { @@ -273,7 +301,7 @@ static int notify_on_release(const struct cgroup *cgrp) * @ssid: the index of the subsystem, CGROUP_SUBSYS_COUNT after reaching the end * @cgrp: the target cgroup to iterate css's of * - * Should be called under cgroup_mutex. + * Should be called under cgroup_[tree_]mutex. */ #define for_each_css(css, ssid, cgrp) \ for ((ssid) = 0; (ssid) < CGROUP_SUBSYS_COUNT; (ssid)++) \ @@ -284,6 +312,20 @@ static int notify_on_release(const struct cgroup *cgrp) else /** + * for_each_e_css - iterate all effective css's of a cgroup + * @css: the iteration cursor + * @ssid: the index of the subsystem, CGROUP_SUBSYS_COUNT after reaching the end + * @cgrp: the target cgroup to iterate css's of + * + * Should be called under cgroup_[tree_]mutex. + */ +#define for_each_e_css(css, ssid, cgrp) \ + for ((ssid) = 0; (ssid) < CGROUP_SUBSYS_COUNT; (ssid)++) \ + if (!((css) = cgroup_e_css(cgrp, cgroup_subsys[(ssid)]))) \ + ; \ + else + +/** * for_each_subsys - iterate all enabled cgroup subsystems * @ss: the iteration cursor * @ssid: the index of @ss, CGROUP_SUBSYS_COUNT after reaching the end @@ -452,20 +494,20 @@ static bool compare_css_sets(struct css_set *cset, { struct list_head *l1, *l2; - if (memcmp(template, cset->subsys, sizeof(cset->subsys))) { - /* Not all subsystems matched */ + /* + * On the default hierarchy, there can be csets which are + * associated with the same set of cgroups but different csses. + * Let's first ensure that csses match. + */ + if (memcmp(template, cset->subsys, sizeof(cset->subsys))) return false; - } /* * Compare cgroup pointers in order to distinguish between - * different cgroups in heirarchies with no subsystems. We - * could get by with just this check alone (and skip the - * memcmp above) but on most setups the memcmp check will - * avoid the need for this more expensive check on almost all - * candidates. + * different cgroups in hierarchies. As different cgroups may + * share the same effective css, this comparison is always + * necessary. */ - l1 = &cset->cgrp_links; l2 = &old_cset->cgrp_links; while (1) { @@ -530,13 +572,16 @@ static struct css_set *find_existing_css_set(struct css_set *old_cset, */ for_each_subsys(ss, i) { if (root->subsys_mask & (1UL << i)) { - /* Subsystem is in this hierarchy. So we want - * the subsystem state from the new - * cgroup */ - template[i] = cgroup_css(cgrp, ss); + /* + * @ss is in this hierarchy, so we want the + * effective css from @cgrp. + */ + template[i] = cgroup_e_css(cgrp, ss); } else { - /* Subsystem is not in this hierarchy, so we - * don't want to change the subsystem state */ + /* + * @ss is not in this hierarchy, so we don't want + * to change the css. + */ template[i] = old_cset->subsys[i]; } } @@ -1968,7 +2013,7 @@ static int cgroup_migrate(struct cgroup *cgrp, struct task_struct *leader, return 0; /* check that we can legitimately attach to the cgroup */ - for_each_css(css, i, cgrp) { + for_each_e_css(css, i, cgrp) { if (css->ss->can_attach) { ret = css->ss->can_attach(css, &tset); if (ret) { @@ -1998,7 +2043,7 @@ static int cgroup_migrate(struct cgroup *cgrp, struct task_struct *leader, */ tset.csets = &tset.dst_csets; - for_each_css(css, i, cgrp) + for_each_e_css(css, i, cgrp) if (css->ss->attach) css->ss->attach(css, &tset); @@ -2006,7 +2051,7 @@ static int cgroup_migrate(struct cgroup *cgrp, struct task_struct *leader, goto out_release_tset; out_cancel_attach: - for_each_css(css, i, cgrp) { + for_each_e_css(css, i, cgrp) { if (css == failed_css) break; if (css->ss->cancel_attach) -- 1.8.5.3 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH 02/12] cgroup: introduce effective cgroup_subsys_state @ 2014-03-28 2:40 ` Tejun Heo 0 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-03-28 2:40 UTC (permalink / raw) To: lizefan; +Cc: containers, cgroups, linux-kernel, Tejun Heo In the planned default unified hierarchy, controllers may get dynamically attached to and detached from a cgroup and a cgroup may not have csses for all the controllers associated with the hierarchy. When a cgroup doesn't have its own css for a given controller, the css of the nearest ancestor with the controller enabled will be used, which is called the effective css. This patch introduces cgroup_e_css() and for_each_e_css() to access the effective csses and convert compare_css_sets(), find_existing_css_set() and cgroup_migrate() to use the effective csses so that they can handle cgroups with partial csses correctly. This means that for two css_sets to be considered identical, they should have both matching csses and cgroups. compare_css_sets() already compares both, not for correctness but for optimization. As this now becomes a matter of correctness, update the comments accordingly. For all !default hierarchies, cgroup_e_css() always equals cgroup_css(), so this patch doesn't change behavior. While at it, fix incorrect locking comment for for_each_css(). Signed-off-by: Tejun Heo <tj@kernel.org> --- kernel/cgroup.c | 83 ++++++++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 64 insertions(+), 19 deletions(-) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 3c9c623..ae1fe5a 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -208,6 +208,34 @@ static struct cgroup_subsys_state *cgroup_css(struct cgroup *cgrp, return &cgrp->dummy_css; } +/** + * cgroup_e_css - obtain a cgroup's effective css for the specified subsystem + * @cgrp: the cgroup of interest + * @ss: the subsystem of interest (%NULL returns the dummy_css) + * + * Similar to cgroup_css() but returns the effctive css, which is defined + * as the matching css of the nearest ancestor including self which has @ss + * enabled. If @ss is associated with the hierarchy @cgrp is on, this + * function is guaranteed to return non-NULL css. + */ +static struct cgroup_subsys_state *cgroup_e_css(struct cgroup *cgrp, + struct cgroup_subsys *ss) +{ + lockdep_assert_held(&cgroup_mutex); + + if (!ss) + return &cgrp->dummy_css; + + if (!(cgrp->root->subsys_mask & (1 << ss->id))) + return NULL; + + while (cgrp->parent && + !(cgrp->parent->child_subsys_mask & (1 << ss->id))) + cgrp = cgrp->parent; + + return cgroup_css(cgrp, ss); +} + /* convenient tests for these bits */ static inline bool cgroup_is_dead(const struct cgroup *cgrp) { @@ -273,7 +301,7 @@ static int notify_on_release(const struct cgroup *cgrp) * @ssid: the index of the subsystem, CGROUP_SUBSYS_COUNT after reaching the end * @cgrp: the target cgroup to iterate css's of * - * Should be called under cgroup_mutex. + * Should be called under cgroup_[tree_]mutex. */ #define for_each_css(css, ssid, cgrp) \ for ((ssid) = 0; (ssid) < CGROUP_SUBSYS_COUNT; (ssid)++) \ @@ -284,6 +312,20 @@ static int notify_on_release(const struct cgroup *cgrp) else /** + * for_each_e_css - iterate all effective css's of a cgroup + * @css: the iteration cursor + * @ssid: the index of the subsystem, CGROUP_SUBSYS_COUNT after reaching the end + * @cgrp: the target cgroup to iterate css's of + * + * Should be called under cgroup_[tree_]mutex. + */ +#define for_each_e_css(css, ssid, cgrp) \ + for ((ssid) = 0; (ssid) < CGROUP_SUBSYS_COUNT; (ssid)++) \ + if (!((css) = cgroup_e_css(cgrp, cgroup_subsys[(ssid)]))) \ + ; \ + else + +/** * for_each_subsys - iterate all enabled cgroup subsystems * @ss: the iteration cursor * @ssid: the index of @ss, CGROUP_SUBSYS_COUNT after reaching the end @@ -452,20 +494,20 @@ static bool compare_css_sets(struct css_set *cset, { struct list_head *l1, *l2; - if (memcmp(template, cset->subsys, sizeof(cset->subsys))) { - /* Not all subsystems matched */ + /* + * On the default hierarchy, there can be csets which are + * associated with the same set of cgroups but different csses. + * Let's first ensure that csses match. + */ + if (memcmp(template, cset->subsys, sizeof(cset->subsys))) return false; - } /* * Compare cgroup pointers in order to distinguish between - * different cgroups in heirarchies with no subsystems. We - * could get by with just this check alone (and skip the - * memcmp above) but on most setups the memcmp check will - * avoid the need for this more expensive check on almost all - * candidates. + * different cgroups in hierarchies. As different cgroups may + * share the same effective css, this comparison is always + * necessary. */ - l1 = &cset->cgrp_links; l2 = &old_cset->cgrp_links; while (1) { @@ -530,13 +572,16 @@ static struct css_set *find_existing_css_set(struct css_set *old_cset, */ for_each_subsys(ss, i) { if (root->subsys_mask & (1UL << i)) { - /* Subsystem is in this hierarchy. So we want - * the subsystem state from the new - * cgroup */ - template[i] = cgroup_css(cgrp, ss); + /* + * @ss is in this hierarchy, so we want the + * effective css from @cgrp. + */ + template[i] = cgroup_e_css(cgrp, ss); } else { - /* Subsystem is not in this hierarchy, so we - * don't want to change the subsystem state */ + /* + * @ss is not in this hierarchy, so we don't want + * to change the css. + */ template[i] = old_cset->subsys[i]; } } @@ -1968,7 +2013,7 @@ static int cgroup_migrate(struct cgroup *cgrp, struct task_struct *leader, return 0; /* check that we can legitimately attach to the cgroup */ - for_each_css(css, i, cgrp) { + for_each_e_css(css, i, cgrp) { if (css->ss->can_attach) { ret = css->ss->can_attach(css, &tset); if (ret) { @@ -1998,7 +2043,7 @@ static int cgroup_migrate(struct cgroup *cgrp, struct task_struct *leader, */ tset.csets = &tset.dst_csets; - for_each_css(css, i, cgrp) + for_each_e_css(css, i, cgrp) if (css->ss->attach) css->ss->attach(css, &tset); @@ -2006,7 +2051,7 @@ static int cgroup_migrate(struct cgroup *cgrp, struct task_struct *leader, goto out_release_tset; out_cancel_attach: - for_each_css(css, i, cgrp) { + for_each_e_css(css, i, cgrp) { if (css == failed_css) break; if (css->ss->cancel_attach) -- 1.8.5.3 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH 03/12] cgroup: implement cgroup->e_csets[] 2014-03-28 2:40 ` Tejun Heo @ 2014-03-28 2:40 ` Tejun Heo -1 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-03-28 2:40 UTC (permalink / raw) To: lizefan-hv44wF8Li93QT0dZR+AlfA Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-kernel-u79uwXL29TY76Z2rM5mHXA On the default unified hierarchy, a cgroup may be associated with csses of its ancestors, which means that a css of a given cgroup may be associated with css_sets of descendant cgroups. This means that we can't walk all tasks associated with a css by iterating the css_sets associated with the cgroup as there are css_sets which are pointing to the css but linked on the descendants. This patch adds per-subsystem list heads cgroup->e_csets[]. Any css_set which is pointing to a css is linked to css->cgroup->e_csets[$SUBSYS_ID] through css_set->e_cset_node[$SUBSYS_ID]. The lists are protected by css_set_rwsem and will allow us to walk all css_sets associated with a given css so that we can find out all associated tasks. Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> --- include/linux/cgroup.h | 18 ++++++++++++++++++ kernel/cgroup.c | 30 ++++++++++++++++++++++++++++-- 2 files changed, 46 insertions(+), 2 deletions(-) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index e351a18..e902eee 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -188,6 +188,15 @@ struct cgroup { struct list_head cset_links; /* + * On the default hierarchy, a css_set for a cgroup with some + * susbsys disabled will point to css's which are associated with + * the closest ancestor which has the subsys enabled. The + * following lists all css_sets which point to this cgroup's css + * for the given subsystem. + */ + struct list_head e_csets[CGROUP_SUBSYS_COUNT]; + + /* * Linked list running through all cgroups that can * potentially be reaped by the release agent. Protected by * release_list_lock @@ -369,6 +378,15 @@ struct css_set { struct cgroup *mg_src_cgrp; struct css_set *mg_dst_cset; + /* + * On the default hierarhcy, ->subsys[ssid] may point to a css + * attached to an ancestor instead of the cgroup this css_set is + * associated with. The following node is anchored at + * ->subsys[ssid]->cgroup->e_csets[ssid] and provides a way to + * iterate through all css's attached to a given cgroup. + */ + struct list_head e_cset_node[CGROUP_SUBSYS_COUNT]; + /* For RCU-protected deletion */ struct rcu_head rcu_head; }; diff --git a/kernel/cgroup.c b/kernel/cgroup.c index ae1fe5a..2899a6f 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -425,6 +425,8 @@ static unsigned long css_set_hash(struct cgroup_subsys_state *css[]) static void put_css_set_locked(struct css_set *cset, bool taskexit) { struct cgrp_cset_link *link, *tmp_link; + struct cgroup_subsys *ss; + int ssid; lockdep_assert_held(&css_set_rwsem); @@ -432,6 +434,8 @@ static void put_css_set_locked(struct css_set *cset, bool taskexit) return; /* This css_set is dead. unlink it and release cgroup refcounts */ + for_each_subsys(ss, ssid) + list_del(&cset->e_cset_node[ssid]); hash_del(&cset->hlist); css_set_count--; @@ -673,7 +677,9 @@ static struct css_set *find_css_set(struct css_set *old_cset, struct css_set *cset; struct list_head tmp_links; struct cgrp_cset_link *link; + struct cgroup_subsys *ss; unsigned long key; + int ssid; lockdep_assert_held(&cgroup_mutex); @@ -724,10 +730,14 @@ static struct css_set *find_css_set(struct css_set *old_cset, css_set_count++; - /* Add this cgroup group to the hash table */ + /* Add @cset to the hash table */ key = css_set_hash(cset->subsys); hash_add(css_set_table, &cset->hlist, key); + for_each_subsys(ss, ssid) + list_add_tail(&cset->e_cset_node[ssid], + &cset->subsys[ssid]->cgroup->e_csets[ssid]); + up_write(&css_set_rwsem); return cset; @@ -1028,7 +1038,7 @@ static int rebind_subsystems(struct cgroup_root *dst_root, unsigned long ss_mask) { struct cgroup_subsys *ss; - int ssid, ret; + int ssid, i, ret; lockdep_assert_held(&cgroup_tree_mutex); lockdep_assert_held(&cgroup_mutex); @@ -1081,6 +1091,7 @@ static int rebind_subsystems(struct cgroup_root *dst_root, for_each_subsys(ss, ssid) { struct cgroup_root *src_root; struct cgroup_subsys_state *css; + struct css_set *cset; if (!(ss_mask & (1 << ssid))) continue; @@ -1095,6 +1106,12 @@ static int rebind_subsystems(struct cgroup_root *dst_root, ss->root = dst_root; css->cgroup = &dst_root->cgrp; + down_write(&css_set_rwsem); + hash_for_each(css_set_table, i, cset, hlist) + list_move_tail(&cset->e_cset_node[ss->id], + &dst_root->cgrp.e_csets[ss->id]); + up_write(&css_set_rwsem); + src_root->subsys_mask &= ~(1 << ssid); src_root->cgrp.child_subsys_mask &= ~(1 << ssid); @@ -1417,6 +1434,9 @@ out_unlock: static void init_cgroup_housekeeping(struct cgroup *cgrp) { + struct cgroup_subsys *ss; + int ssid; + atomic_set(&cgrp->refcnt, 1); INIT_LIST_HEAD(&cgrp->sibling); INIT_LIST_HEAD(&cgrp->children); @@ -1425,6 +1445,9 @@ static void init_cgroup_housekeeping(struct cgroup *cgrp) INIT_LIST_HEAD(&cgrp->pidlists); mutex_init(&cgrp->pidlist_mutex); cgrp->dummy_css.cgroup = cgrp; + + for_each_subsys(ss, ssid) + INIT_LIST_HEAD(&cgrp->e_csets[ssid]); } static void init_cgroup_root(struct cgroup_root *root, @@ -4224,6 +4247,9 @@ int __init cgroup_init(void) if (!ss->early_init) cgroup_init_subsys(ss); + list_add_tail(&init_css_set.e_cset_node[ssid], + &cgrp_dfl_root.cgrp.e_csets[ssid]); + /* * cftype registration needs kmalloc and can't be done * during early_init. Register base cftypes separately. -- 1.8.5.3 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH 03/12] cgroup: implement cgroup->e_csets[] @ 2014-03-28 2:40 ` Tejun Heo 0 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-03-28 2:40 UTC (permalink / raw) To: lizefan; +Cc: containers, cgroups, linux-kernel, Tejun Heo On the default unified hierarchy, a cgroup may be associated with csses of its ancestors, which means that a css of a given cgroup may be associated with css_sets of descendant cgroups. This means that we can't walk all tasks associated with a css by iterating the css_sets associated with the cgroup as there are css_sets which are pointing to the css but linked on the descendants. This patch adds per-subsystem list heads cgroup->e_csets[]. Any css_set which is pointing to a css is linked to css->cgroup->e_csets[$SUBSYS_ID] through css_set->e_cset_node[$SUBSYS_ID]. The lists are protected by css_set_rwsem and will allow us to walk all css_sets associated with a given css so that we can find out all associated tasks. Signed-off-by: Tejun Heo <tj@kernel.org> --- include/linux/cgroup.h | 18 ++++++++++++++++++ kernel/cgroup.c | 30 ++++++++++++++++++++++++++++-- 2 files changed, 46 insertions(+), 2 deletions(-) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index e351a18..e902eee 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -188,6 +188,15 @@ struct cgroup { struct list_head cset_links; /* + * On the default hierarchy, a css_set for a cgroup with some + * susbsys disabled will point to css's which are associated with + * the closest ancestor which has the subsys enabled. The + * following lists all css_sets which point to this cgroup's css + * for the given subsystem. + */ + struct list_head e_csets[CGROUP_SUBSYS_COUNT]; + + /* * Linked list running through all cgroups that can * potentially be reaped by the release agent. Protected by * release_list_lock @@ -369,6 +378,15 @@ struct css_set { struct cgroup *mg_src_cgrp; struct css_set *mg_dst_cset; + /* + * On the default hierarhcy, ->subsys[ssid] may point to a css + * attached to an ancestor instead of the cgroup this css_set is + * associated with. The following node is anchored at + * ->subsys[ssid]->cgroup->e_csets[ssid] and provides a way to + * iterate through all css's attached to a given cgroup. + */ + struct list_head e_cset_node[CGROUP_SUBSYS_COUNT]; + /* For RCU-protected deletion */ struct rcu_head rcu_head; }; diff --git a/kernel/cgroup.c b/kernel/cgroup.c index ae1fe5a..2899a6f 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -425,6 +425,8 @@ static unsigned long css_set_hash(struct cgroup_subsys_state *css[]) static void put_css_set_locked(struct css_set *cset, bool taskexit) { struct cgrp_cset_link *link, *tmp_link; + struct cgroup_subsys *ss; + int ssid; lockdep_assert_held(&css_set_rwsem); @@ -432,6 +434,8 @@ static void put_css_set_locked(struct css_set *cset, bool taskexit) return; /* This css_set is dead. unlink it and release cgroup refcounts */ + for_each_subsys(ss, ssid) + list_del(&cset->e_cset_node[ssid]); hash_del(&cset->hlist); css_set_count--; @@ -673,7 +677,9 @@ static struct css_set *find_css_set(struct css_set *old_cset, struct css_set *cset; struct list_head tmp_links; struct cgrp_cset_link *link; + struct cgroup_subsys *ss; unsigned long key; + int ssid; lockdep_assert_held(&cgroup_mutex); @@ -724,10 +730,14 @@ static struct css_set *find_css_set(struct css_set *old_cset, css_set_count++; - /* Add this cgroup group to the hash table */ + /* Add @cset to the hash table */ key = css_set_hash(cset->subsys); hash_add(css_set_table, &cset->hlist, key); + for_each_subsys(ss, ssid) + list_add_tail(&cset->e_cset_node[ssid], + &cset->subsys[ssid]->cgroup->e_csets[ssid]); + up_write(&css_set_rwsem); return cset; @@ -1028,7 +1038,7 @@ static int rebind_subsystems(struct cgroup_root *dst_root, unsigned long ss_mask) { struct cgroup_subsys *ss; - int ssid, ret; + int ssid, i, ret; lockdep_assert_held(&cgroup_tree_mutex); lockdep_assert_held(&cgroup_mutex); @@ -1081,6 +1091,7 @@ static int rebind_subsystems(struct cgroup_root *dst_root, for_each_subsys(ss, ssid) { struct cgroup_root *src_root; struct cgroup_subsys_state *css; + struct css_set *cset; if (!(ss_mask & (1 << ssid))) continue; @@ -1095,6 +1106,12 @@ static int rebind_subsystems(struct cgroup_root *dst_root, ss->root = dst_root; css->cgroup = &dst_root->cgrp; + down_write(&css_set_rwsem); + hash_for_each(css_set_table, i, cset, hlist) + list_move_tail(&cset->e_cset_node[ss->id], + &dst_root->cgrp.e_csets[ss->id]); + up_write(&css_set_rwsem); + src_root->subsys_mask &= ~(1 << ssid); src_root->cgrp.child_subsys_mask &= ~(1 << ssid); @@ -1417,6 +1434,9 @@ out_unlock: static void init_cgroup_housekeeping(struct cgroup *cgrp) { + struct cgroup_subsys *ss; + int ssid; + atomic_set(&cgrp->refcnt, 1); INIT_LIST_HEAD(&cgrp->sibling); INIT_LIST_HEAD(&cgrp->children); @@ -1425,6 +1445,9 @@ static void init_cgroup_housekeeping(struct cgroup *cgrp) INIT_LIST_HEAD(&cgrp->pidlists); mutex_init(&cgrp->pidlist_mutex); cgrp->dummy_css.cgroup = cgrp; + + for_each_subsys(ss, ssid) + INIT_LIST_HEAD(&cgrp->e_csets[ssid]); } static void init_cgroup_root(struct cgroup_root *root, @@ -4224,6 +4247,9 @@ int __init cgroup_init(void) if (!ss->early_init) cgroup_init_subsys(ss); + list_add_tail(&init_css_set.e_cset_node[ssid], + &cgrp_dfl_root.cgrp.e_csets[ssid]); + /* * cftype registration needs kmalloc and can't be done * during early_init. Register base cftypes separately. -- 1.8.5.3 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH 04/12] cgroup: make css_next_child() skip missing csses [not found] ` <1395974461-12735-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> ` (2 preceding siblings ...) 2014-03-28 2:40 ` Tejun Heo @ 2014-03-28 2:40 ` Tejun Heo 2014-03-28 2:40 ` Tejun Heo ` (10 subsequent siblings) 14 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-03-28 2:40 UTC (permalink / raw) To: lizefan-hv44wF8Li93QT0dZR+AlfA Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-kernel-u79uwXL29TY76Z2rM5mHXA css_next_child() walks the children of the specified css. It does this by finding the next cgroup and then returning the requested css. On the default unified hierarchy, a cgroup may not have a css associated with it even if the hierarchy has the subsystem enabled. This patch updates css_next_child() so that it skips children without the requested css associated. Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> --- kernel/cgroup.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 2899a6f..9d2d6ef 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -2690,10 +2690,19 @@ css_next_child(struct cgroup_subsys_state *pos_css, break; } - if (&next->sibling == &cgrp->children) - return NULL; + /* + * @next, if not pointing to the head, can be dereferenced and is + * the next sibling; however, it might have @ss disabled. If so, + * fast-forward to the next enabled one. + */ + while (&next->sibling != &cgrp->children) { + struct cgroup_subsys_state *next_css = cgroup_css(next, parent_css->ss); - return cgroup_css(next, parent_css->ss); + if (next_css) + return next_css; + next = list_entry_rcu(next->sibling.next, struct cgroup, sibling); + } + return NULL; } /** -- 1.8.5.3 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH 05/12] cgroup: reorganize css_task_iter 2014-03-28 2:40 ` Tejun Heo @ 2014-03-28 2:40 ` Tejun Heo -1 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-03-28 2:40 UTC (permalink / raw) To: lizefan-hv44wF8Li93QT0dZR+AlfA Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-kernel-u79uwXL29TY76Z2rM5mHXA This patch reorganizes css_task_iter so that adding effective css support is easier. * s/->cset_link/->cset_pos/ and s/->task/->task_pos/ for consistency * ->origin_css is used to determine whether the iteration reached the last css_set. Replace it with explicit ->cset_head so that css_advance_task_iter() doesn't have to know the termination condition directly. * css_task_iter_next() currently assumes that it's walking list of cgrp_cset_link and reaches into the current cset through the current link to determine the termination conditions for task walking. As this won't always be true for effective css walking, add ->tasks_head and ->mg_tasks_head and use them to control task walking so that css_task_iter_next() doesn't have to know how css_sets are being walked. This patch doesn't make any behavior changes. The iteration logic stays unchanged after the patch. Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> --- include/linux/cgroup.h | 9 ++++++--- kernel/cgroup.c | 33 +++++++++++++++++---------------- 2 files changed, 23 insertions(+), 19 deletions(-) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index e902eee..6bbe143 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -842,9 +842,12 @@ css_next_descendant_post(struct cgroup_subsys_state *pos, /* A css_task_iter should be treated as an opaque object */ struct css_task_iter { - struct cgroup_subsys_state *origin_css; - struct list_head *cset_link; - struct list_head *task; + struct list_head *cset_pos; + struct list_head *cset_head; + + struct list_head *task_pos; + struct list_head *tasks_head; + struct list_head *mg_tasks_head; }; void css_task_iter_start(struct cgroup_subsys_state *css, diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 9d2d6ef..318b6a2 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -2839,27 +2839,30 @@ css_next_descendant_post(struct cgroup_subsys_state *pos, */ static void css_advance_task_iter(struct css_task_iter *it) { - struct list_head *l = it->cset_link; + struct list_head *l = it->cset_pos; struct cgrp_cset_link *link; struct css_set *cset; /* Advance to the next non-empty css_set */ do { l = l->next; - if (l == &it->origin_css->cgroup->cset_links) { - it->cset_link = NULL; + if (l == it->cset_head) { + it->cset_pos = NULL; return; } link = list_entry(l, struct cgrp_cset_link, cset_link); cset = link->cset; } while (list_empty(&cset->tasks) && list_empty(&cset->mg_tasks)); - it->cset_link = l; + it->cset_pos = l; if (!list_empty(&cset->tasks)) - it->task = cset->tasks.next; + it->task_pos = cset->tasks.next; else - it->task = cset->mg_tasks.next; + it->task_pos = cset->mg_tasks.next; + + it->tasks_head = &cset->tasks; + it->mg_tasks_head = &cset->mg_tasks; } /** @@ -2885,8 +2888,8 @@ void css_task_iter_start(struct cgroup_subsys_state *css, down_read(&css_set_rwsem); - it->origin_css = css; - it->cset_link = &css->cgroup->cset_links; + it->cset_pos = &css->cgroup->cset_links; + it->cset_head = it->cset_pos; css_advance_task_iter(it); } @@ -2902,12 +2905,10 @@ void css_task_iter_start(struct cgroup_subsys_state *css, struct task_struct *css_task_iter_next(struct css_task_iter *it) { struct task_struct *res; - struct list_head *l = it->task; - struct cgrp_cset_link *link = list_entry(it->cset_link, - struct cgrp_cset_link, cset_link); + struct list_head *l = it->task_pos; /* If the iterator cg is NULL, we have no tasks */ - if (!it->cset_link) + if (!it->cset_pos) return NULL; res = list_entry(l, struct task_struct, cg_list); @@ -2918,13 +2919,13 @@ struct task_struct *css_task_iter_next(struct css_task_iter *it) */ l = l->next; - if (l == &link->cset->tasks) - l = link->cset->mg_tasks.next; + if (l == it->tasks_head) + l = it->mg_tasks_head->next; - if (l == &link->cset->mg_tasks) + if (l == it->mg_tasks_head) css_advance_task_iter(it); else - it->task = l; + it->task_pos = l; return res; } -- 1.8.5.3 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH 05/12] cgroup: reorganize css_task_iter @ 2014-03-28 2:40 ` Tejun Heo 0 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-03-28 2:40 UTC (permalink / raw) To: lizefan; +Cc: containers, cgroups, linux-kernel, Tejun Heo This patch reorganizes css_task_iter so that adding effective css support is easier. * s/->cset_link/->cset_pos/ and s/->task/->task_pos/ for consistency * ->origin_css is used to determine whether the iteration reached the last css_set. Replace it with explicit ->cset_head so that css_advance_task_iter() doesn't have to know the termination condition directly. * css_task_iter_next() currently assumes that it's walking list of cgrp_cset_link and reaches into the current cset through the current link to determine the termination conditions for task walking. As this won't always be true for effective css walking, add ->tasks_head and ->mg_tasks_head and use them to control task walking so that css_task_iter_next() doesn't have to know how css_sets are being walked. This patch doesn't make any behavior changes. The iteration logic stays unchanged after the patch. Signed-off-by: Tejun Heo <tj@kernel.org> --- include/linux/cgroup.h | 9 ++++++--- kernel/cgroup.c | 33 +++++++++++++++++---------------- 2 files changed, 23 insertions(+), 19 deletions(-) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index e902eee..6bbe143 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -842,9 +842,12 @@ css_next_descendant_post(struct cgroup_subsys_state *pos, /* A css_task_iter should be treated as an opaque object */ struct css_task_iter { - struct cgroup_subsys_state *origin_css; - struct list_head *cset_link; - struct list_head *task; + struct list_head *cset_pos; + struct list_head *cset_head; + + struct list_head *task_pos; + struct list_head *tasks_head; + struct list_head *mg_tasks_head; }; void css_task_iter_start(struct cgroup_subsys_state *css, diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 9d2d6ef..318b6a2 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -2839,27 +2839,30 @@ css_next_descendant_post(struct cgroup_subsys_state *pos, */ static void css_advance_task_iter(struct css_task_iter *it) { - struct list_head *l = it->cset_link; + struct list_head *l = it->cset_pos; struct cgrp_cset_link *link; struct css_set *cset; /* Advance to the next non-empty css_set */ do { l = l->next; - if (l == &it->origin_css->cgroup->cset_links) { - it->cset_link = NULL; + if (l == it->cset_head) { + it->cset_pos = NULL; return; } link = list_entry(l, struct cgrp_cset_link, cset_link); cset = link->cset; } while (list_empty(&cset->tasks) && list_empty(&cset->mg_tasks)); - it->cset_link = l; + it->cset_pos = l; if (!list_empty(&cset->tasks)) - it->task = cset->tasks.next; + it->task_pos = cset->tasks.next; else - it->task = cset->mg_tasks.next; + it->task_pos = cset->mg_tasks.next; + + it->tasks_head = &cset->tasks; + it->mg_tasks_head = &cset->mg_tasks; } /** @@ -2885,8 +2888,8 @@ void css_task_iter_start(struct cgroup_subsys_state *css, down_read(&css_set_rwsem); - it->origin_css = css; - it->cset_link = &css->cgroup->cset_links; + it->cset_pos = &css->cgroup->cset_links; + it->cset_head = it->cset_pos; css_advance_task_iter(it); } @@ -2902,12 +2905,10 @@ void css_task_iter_start(struct cgroup_subsys_state *css, struct task_struct *css_task_iter_next(struct css_task_iter *it) { struct task_struct *res; - struct list_head *l = it->task; - struct cgrp_cset_link *link = list_entry(it->cset_link, - struct cgrp_cset_link, cset_link); + struct list_head *l = it->task_pos; /* If the iterator cg is NULL, we have no tasks */ - if (!it->cset_link) + if (!it->cset_pos) return NULL; res = list_entry(l, struct task_struct, cg_list); @@ -2918,13 +2919,13 @@ struct task_struct *css_task_iter_next(struct css_task_iter *it) */ l = l->next; - if (l == &link->cset->tasks) - l = link->cset->mg_tasks.next; + if (l == it->tasks_head) + l = it->mg_tasks_head->next; - if (l == &link->cset->mg_tasks) + if (l == it->mg_tasks_head) css_advance_task_iter(it); else - it->task = l; + it->task_pos = l; return res; } -- 1.8.5.3 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH 06/12] cgroup: teach css_task_iter about effective csses [not found] ` <1395974461-12735-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> ` (4 preceding siblings ...) 2014-03-28 2:40 ` Tejun Heo @ 2014-03-28 2:40 ` Tejun Heo 2014-03-28 2:40 ` Tejun Heo ` (8 subsequent siblings) 14 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-03-28 2:40 UTC (permalink / raw) To: lizefan-hv44wF8Li93QT0dZR+AlfA Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-kernel-u79uwXL29TY76Z2rM5mHXA Currently, css_task_iter iterates tasks associated with a css by visiting each css_set associated with the owning cgroup and walking tasks of each of them. This works fine for !unified hierarchies as each cgroup has its own css for each associated subsystem on the hierarchy; however, on the planned unified hierarchy, a cgroup may not have csses associated and its tasks would be considered associated with the matching css of the nearest ancestor which has the subsystem enabled. This means that on the default unified hierarchy, just walking all tasks associated with a cgroup isn't enough to walk all tasks which are associated with the specified css. If any of its children doesn't have the matching css enabled, task iteration should also include all tasks from the subtree. We already added cgroup->e_csets[] to list all css_sets effectively associated with a given css and walk css_sets on that list instead to achieve such iteration. This patch updates css_task_iter iteration such that it walks css_sets on cgroup->e_csets[] instead of cgroup->cset_links if iteration is requested on an non-dummy css. Thanks to the previous iteration update, this change can be achieved with the addition of css_task_iter->ss and minimal updates to css_advance_task_iter() and css_task_iter_start(). Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> --- include/linux/cgroup.h | 2 ++ kernel/cgroup.c | 18 +++++++++++++++--- 2 files changed, 17 insertions(+), 3 deletions(-) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 6bbe143..cbfd8be 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -842,6 +842,8 @@ css_next_descendant_post(struct cgroup_subsys_state *pos, /* A css_task_iter should be treated as an opaque object */ struct css_task_iter { + struct cgroup_subsys *ss; + struct list_head *cset_pos; struct list_head *cset_head; diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 318b6a2..e3aa2aa 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -2850,8 +2850,14 @@ static void css_advance_task_iter(struct css_task_iter *it) it->cset_pos = NULL; return; } - link = list_entry(l, struct cgrp_cset_link, cset_link); - cset = link->cset; + + if (it->ss) { + cset = container_of(l, struct css_set, + e_cset_node[it->ss->id]); + } else { + link = list_entry(l, struct cgrp_cset_link, cset_link); + cset = link->cset; + } } while (list_empty(&cset->tasks) && list_empty(&cset->mg_tasks)); it->cset_pos = l; @@ -2888,7 +2894,13 @@ void css_task_iter_start(struct cgroup_subsys_state *css, down_read(&css_set_rwsem); - it->cset_pos = &css->cgroup->cset_links; + it->ss = css->ss; + + if (it->ss) + it->cset_pos = &css->cgroup->e_csets[css->ss->id]; + else + it->cset_pos = &css->cgroup->cset_links; + it->cset_head = it->cset_pos; css_advance_task_iter(it); -- 1.8.5.3 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH 07/12] cgroup: cgroup->subsys[] should be cleared after the css is offlined 2014-03-28 2:40 ` Tejun Heo @ 2014-03-28 2:40 ` Tejun Heo -1 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-03-28 2:40 UTC (permalink / raw) To: lizefan-hv44wF8Li93QT0dZR+AlfA Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-kernel-u79uwXL29TY76Z2rM5mHXA After a css finishes offlining, offline_css() mistakenly performs RCU_INIT_POINTER(css->cgroup->subsys[ss->id], css) which just sets the cgroup->subsys[] pointer to the current value. The intention was to clear it after offline is complete, not reassign the same value. Update it to assign NULL instead of the current value. This makes cgroup_css() to return NULL once offline is complete. All the existing users of the function either can handle NULL return already or guarantee that the css doesn't get offlined. While this is a bugfix, as css lifetime is currently tied to the cgroup it belongs to, this bug doesn't cause any actual problems. Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> Acked-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> --- kernel/cgroup.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index e3aa2aa..c59946c 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -3692,7 +3692,7 @@ static void offline_css(struct cgroup_subsys_state *css) css->flags &= ~CSS_ONLINE; css->cgroup->nr_css--; - RCU_INIT_POINTER(css->cgroup->subsys[ss->id], css); + RCU_INIT_POINTER(css->cgroup->subsys[ss->id], NULL); } /** -- 1.8.5.3 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH 07/12] cgroup: cgroup->subsys[] should be cleared after the css is offlined @ 2014-03-28 2:40 ` Tejun Heo 0 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-03-28 2:40 UTC (permalink / raw) To: lizefan; +Cc: containers, cgroups, linux-kernel, Tejun Heo After a css finishes offlining, offline_css() mistakenly performs RCU_INIT_POINTER(css->cgroup->subsys[ss->id], css) which just sets the cgroup->subsys[] pointer to the current value. The intention was to clear it after offline is complete, not reassign the same value. Update it to assign NULL instead of the current value. This makes cgroup_css() to return NULL once offline is complete. All the existing users of the function either can handle NULL return already or guarantee that the css doesn't get offlined. While this is a bugfix, as css lifetime is currently tied to the cgroup it belongs to, this bug doesn't cause any actual problems. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> --- kernel/cgroup.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index e3aa2aa..c59946c 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -3692,7 +3692,7 @@ static void offline_css(struct cgroup_subsys_state *css) css->flags &= ~CSS_ONLINE; css->cgroup->nr_css--; - RCU_INIT_POINTER(css->cgroup->subsys[ss->id], css); + RCU_INIT_POINTER(css->cgroup->subsys[ss->id], NULL); } /** -- 1.8.5.3 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH 08/12] cgroup: allow cgroup creation and suppress automatic css creation in the unified hierarchy 2014-03-28 2:40 ` Tejun Heo @ 2014-03-28 2:40 ` Tejun Heo -1 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-03-28 2:40 UTC (permalink / raw) To: lizefan-hv44wF8Li93QT0dZR+AlfA Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-kernel-u79uwXL29TY76Z2rM5mHXA Now that effective css handling has been added and iterators updated accordingly, it's safe to allow cgroup creation in the default hierarchy. Unblock cgroup creation in the default hierarchy. As the default hierarchy will implement explicit enabling and disabling of controllers on each cgroup, suppress automatic css enabling on cgroup creation. Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> --- kernel/cgroup.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index c59946c..30f48b0 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -1115,8 +1115,10 @@ static int rebind_subsystems(struct cgroup_root *dst_root, src_root->subsys_mask &= ~(1 << ssid); src_root->cgrp.child_subsys_mask &= ~(1 << ssid); + /* default hierarchy doesn't enable controllers by default */ dst_root->subsys_mask |= 1 << ssid; - dst_root->cgrp.child_subsys_mask |= 1 << ssid; + if (dst_root != &cgrp_dfl_root) + dst_root->cgrp.child_subsys_mask |= 1 << ssid; if (ss->bind) ss->bind(css); @@ -3765,13 +3767,6 @@ static long cgroup_create(struct cgroup *parent, const char *name, struct cgroup_subsys *ss; struct kernfs_node *kn; - /* - * XXX: The default hierarchy isn't fully implemented yet. Block - * !root cgroup creation on it for now. - */ - if (root == &cgrp_dfl_root) - return -EINVAL; - /* allocate the cgroup and its ID, 0 is reserved for the root */ cgrp = kzalloc(sizeof(*cgrp), GFP_KERNEL); if (!cgrp) @@ -3853,7 +3848,12 @@ static long cgroup_create(struct cgroup *parent, const char *name, } } - cgrp->child_subsys_mask = parent->child_subsys_mask; + /* + * On the default hierarchy, a child doesn't automatically inherit + * child_subsys_mask from the parent. Each is configured manually. + */ + if (!cgroup_on_dfl(cgrp)) + cgrp->child_subsys_mask = parent->child_subsys_mask; kernfs_activate(kn); -- 1.8.5.3 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH 08/12] cgroup: allow cgroup creation and suppress automatic css creation in the unified hierarchy @ 2014-03-28 2:40 ` Tejun Heo 0 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-03-28 2:40 UTC (permalink / raw) To: lizefan; +Cc: containers, cgroups, linux-kernel, Tejun Heo Now that effective css handling has been added and iterators updated accordingly, it's safe to allow cgroup creation in the default hierarchy. Unblock cgroup creation in the default hierarchy. As the default hierarchy will implement explicit enabling and disabling of controllers on each cgroup, suppress automatic css enabling on cgroup creation. Signed-off-by: Tejun Heo <tj@kernel.org> --- kernel/cgroup.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index c59946c..30f48b0 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -1115,8 +1115,10 @@ static int rebind_subsystems(struct cgroup_root *dst_root, src_root->subsys_mask &= ~(1 << ssid); src_root->cgrp.child_subsys_mask &= ~(1 << ssid); + /* default hierarchy doesn't enable controllers by default */ dst_root->subsys_mask |= 1 << ssid; - dst_root->cgrp.child_subsys_mask |= 1 << ssid; + if (dst_root != &cgrp_dfl_root) + dst_root->cgrp.child_subsys_mask |= 1 << ssid; if (ss->bind) ss->bind(css); @@ -3765,13 +3767,6 @@ static long cgroup_create(struct cgroup *parent, const char *name, struct cgroup_subsys *ss; struct kernfs_node *kn; - /* - * XXX: The default hierarchy isn't fully implemented yet. Block - * !root cgroup creation on it for now. - */ - if (root == &cgrp_dfl_root) - return -EINVAL; - /* allocate the cgroup and its ID, 0 is reserved for the root */ cgrp = kzalloc(sizeof(*cgrp), GFP_KERNEL); if (!cgrp) @@ -3853,7 +3848,12 @@ static long cgroup_create(struct cgroup *parent, const char *name, } } - cgrp->child_subsys_mask = parent->child_subsys_mask; + /* + * On the default hierarchy, a child doesn't automatically inherit + * child_subsys_mask from the parent. Each is configured manually. + */ + if (!cgroup_on_dfl(cgrp)) + cgrp->child_subsys_mask = parent->child_subsys_mask; kernfs_activate(kn); -- 1.8.5.3 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH 09/12] cgroup: add css_set->dfl_cgrp 2014-03-28 2:40 ` Tejun Heo @ 2014-03-28 2:40 ` Tejun Heo -1 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-03-28 2:40 UTC (permalink / raw) To: lizefan-hv44wF8Li93QT0dZR+AlfA Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-kernel-u79uwXL29TY76Z2rM5mHXA To implement the unified hierarchy behavior, we'll need to be able to determine the associated cgroup on the default hierarchy from css_set. Let's add css_set->dfl_cgrp so that it can be accessed conveniently and efficiently. Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> --- include/linux/cgroup.h | 3 +++ kernel/cgroup.c | 4 ++++ 2 files changed, 7 insertions(+) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index cbfd8be..b6f5e40 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -354,6 +354,9 @@ struct css_set { */ struct list_head cgrp_links; + /* the default cgroup associated with this css_set */ + struct cgroup *dfl_cgrp; + /* * Set of subsystem states, one for each subsystem. This array is * immutable after creation apart from the init_css_set during diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 30f48b0..e714ccf 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -651,6 +651,10 @@ static void link_css_set(struct list_head *tmp_links, struct css_set *cset, struct cgrp_cset_link *link; BUG_ON(list_empty(tmp_links)); + + if (cgroup_on_dfl(cgrp)) + cset->dfl_cgrp = cgrp; + link = list_first_entry(tmp_links, struct cgrp_cset_link, cset_link); link->cset = cset; link->cgrp = cgrp; -- 1.8.5.3 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH 09/12] cgroup: add css_set->dfl_cgrp @ 2014-03-28 2:40 ` Tejun Heo 0 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-03-28 2:40 UTC (permalink / raw) To: lizefan; +Cc: containers, cgroups, linux-kernel, Tejun Heo To implement the unified hierarchy behavior, we'll need to be able to determine the associated cgroup on the default hierarchy from css_set. Let's add css_set->dfl_cgrp so that it can be accessed conveniently and efficiently. Signed-off-by: Tejun Heo <tj@kernel.org> --- include/linux/cgroup.h | 3 +++ kernel/cgroup.c | 4 ++++ 2 files changed, 7 insertions(+) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index cbfd8be..b6f5e40 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -354,6 +354,9 @@ struct css_set { */ struct list_head cgrp_links; + /* the default cgroup associated with this css_set */ + struct cgroup *dfl_cgrp; + /* * Set of subsystem states, one for each subsystem. This array is * immutable after creation apart from the init_css_set during diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 30f48b0..e714ccf 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -651,6 +651,10 @@ static void link_css_set(struct list_head *tmp_links, struct css_set *cset, struct cgrp_cset_link *link; BUG_ON(list_empty(tmp_links)); + + if (cgroup_on_dfl(cgrp)) + cset->dfl_cgrp = cgrp; + link = list_first_entry(tmp_links, struct cgrp_cset_link, cset_link); link->cset = cset; link->cgrp = cgrp; -- 1.8.5.3 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH 10/12] cgroup: update subsystem rebind restrictions [not found] ` <1395974461-12735-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> ` (8 preceding siblings ...) 2014-03-28 2:40 ` Tejun Heo @ 2014-03-28 2:40 ` Tejun Heo 2014-03-28 2:41 ` [PATCH 11/12] cgroup: prepare migration path for unified hierarchy Tejun Heo ` (4 subsequent siblings) 14 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-03-28 2:40 UTC (permalink / raw) To: lizefan-hv44wF8Li93QT0dZR+AlfA Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-kernel-u79uwXL29TY76Z2rM5mHXA Because the default root couldn't have any non-root csses attached to it, rebinding away from it was always allowed; however, the default hierarchy will soon host the unified hierarchy and have non-root csses so the rebind restrictions need to be updated accordingly. Instead of special casing rebinding from the default hierarchy and then checking whether the source hierarchy has children cgroups, which implies non-root csses for !dfl hierarchies, simply check whether the source hierarchy has non-root csses for the subsystem using css_next_child(). Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> --- kernel/cgroup.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index e714ccf..0c67141 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -1051,16 +1051,12 @@ static int rebind_subsystems(struct cgroup_root *dst_root, if (!(ss_mask & (1 << ssid))) continue; - /* if @ss is on the dummy_root, we can always move it */ - if (ss->root == &cgrp_dfl_root) - continue; - - /* if @ss has non-root cgroups attached to it, can't move */ - if (!list_empty(&ss->root->cgrp.children)) + /* if @ss has non-root csses attached to it, can't move */ + if (css_next_child(NULL, cgroup_css(&ss->root->cgrp, ss))) return -EBUSY; /* can't move between two non-dummy roots either */ - if (dst_root != &cgrp_dfl_root) + if (ss->root != &cgrp_dfl_root && dst_root != &cgrp_dfl_root) return -EBUSY; } -- 1.8.5.3 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH 11/12] cgroup: prepare migration path for unified hierarchy [not found] ` <1395974461-12735-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> ` (9 preceding siblings ...) 2014-03-28 2:40 ` [PATCH 10/12] cgroup: update subsystem rebind restrictions Tejun Heo @ 2014-03-28 2:41 ` Tejun Heo 2014-03-28 2:41 ` [PATCH 12/12] cgroup: implement dynamic subtree controller enable/disable on the default hierarchy Tejun Heo ` (3 subsequent siblings) 14 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-03-28 2:41 UTC (permalink / raw) To: lizefan-hv44wF8Li93QT0dZR+AlfA Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-kernel-u79uwXL29TY76Z2rM5mHXA Unified hierarchy implementation would require re-migrating tasks onto the same cgroup on the default hierarchy to reflect updated effective csses. Update cgroup_migrate_prepare_dst() so that it accepts NULL as the destination cgrp. When NULL is specified, the destination is considered to be the cgroup on the default hierarchy associated with each css_set. After this change, the identity check in cgroup_migrate_add_src() isn't sufficient for noop detection as the associated csses may change without any cgroup association changing. The only way to tell whether a migration is noop or not is testing whether the source and destination csets are identical. The noop check in cgroup_migrate_add_src() is removed and cset identity test is added to cgroup_migreate_prepare_dst(). If it's detected that source and destination csets are identical, the cset is removed removed from @preloaded_csets and all the migration nodes are cleared which makes cgroup_migrate() ignore the cset. Also, make the function append the destination css_sets to @preloaded_list so that destination css_sets always come after source css_sets. Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> --- kernel/cgroup.c | 34 +++++++++++++++++++++++----------- 1 file changed, 23 insertions(+), 11 deletions(-) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 0c67141..5faa115 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -1901,10 +1901,6 @@ static void cgroup_migrate_add_src(struct css_set *src_cset, src_cgrp = cset_cgroup_from_root(src_cset, dst_cgrp->root); - /* nothing to do if this cset already belongs to the cgroup */ - if (src_cgrp == dst_cgrp) - return; - if (!list_empty(&src_cset->mg_preload_node)) return; @@ -1919,13 +1915,14 @@ static void cgroup_migrate_add_src(struct css_set *src_cset, /** * cgroup_migrate_prepare_dst - prepare destination css_sets for migration - * @dst_cgrp: the destination cgroup + * @dst_cgrp: the destination cgroup (may be %NULL) * @preloaded_csets: list of preloaded source css_sets * * Tasks are about to be moved to @dst_cgrp and all the source css_sets * have been preloaded to @preloaded_csets. This function looks up and - * pins all destination css_sets, links each to its source, and put them on - * @preloaded_csets. + * pins all destination css_sets, links each to its source, and append them + * to @preloaded_csets. If @dst_cgrp is %NULL, the destination of each + * source css_set is assumed to be its cgroup on the default hierarchy. * * This function must be called after cgroup_migrate_add_src() has been * called on each migration source css_set. After migration is performed @@ -1936,19 +1933,34 @@ static int cgroup_migrate_prepare_dst(struct cgroup *dst_cgrp, struct list_head *preloaded_csets) { LIST_HEAD(csets); - struct css_set *src_cset; + struct css_set *src_cset, *tmp_cset; lockdep_assert_held(&cgroup_mutex); /* look up the dst cset for each src cset and link it to src */ - list_for_each_entry(src_cset, preloaded_csets, mg_preload_node) { + list_for_each_entry_safe(src_cset, tmp_cset, preloaded_csets, mg_preload_node) { struct css_set *dst_cset; - dst_cset = find_css_set(src_cset, dst_cgrp); + dst_cset = find_css_set(src_cset, + dst_cgrp ?: src_cset->dfl_cgrp); if (!dst_cset) goto err; WARN_ON_ONCE(src_cset->mg_dst_cset || dst_cset->mg_dst_cset); + + /* + * If src cset equals dst, it's noop. Drop the src. + * cgroup_migrate() will skip the cset too. Note that we + * can't handle src == dst as some nodes are used by both. + */ + if (src_cset == dst_cset) { + src_cset->mg_src_cgrp = NULL; + list_del_init(&src_cset->mg_preload_node); + put_css_set(src_cset, false); + put_css_set(dst_cset, false); + continue; + } + src_cset->mg_dst_cset = dst_cset; if (list_empty(&dst_cset->mg_preload_node)) @@ -1957,7 +1969,7 @@ static int cgroup_migrate_prepare_dst(struct cgroup *dst_cgrp, put_css_set(dst_cset, false); } - list_splice(&csets, preloaded_csets); + list_splice_tail(&csets, preloaded_csets); return 0; err: cgroup_migrate_finish(&csets); -- 1.8.5.3 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH 12/12] cgroup: implement dynamic subtree controller enable/disable on the default hierarchy [not found] ` <1395974461-12735-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> ` (10 preceding siblings ...) 2014-03-28 2:41 ` [PATCH 11/12] cgroup: prepare migration path for unified hierarchy Tejun Heo @ 2014-03-28 2:41 ` Tejun Heo 2014-04-14 15:45 ` Vivek Goyal ` (2 subsequent siblings) 14 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-03-28 2:41 UTC (permalink / raw) To: lizefan-hv44wF8Li93QT0dZR+AlfA Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-kernel-u79uwXL29TY76Z2rM5mHXA cgroup is switching away from multiple hierarchies and will use one unified default hierarchy where controllers can be dynamically enabled and disabled per subtree. The default hierarchy will serve as the unified hierarchy to which all controllers are attached and a css on the default hierarchy would need to also serve the tasks of descendant cgroups which don't have the controller enabled - ie. the tree may be collapsed from leaf towards root when viewed from specific controllers. This has been implemented through effective css in the previous patches. This patch finally implements dynamic subtree controller enable/disable on the default hierarchy via a new knob - "cgroup.subtree_control" which controls which controllers are enabled on the child cgroups. Let's assume a hierarchy like the following. root - A - B - C \ D root's "cgroup.subtree_control" determines which controllers are enabled on A. A's on B. B's on C and D. This coincides with the fact that controllers on the immediate sub-level are used to distribute the resources of the parent. In fact, it's natural to assume that resource control knobs of a child belong to its parent. Enabling a controller in "cgroup.subtree_control" declares that distribution of the respective resources of the cgroup will be controlled. Note that this means that controller enable states are shared among siblings. The default hierarchy has an extra restriction - only cgroups which don't contain any task may have controllers enabled in "cgroup.subtree_control". Combined with the other properties of the default hierarchy, this guarantees that, from the view point of controllers, tasks are only on the leaf cgroups. In other words, only leaf csses may contain tasks. This rules out situations where child cgroups compete against internal tasks of the parent, which is a competition between two different types of entities without any clear way to determine resource distribution between the two. Different controllers handle it differently and all the implemented behaviors are ambiguous, ad-hoc, cumbersome and/or just wrong. Having this structural constraints imposed from cgroup core removes the burden from controller implementations and enables showing one consistent behavior across all controllers. When a controller is enabled or disabled, css associations for the controller in the subtrees of each child should be updated. After enabling, the whole subtree of a child should point to the new css of the child. After disabling, the whole subtree of a child should point to the cgroup's css. This is implemented by first updating cgroup states such that cgroup_e_css() result points to the appropriate css and then invoking cgroup_update_dfl_csses() which migrates all tasks in the affected subtrees to the self cgroup on the default hierarchy. * When read, "cgroup.subtree_control" lists all the currently enabled controllers on the children of the cgroup. * White-space separated list of controller names prefixed with either '+' or '-' can be written to "cgroup.subtree_control". The ones prefixed with '+' are enabled on the controller and '-' disabled. * A controller can be enabled iff the parent's "cgroup.subtree_control" enables it and disabled iff no child's "cgroup.subtree_control" has it enabled. * If a cgroup has tasks, no controller can be enabled via "cgroup.subtree_control". Likewise, if "cgroup.subtree_control" has some controllers enabled, tasks can't be migrated into the cgroup. * All controllers which aren't bound on other hierarchies are automatically associated with the root cgroup of the default hierarchy. All the controllers which are bound to the default hierarchy are listed in the read-only file "cgroup.controllers" in the root directory. This is still experimental and there are some holes, one of which is that ->can_attach() failure during cgroup_update_dfl_csses() may leave the cgroups in an undefined state. The issues will be addressed by future patches. Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> --- include/linux/cgroup.h | 5 + kernel/cgroup.c | 346 +++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 351 insertions(+) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index b6f5e40..dee6f3c 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -21,6 +21,7 @@ #include <linux/percpu-refcount.h> #include <linux/seq_file.h> #include <linux/kernfs.h> +#include <linux/wait.h> #ifdef CONFIG_CGROUPS @@ -164,6 +165,7 @@ struct cgroup { struct cgroup *parent; /* my parent */ struct kernfs_node *kn; /* cgroup kernfs entry */ + struct kernfs_node *control_kn; /* kn for "cgroup.subtree_control" */ /* * Monotonically increasing unique serial number which defines a @@ -216,6 +218,9 @@ struct cgroup { /* For css percpu_ref killing and RCU-protected deletion */ struct rcu_head rcu_head; struct work_struct destroy_work; + + /* used to wait for offlining of csses */ + wait_queue_head_t offline_waitq; }; #define MAX_CGROUP_ROOT_NAMELEN 64 diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 5faa115..4e958c7 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -182,6 +182,8 @@ static int rebind_subsystems(struct cgroup_root *dst_root, unsigned long ss_mask); static void cgroup_destroy_css_killed(struct cgroup *cgrp); static int cgroup_destroy_locked(struct cgroup *cgrp); +static int create_css(struct cgroup *cgrp, struct cgroup_subsys *ss); +static void kill_css(struct cgroup_subsys_state *css); static int cgroup_addrm_files(struct cgroup *cgrp, struct cftype cfts[], bool is_add); static void cgroup_pidlist_destroy_all(struct cgroup *cgrp); @@ -338,6 +340,14 @@ static int notify_on_release(const struct cgroup *cgrp) #define for_each_root(root) \ list_for_each_entry((root), &cgroup_roots, root_list) +/* iterate over child cgrps, lock should be held throughout iteration */ +#define cgroup_for_each_live_child(child, cgrp) \ + list_for_each_entry((child), &(cgrp)->children, sibling) \ + if (({ lockdep_assert_held(&cgroup_tree_mutex); \ + cgroup_is_dead(child); })) \ + ; \ + else + /** * cgroup_lock_live_group - take cgroup_mutex and check that cgrp is alive. * @cgrp: the cgroup to be checked for liveness @@ -1450,6 +1460,8 @@ static void init_cgroup_housekeeping(struct cgroup *cgrp) for_each_subsys(ss, ssid) INIT_LIST_HEAD(&cgrp->e_csets[ssid]); + + init_waitqueue_head(&cgrp->offline_waitq); } static void init_cgroup_root(struct cgroup_root *root, @@ -1937,6 +1949,14 @@ static int cgroup_migrate_prepare_dst(struct cgroup *dst_cgrp, lockdep_assert_held(&cgroup_mutex); + /* + * Except for the root, child_subsys_mask must be zero for a cgroup + * with tasks so that child cgroups don't compete against tasks. + */ + if (dst_cgrp && cgroup_on_dfl(dst_cgrp) && dst_cgrp->parent && + dst_cgrp->child_subsys_mask) + return -EBUSY; + /* look up the dst cset for each src cset and link it to src */ list_for_each_entry_safe(src_cset, tmp_cset, preloaded_csets, mg_preload_node) { struct css_set *dst_cset; @@ -2302,6 +2322,317 @@ static int cgroup_sane_behavior_show(struct seq_file *seq, void *v) return 0; } +static void cgroup_print_ss_mask(struct seq_file *seq, unsigned int ss_mask) +{ + struct cgroup_subsys *ss; + bool printed = false; + int ssid; + + for_each_subsys(ss, ssid) { + if (ss_mask & (1 << ssid)) { + if (printed) + seq_putc(seq, ' '); + seq_printf(seq, "%s", ss->name); + printed = true; + } + } + if (printed) + seq_putc(seq, '\n'); +} + +/* show controllers which are currently attached to the default hierarchy */ +static int cgroup_controllers_show(struct seq_file *seq, void *v) +{ + struct cgroup *cgrp = seq_css(seq)->cgroup; + + cgroup_print_ss_mask(seq, cgrp->root->subsys_mask); + return 0; +} + +/* show controllers which are enabled for a given cgroup's children */ +static int cgroup_subtree_control_show(struct seq_file *seq, void *v) +{ + struct cgroup *cgrp = seq_css(seq)->cgroup; + + cgroup_print_ss_mask(seq, cgrp->child_subsys_mask); + return 0; +} + +/** + * cgroup_update_dfl_csses - update css assoc of a subtree in default hierarchy + * @cgrp: root of the subtree to update csses for + * + * @cgrp's child_subsys_mask has changed and its subtree's (self excluded) + * css associations need to be updated accordingly. This function looks up + * all css_sets which are attached to the subtree, creates the matching + * updated css_sets and migrates the tasks to the new ones. + */ +static int cgroup_update_dfl_csses(struct cgroup *cgrp) +{ + LIST_HEAD(preloaded_csets); + struct cgroup_subsys_state *css; + struct css_set *src_cset; + int ret; + + lockdep_assert_held(&cgroup_tree_mutex); + lockdep_assert_held(&cgroup_mutex); + + /* look up all csses currently attached to @cgrp's subtree */ + down_read(&css_set_rwsem); + css_for_each_descendant_pre(css, cgroup_css(cgrp, NULL)) { + struct cgrp_cset_link *link; + + /* self is not affected by child_subsys_mask change */ + if (css->cgroup == cgrp) + continue; + + list_for_each_entry(link, &css->cgroup->cset_links, cset_link) + cgroup_migrate_add_src(link->cset, cgrp, + &preloaded_csets); + } + up_read(&css_set_rwsem); + + /* NULL dst indicates self on default hierarchy */ + ret = cgroup_migrate_prepare_dst(NULL, &preloaded_csets); + if (ret) + goto out_finish; + + list_for_each_entry(src_cset, &preloaded_csets, mg_preload_node) { + struct task_struct *last_task = NULL, *task; + + /* src_csets precede dst_csets, break on the first dst_cset */ + if (!src_cset->mg_src_cgrp) + break; + + /* + * All tasks in src_cset need to be migrated to the + * matching dst_cset. Empty it process by process. We + * walk tasks but migrate processes. The leader might even + * belong to a different cset but such src_cset would also + * be among the target src_csets because the default + * hierarchy enforces per-process membership. + */ + while (true) { + down_read(&css_set_rwsem); + task = list_first_entry_or_null(&src_cset->tasks, + struct task_struct, cg_list); + if (task) { + task = task->group_leader; + WARN_ON_ONCE(!task_css_set(task)->mg_src_cgrp); + get_task_struct(task); + } + up_read(&css_set_rwsem); + + if (!task) + break; + + /* guard against possible infinite loop */ + if (WARN(last_task == task, + "cgroup: update_dfl_csses failed to make progress, aborting in inconsistent state\n")) + goto out_finish; + last_task = task; + + threadgroup_lock(task); + /* raced against de_thread() from another thread? */ + if (!thread_group_leader(task)) { + threadgroup_unlock(task); + put_task_struct(task); + continue; + } + + ret = cgroup_migrate(src_cset->dfl_cgrp, task, true); + + threadgroup_unlock(task); + put_task_struct(task); + + if (WARN(ret, "cgroup: failed to update controllers for the default hierarchy (%d), further operations may crash or hang\n", ret)) + goto out_finish; + } + } + +out_finish: + cgroup_migrate_finish(&preloaded_csets); + return ret; +} + +/* change the enabled child controllers for a cgroup in the default hierarchy */ +static int cgroup_subtree_control_write(struct cgroup_subsys_state *dummy_css, + struct cftype *cft, char *buffer) +{ + unsigned long enable_req = 0, disable_req = 0, enable, disable; + struct cgroup *cgrp = dummy_css->cgroup, *child; + struct cgroup_subsys *ss; + char *tok, *p; + int ssid, ret; + + /* + * Parse input - white space separated list of subsystem names + * prefixed with either + or -. + */ + p = buffer; + while ((tok = strsep(&p, " \t\n"))) { + for_each_subsys(ss, ssid) { + if (ss->disabled || strcmp(tok + 1, ss->name)) + continue; + + if (*tok == '+') { + enable_req |= 1 << ssid; + disable_req &= ~(1 << ssid); + } else if (*tok == '-') { + disable_req |= 1 << ssid; + enable_req &= ~(1 << ssid); + } else { + return -EINVAL; + } + break; + } + if (ssid == CGROUP_SUBSYS_COUNT) + return -EINVAL; + } + + /* + * We're gonna grab cgroup_tree_mutex which nests outside kernfs + * active_ref. cgroup_lock_live_group() already provides enough + * protection. Ensure @cgrp stays accessible and break the + * active_ref protection. + */ + cgroup_get(cgrp); + kernfs_break_active_protection(cgrp->control_kn); +retry: + enable = enable_req; + disable = disable_req; + + mutex_lock(&cgroup_tree_mutex); + + for_each_subsys(ss, ssid) { + if (enable & (1 << ssid)) { + if (cgrp->child_subsys_mask & (1 << ssid)) { + enable &= ~(1 << ssid); + continue; + } + + /* + * Because css offlining is asynchronous, userland + * might try to re-enable the same controller while + * the previous instance is still around. In such + * cases, wait till it's gone using offline_waitq. + */ + cgroup_for_each_live_child(child, cgrp) { + wait_queue_t wait; + + if (!cgroup_css(child, ss)) + continue; + + prepare_to_wait(&child->offline_waitq, &wait, + TASK_UNINTERRUPTIBLE); + mutex_unlock(&cgroup_tree_mutex); + schedule(); + finish_wait(&child->offline_waitq, &wait); + goto retry; + } + + /* unavailable or not enabled on the parent? */ + if (!(cgrp_dfl_root.subsys_mask & (1 << ssid)) || + (cgrp->parent && + !(cgrp->parent->child_subsys_mask & (1 << ssid)))) { + ret = -ENOENT; + goto out_unlock_tree; + } + } else if (disable & (1 << ssid)) { + if (!(cgrp->child_subsys_mask & (1 << ssid))) { + disable &= ~(1 << ssid); + continue; + } + + /* a child has it enabled? */ + cgroup_for_each_live_child(child, cgrp) { + if (child->child_subsys_mask & (1 << ssid)) { + ret = -EBUSY; + goto out_unlock_tree; + } + } + } + } + + if (!enable && !disable) { + ret = 0; + goto out_unlock_tree; + } + + if (!cgroup_lock_live_group(cgrp)) { + ret = -ENODEV; + goto out_unlock_tree; + } + + /* + * Except for the root, child_subsys_mask must be zero for a cgroup + * with tasks so that child cgroups don't compete against tasks. + */ + if (enable && cgrp->parent && !list_empty(&cgrp->cset_links)) { + ret = -EBUSY; + goto out_unlock; + } + + /* + * Create csses for enables and update child_subsys_mask. This + * changes cgroup_e_css() results which in turn makes the + * subsequent cgroup_update_dfl_csses() associate all tasks in the + * subtree to the updated csses. + */ + for_each_subsys(ss, ssid) { + if (!(enable & (1 << ssid))) + continue; + + cgroup_for_each_live_child(child, cgrp) { + ret = create_css(child, ss); + if (ret) + goto err_undo_css; + } + } + + cgrp->child_subsys_mask |= enable; + cgrp->child_subsys_mask &= ~disable; + + ret = cgroup_update_dfl_csses(cgrp); + if (ret) + goto err_undo_css; + + /* all tasks are now migrated away from the old csses, kill them */ + for_each_subsys(ss, ssid) { + if (!(disable & (1 << ssid))) + continue; + + cgroup_for_each_live_child(child, cgrp) + kill_css(cgroup_css(child, ss)); + } + + kernfs_activate(cgrp->kn); + ret = 0; +out_unlock: + mutex_unlock(&cgroup_mutex); +out_unlock_tree: + mutex_unlock(&cgroup_tree_mutex); + kernfs_unbreak_active_protection(cgrp->control_kn); + cgroup_put(cgrp); + return ret; + +err_undo_css: + cgrp->child_subsys_mask &= ~enable; + cgrp->child_subsys_mask |= disable; + + for_each_subsys(ss, ssid) { + if (!(enable & (1 << ssid))) + continue; + + cgroup_for_each_live_child(child, cgrp) { + struct cgroup_subsys_state *css = cgroup_css(child, ss); + if (css) + kill_css(css); + } + } + goto out_unlock; +} + static ssize_t cgroup_file_write(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off) { @@ -2442,6 +2773,8 @@ static int cgroup_add_file(struct cgroup *cgrp, struct cftype *cft) kn = __kernfs_create_file(cgrp->kn, cgroup_file_name(cgrp, cft, name), cgroup_file_mode(cft), 0, cft->kf_ops, cft, NULL, false, key); + if (cft->seq_show == cgroup_subtree_control_show) + cgrp->control_kn = kn; return PTR_ERR_OR_ZERO(kn); } @@ -3539,6 +3872,17 @@ static struct cftype cgroup_base_files[] = { .flags = CFTYPE_ONLY_ON_ROOT, .seq_show = cgroup_sane_behavior_show, }, + { + .name = "cgroup.controllers", + .flags = CFTYPE_ONLY_ON_DFL | CFTYPE_ONLY_ON_ROOT, + .seq_show = cgroup_controllers_show, + }, + { + .name = "cgroup.subtree_control", + .flags = CFTYPE_ONLY_ON_DFL, + .seq_show = cgroup_subtree_control_show, + .write_string = cgroup_subtree_control_write, + }, /* * Historical crazy stuff. These don't have "cgroup." prefix and @@ -3707,6 +4051,8 @@ static void offline_css(struct cgroup_subsys_state *css) css->flags &= ~CSS_ONLINE; css->cgroup->nr_css--; RCU_INIT_POINTER(css->cgroup->subsys[ss->id], NULL); + + wake_up_all(&css->cgroup->offline_waitq); } /** -- 1.8.5.3 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* Re: [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy 2014-03-28 2:40 ` Tejun Heo @ 2014-04-14 15:45 ` Vivek Goyal -1 siblings, 0 replies; 63+ messages in thread From: Vivek Goyal @ 2014-04-14 15:45 UTC (permalink / raw) To: Tejun Heo Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-kernel-u79uwXL29TY76Z2rM5mHXA On Thu, Mar 27, 2014 at 10:40:49PM -0400, Tejun Heo wrote: [..] > This patchset finally implements the default unified hierarchy. The > goal is providing enough flexibility while enforcing stricter common > structure where appropriate to address the above listed issues. > > Controllers which aren't bound to other hierarchies are > automatically attached to the unified hierarchy, Hi Tejun, So this patchset does not *enforce* a single hierarchy? So user space can still mount other hierarchies. > which is different in > that controllers are enabled explicitly for each subtree. > "cgroup.subtree_control" controls which controllers are enabled on the > child cgroups. Let's assume a hierarchy like the following. > > root - A - B - C > \ D > > root's "cgroup.subtree_control" determines which controllers are > enabled on A. A's on B. B's on C and D. This coincides with the > fact that controllers on the immediate sub-level are used to > distribute the resources of the parent. In fact, it's natural to > assume that resource control knobs of a child belong to its parent. > Enabling a controller in "cgroup.subtree_control" declares that > distribution of the respective resources of the cgroup will be > controlled. Note that this means that controller enable states are > shared among siblings. > > The default hierarchy has an extra restriction - only cgroups which > don't contain any task may have controllers enabled in > "cgroup.subtree_control". Combined with the other properties of the > default hierarchy, this guarantees that, from the view point of > controllers, tasks are only on the leaf cgroups. In other words, only > leaf csses may contain tasks. This rules out situations where child > cgroups compete against internal tasks of the parent. How does this work for root's tasks now? Given that task can only be in leaf cgroups, that means tasks can't be in / cgroup (If one wants to create some cgroups). Does that mean / will be empty and init system need to setup things right. Before one can create child cgroups, there will be some processes already running (init itlsef, kernel threads etc). I am assuming they will be running in /. So how would that work. Thanks Vivek ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy @ 2014-04-14 15:45 ` Vivek Goyal 0 siblings, 0 replies; 63+ messages in thread From: Vivek Goyal @ 2014-04-14 15:45 UTC (permalink / raw) To: Tejun Heo; +Cc: lizefan, containers, cgroups, linux-kernel On Thu, Mar 27, 2014 at 10:40:49PM -0400, Tejun Heo wrote: [..] > This patchset finally implements the default unified hierarchy. The > goal is providing enough flexibility while enforcing stricter common > structure where appropriate to address the above listed issues. > > Controllers which aren't bound to other hierarchies are > automatically attached to the unified hierarchy, Hi Tejun, So this patchset does not *enforce* a single hierarchy? So user space can still mount other hierarchies. > which is different in > that controllers are enabled explicitly for each subtree. > "cgroup.subtree_control" controls which controllers are enabled on the > child cgroups. Let's assume a hierarchy like the following. > > root - A - B - C > \ D > > root's "cgroup.subtree_control" determines which controllers are > enabled on A. A's on B. B's on C and D. This coincides with the > fact that controllers on the immediate sub-level are used to > distribute the resources of the parent. In fact, it's natural to > assume that resource control knobs of a child belong to its parent. > Enabling a controller in "cgroup.subtree_control" declares that > distribution of the respective resources of the cgroup will be > controlled. Note that this means that controller enable states are > shared among siblings. > > The default hierarchy has an extra restriction - only cgroups which > don't contain any task may have controllers enabled in > "cgroup.subtree_control". Combined with the other properties of the > default hierarchy, this guarantees that, from the view point of > controllers, tasks are only on the leaf cgroups. In other words, only > leaf csses may contain tasks. This rules out situations where child > cgroups compete against internal tasks of the parent. How does this work for root's tasks now? Given that task can only be in leaf cgroups, that means tasks can't be in / cgroup (If one wants to create some cgroups). Does that mean / will be empty and init system need to setup things right. Before one can create child cgroups, there will be some processes already running (init itlsef, kernel threads etc). I am assuming they will be running in /. So how would that work. Thanks Vivek ^ permalink raw reply [flat|nested] 63+ messages in thread
[parent not found: <20140414154556.GA9552-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy 2014-04-14 15:45 ` Vivek Goyal @ 2014-04-14 17:52 ` Tejun Heo -1 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-04-14 17:52 UTC (permalink / raw) To: Vivek Goyal Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-kernel-u79uwXL29TY76Z2rM5mHXA Hello, On Mon, Apr 14, 2014 at 11:45:56AM -0400, Vivek Goyal wrote: > So this patchset does not *enforce* a single hierarchy? So user space > can still mount other hierarchies. Nope, nothing is forced. Controllers may be moved between unified and traditional multiple hierarchies. > How does this work for root's tasks now? Given that task can only be > in leaf cgroups, that means tasks can't be in / cgroup (If one wants > to create some cgroups). Does that mean / will be empty and init system > need to setup things right. Root is exempt from the restriction. Root has always been special anyway. I'll soon post a document describing the design and restrictions of unified hierarchy along with an updated version of this patchset. Thanks. -- tejun ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy @ 2014-04-14 17:52 ` Tejun Heo 0 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-04-14 17:52 UTC (permalink / raw) To: Vivek Goyal; +Cc: lizefan, containers, cgroups, linux-kernel Hello, On Mon, Apr 14, 2014 at 11:45:56AM -0400, Vivek Goyal wrote: > So this patchset does not *enforce* a single hierarchy? So user space > can still mount other hierarchies. Nope, nothing is forced. Controllers may be moved between unified and traditional multiple hierarchies. > How does this work for root's tasks now? Given that task can only be > in leaf cgroups, that means tasks can't be in / cgroup (If one wants > to create some cgroups). Does that mean / will be empty and init system > need to setup things right. Root is exempt from the restriction. Root has always been special anyway. I'll soon post a document describing the design and restrictions of unified hierarchy along with an updated version of this patchset. Thanks. -- tejun ^ permalink raw reply [flat|nested] 63+ messages in thread
[parent not found: <20140414175236.GB15249-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>]
* Re: [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy 2014-04-14 17:52 ` Tejun Heo @ 2014-04-14 18:14 ` Vivek Goyal -1 siblings, 0 replies; 63+ messages in thread From: Vivek Goyal @ 2014-04-14 18:14 UTC (permalink / raw) To: Tejun Heo Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-kernel-u79uwXL29TY76Z2rM5mHXA On Mon, Apr 14, 2014 at 01:52:36PM -0400, Tejun Heo wrote: > Hello, > > On Mon, Apr 14, 2014 at 11:45:56AM -0400, Vivek Goyal wrote: > > So this patchset does not *enforce* a single hierarchy? So user space > > can still mount other hierarchies. > > Nope, nothing is forced. Controllers may be moved between unified and > traditional multiple hierarchies. So is this an intermediate mode before we move to single hiearchy *only* mode. AFAIK, you had mentioned that we will support legacy multiple hiearchy mode but single hiearchy is the new default mode. How does one figure out which one is unified hiearchy (out of multiple hierarchies). Is there a flag somewhere? Or sane flag will be used to mark unified hiearchy. > > > How does this work for root's tasks now? Given that task can only be > > in leaf cgroups, that means tasks can't be in / cgroup (If one wants > > to create some cgroups). Does that mean / will be empty and init system > > need to setup things right. > > Root is exempt from the restriction. Root has always been special > anyway. So task and child groups can co-exist in root *only*? > > I'll soon post a document describing the design and restrictions of > unified hierarchy along with an updated version of this patchset. That would help a lot in understanding various modes and that in turn will help with patch reviews too. Thanks Vivek ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy @ 2014-04-14 18:14 ` Vivek Goyal 0 siblings, 0 replies; 63+ messages in thread From: Vivek Goyal @ 2014-04-14 18:14 UTC (permalink / raw) To: Tejun Heo; +Cc: lizefan, containers, cgroups, linux-kernel On Mon, Apr 14, 2014 at 01:52:36PM -0400, Tejun Heo wrote: > Hello, > > On Mon, Apr 14, 2014 at 11:45:56AM -0400, Vivek Goyal wrote: > > So this patchset does not *enforce* a single hierarchy? So user space > > can still mount other hierarchies. > > Nope, nothing is forced. Controllers may be moved between unified and > traditional multiple hierarchies. So is this an intermediate mode before we move to single hiearchy *only* mode. AFAIK, you had mentioned that we will support legacy multiple hiearchy mode but single hiearchy is the new default mode. How does one figure out which one is unified hiearchy (out of multiple hierarchies). Is there a flag somewhere? Or sane flag will be used to mark unified hiearchy. > > > How does this work for root's tasks now? Given that task can only be > > in leaf cgroups, that means tasks can't be in / cgroup (If one wants > > to create some cgroups). Does that mean / will be empty and init system > > need to setup things right. > > Root is exempt from the restriction. Root has always been special > anyway. So task and child groups can co-exist in root *only*? > > I'll soon post a document describing the design and restrictions of > unified hierarchy along with an updated version of this patchset. That would help a lot in understanding various modes and that in turn will help with patch reviews too. Thanks Vivek ^ permalink raw reply [flat|nested] 63+ messages in thread
[parent not found: <20140414181421.GC9552-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy 2014-04-14 18:14 ` Vivek Goyal @ 2014-04-14 19:21 ` Tejun Heo -1 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-04-14 19:21 UTC (permalink / raw) To: Vivek Goyal Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-kernel-u79uwXL29TY76Z2rM5mHXA Hello, On Mon, Apr 14, 2014 at 02:14:21PM -0400, Vivek Goyal wrote: > So is this an intermediate mode before we move to single hiearchy *only* > mode. AFAIK, you had mentioned that we will support legacy multiple > hiearchy mode but single hiearchy is the new default mode. Hmmm? It's a separate special hierarchy which can co-exist with other hierarchies. There's no inherent intermediacy to it. As long as traditional hierarchies exist, it's gonna be like this. > How does one figure out which one is unified hiearchy (out of multiple > hierarchies). Is there a flag somewhere? Or sane flag will be used to > mark unified hiearchy. You can tell it multiple ways - from its mount options, whether cgroup.controllers exists, but it'll most likely be mounted on a distinctive mount point, wouldn't it? > So task and child groups can co-exist in root *only*? Note that the restriction doesn't apply if no controller is enabled but yes root is the only cgroup where tasks can exist and controllers are enabled at the same time. Thanks. -- tejun ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy @ 2014-04-14 19:21 ` Tejun Heo 0 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-04-14 19:21 UTC (permalink / raw) To: Vivek Goyal; +Cc: lizefan, containers, cgroups, linux-kernel Hello, On Mon, Apr 14, 2014 at 02:14:21PM -0400, Vivek Goyal wrote: > So is this an intermediate mode before we move to single hiearchy *only* > mode. AFAIK, you had mentioned that we will support legacy multiple > hiearchy mode but single hiearchy is the new default mode. Hmmm? It's a separate special hierarchy which can co-exist with other hierarchies. There's no inherent intermediacy to it. As long as traditional hierarchies exist, it's gonna be like this. > How does one figure out which one is unified hiearchy (out of multiple > hierarchies). Is there a flag somewhere? Or sane flag will be used to > mark unified hiearchy. You can tell it multiple ways - from its mount options, whether cgroup.controllers exists, but it'll most likely be mounted on a distinctive mount point, wouldn't it? > So task and child groups can co-exist in root *only*? Note that the restriction doesn't apply if no controller is enabled but yes root is the only cgroup where tasks can exist and controllers are enabled at the same time. Thanks. -- tejun ^ permalink raw reply [flat|nested] 63+ messages in thread
[parent not found: <20140414192139.GB16835-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>]
* Re: [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy 2014-04-14 19:21 ` Tejun Heo @ 2014-04-15 2:58 ` Li Zefan -1 siblings, 0 replies; 63+ messages in thread From: Li Zefan @ 2014-04-15 2:58 UTC (permalink / raw) To: Tejun Heo Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Vivek Goyal on 2014/4/15 3:21, Tejun Heo wrote: > Hello, > > On Mon, Apr 14, 2014 at 02:14:21PM -0400, Vivek Goyal wrote: >> So is this an intermediate mode before we move to single hiearchy *only* >> mode. AFAIK, you had mentioned that we will support legacy multiple >> hiearchy mode but single hiearchy is the new default mode. > > Hmmm? It's a separate special hierarchy which can co-exist with other > hierarchies. There's no inherent intermediacy to it. As long as > traditional hierarchies exist, it's gonna be like this. > I know we should allow multiple hierarchies, but why we allow this even if sane_behavior is specified? Like this: # mount -t cgroup -o cpuset,__DEVEL__sane_behavior xxx /cgroup # mount -t cgroup -o __DEVEL__sane_behavior xxx /cgroup2/ ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy @ 2014-04-15 2:58 ` Li Zefan 0 siblings, 0 replies; 63+ messages in thread From: Li Zefan @ 2014-04-15 2:58 UTC (permalink / raw) To: Tejun Heo; +Cc: Vivek Goyal, containers, cgroups, linux-kernel on 2014/4/15 3:21, Tejun Heo wrote: > Hello, > > On Mon, Apr 14, 2014 at 02:14:21PM -0400, Vivek Goyal wrote: >> So is this an intermediate mode before we move to single hiearchy *only* >> mode. AFAIK, you had mentioned that we will support legacy multiple >> hiearchy mode but single hiearchy is the new default mode. > > Hmmm? It's a separate special hierarchy which can co-exist with other > hierarchies. There's no inherent intermediacy to it. As long as > traditional hierarchies exist, it's gonna be like this. > I know we should allow multiple hierarchies, but why we allow this even if sane_behavior is specified? Like this: # mount -t cgroup -o cpuset,__DEVEL__sane_behavior xxx /cgroup # mount -t cgroup -o __DEVEL__sane_behavior xxx /cgroup2/ ^ permalink raw reply [flat|nested] 63+ messages in thread
[parent not found: <534CA050.8060709-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>]
* Re: [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy 2014-04-15 2:58 ` Li Zefan @ 2014-04-15 4:57 ` Tejun Heo -1 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-04-15 4:57 UTC (permalink / raw) To: Li Zefan Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Vivek Goyal Hello, On Tue, Apr 15, 2014 at 10:58:24AM +0800, Li Zefan wrote: > I know we should allow multiple hierarchies, but why we allow this even > if sane_behavior is specified? Oh, because we didn't have unified hierarchy before. The whole sane_behavior thing is gonna go away. It'll be absorbed into unified hierarchy and probably be handled as a separate filesystem type, so the current situation is transitional. Thanks. -- tejun ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy @ 2014-04-15 4:57 ` Tejun Heo 0 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-04-15 4:57 UTC (permalink / raw) To: Li Zefan; +Cc: Vivek Goyal, containers, cgroups, linux-kernel Hello, On Tue, Apr 15, 2014 at 10:58:24AM +0800, Li Zefan wrote: > I know we should allow multiple hierarchies, but why we allow this even > if sane_behavior is specified? Oh, because we didn't have unified hierarchy before. The whole sane_behavior thing is gonna go away. It'll be absorbed into unified hierarchy and probably be handled as a separate filesystem type, so the current situation is transitional. Thanks. -- tejun ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy 2014-04-14 17:52 ` Tejun Heo @ 2014-04-30 10:57 ` Raghavendra KT -1 siblings, 0 replies; 63+ messages in thread From: Raghavendra KT @ 2014-04-30 10:57 UTC (permalink / raw) To: Tejun Heo Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Linux Kernel Mailing List, Vivek Goyal On Mon, Apr 14, 2014 at 11:22 PM, Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote: >> How does this work for root's tasks now? Given that task can only be >> in leaf cgroups, that means tasks can't be in / cgroup (If one wants >> to create some cgroups). Does that mean / will be empty and init system >> need to setup things right. > > Root is exempt from the restriction. Root has always been special > anyway. > (I do not wish to Hijack this thread, but found a relevant context here to initiate discussion. But would discuss in a separate thread once I get a positive go on the topic) Hi Tejun, For some controllers like cpuset, when we have exclusive_cpu is set, what about having a knob in the cpuset controller to move the task to non-root (when option is set). Because all the way along, though we have freedom to make the cpusets exclusive and move tasks (say VMs) into them, making sure they do not interfere with each other, we can not prevent the other tasks spawned in a system eating into cpus of exclusive cpuset since they go to root automatically. Do you think having a knob, to make sure new tasks spawned go to say a default directory under root makes sense? I understand that we could easily have a userspace script which could achieve intended goal, but kernel solution would really make the exclusive cpusets have exclusive access to cpus it should have. (I also have a POC implemented for pre-unified hierarchy tree and understand some bit of complications involved in that and understand that we should not have complex policies in kernel for e.g. filtering tasks based on patterns etc..). ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy @ 2014-04-30 10:57 ` Raghavendra KT 0 siblings, 0 replies; 63+ messages in thread From: Raghavendra KT @ 2014-04-30 10:57 UTC (permalink / raw) To: Tejun Heo Cc: Vivek Goyal, lizefan, containers, cgroups, Linux Kernel Mailing List On Mon, Apr 14, 2014 at 11:22 PM, Tejun Heo <tj@kernel.org> wrote: >> How does this work for root's tasks now? Given that task can only be >> in leaf cgroups, that means tasks can't be in / cgroup (If one wants >> to create some cgroups). Does that mean / will be empty and init system >> need to setup things right. > > Root is exempt from the restriction. Root has always been special > anyway. > (I do not wish to Hijack this thread, but found a relevant context here to initiate discussion. But would discuss in a separate thread once I get a positive go on the topic) Hi Tejun, For some controllers like cpuset, when we have exclusive_cpu is set, what about having a knob in the cpuset controller to move the task to non-root (when option is set). Because all the way along, though we have freedom to make the cpusets exclusive and move tasks (say VMs) into them, making sure they do not interfere with each other, we can not prevent the other tasks spawned in a system eating into cpus of exclusive cpuset since they go to root automatically. Do you think having a knob, to make sure new tasks spawned go to say a default directory under root makes sense? I understand that we could easily have a userspace script which could achieve intended goal, but kernel solution would really make the exclusive cpusets have exclusive access to cpus it should have. (I also have a POC implemented for pre-unified hierarchy tree and understand some bit of complications involved in that and understand that we should not have complex policies in kernel for e.g. filtering tasks based on patterns etc..). ^ permalink raw reply [flat|nested] 63+ messages in thread
[parent not found: <CAC4Lta0qd2vW4ENSOpzK+0i06MorzfmsZZZuW9xUtpZ_gWA9gA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy 2014-04-30 10:57 ` Raghavendra KT @ 2014-05-02 15:12 ` Tejun Heo -1 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-05-02 15:12 UTC (permalink / raw) To: Raghavendra KT Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Linux Kernel Mailing List, Vivek Goyal Hello, On Wed, Apr 30, 2014 at 04:27:51PM +0530, Raghavendra KT wrote: > For some controllers like cpuset, when we have exclusive_cpu is set, > what about having a knob in the cpuset controller > to move the task to non-root (when option is set). I'm doubtful. > Because all the way along, though we have freedom to make the cpusets > exclusive and move tasks (say VMs) into them, > making sure they do not interfere with each other, we can not prevent > the other tasks spawned in a system eating into cpus of > exclusive cpuset since they go to root automatically. I believe the right thing to do would be starting / confining other tasks in the appropriate non-root cgroups. cgroup already provides mechanisms to achieve that. The rest is upto userland. > Do you think having a knob, to make sure new tasks spawned go to say a > default directory under root makes sense? > > I understand that we could easily have a userspace script which could > achieve intended goal, but kernel solution > would really make the exclusive cpusets have exclusive access to cpus > it should have. This would just be a more reliable implementation of an ad-hoc mechanism when it can already be properly achieved by managing cgroups of all processes in the system. > (I also have a POC implemented for pre-unified hierarchy tree and > understand some bit of complications involved in that and > understand that we should not have complex policies in kernel for e.g. > filtering tasks based on patterns etc..). As I wrote above, I think this is something to be solved from userland. I'm very positively against adding this sort of hacky ad-hoc policies which serves one or a few use cases well while introducing possible long-term maintenance issues. Thanks. -- tejun ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy @ 2014-05-02 15:12 ` Tejun Heo 0 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-05-02 15:12 UTC (permalink / raw) To: Raghavendra KT Cc: Vivek Goyal, lizefan, containers, cgroups, Linux Kernel Mailing List Hello, On Wed, Apr 30, 2014 at 04:27:51PM +0530, Raghavendra KT wrote: > For some controllers like cpuset, when we have exclusive_cpu is set, > what about having a knob in the cpuset controller > to move the task to non-root (when option is set). I'm doubtful. > Because all the way along, though we have freedom to make the cpusets > exclusive and move tasks (say VMs) into them, > making sure they do not interfere with each other, we can not prevent > the other tasks spawned in a system eating into cpus of > exclusive cpuset since they go to root automatically. I believe the right thing to do would be starting / confining other tasks in the appropriate non-root cgroups. cgroup already provides mechanisms to achieve that. The rest is upto userland. > Do you think having a knob, to make sure new tasks spawned go to say a > default directory under root makes sense? > > I understand that we could easily have a userspace script which could > achieve intended goal, but kernel solution > would really make the exclusive cpusets have exclusive access to cpus > it should have. This would just be a more reliable implementation of an ad-hoc mechanism when it can already be properly achieved by managing cgroups of all processes in the system. > (I also have a POC implemented for pre-unified hierarchy tree and > understand some bit of complications involved in that and > understand that we should not have complex policies in kernel for e.g. > filtering tasks based on patterns etc..). As I wrote above, I think this is something to be solved from userland. I'm very positively against adding this sort of hacky ad-hoc policies which serves one or a few use cases well while introducing possible long-term maintenance issues. Thanks. -- tejun ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy @ 2014-05-05 16:37 ` Raghavendra KT 0 siblings, 0 replies; 63+ messages in thread From: Raghavendra KT @ 2014-05-05 16:37 UTC (permalink / raw) To: Tejun Heo Cc: Vivek Goyal, lizefan, containers, cgroups, Linux Kernel Mailing List On 5/2/14, Tejun Heo <tj@kernel.org> wrote: > Hello, > > On Wed, Apr 30, 2014 at 04:27:51PM +0530, Raghavendra KT wrote: [...] >> Because all the way along, though we have freedom to make the cpusets >> exclusive and move tasks (say VMs) into them, >> making sure they do not interfere with each other, we can not prevent >> the other tasks spawned in a system eating into cpus of >> exclusive cpuset since they go to root automatically. > > I believe the right thing to do would be starting / confining other > tasks in the appropriate non-root cgroups. cgroup already provides > mechanisms to achieve that. The rest is upto userland. > Thanks Tejun for the reply. I agree. >> Do you think having a knob, to make sure new tasks spawned go to say a >> default directory under root makes sense? >> >> I understand that we could easily have a userspace script which could >> achieve intended goal, but kernel solution >> would really make the exclusive cpusets have exclusive access to cpus >> it should have. > > This would just be a more reliable implementation of an ad-hoc > mechanism when it can already be properly achieved by managing cgroups > of all processes in the system. Correct. Perhaps your answer clears my dilemma to go for a userspace since I donot have any strong reason except convenience to go for a kernel solution. ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy @ 2014-05-05 16:37 ` Raghavendra KT 0 siblings, 0 replies; 63+ messages in thread From: Raghavendra KT @ 2014-05-05 16:37 UTC (permalink / raw) To: Tejun Heo Cc: Vivek Goyal, lizefan-hv44wF8Li93QT0dZR+AlfA, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, cgroups-u79uwXL29TY76Z2rM5mHXA, Linux Kernel Mailing List On 5/2/14, Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote: > Hello, > > On Wed, Apr 30, 2014 at 04:27:51PM +0530, Raghavendra KT wrote: [...] >> Because all the way along, though we have freedom to make the cpusets >> exclusive and move tasks (say VMs) into them, >> making sure they do not interfere with each other, we can not prevent >> the other tasks spawned in a system eating into cpus of >> exclusive cpuset since they go to root automatically. > > I believe the right thing to do would be starting / confining other > tasks in the appropriate non-root cgroups. cgroup already provides > mechanisms to achieve that. The rest is upto userland. > Thanks Tejun for the reply. I agree. >> Do you think having a knob, to make sure new tasks spawned go to say a >> default directory under root makes sense? >> >> I understand that we could easily have a userspace script which could >> achieve intended goal, but kernel solution >> would really make the exclusive cpusets have exclusive access to cpus >> it should have. > > This would just be a more reliable implementation of an ad-hoc > mechanism when it can already be properly achieved by managing cgroups > of all processes in the system. Correct. Perhaps your answer clears my dilemma to go for a userspace since I donot have any strong reason except convenience to go for a kernel solution. ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy 2014-03-28 2:40 ` Tejun Heo @ 2014-04-14 21:31 ` Tejun Heo -1 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-04-14 21:31 UTC (permalink / raw) To: lizefan-hv44wF8Li93QT0dZR+AlfA Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-kernel-u79uwXL29TY76Z2rM5mHXA On Thu, Mar 27, 2014 at 10:40:49PM -0400, Tejun Heo wrote: > The patchset is on top of cgroup/for-3.15 01a971406177 ("cgroup: Use > RCU_INIT_POINTER(x, NULL) in cgroup.c") and also available in the > following git branch. > > git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-unified Branch renamed to review-unified-v1. Thanks. -- tejun ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy @ 2014-04-14 21:31 ` Tejun Heo 0 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-04-14 21:31 UTC (permalink / raw) To: lizefan; +Cc: containers, cgroups, linux-kernel On Thu, Mar 27, 2014 at 10:40:49PM -0400, Tejun Heo wrote: > The patchset is on top of cgroup/for-3.15 01a971406177 ("cgroup: Use > RCU_INIT_POINTER(x, NULL) in cgroup.c") and also available in the > following git branch. > > git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-unified Branch renamed to review-unified-v1. Thanks. -- tejun ^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH 0.5/12] cgroup: cgroup_apply_cftypes() shouldn't skip the default hierarhcy [not found] ` <1395974461-12735-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> ` (13 preceding siblings ...) 2014-04-14 21:31 ` Tejun Heo @ 2014-04-15 22:05 ` Tejun Heo 14 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-04-15 22:05 UTC (permalink / raw) To: lizefan-hv44wF8Li93QT0dZR+AlfA Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-kernel-u79uwXL29TY76Z2rM5mHXA From c7596b2f33e9f7a57cef6909c30fec268e921c3b Mon Sep 17 00:00:00 2001 From: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> Date: Tue, 15 Apr 2014 18:02:12 -0400 cgroup_apply_cftypes() skip creating or removing files if the subsystem is attached to the default hierarchy, which led to missing files in the root of the default hierarchy. Skipping made sense when the default hierarchy was dummy; however, now that the default hierarchy is full functional and planned to be used as the unified hierarchy, it shouldn't be skipped over. Reported-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> --- kernel/cgroup.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 9fcdaa7..fbcc710 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -2436,10 +2436,6 @@ static int cgroup_apply_cftypes(struct cftype *cfts, bool is_add) lockdep_assert_held(&cgroup_tree_mutex); - /* don't bother if @ss isn't attached */ - if (ss->root == &cgrp_dfl_root) - return 0; - /* add/rm files for all cgroups created before */ css_for_each_descendant_pre(css, cgroup_css(root, ss)) { struct cgroup *cgrp = css->cgroup; -- 1.9.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH 0.5/12] cgroup: cgroup_apply_cftypes() shouldn't skip the default hierarhcy [not found] ` <1395974461-12735-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> @ 2014-04-15 22:05 ` Tejun Heo 2014-03-28 2:40 ` Tejun Heo ` (13 subsequent siblings) 14 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-04-15 22:05 UTC (permalink / raw) To: lizefan; +Cc: containers, cgroups, linux-kernel >From c7596b2f33e9f7a57cef6909c30fec268e921c3b Mon Sep 17 00:00:00 2001 From: Tejun Heo <tj@kernel.org> Date: Tue, 15 Apr 2014 18:02:12 -0400 cgroup_apply_cftypes() skip creating or removing files if the subsystem is attached to the default hierarchy, which led to missing files in the root of the default hierarchy. Skipping made sense when the default hierarchy was dummy; however, now that the default hierarchy is full functional and planned to be used as the unified hierarchy, it shouldn't be skipped over. Reported-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org> --- kernel/cgroup.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 9fcdaa7..fbcc710 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -2436,10 +2436,6 @@ static int cgroup_apply_cftypes(struct cftype *cfts, bool is_add) lockdep_assert_held(&cgroup_tree_mutex); - /* don't bother if @ss isn't attached */ - if (ss->root == &cgrp_dfl_root) - return 0; - /* add/rm files for all cgroups created before */ css_for_each_descendant_pre(css, cgroup_css(root, ss)) { struct cgroup *cgrp = css->cgroup; -- 1.9.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH 0.5/12] cgroup: cgroup_apply_cftypes() shouldn't skip the default hierarhcy @ 2014-04-15 22:05 ` Tejun Heo 0 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-04-15 22:05 UTC (permalink / raw) To: lizefan-hv44wF8Li93QT0dZR+AlfA Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA From c7596b2f33e9f7a57cef6909c30fec268e921c3b Mon Sep 17 00:00:00 2001 From: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> Date: Tue, 15 Apr 2014 18:02:12 -0400 cgroup_apply_cftypes() skip creating or removing files if the subsystem is attached to the default hierarchy, which led to missing files in the root of the default hierarchy. Skipping made sense when the default hierarchy was dummy; however, now that the default hierarchy is full functional and planned to be used as the unified hierarchy, it shouldn't be skipped over. Reported-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> --- kernel/cgroup.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 9fcdaa7..fbcc710 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -2436,10 +2436,6 @@ static int cgroup_apply_cftypes(struct cftype *cfts, bool is_add) lockdep_assert_held(&cgroup_tree_mutex); - /* don't bother if @ss isn't attached */ - if (ss->root == &cgrp_dfl_root) - return 0; - /* add/rm files for all cgroups created before */ css_for_each_descendant_pre(css, cgroup_css(root, ss)) { struct cgroup *cgrp = css->cgroup; -- 1.9.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
[parent not found: <20140415220504.GA13099-9pTldWuhBndy/B6EtB590w@public.gmane.org>]
* Re: [PATCH 0.5/12] cgroup: cgroup_apply_cftypes() shouldn't skip the default hierarhcy 2014-04-15 22:05 ` Tejun Heo @ 2014-04-15 22:05 ` Tejun Heo -1 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-04-15 22:05 UTC (permalink / raw) To: lizefan-hv44wF8Li93QT0dZR+AlfA Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-kernel-u79uwXL29TY76Z2rM5mHXA On Tue, Apr 15, 2014 at 06:05:04PM -0400, Tejun Heo wrote: > From c7596b2f33e9f7a57cef6909c30fec268e921c3b Mon Sep 17 00:00:00 2001 > From: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> > Date: Tue, 15 Apr 2014 18:02:12 -0400 > > cgroup_apply_cftypes() skip creating or removing files if the > subsystem is attached to the default hierarchy, which led to missing > files in the root of the default hierarchy. > > Skipping made sense when the default hierarchy was dummy; however, now > that the default hierarchy is full functional and planned to be used > as the unified hierarchy, it shouldn't be skipped over. > > Reported-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> > Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> Ugh... wrong thread. Will repost. thanks. -- tejun ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH 0.5/12] cgroup: cgroup_apply_cftypes() shouldn't skip the default hierarhcy @ 2014-04-15 22:05 ` Tejun Heo 0 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-04-15 22:05 UTC (permalink / raw) To: lizefan; +Cc: containers, cgroups, linux-kernel On Tue, Apr 15, 2014 at 06:05:04PM -0400, Tejun Heo wrote: > From c7596b2f33e9f7a57cef6909c30fec268e921c3b Mon Sep 17 00:00:00 2001 > From: Tejun Heo <tj@kernel.org> > Date: Tue, 15 Apr 2014 18:02:12 -0400 > > cgroup_apply_cftypes() skip creating or removing files if the > subsystem is attached to the default hierarchy, which led to missing > files in the root of the default hierarchy. > > Skipping made sense when the default hierarchy was dummy; however, now > that the default hierarchy is full functional and planned to be used > as the unified hierarchy, it shouldn't be skipped over. > > Reported-by: Li Zefan <lizefan@huawei.com> > Signed-off-by: Tejun Heo <tj@kernel.org> Ugh... wrong thread. Will repost. thanks. -- tejun ^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCHSET cgroup/for-3.16] cgroup: implement unified hierarchy, v2 @ 2014-04-14 21:36 Tejun Heo [not found] ` <1397511430-2673-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> 2014-04-14 21:37 ` Tejun Heo 0 siblings, 2 replies; 63+ messages in thread From: Tejun Heo @ 2014-04-14 21:36 UTC (permalink / raw) To: lizefan-hv44wF8Li93QT0dZR+AlfA Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-kernel-u79uwXL29TY76Z2rM5mHXA Hello, This is v2 of the unified hierarchy patchset. Changes from v1[1] are, * Rebased on top of v3.15-rc1 * Interface file "cgroup.controllers" which was only available in the root is now available in all cgroups. This allows, e.g., a sub-manager in charge of a subtree to tell which controllers are available to it. cgroup currently allows creating arbitrary number of hierarchies and any number of controllers may be associated with a given tree. This allows for huge amount of variance how tasks are associated with various cgroups and controllers; unfortunately, the variance is extreme to the extent that it unnecessarily complicates capabilities which can otherwise be straight-forward and hinders implementation of features which can benefit from coordination among different controllers. Here are some of the issues which we're facing with the current multiple hierarchies. * cgroup membership of a task can't be described in finite number of paths. As there can be arbitrary number of hierarchies, the key describing a task's cgroup membership can be arbitrarily long. This is painful when userland or other parts of the kernel needs to take cgroup membership into account and leads to proliferation of controllers which are just there to identify membership rather than actually control resources, which in turn exacerbates the problem. * Different controllers may or may not reside on the same hierarchy. Features or optimizations which can benefit from sharing the hierarchical organization either can't be implemented or becomes overly complicated. * Tasks of a process may belong to different cgroups, which doesn't make any sense for some controllers. Those controllers end up ignoring such configurations in their own ways leading to inconsistent behavior. In addition, in-process resource control fundamentally isn't something which belongs to cgroup. As it has to be visible to the binary for the process, it must be part of the stable programming interface which is easily accessible to the process proper in an easy race-free way. * The current cgroup allows cgroups which have child cgroups to have tasks in it. This means that the child cgroups end up competing against the internal tasks. This introduces inherent ambiguity as the two are separate types of entities and the latter doesn't have the same control knobs assigned to them. Different controllers are dealing with the issue in different ways. cpu treats internal tasks and child cgroups as equivalents, which makes giving a child cgroup a given ratio of the parent's cpu time difficult as the number of competing entities may fluctuate without any indication. blkio, in my misguided attempt to deal with the issue, introduced a whole duplicate set of knobs for internal tasks and deal with them as if they belong to a separate child cgroup making the interface and implementation a mess. memcg seems somewhat ambiguous on the issue but there are attempts to introduce ad-hoc modifications to tilt the way it's handled to suit specific use cases. This is an inherent problem. All of the solutions that different controllers came up with are unsatisfactory, the different behaviors greatly increases the level of inconsistency and complicates the controller implementations. This patchset finally implements the default unified hierarchy. The goal is providing enough flexibility while enforcing stricter common structure where appropriate to address the above listed issues. Controllers which aren't bound to other hierarchies are automatically attached to the unified hierarchy, which is different in that controllers are enabled explicitly for each subtree. "cgroup.subtree_control" controls which controllers are enabled on the child cgroups. Let's assume a hierarchy like the following. root - A - B - C \ D root's "cgroup.subtree_control" determines which controllers are enabled on A. A's on B. B's on C and D. This coincides with the fact that controllers on the immediate sub-level are used to distribute the resources of the parent. In fact, it's natural to assume that resource control knobs of a child belong to its parent. Enabling a controller in "cgroup.subtree_control" declares that distribution of the respective resources of the cgroup will be controlled. Note that this means that controller enable states are shared among siblings. The default hierarchy has an extra restriction - only cgroups which don't contain any task may have controllers enabled in "cgroup.subtree_control". Combined with the other properties of the default hierarchy, this guarantees that, from the view point of controllers, tasks are only on the leaf cgroups. In other words, only leaf csses may contain tasks. This rules out situations where child cgroups compete against internal tasks of the parent. This patchset contains the following twelve patches. 0001-cgroup-update-cgroup-subsys_mask-to-child_subsys_mas.patch 0002-cgroup-introduce-effective-cgroup_subsys_state.patch 0003-cgroup-implement-cgroup-e_csets.patch 0004-cgroup-make-css_next_child-skip-missing-csses.patch 0005-cgroup-reorganize-css_task_iter.patch 0006-cgroup-teach-css_task_iter-about-effective-csses.patch 0007-cgroup-cgroup-subsys-should-be-cleared-after-the-css.patch 0008-cgroup-allow-cgroup-creation-and-suppress-automatic-.patch 0009-cgroup-add-css_set-dfl_cgrp.patch 0010-cgroup-update-subsystem-rebind-restrictions.patch 0011-cgroup-prepare-migration-path-for-unified-hierarchy.patch 0012-cgroup-implement-dynamic-subtree-controller-enable-d.patch 0001 updates subsys_mask handling again to morph cgrp->subsys_mask to cgrp->child_subsys_mask. 0002-0003 introduce effective cgroup. The cgroup on the unified hierarchy a task belongs to when viewed from a controller. 0004-0007 update iterators to handle effective cgroup correctly. 0008-0011 prepare various paths for explicit controller enable/disable. 0012 implements explicit controller enable/disable. The patchset is on top of cgroup/for-3.15 01a971406177 ("cgroup: Use RCU_INIT_POINTER(x, NULL) in cgroup.c") and also available in the following git branch. git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-unified-v2 diffstat follows. include/linux/cgroup.h | 44 ++- kernel/cgroup.c | 672 +++++++++++++++++++++++++++++++++++++++++-------- 2 files changed, 604 insertions(+), 112 deletions(-) Thanks. -- tejun [1] http://lkml.kernel.org/g/1395974461-12735-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org ^ permalink raw reply [flat|nested] 63+ messages in thread
[parent not found: <1397511430-2673-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>]
* [PATCH 12/12] cgroup: implement dynamic subtree controller enable/disable on the default hierarchy [not found] ` <1397511430-2673-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> @ 2014-04-14 21:37 ` Tejun Heo 0 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-04-14 21:37 UTC (permalink / raw) To: lizefan-hv44wF8Li93QT0dZR+AlfA Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-kernel-u79uwXL29TY76Z2rM5mHXA cgroup is switching away from multiple hierarchies and will use one unified default hierarchy where controllers can be dynamically enabled and disabled per subtree. The default hierarchy will serve as the unified hierarchy to which all controllers are attached and a css on the default hierarchy would need to also serve the tasks of descendant cgroups which don't have the controller enabled - ie. the tree may be collapsed from leaf towards root when viewed from specific controllers. This has been implemented through effective css in the previous patches. This patch finally implements dynamic subtree controller enable/disable on the default hierarchy via a new knob - "cgroup.subtree_control" which controls which controllers are enabled on the child cgroups. Let's assume a hierarchy like the following. root - A - B - C \ D root's "cgroup.subtree_control" determines which controllers are enabled on A. A's on B. B's on C and D. This coincides with the fact that controllers on the immediate sub-level are used to distribute the resources of the parent. In fact, it's natural to assume that resource control knobs of a child belong to its parent. Enabling a controller in "cgroup.subtree_control" declares that distribution of the respective resources of the cgroup will be controlled. Note that this means that controller enable states are shared among siblings. The default hierarchy has an extra restriction - only cgroups which don't contain any task may have controllers enabled in "cgroup.subtree_control". Combined with the other properties of the default hierarchy, this guarantees that, from the view point of controllers, tasks are only on the leaf cgroups. In other words, only leaf csses may contain tasks. This rules out situations where child cgroups compete against internal tasks of the parent, which is a competition between two different types of entities without any clear way to determine resource distribution between the two. Different controllers handle it differently and all the implemented behaviors are ambiguous, ad-hoc, cumbersome and/or just wrong. Having this structural constraints imposed from cgroup core removes the burden from controller implementations and enables showing one consistent behavior across all controllers. When a controller is enabled or disabled, css associations for the controller in the subtrees of each child should be updated. After enabling, the whole subtree of a child should point to the new css of the child. After disabling, the whole subtree of a child should point to the cgroup's css. This is implemented by first updating cgroup states such that cgroup_e_css() result points to the appropriate css and then invoking cgroup_update_dfl_csses() which migrates all tasks in the affected subtrees to the self cgroup on the default hierarchy. * When read, "cgroup.subtree_control" lists all the currently enabled controllers on the children of the cgroup. * White-space separated list of controller names prefixed with either '+' or '-' can be written to "cgroup.subtree_control". The ones prefixed with '+' are enabled on the controller and '-' disabled. * A controller can be enabled iff the parent's "cgroup.subtree_control" enables it and disabled iff no child's "cgroup.subtree_control" has it enabled. * If a cgroup has tasks, no controller can be enabled via "cgroup.subtree_control". Likewise, if "cgroup.subtree_control" has some controllers enabled, tasks can't be migrated into the cgroup. * All controllers which aren't bound on other hierarchies are automatically associated with the root cgroup of the default hierarchy. All the controllers which are bound to the default hierarchy are listed in the read-only file "cgroup.controllers" in the root directory. * "cgroup.controllers" in all non-root cgroups is read-only file whose content is equal to that of "cgroup.subtree_control" of the parent. This indicates which controllers can be used in the cgroup's "cgroup.subtree_control". This is still experimental and there are some holes, one of which is that ->can_attach() failure during cgroup_update_dfl_csses() may leave the cgroups in an undefined state. The issues will be addressed by future patches. v2: Non-root cgroups now also have "cgroup.controllers". Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> --- include/linux/cgroup.h | 5 + kernel/cgroup.c | 367 ++++++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 370 insertions(+), 2 deletions(-) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index c49d161..ada2392 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -21,6 +21,7 @@ #include <linux/percpu-refcount.h> #include <linux/seq_file.h> #include <linux/kernfs.h> +#include <linux/wait.h> #ifdef CONFIG_CGROUPS @@ -164,6 +165,7 @@ struct cgroup { struct cgroup *parent; /* my parent */ struct kernfs_node *kn; /* cgroup kernfs entry */ + struct kernfs_node *control_kn; /* kn for "cgroup.subtree_control" */ /* * Monotonically increasing unique serial number which defines a @@ -216,6 +218,9 @@ struct cgroup { /* For css percpu_ref killing and RCU-protected deletion */ struct rcu_head rcu_head; struct work_struct destroy_work; + + /* used to wait for offlining of csses */ + wait_queue_head_t offline_waitq; }; #define MAX_CGROUP_ROOT_NAMELEN 64 diff --git a/kernel/cgroup.c b/kernel/cgroup.c index c7b8509..08c4439 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -182,6 +182,8 @@ static int rebind_subsystems(struct cgroup_root *dst_root, unsigned long ss_mask); static void cgroup_destroy_css_killed(struct cgroup *cgrp); static int cgroup_destroy_locked(struct cgroup *cgrp); +static int create_css(struct cgroup *cgrp, struct cgroup_subsys *ss); +static void kill_css(struct cgroup_subsys_state *css); static int cgroup_addrm_files(struct cgroup *cgrp, struct cftype cfts[], bool is_add); static void cgroup_pidlist_destroy_all(struct cgroup *cgrp); @@ -338,6 +340,14 @@ static int notify_on_release(const struct cgroup *cgrp) #define for_each_root(root) \ list_for_each_entry((root), &cgroup_roots, root_list) +/* iterate over child cgrps, lock should be held throughout iteration */ +#define cgroup_for_each_live_child(child, cgrp) \ + list_for_each_entry((child), &(cgrp)->children, sibling) \ + if (({ lockdep_assert_held(&cgroup_tree_mutex); \ + cgroup_is_dead(child); })) \ + ; \ + else + /** * cgroup_lock_live_group - take cgroup_mutex and check that cgrp is alive. * @cgrp: the cgroup to be checked for liveness @@ -1450,6 +1460,8 @@ static void init_cgroup_housekeeping(struct cgroup *cgrp) for_each_subsys(ss, ssid) INIT_LIST_HEAD(&cgrp->e_csets[ssid]); + + init_waitqueue_head(&cgrp->offline_waitq); } static void init_cgroup_root(struct cgroup_root *root, @@ -1938,6 +1950,14 @@ static int cgroup_migrate_prepare_dst(struct cgroup *dst_cgrp, lockdep_assert_held(&cgroup_mutex); + /* + * Except for the root, child_subsys_mask must be zero for a cgroup + * with tasks so that child cgroups don't compete against tasks. + */ + if (dst_cgrp && cgroup_on_dfl(dst_cgrp) && dst_cgrp->parent && + dst_cgrp->child_subsys_mask) + return -EBUSY; + /* look up the dst cset for each src cset and link it to src */ list_for_each_entry_safe(src_cset, tmp_cset, preloaded_csets, mg_preload_node) { struct css_set *dst_cset; @@ -2303,6 +2323,326 @@ static int cgroup_sane_behavior_show(struct seq_file *seq, void *v) return 0; } +static void cgroup_print_ss_mask(struct seq_file *seq, unsigned int ss_mask) +{ + struct cgroup_subsys *ss; + bool printed = false; + int ssid; + + for_each_subsys(ss, ssid) { + if (ss_mask & (1 << ssid)) { + if (printed) + seq_putc(seq, ' '); + seq_printf(seq, "%s", ss->name); + printed = true; + } + } + if (printed) + seq_putc(seq, '\n'); +} + +/* show controllers which are currently attached to the default hierarchy */ +static int cgroup_root_controllers_show(struct seq_file *seq, void *v) +{ + struct cgroup *cgrp = seq_css(seq)->cgroup; + + cgroup_print_ss_mask(seq, cgrp->root->subsys_mask); + return 0; +} + +/* show controllers which are enabled from the parent */ +static int cgroup_controllers_show(struct seq_file *seq, void *v) +{ + struct cgroup *cgrp = seq_css(seq)->cgroup; + + cgroup_print_ss_mask(seq, cgrp->parent->child_subsys_mask); + return 0; +} + +/* show controllers which are enabled for a given cgroup's children */ +static int cgroup_subtree_control_show(struct seq_file *seq, void *v) +{ + struct cgroup *cgrp = seq_css(seq)->cgroup; + + cgroup_print_ss_mask(seq, cgrp->child_subsys_mask); + return 0; +} + +/** + * cgroup_update_dfl_csses - update css assoc of a subtree in default hierarchy + * @cgrp: root of the subtree to update csses for + * + * @cgrp's child_subsys_mask has changed and its subtree's (self excluded) + * css associations need to be updated accordingly. This function looks up + * all css_sets which are attached to the subtree, creates the matching + * updated css_sets and migrates the tasks to the new ones. + */ +static int cgroup_update_dfl_csses(struct cgroup *cgrp) +{ + LIST_HEAD(preloaded_csets); + struct cgroup_subsys_state *css; + struct css_set *src_cset; + int ret; + + lockdep_assert_held(&cgroup_tree_mutex); + lockdep_assert_held(&cgroup_mutex); + + /* look up all csses currently attached to @cgrp's subtree */ + down_read(&css_set_rwsem); + css_for_each_descendant_pre(css, cgroup_css(cgrp, NULL)) { + struct cgrp_cset_link *link; + + /* self is not affected by child_subsys_mask change */ + if (css->cgroup == cgrp) + continue; + + list_for_each_entry(link, &css->cgroup->cset_links, cset_link) + cgroup_migrate_add_src(link->cset, cgrp, + &preloaded_csets); + } + up_read(&css_set_rwsem); + + /* NULL dst indicates self on default hierarchy */ + ret = cgroup_migrate_prepare_dst(NULL, &preloaded_csets); + if (ret) + goto out_finish; + + list_for_each_entry(src_cset, &preloaded_csets, mg_preload_node) { + struct task_struct *last_task = NULL, *task; + + /* src_csets precede dst_csets, break on the first dst_cset */ + if (!src_cset->mg_src_cgrp) + break; + + /* + * All tasks in src_cset need to be migrated to the + * matching dst_cset. Empty it process by process. We + * walk tasks but migrate processes. The leader might even + * belong to a different cset but such src_cset would also + * be among the target src_csets because the default + * hierarchy enforces per-process membership. + */ + while (true) { + down_read(&css_set_rwsem); + task = list_first_entry_or_null(&src_cset->tasks, + struct task_struct, cg_list); + if (task) { + task = task->group_leader; + WARN_ON_ONCE(!task_css_set(task)->mg_src_cgrp); + get_task_struct(task); + } + up_read(&css_set_rwsem); + + if (!task) + break; + + /* guard against possible infinite loop */ + if (WARN(last_task == task, + "cgroup: update_dfl_csses failed to make progress, aborting in inconsistent state\n")) + goto out_finish; + last_task = task; + + threadgroup_lock(task); + /* raced against de_thread() from another thread? */ + if (!thread_group_leader(task)) { + threadgroup_unlock(task); + put_task_struct(task); + continue; + } + + ret = cgroup_migrate(src_cset->dfl_cgrp, task, true); + + threadgroup_unlock(task); + put_task_struct(task); + + if (WARN(ret, "cgroup: failed to update controllers for the default hierarchy (%d), further operations may crash or hang\n", ret)) + goto out_finish; + } + } + +out_finish: + cgroup_migrate_finish(&preloaded_csets); + return ret; +} + +/* change the enabled child controllers for a cgroup in the default hierarchy */ +static int cgroup_subtree_control_write(struct cgroup_subsys_state *dummy_css, + struct cftype *cft, char *buffer) +{ + unsigned long enable_req = 0, disable_req = 0, enable, disable; + struct cgroup *cgrp = dummy_css->cgroup, *child; + struct cgroup_subsys *ss; + char *tok, *p; + int ssid, ret; + + /* + * Parse input - white space separated list of subsystem names + * prefixed with either + or -. + */ + p = buffer; + while ((tok = strsep(&p, " \t\n"))) { + for_each_subsys(ss, ssid) { + if (ss->disabled || strcmp(tok + 1, ss->name)) + continue; + + if (*tok == '+') { + enable_req |= 1 << ssid; + disable_req &= ~(1 << ssid); + } else if (*tok == '-') { + disable_req |= 1 << ssid; + enable_req &= ~(1 << ssid); + } else { + return -EINVAL; + } + break; + } + if (ssid == CGROUP_SUBSYS_COUNT) + return -EINVAL; + } + + /* + * We're gonna grab cgroup_tree_mutex which nests outside kernfs + * active_ref. cgroup_lock_live_group() already provides enough + * protection. Ensure @cgrp stays accessible and break the + * active_ref protection. + */ + cgroup_get(cgrp); + kernfs_break_active_protection(cgrp->control_kn); +retry: + enable = enable_req; + disable = disable_req; + + mutex_lock(&cgroup_tree_mutex); + + for_each_subsys(ss, ssid) { + if (enable & (1 << ssid)) { + if (cgrp->child_subsys_mask & (1 << ssid)) { + enable &= ~(1 << ssid); + continue; + } + + /* + * Because css offlining is asynchronous, userland + * might try to re-enable the same controller while + * the previous instance is still around. In such + * cases, wait till it's gone using offline_waitq. + */ + cgroup_for_each_live_child(child, cgrp) { + wait_queue_t wait; + + if (!cgroup_css(child, ss)) + continue; + + prepare_to_wait(&child->offline_waitq, &wait, + TASK_UNINTERRUPTIBLE); + mutex_unlock(&cgroup_tree_mutex); + schedule(); + finish_wait(&child->offline_waitq, &wait); + goto retry; + } + + /* unavailable or not enabled on the parent? */ + if (!(cgrp_dfl_root.subsys_mask & (1 << ssid)) || + (cgrp->parent && + !(cgrp->parent->child_subsys_mask & (1 << ssid)))) { + ret = -ENOENT; + goto out_unlock_tree; + } + } else if (disable & (1 << ssid)) { + if (!(cgrp->child_subsys_mask & (1 << ssid))) { + disable &= ~(1 << ssid); + continue; + } + + /* a child has it enabled? */ + cgroup_for_each_live_child(child, cgrp) { + if (child->child_subsys_mask & (1 << ssid)) { + ret = -EBUSY; + goto out_unlock_tree; + } + } + } + } + + if (!enable && !disable) { + ret = 0; + goto out_unlock_tree; + } + + if (!cgroup_lock_live_group(cgrp)) { + ret = -ENODEV; + goto out_unlock_tree; + } + + /* + * Except for the root, child_subsys_mask must be zero for a cgroup + * with tasks so that child cgroups don't compete against tasks. + */ + if (enable && cgrp->parent && !list_empty(&cgrp->cset_links)) { + ret = -EBUSY; + goto out_unlock; + } + + /* + * Create csses for enables and update child_subsys_mask. This + * changes cgroup_e_css() results which in turn makes the + * subsequent cgroup_update_dfl_csses() associate all tasks in the + * subtree to the updated csses. + */ + for_each_subsys(ss, ssid) { + if (!(enable & (1 << ssid))) + continue; + + cgroup_for_each_live_child(child, cgrp) { + ret = create_css(child, ss); + if (ret) + goto err_undo_css; + } + } + + cgrp->child_subsys_mask |= enable; + cgrp->child_subsys_mask &= ~disable; + + ret = cgroup_update_dfl_csses(cgrp); + if (ret) + goto err_undo_css; + + /* all tasks are now migrated away from the old csses, kill them */ + for_each_subsys(ss, ssid) { + if (!(disable & (1 << ssid))) + continue; + + cgroup_for_each_live_child(child, cgrp) + kill_css(cgroup_css(child, ss)); + } + + kernfs_activate(cgrp->kn); + ret = 0; +out_unlock: + mutex_unlock(&cgroup_mutex); +out_unlock_tree: + mutex_unlock(&cgroup_tree_mutex); + kernfs_unbreak_active_protection(cgrp->control_kn); + cgroup_put(cgrp); + return ret; + +err_undo_css: + cgrp->child_subsys_mask &= ~enable; + cgrp->child_subsys_mask |= disable; + + for_each_subsys(ss, ssid) { + if (!(enable & (1 << ssid))) + continue; + + cgroup_for_each_live_child(child, cgrp) { + struct cgroup_subsys_state *css = cgroup_css(child, ss); + if (css) + kill_css(css); + } + } + goto out_unlock; +} + static ssize_t cgroup_file_write(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off) { @@ -2462,9 +2802,14 @@ static int cgroup_add_file(struct cgroup *cgrp, struct cftype *cft) return PTR_ERR(kn); ret = cgroup_kn_set_ugid(kn); - if (ret) + if (ret) { kernfs_remove(kn); - return ret; + return ret; + } + + if (cft->seq_show == cgroup_subtree_control_show) + cgrp->control_kn = kn; + return 0; } /** @@ -3561,6 +3906,22 @@ static struct cftype cgroup_base_files[] = { .flags = CFTYPE_ONLY_ON_ROOT, .seq_show = cgroup_sane_behavior_show, }, + { + .name = "cgroup.controllers", + .flags = CFTYPE_ONLY_ON_DFL | CFTYPE_ONLY_ON_ROOT, + .seq_show = cgroup_root_controllers_show, + }, + { + .name = "cgroup.controllers", + .flags = CFTYPE_ONLY_ON_DFL | CFTYPE_NOT_ON_ROOT, + .seq_show = cgroup_controllers_show, + }, + { + .name = "cgroup.subtree_control", + .flags = CFTYPE_ONLY_ON_DFL, + .seq_show = cgroup_subtree_control_show, + .write_string = cgroup_subtree_control_write, + }, /* * Historical crazy stuff. These don't have "cgroup." prefix and @@ -3729,6 +4090,8 @@ static void offline_css(struct cgroup_subsys_state *css) css->flags &= ~CSS_ONLINE; css->cgroup->nr_css--; RCU_INIT_POINTER(css->cgroup->subsys[ss->id], NULL); + + wake_up_all(&css->cgroup->offline_waitq); } /** -- 1.9.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH 12/12] cgroup: implement dynamic subtree controller enable/disable on the default hierarchy [not found] ` <1397511430-2673-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> @ 2014-04-14 21:37 ` Tejun Heo 0 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-04-14 21:37 UTC (permalink / raw) To: lizefan; +Cc: containers, cgroups, linux-kernel, Tejun Heo cgroup is switching away from multiple hierarchies and will use one unified default hierarchy where controllers can be dynamically enabled and disabled per subtree. The default hierarchy will serve as the unified hierarchy to which all controllers are attached and a css on the default hierarchy would need to also serve the tasks of descendant cgroups which don't have the controller enabled - ie. the tree may be collapsed from leaf towards root when viewed from specific controllers. This has been implemented through effective css in the previous patches. This patch finally implements dynamic subtree controller enable/disable on the default hierarchy via a new knob - "cgroup.subtree_control" which controls which controllers are enabled on the child cgroups. Let's assume a hierarchy like the following. root - A - B - C \ D root's "cgroup.subtree_control" determines which controllers are enabled on A. A's on B. B's on C and D. This coincides with the fact that controllers on the immediate sub-level are used to distribute the resources of the parent. In fact, it's natural to assume that resource control knobs of a child belong to its parent. Enabling a controller in "cgroup.subtree_control" declares that distribution of the respective resources of the cgroup will be controlled. Note that this means that controller enable states are shared among siblings. The default hierarchy has an extra restriction - only cgroups which don't contain any task may have controllers enabled in "cgroup.subtree_control". Combined with the other properties of the default hierarchy, this guarantees that, from the view point of controllers, tasks are only on the leaf cgroups. In other words, only leaf csses may contain tasks. This rules out situations where child cgroups compete against internal tasks of the parent, which is a competition between two different types of entities without any clear way to determine resource distribution between the two. Different controllers handle it differently and all the implemented behaviors are ambiguous, ad-hoc, cumbersome and/or just wrong. Having this structural constraints imposed from cgroup core removes the burden from controller implementations and enables showing one consistent behavior across all controllers. When a controller is enabled or disabled, css associations for the controller in the subtrees of each child should be updated. After enabling, the whole subtree of a child should point to the new css of the child. After disabling, the whole subtree of a child should point to the cgroup's css. This is implemented by first updating cgroup states such that cgroup_e_css() result points to the appropriate css and then invoking cgroup_update_dfl_csses() which migrates all tasks in the affected subtrees to the self cgroup on the default hierarchy. * When read, "cgroup.subtree_control" lists all the currently enabled controllers on the children of the cgroup. * White-space separated list of controller names prefixed with either '+' or '-' can be written to "cgroup.subtree_control". The ones prefixed with '+' are enabled on the controller and '-' disabled. * A controller can be enabled iff the parent's "cgroup.subtree_control" enables it and disabled iff no child's "cgroup.subtree_control" has it enabled. * If a cgroup has tasks, no controller can be enabled via "cgroup.subtree_control". Likewise, if "cgroup.subtree_control" has some controllers enabled, tasks can't be migrated into the cgroup. * All controllers which aren't bound on other hierarchies are automatically associated with the root cgroup of the default hierarchy. All the controllers which are bound to the default hierarchy are listed in the read-only file "cgroup.controllers" in the root directory. * "cgroup.controllers" in all non-root cgroups is read-only file whose content is equal to that of "cgroup.subtree_control" of the parent. This indicates which controllers can be used in the cgroup's "cgroup.subtree_control". This is still experimental and there are some holes, one of which is that ->can_attach() failure during cgroup_update_dfl_csses() may leave the cgroups in an undefined state. The issues will be addressed by future patches. v2: Non-root cgroups now also have "cgroup.controllers". Signed-off-by: Tejun Heo <tj@kernel.org> --- include/linux/cgroup.h | 5 + kernel/cgroup.c | 367 ++++++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 370 insertions(+), 2 deletions(-) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index c49d161..ada2392 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -21,6 +21,7 @@ #include <linux/percpu-refcount.h> #include <linux/seq_file.h> #include <linux/kernfs.h> +#include <linux/wait.h> #ifdef CONFIG_CGROUPS @@ -164,6 +165,7 @@ struct cgroup { struct cgroup *parent; /* my parent */ struct kernfs_node *kn; /* cgroup kernfs entry */ + struct kernfs_node *control_kn; /* kn for "cgroup.subtree_control" */ /* * Monotonically increasing unique serial number which defines a @@ -216,6 +218,9 @@ struct cgroup { /* For css percpu_ref killing and RCU-protected deletion */ struct rcu_head rcu_head; struct work_struct destroy_work; + + /* used to wait for offlining of csses */ + wait_queue_head_t offline_waitq; }; #define MAX_CGROUP_ROOT_NAMELEN 64 diff --git a/kernel/cgroup.c b/kernel/cgroup.c index c7b8509..08c4439 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -182,6 +182,8 @@ static int rebind_subsystems(struct cgroup_root *dst_root, unsigned long ss_mask); static void cgroup_destroy_css_killed(struct cgroup *cgrp); static int cgroup_destroy_locked(struct cgroup *cgrp); +static int create_css(struct cgroup *cgrp, struct cgroup_subsys *ss); +static void kill_css(struct cgroup_subsys_state *css); static int cgroup_addrm_files(struct cgroup *cgrp, struct cftype cfts[], bool is_add); static void cgroup_pidlist_destroy_all(struct cgroup *cgrp); @@ -338,6 +340,14 @@ static int notify_on_release(const struct cgroup *cgrp) #define for_each_root(root) \ list_for_each_entry((root), &cgroup_roots, root_list) +/* iterate over child cgrps, lock should be held throughout iteration */ +#define cgroup_for_each_live_child(child, cgrp) \ + list_for_each_entry((child), &(cgrp)->children, sibling) \ + if (({ lockdep_assert_held(&cgroup_tree_mutex); \ + cgroup_is_dead(child); })) \ + ; \ + else + /** * cgroup_lock_live_group - take cgroup_mutex and check that cgrp is alive. * @cgrp: the cgroup to be checked for liveness @@ -1450,6 +1460,8 @@ static void init_cgroup_housekeeping(struct cgroup *cgrp) for_each_subsys(ss, ssid) INIT_LIST_HEAD(&cgrp->e_csets[ssid]); + + init_waitqueue_head(&cgrp->offline_waitq); } static void init_cgroup_root(struct cgroup_root *root, @@ -1938,6 +1950,14 @@ static int cgroup_migrate_prepare_dst(struct cgroup *dst_cgrp, lockdep_assert_held(&cgroup_mutex); + /* + * Except for the root, child_subsys_mask must be zero for a cgroup + * with tasks so that child cgroups don't compete against tasks. + */ + if (dst_cgrp && cgroup_on_dfl(dst_cgrp) && dst_cgrp->parent && + dst_cgrp->child_subsys_mask) + return -EBUSY; + /* look up the dst cset for each src cset and link it to src */ list_for_each_entry_safe(src_cset, tmp_cset, preloaded_csets, mg_preload_node) { struct css_set *dst_cset; @@ -2303,6 +2323,326 @@ static int cgroup_sane_behavior_show(struct seq_file *seq, void *v) return 0; } +static void cgroup_print_ss_mask(struct seq_file *seq, unsigned int ss_mask) +{ + struct cgroup_subsys *ss; + bool printed = false; + int ssid; + + for_each_subsys(ss, ssid) { + if (ss_mask & (1 << ssid)) { + if (printed) + seq_putc(seq, ' '); + seq_printf(seq, "%s", ss->name); + printed = true; + } + } + if (printed) + seq_putc(seq, '\n'); +} + +/* show controllers which are currently attached to the default hierarchy */ +static int cgroup_root_controllers_show(struct seq_file *seq, void *v) +{ + struct cgroup *cgrp = seq_css(seq)->cgroup; + + cgroup_print_ss_mask(seq, cgrp->root->subsys_mask); + return 0; +} + +/* show controllers which are enabled from the parent */ +static int cgroup_controllers_show(struct seq_file *seq, void *v) +{ + struct cgroup *cgrp = seq_css(seq)->cgroup; + + cgroup_print_ss_mask(seq, cgrp->parent->child_subsys_mask); + return 0; +} + +/* show controllers which are enabled for a given cgroup's children */ +static int cgroup_subtree_control_show(struct seq_file *seq, void *v) +{ + struct cgroup *cgrp = seq_css(seq)->cgroup; + + cgroup_print_ss_mask(seq, cgrp->child_subsys_mask); + return 0; +} + +/** + * cgroup_update_dfl_csses - update css assoc of a subtree in default hierarchy + * @cgrp: root of the subtree to update csses for + * + * @cgrp's child_subsys_mask has changed and its subtree's (self excluded) + * css associations need to be updated accordingly. This function looks up + * all css_sets which are attached to the subtree, creates the matching + * updated css_sets and migrates the tasks to the new ones. + */ +static int cgroup_update_dfl_csses(struct cgroup *cgrp) +{ + LIST_HEAD(preloaded_csets); + struct cgroup_subsys_state *css; + struct css_set *src_cset; + int ret; + + lockdep_assert_held(&cgroup_tree_mutex); + lockdep_assert_held(&cgroup_mutex); + + /* look up all csses currently attached to @cgrp's subtree */ + down_read(&css_set_rwsem); + css_for_each_descendant_pre(css, cgroup_css(cgrp, NULL)) { + struct cgrp_cset_link *link; + + /* self is not affected by child_subsys_mask change */ + if (css->cgroup == cgrp) + continue; + + list_for_each_entry(link, &css->cgroup->cset_links, cset_link) + cgroup_migrate_add_src(link->cset, cgrp, + &preloaded_csets); + } + up_read(&css_set_rwsem); + + /* NULL dst indicates self on default hierarchy */ + ret = cgroup_migrate_prepare_dst(NULL, &preloaded_csets); + if (ret) + goto out_finish; + + list_for_each_entry(src_cset, &preloaded_csets, mg_preload_node) { + struct task_struct *last_task = NULL, *task; + + /* src_csets precede dst_csets, break on the first dst_cset */ + if (!src_cset->mg_src_cgrp) + break; + + /* + * All tasks in src_cset need to be migrated to the + * matching dst_cset. Empty it process by process. We + * walk tasks but migrate processes. The leader might even + * belong to a different cset but such src_cset would also + * be among the target src_csets because the default + * hierarchy enforces per-process membership. + */ + while (true) { + down_read(&css_set_rwsem); + task = list_first_entry_or_null(&src_cset->tasks, + struct task_struct, cg_list); + if (task) { + task = task->group_leader; + WARN_ON_ONCE(!task_css_set(task)->mg_src_cgrp); + get_task_struct(task); + } + up_read(&css_set_rwsem); + + if (!task) + break; + + /* guard against possible infinite loop */ + if (WARN(last_task == task, + "cgroup: update_dfl_csses failed to make progress, aborting in inconsistent state\n")) + goto out_finish; + last_task = task; + + threadgroup_lock(task); + /* raced against de_thread() from another thread? */ + if (!thread_group_leader(task)) { + threadgroup_unlock(task); + put_task_struct(task); + continue; + } + + ret = cgroup_migrate(src_cset->dfl_cgrp, task, true); + + threadgroup_unlock(task); + put_task_struct(task); + + if (WARN(ret, "cgroup: failed to update controllers for the default hierarchy (%d), further operations may crash or hang\n", ret)) + goto out_finish; + } + } + +out_finish: + cgroup_migrate_finish(&preloaded_csets); + return ret; +} + +/* change the enabled child controllers for a cgroup in the default hierarchy */ +static int cgroup_subtree_control_write(struct cgroup_subsys_state *dummy_css, + struct cftype *cft, char *buffer) +{ + unsigned long enable_req = 0, disable_req = 0, enable, disable; + struct cgroup *cgrp = dummy_css->cgroup, *child; + struct cgroup_subsys *ss; + char *tok, *p; + int ssid, ret; + + /* + * Parse input - white space separated list of subsystem names + * prefixed with either + or -. + */ + p = buffer; + while ((tok = strsep(&p, " \t\n"))) { + for_each_subsys(ss, ssid) { + if (ss->disabled || strcmp(tok + 1, ss->name)) + continue; + + if (*tok == '+') { + enable_req |= 1 << ssid; + disable_req &= ~(1 << ssid); + } else if (*tok == '-') { + disable_req |= 1 << ssid; + enable_req &= ~(1 << ssid); + } else { + return -EINVAL; + } + break; + } + if (ssid == CGROUP_SUBSYS_COUNT) + return -EINVAL; + } + + /* + * We're gonna grab cgroup_tree_mutex which nests outside kernfs + * active_ref. cgroup_lock_live_group() already provides enough + * protection. Ensure @cgrp stays accessible and break the + * active_ref protection. + */ + cgroup_get(cgrp); + kernfs_break_active_protection(cgrp->control_kn); +retry: + enable = enable_req; + disable = disable_req; + + mutex_lock(&cgroup_tree_mutex); + + for_each_subsys(ss, ssid) { + if (enable & (1 << ssid)) { + if (cgrp->child_subsys_mask & (1 << ssid)) { + enable &= ~(1 << ssid); + continue; + } + + /* + * Because css offlining is asynchronous, userland + * might try to re-enable the same controller while + * the previous instance is still around. In such + * cases, wait till it's gone using offline_waitq. + */ + cgroup_for_each_live_child(child, cgrp) { + wait_queue_t wait; + + if (!cgroup_css(child, ss)) + continue; + + prepare_to_wait(&child->offline_waitq, &wait, + TASK_UNINTERRUPTIBLE); + mutex_unlock(&cgroup_tree_mutex); + schedule(); + finish_wait(&child->offline_waitq, &wait); + goto retry; + } + + /* unavailable or not enabled on the parent? */ + if (!(cgrp_dfl_root.subsys_mask & (1 << ssid)) || + (cgrp->parent && + !(cgrp->parent->child_subsys_mask & (1 << ssid)))) { + ret = -ENOENT; + goto out_unlock_tree; + } + } else if (disable & (1 << ssid)) { + if (!(cgrp->child_subsys_mask & (1 << ssid))) { + disable &= ~(1 << ssid); + continue; + } + + /* a child has it enabled? */ + cgroup_for_each_live_child(child, cgrp) { + if (child->child_subsys_mask & (1 << ssid)) { + ret = -EBUSY; + goto out_unlock_tree; + } + } + } + } + + if (!enable && !disable) { + ret = 0; + goto out_unlock_tree; + } + + if (!cgroup_lock_live_group(cgrp)) { + ret = -ENODEV; + goto out_unlock_tree; + } + + /* + * Except for the root, child_subsys_mask must be zero for a cgroup + * with tasks so that child cgroups don't compete against tasks. + */ + if (enable && cgrp->parent && !list_empty(&cgrp->cset_links)) { + ret = -EBUSY; + goto out_unlock; + } + + /* + * Create csses for enables and update child_subsys_mask. This + * changes cgroup_e_css() results which in turn makes the + * subsequent cgroup_update_dfl_csses() associate all tasks in the + * subtree to the updated csses. + */ + for_each_subsys(ss, ssid) { + if (!(enable & (1 << ssid))) + continue; + + cgroup_for_each_live_child(child, cgrp) { + ret = create_css(child, ss); + if (ret) + goto err_undo_css; + } + } + + cgrp->child_subsys_mask |= enable; + cgrp->child_subsys_mask &= ~disable; + + ret = cgroup_update_dfl_csses(cgrp); + if (ret) + goto err_undo_css; + + /* all tasks are now migrated away from the old csses, kill them */ + for_each_subsys(ss, ssid) { + if (!(disable & (1 << ssid))) + continue; + + cgroup_for_each_live_child(child, cgrp) + kill_css(cgroup_css(child, ss)); + } + + kernfs_activate(cgrp->kn); + ret = 0; +out_unlock: + mutex_unlock(&cgroup_mutex); +out_unlock_tree: + mutex_unlock(&cgroup_tree_mutex); + kernfs_unbreak_active_protection(cgrp->control_kn); + cgroup_put(cgrp); + return ret; + +err_undo_css: + cgrp->child_subsys_mask &= ~enable; + cgrp->child_subsys_mask |= disable; + + for_each_subsys(ss, ssid) { + if (!(enable & (1 << ssid))) + continue; + + cgroup_for_each_live_child(child, cgrp) { + struct cgroup_subsys_state *css = cgroup_css(child, ss); + if (css) + kill_css(css); + } + } + goto out_unlock; +} + static ssize_t cgroup_file_write(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off) { @@ -2462,9 +2802,14 @@ static int cgroup_add_file(struct cgroup *cgrp, struct cftype *cft) return PTR_ERR(kn); ret = cgroup_kn_set_ugid(kn); - if (ret) + if (ret) { kernfs_remove(kn); - return ret; + return ret; + } + + if (cft->seq_show == cgroup_subtree_control_show) + cgrp->control_kn = kn; + return 0; } /** @@ -3561,6 +3906,22 @@ static struct cftype cgroup_base_files[] = { .flags = CFTYPE_ONLY_ON_ROOT, .seq_show = cgroup_sane_behavior_show, }, + { + .name = "cgroup.controllers", + .flags = CFTYPE_ONLY_ON_DFL | CFTYPE_ONLY_ON_ROOT, + .seq_show = cgroup_root_controllers_show, + }, + { + .name = "cgroup.controllers", + .flags = CFTYPE_ONLY_ON_DFL | CFTYPE_NOT_ON_ROOT, + .seq_show = cgroup_controllers_show, + }, + { + .name = "cgroup.subtree_control", + .flags = CFTYPE_ONLY_ON_DFL, + .seq_show = cgroup_subtree_control_show, + .write_string = cgroup_subtree_control_write, + }, /* * Historical crazy stuff. These don't have "cgroup." prefix and @@ -3729,6 +4090,8 @@ static void offline_css(struct cgroup_subsys_state *css) css->flags &= ~CSS_ONLINE; css->cgroup->nr_css--; RCU_INIT_POINTER(css->cgroup->subsys[ss->id], NULL); + + wake_up_all(&css->cgroup->offline_waitq); } /** -- 1.9.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH 12/12] cgroup: implement dynamic subtree controller enable/disable on the default hierarchy @ 2014-04-14 21:37 ` Tejun Heo 0 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-04-14 21:37 UTC (permalink / raw) To: lizefan-hv44wF8Li93QT0dZR+AlfA Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo cgroup is switching away from multiple hierarchies and will use one unified default hierarchy where controllers can be dynamically enabled and disabled per subtree. The default hierarchy will serve as the unified hierarchy to which all controllers are attached and a css on the default hierarchy would need to also serve the tasks of descendant cgroups which don't have the controller enabled - ie. the tree may be collapsed from leaf towards root when viewed from specific controllers. This has been implemented through effective css in the previous patches. This patch finally implements dynamic subtree controller enable/disable on the default hierarchy via a new knob - "cgroup.subtree_control" which controls which controllers are enabled on the child cgroups. Let's assume a hierarchy like the following. root - A - B - C \ D root's "cgroup.subtree_control" determines which controllers are enabled on A. A's on B. B's on C and D. This coincides with the fact that controllers on the immediate sub-level are used to distribute the resources of the parent. In fact, it's natural to assume that resource control knobs of a child belong to its parent. Enabling a controller in "cgroup.subtree_control" declares that distribution of the respective resources of the cgroup will be controlled. Note that this means that controller enable states are shared among siblings. The default hierarchy has an extra restriction - only cgroups which don't contain any task may have controllers enabled in "cgroup.subtree_control". Combined with the other properties of the default hierarchy, this guarantees that, from the view point of controllers, tasks are only on the leaf cgroups. In other words, only leaf csses may contain tasks. This rules out situations where child cgroups compete against internal tasks of the parent, which is a competition between two different types of entities without any clear way to determine resource distribution between the two. Different controllers handle it differently and all the implemented behaviors are ambiguous, ad-hoc, cumbersome and/or just wrong. Having this structural constraints imposed from cgroup core removes the burden from controller implementations and enables showing one consistent behavior across all controllers. When a controller is enabled or disabled, css associations for the controller in the subtrees of each child should be updated. After enabling, the whole subtree of a child should point to the new css of the child. After disabling, the whole subtree of a child should point to the cgroup's css. This is implemented by first updating cgroup states such that cgroup_e_css() result points to the appropriate css and then invoking cgroup_update_dfl_csses() which migrates all tasks in the affected subtrees to the self cgroup on the default hierarchy. * When read, "cgroup.subtree_control" lists all the currently enabled controllers on the children of the cgroup. * White-space separated list of controller names prefixed with either '+' or '-' can be written to "cgroup.subtree_control". The ones prefixed with '+' are enabled on the controller and '-' disabled. * A controller can be enabled iff the parent's "cgroup.subtree_control" enables it and disabled iff no child's "cgroup.subtree_control" has it enabled. * If a cgroup has tasks, no controller can be enabled via "cgroup.subtree_control". Likewise, if "cgroup.subtree_control" has some controllers enabled, tasks can't be migrated into the cgroup. * All controllers which aren't bound on other hierarchies are automatically associated with the root cgroup of the default hierarchy. All the controllers which are bound to the default hierarchy are listed in the read-only file "cgroup.controllers" in the root directory. * "cgroup.controllers" in all non-root cgroups is read-only file whose content is equal to that of "cgroup.subtree_control" of the parent. This indicates which controllers can be used in the cgroup's "cgroup.subtree_control". This is still experimental and there are some holes, one of which is that ->can_attach() failure during cgroup_update_dfl_csses() may leave the cgroups in an undefined state. The issues will be addressed by future patches. v2: Non-root cgroups now also have "cgroup.controllers". Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> --- include/linux/cgroup.h | 5 + kernel/cgroup.c | 367 ++++++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 370 insertions(+), 2 deletions(-) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index c49d161..ada2392 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -21,6 +21,7 @@ #include <linux/percpu-refcount.h> #include <linux/seq_file.h> #include <linux/kernfs.h> +#include <linux/wait.h> #ifdef CONFIG_CGROUPS @@ -164,6 +165,7 @@ struct cgroup { struct cgroup *parent; /* my parent */ struct kernfs_node *kn; /* cgroup kernfs entry */ + struct kernfs_node *control_kn; /* kn for "cgroup.subtree_control" */ /* * Monotonically increasing unique serial number which defines a @@ -216,6 +218,9 @@ struct cgroup { /* For css percpu_ref killing and RCU-protected deletion */ struct rcu_head rcu_head; struct work_struct destroy_work; + + /* used to wait for offlining of csses */ + wait_queue_head_t offline_waitq; }; #define MAX_CGROUP_ROOT_NAMELEN 64 diff --git a/kernel/cgroup.c b/kernel/cgroup.c index c7b8509..08c4439 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -182,6 +182,8 @@ static int rebind_subsystems(struct cgroup_root *dst_root, unsigned long ss_mask); static void cgroup_destroy_css_killed(struct cgroup *cgrp); static int cgroup_destroy_locked(struct cgroup *cgrp); +static int create_css(struct cgroup *cgrp, struct cgroup_subsys *ss); +static void kill_css(struct cgroup_subsys_state *css); static int cgroup_addrm_files(struct cgroup *cgrp, struct cftype cfts[], bool is_add); static void cgroup_pidlist_destroy_all(struct cgroup *cgrp); @@ -338,6 +340,14 @@ static int notify_on_release(const struct cgroup *cgrp) #define for_each_root(root) \ list_for_each_entry((root), &cgroup_roots, root_list) +/* iterate over child cgrps, lock should be held throughout iteration */ +#define cgroup_for_each_live_child(child, cgrp) \ + list_for_each_entry((child), &(cgrp)->children, sibling) \ + if (({ lockdep_assert_held(&cgroup_tree_mutex); \ + cgroup_is_dead(child); })) \ + ; \ + else + /** * cgroup_lock_live_group - take cgroup_mutex and check that cgrp is alive. * @cgrp: the cgroup to be checked for liveness @@ -1450,6 +1460,8 @@ static void init_cgroup_housekeeping(struct cgroup *cgrp) for_each_subsys(ss, ssid) INIT_LIST_HEAD(&cgrp->e_csets[ssid]); + + init_waitqueue_head(&cgrp->offline_waitq); } static void init_cgroup_root(struct cgroup_root *root, @@ -1938,6 +1950,14 @@ static int cgroup_migrate_prepare_dst(struct cgroup *dst_cgrp, lockdep_assert_held(&cgroup_mutex); + /* + * Except for the root, child_subsys_mask must be zero for a cgroup + * with tasks so that child cgroups don't compete against tasks. + */ + if (dst_cgrp && cgroup_on_dfl(dst_cgrp) && dst_cgrp->parent && + dst_cgrp->child_subsys_mask) + return -EBUSY; + /* look up the dst cset for each src cset and link it to src */ list_for_each_entry_safe(src_cset, tmp_cset, preloaded_csets, mg_preload_node) { struct css_set *dst_cset; @@ -2303,6 +2323,326 @@ static int cgroup_sane_behavior_show(struct seq_file *seq, void *v) return 0; } +static void cgroup_print_ss_mask(struct seq_file *seq, unsigned int ss_mask) +{ + struct cgroup_subsys *ss; + bool printed = false; + int ssid; + + for_each_subsys(ss, ssid) { + if (ss_mask & (1 << ssid)) { + if (printed) + seq_putc(seq, ' '); + seq_printf(seq, "%s", ss->name); + printed = true; + } + } + if (printed) + seq_putc(seq, '\n'); +} + +/* show controllers which are currently attached to the default hierarchy */ +static int cgroup_root_controllers_show(struct seq_file *seq, void *v) +{ + struct cgroup *cgrp = seq_css(seq)->cgroup; + + cgroup_print_ss_mask(seq, cgrp->root->subsys_mask); + return 0; +} + +/* show controllers which are enabled from the parent */ +static int cgroup_controllers_show(struct seq_file *seq, void *v) +{ + struct cgroup *cgrp = seq_css(seq)->cgroup; + + cgroup_print_ss_mask(seq, cgrp->parent->child_subsys_mask); + return 0; +} + +/* show controllers which are enabled for a given cgroup's children */ +static int cgroup_subtree_control_show(struct seq_file *seq, void *v) +{ + struct cgroup *cgrp = seq_css(seq)->cgroup; + + cgroup_print_ss_mask(seq, cgrp->child_subsys_mask); + return 0; +} + +/** + * cgroup_update_dfl_csses - update css assoc of a subtree in default hierarchy + * @cgrp: root of the subtree to update csses for + * + * @cgrp's child_subsys_mask has changed and its subtree's (self excluded) + * css associations need to be updated accordingly. This function looks up + * all css_sets which are attached to the subtree, creates the matching + * updated css_sets and migrates the tasks to the new ones. + */ +static int cgroup_update_dfl_csses(struct cgroup *cgrp) +{ + LIST_HEAD(preloaded_csets); + struct cgroup_subsys_state *css; + struct css_set *src_cset; + int ret; + + lockdep_assert_held(&cgroup_tree_mutex); + lockdep_assert_held(&cgroup_mutex); + + /* look up all csses currently attached to @cgrp's subtree */ + down_read(&css_set_rwsem); + css_for_each_descendant_pre(css, cgroup_css(cgrp, NULL)) { + struct cgrp_cset_link *link; + + /* self is not affected by child_subsys_mask change */ + if (css->cgroup == cgrp) + continue; + + list_for_each_entry(link, &css->cgroup->cset_links, cset_link) + cgroup_migrate_add_src(link->cset, cgrp, + &preloaded_csets); + } + up_read(&css_set_rwsem); + + /* NULL dst indicates self on default hierarchy */ + ret = cgroup_migrate_prepare_dst(NULL, &preloaded_csets); + if (ret) + goto out_finish; + + list_for_each_entry(src_cset, &preloaded_csets, mg_preload_node) { + struct task_struct *last_task = NULL, *task; + + /* src_csets precede dst_csets, break on the first dst_cset */ + if (!src_cset->mg_src_cgrp) + break; + + /* + * All tasks in src_cset need to be migrated to the + * matching dst_cset. Empty it process by process. We + * walk tasks but migrate processes. The leader might even + * belong to a different cset but such src_cset would also + * be among the target src_csets because the default + * hierarchy enforces per-process membership. + */ + while (true) { + down_read(&css_set_rwsem); + task = list_first_entry_or_null(&src_cset->tasks, + struct task_struct, cg_list); + if (task) { + task = task->group_leader; + WARN_ON_ONCE(!task_css_set(task)->mg_src_cgrp); + get_task_struct(task); + } + up_read(&css_set_rwsem); + + if (!task) + break; + + /* guard against possible infinite loop */ + if (WARN(last_task == task, + "cgroup: update_dfl_csses failed to make progress, aborting in inconsistent state\n")) + goto out_finish; + last_task = task; + + threadgroup_lock(task); + /* raced against de_thread() from another thread? */ + if (!thread_group_leader(task)) { + threadgroup_unlock(task); + put_task_struct(task); + continue; + } + + ret = cgroup_migrate(src_cset->dfl_cgrp, task, true); + + threadgroup_unlock(task); + put_task_struct(task); + + if (WARN(ret, "cgroup: failed to update controllers for the default hierarchy (%d), further operations may crash or hang\n", ret)) + goto out_finish; + } + } + +out_finish: + cgroup_migrate_finish(&preloaded_csets); + return ret; +} + +/* change the enabled child controllers for a cgroup in the default hierarchy */ +static int cgroup_subtree_control_write(struct cgroup_subsys_state *dummy_css, + struct cftype *cft, char *buffer) +{ + unsigned long enable_req = 0, disable_req = 0, enable, disable; + struct cgroup *cgrp = dummy_css->cgroup, *child; + struct cgroup_subsys *ss; + char *tok, *p; + int ssid, ret; + + /* + * Parse input - white space separated list of subsystem names + * prefixed with either + or -. + */ + p = buffer; + while ((tok = strsep(&p, " \t\n"))) { + for_each_subsys(ss, ssid) { + if (ss->disabled || strcmp(tok + 1, ss->name)) + continue; + + if (*tok == '+') { + enable_req |= 1 << ssid; + disable_req &= ~(1 << ssid); + } else if (*tok == '-') { + disable_req |= 1 << ssid; + enable_req &= ~(1 << ssid); + } else { + return -EINVAL; + } + break; + } + if (ssid == CGROUP_SUBSYS_COUNT) + return -EINVAL; + } + + /* + * We're gonna grab cgroup_tree_mutex which nests outside kernfs + * active_ref. cgroup_lock_live_group() already provides enough + * protection. Ensure @cgrp stays accessible and break the + * active_ref protection. + */ + cgroup_get(cgrp); + kernfs_break_active_protection(cgrp->control_kn); +retry: + enable = enable_req; + disable = disable_req; + + mutex_lock(&cgroup_tree_mutex); + + for_each_subsys(ss, ssid) { + if (enable & (1 << ssid)) { + if (cgrp->child_subsys_mask & (1 << ssid)) { + enable &= ~(1 << ssid); + continue; + } + + /* + * Because css offlining is asynchronous, userland + * might try to re-enable the same controller while + * the previous instance is still around. In such + * cases, wait till it's gone using offline_waitq. + */ + cgroup_for_each_live_child(child, cgrp) { + wait_queue_t wait; + + if (!cgroup_css(child, ss)) + continue; + + prepare_to_wait(&child->offline_waitq, &wait, + TASK_UNINTERRUPTIBLE); + mutex_unlock(&cgroup_tree_mutex); + schedule(); + finish_wait(&child->offline_waitq, &wait); + goto retry; + } + + /* unavailable or not enabled on the parent? */ + if (!(cgrp_dfl_root.subsys_mask & (1 << ssid)) || + (cgrp->parent && + !(cgrp->parent->child_subsys_mask & (1 << ssid)))) { + ret = -ENOENT; + goto out_unlock_tree; + } + } else if (disable & (1 << ssid)) { + if (!(cgrp->child_subsys_mask & (1 << ssid))) { + disable &= ~(1 << ssid); + continue; + } + + /* a child has it enabled? */ + cgroup_for_each_live_child(child, cgrp) { + if (child->child_subsys_mask & (1 << ssid)) { + ret = -EBUSY; + goto out_unlock_tree; + } + } + } + } + + if (!enable && !disable) { + ret = 0; + goto out_unlock_tree; + } + + if (!cgroup_lock_live_group(cgrp)) { + ret = -ENODEV; + goto out_unlock_tree; + } + + /* + * Except for the root, child_subsys_mask must be zero for a cgroup + * with tasks so that child cgroups don't compete against tasks. + */ + if (enable && cgrp->parent && !list_empty(&cgrp->cset_links)) { + ret = -EBUSY; + goto out_unlock; + } + + /* + * Create csses for enables and update child_subsys_mask. This + * changes cgroup_e_css() results which in turn makes the + * subsequent cgroup_update_dfl_csses() associate all tasks in the + * subtree to the updated csses. + */ + for_each_subsys(ss, ssid) { + if (!(enable & (1 << ssid))) + continue; + + cgroup_for_each_live_child(child, cgrp) { + ret = create_css(child, ss); + if (ret) + goto err_undo_css; + } + } + + cgrp->child_subsys_mask |= enable; + cgrp->child_subsys_mask &= ~disable; + + ret = cgroup_update_dfl_csses(cgrp); + if (ret) + goto err_undo_css; + + /* all tasks are now migrated away from the old csses, kill them */ + for_each_subsys(ss, ssid) { + if (!(disable & (1 << ssid))) + continue; + + cgroup_for_each_live_child(child, cgrp) + kill_css(cgroup_css(child, ss)); + } + + kernfs_activate(cgrp->kn); + ret = 0; +out_unlock: + mutex_unlock(&cgroup_mutex); +out_unlock_tree: + mutex_unlock(&cgroup_tree_mutex); + kernfs_unbreak_active_protection(cgrp->control_kn); + cgroup_put(cgrp); + return ret; + +err_undo_css: + cgrp->child_subsys_mask &= ~enable; + cgrp->child_subsys_mask |= disable; + + for_each_subsys(ss, ssid) { + if (!(enable & (1 << ssid))) + continue; + + cgroup_for_each_live_child(child, cgrp) { + struct cgroup_subsys_state *css = cgroup_css(child, ss); + if (css) + kill_css(css); + } + } + goto out_unlock; +} + static ssize_t cgroup_file_write(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off) { @@ -2462,9 +2802,14 @@ static int cgroup_add_file(struct cgroup *cgrp, struct cftype *cft) return PTR_ERR(kn); ret = cgroup_kn_set_ugid(kn); - if (ret) + if (ret) { kernfs_remove(kn); - return ret; + return ret; + } + + if (cft->seq_show == cgroup_subtree_control_show) + cgrp->control_kn = kn; + return 0; } /** @@ -3561,6 +3906,22 @@ static struct cftype cgroup_base_files[] = { .flags = CFTYPE_ONLY_ON_ROOT, .seq_show = cgroup_sane_behavior_show, }, + { + .name = "cgroup.controllers", + .flags = CFTYPE_ONLY_ON_DFL | CFTYPE_ONLY_ON_ROOT, + .seq_show = cgroup_root_controllers_show, + }, + { + .name = "cgroup.controllers", + .flags = CFTYPE_ONLY_ON_DFL | CFTYPE_NOT_ON_ROOT, + .seq_show = cgroup_controllers_show, + }, + { + .name = "cgroup.subtree_control", + .flags = CFTYPE_ONLY_ON_DFL, + .seq_show = cgroup_subtree_control_show, + .write_string = cgroup_subtree_control_write, + }, /* * Historical crazy stuff. These don't have "cgroup." prefix and @@ -3729,6 +4090,8 @@ static void offline_css(struct cgroup_subsys_state *css) css->flags &= ~CSS_ONLINE; css->cgroup->nr_css--; RCU_INIT_POINTER(css->cgroup->subsys[ss->id], NULL); + + wake_up_all(&css->cgroup->offline_waitq); } /** -- 1.9.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* Re: [PATCH 12/12] cgroup: implement dynamic subtree controller enable/disable on the default hierarchy [not found] ` <1397511430-2673-13-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> @ 2014-04-17 18:03 ` Raghavendra KT 0 siblings, 0 replies; 63+ messages in thread From: Raghavendra KT @ 2014-04-17 18:03 UTC (permalink / raw) To: Tejun Heo; +Cc: lizefan, containers, cgroups, Linux Kernel Mailing List On Tue, Apr 15, 2014 at 3:07 AM, Tejun Heo <tj@kernel.org> wrote: [...] > * White-space separated list of controller names prefixed with either > '+' or '-' can be written to "cgroup.subtree_control". The ones > prefixed with '+' are enabled on the controller and '-' disabled. > [...] > + > +/* change the enabled child controllers for a cgroup in the default hierarchy */ > +static int cgroup_subtree_control_write(struct cgroup_subsys_state *dummy_css, > + struct cftype *cft, char *buffer) > +{ > + unsigned long enable_req = 0, disable_req = 0, enable, disable; > + struct cgroup *cgrp = dummy_css->cgroup, *child; > + struct cgroup_subsys *ss; > + char *tok, *p; > + int ssid, ret; > + > + /* > + * Parse input - white space separated list of subsystem names > + * prefixed with either + or -. > + */ > + p = buffer; > + while ((tok = strsep(&p, " \t\n"))) { > + for_each_subsys(ss, ssid) { > + if (ss->disabled || strcmp(tok + 1, ss->name)) > + continue; > + > + if (*tok == '+') { > + enable_req |= 1 << ssid; > + disable_req &= ~(1 << ssid); > + } else if (*tok == '-') { > + disable_req |= 1 << ssid; > + enable_req &= ~(1 << ssid); > + } else { > + return -EINVAL; > + } > + break; > + } > + if (ssid == CGROUP_SUBSYS_COUNT) > + return -EINVAL; > + } I undertsand that with the above parsing we could do echo "-blkio +blkio" > cgroup.subtree_control and honor last enable/disable. It confused me while testing. do you think we should return -EINVAL in such case? ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH 12/12] cgroup: implement dynamic subtree controller enable/disable on the default hierarchy @ 2014-04-17 18:03 ` Raghavendra KT 0 siblings, 0 replies; 63+ messages in thread From: Raghavendra KT @ 2014-04-17 18:03 UTC (permalink / raw) To: Tejun Heo Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, cgroups-u79uwXL29TY76Z2rM5mHXA, Linux Kernel Mailing List On Tue, Apr 15, 2014 at 3:07 AM, Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote: [...] > * White-space separated list of controller names prefixed with either > '+' or '-' can be written to "cgroup.subtree_control". The ones > prefixed with '+' are enabled on the controller and '-' disabled. > [...] > + > +/* change the enabled child controllers for a cgroup in the default hierarchy */ > +static int cgroup_subtree_control_write(struct cgroup_subsys_state *dummy_css, > + struct cftype *cft, char *buffer) > +{ > + unsigned long enable_req = 0, disable_req = 0, enable, disable; > + struct cgroup *cgrp = dummy_css->cgroup, *child; > + struct cgroup_subsys *ss; > + char *tok, *p; > + int ssid, ret; > + > + /* > + * Parse input - white space separated list of subsystem names > + * prefixed with either + or -. > + */ > + p = buffer; > + while ((tok = strsep(&p, " \t\n"))) { > + for_each_subsys(ss, ssid) { > + if (ss->disabled || strcmp(tok + 1, ss->name)) > + continue; > + > + if (*tok == '+') { > + enable_req |= 1 << ssid; > + disable_req &= ~(1 << ssid); > + } else if (*tok == '-') { > + disable_req |= 1 << ssid; > + enable_req &= ~(1 << ssid); > + } else { > + return -EINVAL; > + } > + break; > + } > + if (ssid == CGROUP_SUBSYS_COUNT) > + return -EINVAL; > + } I undertsand that with the above parsing we could do echo "-blkio +blkio" > cgroup.subtree_control and honor last enable/disable. It confused me while testing. do you think we should return -EINVAL in such case? ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH 12/12] cgroup: implement dynamic subtree controller enable/disable on the default hierarchy [not found] ` <CAC4Lta0tD=FbtVpGnLw2sKMY69+AnqYqaOvCQdFs88JeR5Pemw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2014-04-18 20:41 ` Tejun Heo 0 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-04-18 20:41 UTC (permalink / raw) To: Raghavendra KT; +Cc: lizefan, containers, cgroups, Linux Kernel Mailing List Hello, On Thu, Apr 17, 2014 at 11:33:50PM +0530, Raghavendra KT wrote: > On Tue, Apr 15, 2014 at 3:07 AM, Tejun Heo <tj@kernel.org> wrote: > I undertsand that with the above parsing we could do > echo "-blkio +blkio" > cgroup.subtree_control and honor last enable/disable. Yeah, and the behavior is described in the following document. http://lkml.kernel.org/g/20140416145248.GD1257@htj.dyndns.org > It confused me while testing. do you think we should return -EINVAL in > such case? Hmmm... I think whatever comes later wins is a pretty clear and easy rule, no? -- tejun ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH 12/12] cgroup: implement dynamic subtree controller enable/disable on the default hierarchy @ 2014-04-18 20:41 ` Tejun Heo 0 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-04-18 20:41 UTC (permalink / raw) To: Raghavendra KT Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, cgroups-u79uwXL29TY76Z2rM5mHXA, Linux Kernel Mailing List Hello, On Thu, Apr 17, 2014 at 11:33:50PM +0530, Raghavendra KT wrote: > On Tue, Apr 15, 2014 at 3:07 AM, Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote: > I undertsand that with the above parsing we could do > echo "-blkio +blkio" > cgroup.subtree_control and honor last enable/disable. Yeah, and the behavior is described in the following document. http://lkml.kernel.org/g/20140416145248.GD1257-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org > It confused me while testing. do you think we should return -EINVAL in > such case? Hmmm... I think whatever comes later wins is a pretty clear and easy rule, no? -- tejun ^ permalink raw reply [flat|nested] 63+ messages in thread
[parent not found: <20140418204108.GL23576-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>]
* Re: [PATCH 12/12] cgroup: implement dynamic subtree controller enable/disable on the default hierarchy 2014-04-18 20:41 ` Tejun Heo @ 2014-04-21 8:17 ` Raghavendra K T -1 siblings, 0 replies; 63+ messages in thread From: Raghavendra K T @ 2014-04-21 8:17 UTC (permalink / raw) To: Tejun Heo Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Linux Kernel Mailing List On 04/19/2014 02:11 AM, Tejun Heo wrote: > Hello, > > On Thu, Apr 17, 2014 at 11:33:50PM +0530, Raghavendra KT wrote: >> On Tue, Apr 15, 2014 at 3:07 AM, Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote: >> I undertsand that with the above parsing we could do >> echo "-blkio +blkio" > cgroup.subtree_control and honor last enable/disable. > > Yeah, and the behavior is described in the following document. > Hello Tejun, Thanks for the kind reply. The document has helped me a lot while reviewing,testing the unified hierarchy code. > http://lkml.kernel.org/g/20140416145248.GD1257-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org > >> It confused me while testing. do you think we should return -EINVAL in >> such case? > > Hmmm... I think whatever comes later wins is a pretty clear and easy > rule, no? Very much correct. and if there is a feeling to check multiple entry later I would happy to update the code. ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH 12/12] cgroup: implement dynamic subtree controller enable/disable on the default hierarchy @ 2014-04-21 8:17 ` Raghavendra K T 0 siblings, 0 replies; 63+ messages in thread From: Raghavendra K T @ 2014-04-21 8:17 UTC (permalink / raw) To: Tejun Heo; +Cc: lizefan, containers, cgroups, Linux Kernel Mailing List On 04/19/2014 02:11 AM, Tejun Heo wrote: > Hello, > > On Thu, Apr 17, 2014 at 11:33:50PM +0530, Raghavendra KT wrote: >> On Tue, Apr 15, 2014 at 3:07 AM, Tejun Heo <tj@kernel.org> wrote: >> I undertsand that with the above parsing we could do >> echo "-blkio +blkio" > cgroup.subtree_control and honor last enable/disable. > > Yeah, and the behavior is described in the following document. > Hello Tejun, Thanks for the kind reply. The document has helped me a lot while reviewing,testing the unified hierarchy code. > http://lkml.kernel.org/g/20140416145248.GD1257@htj.dyndns.org > >> It confused me while testing. do you think we should return -EINVAL in >> such case? > > Hmmm... I think whatever comes later wins is a pretty clear and easy > rule, no? Very much correct. and if there is a feeling to check multiple entry later I would happy to update the code. ^ permalink raw reply [flat|nested] 63+ messages in thread
[parent not found: <CAC4Lta0tD=FbtVpGnLw2sKMY69+AnqYqaOvCQdFs88JeR5Pemw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH 12/12] cgroup: implement dynamic subtree controller enable/disable on the default hierarchy [not found] ` <CAC4Lta0tD=FbtVpGnLw2sKMY69+AnqYqaOvCQdFs88JeR5Pemw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2014-04-18 20:41 ` Tejun Heo 0 siblings, 0 replies; 63+ messages in thread From: Tejun Heo @ 2014-04-18 20:41 UTC (permalink / raw) To: Raghavendra KT Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Linux Kernel Mailing List Hello, On Thu, Apr 17, 2014 at 11:33:50PM +0530, Raghavendra KT wrote: > On Tue, Apr 15, 2014 at 3:07 AM, Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote: > I undertsand that with the above parsing we could do > echo "-blkio +blkio" > cgroup.subtree_control and honor last enable/disable. Yeah, and the behavior is described in the following document. http://lkml.kernel.org/g/20140416145248.GD1257-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org > It confused me while testing. do you think we should return -EINVAL in > such case? Hmmm... I think whatever comes later wins is a pretty clear and easy rule, no? -- tejun ^ permalink raw reply [flat|nested] 63+ messages in thread
[parent not found: <1397511430-2673-13-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>]
* Re: [PATCH 12/12] cgroup: implement dynamic subtree controller enable/disable on the default hierarchy [not found] ` <1397511430-2673-13-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> @ 2014-04-17 18:03 ` Raghavendra KT 0 siblings, 0 replies; 63+ messages in thread From: Raghavendra KT @ 2014-04-17 18:03 UTC (permalink / raw) To: Tejun Heo Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Linux Kernel Mailing List On Tue, Apr 15, 2014 at 3:07 AM, Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote: [...] > * White-space separated list of controller names prefixed with either > '+' or '-' can be written to "cgroup.subtree_control". The ones > prefixed with '+' are enabled on the controller and '-' disabled. > [...] > + > +/* change the enabled child controllers for a cgroup in the default hierarchy */ > +static int cgroup_subtree_control_write(struct cgroup_subsys_state *dummy_css, > + struct cftype *cft, char *buffer) > +{ > + unsigned long enable_req = 0, disable_req = 0, enable, disable; > + struct cgroup *cgrp = dummy_css->cgroup, *child; > + struct cgroup_subsys *ss; > + char *tok, *p; > + int ssid, ret; > + > + /* > + * Parse input - white space separated list of subsystem names > + * prefixed with either + or -. > + */ > + p = buffer; > + while ((tok = strsep(&p, " \t\n"))) { > + for_each_subsys(ss, ssid) { > + if (ss->disabled || strcmp(tok + 1, ss->name)) > + continue; > + > + if (*tok == '+') { > + enable_req |= 1 << ssid; > + disable_req &= ~(1 << ssid); > + } else if (*tok == '-') { > + disable_req |= 1 << ssid; > + enable_req &= ~(1 << ssid); > + } else { > + return -EINVAL; > + } > + break; > + } > + if (ssid == CGROUP_SUBSYS_COUNT) > + return -EINVAL; > + } I undertsand that with the above parsing we could do echo "-blkio +blkio" > cgroup.subtree_control and honor last enable/disable. It confused me while testing. do you think we should return -EINVAL in such case? ^ permalink raw reply [flat|nested] 63+ messages in thread
end of thread, other threads:[~2014-05-05 16:37 UTC | newest] Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-03-28 2:40 [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy Tejun Heo 2014-03-28 2:40 ` Tejun Heo 2014-03-28 2:40 ` [PATCH 04/12] cgroup: make css_next_child() skip missing csses Tejun Heo 2014-03-28 2:40 ` [PATCH 06/12] cgroup: teach css_task_iter about effective csses Tejun Heo 2014-03-28 2:40 ` [PATCH 10/12] cgroup: update subsystem rebind restrictions Tejun Heo 2014-03-28 2:41 ` [PATCH 11/12] cgroup: prepare migration path for unified hierarchy Tejun Heo 2014-03-28 2:41 ` [PATCH 12/12] cgroup: implement dynamic subtree controller enable/disable on the default hierarchy Tejun Heo 2014-03-28 2:41 ` Tejun Heo [not found] ` <1395974461-12735-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> 2014-03-28 2:40 ` [PATCH 01/12] cgroup: update cgroup->subsys_mask to ->child_subsys_mask and restore cgroup_root->subsys_mask Tejun Heo 2014-03-28 2:40 ` Tejun Heo 2014-03-28 2:40 ` [PATCH 02/12] cgroup: introduce effective cgroup_subsys_state Tejun Heo 2014-03-28 2:40 ` Tejun Heo 2014-03-28 2:40 ` [PATCH 03/12] cgroup: implement cgroup->e_csets[] Tejun Heo 2014-03-28 2:40 ` Tejun Heo 2014-03-28 2:40 ` [PATCH 04/12] cgroup: make css_next_child() skip missing csses Tejun Heo 2014-03-28 2:40 ` [PATCH 05/12] cgroup: reorganize css_task_iter Tejun Heo 2014-03-28 2:40 ` Tejun Heo 2014-03-28 2:40 ` [PATCH 06/12] cgroup: teach css_task_iter about effective csses Tejun Heo 2014-03-28 2:40 ` [PATCH 07/12] cgroup: cgroup->subsys[] should be cleared after the css is offlined Tejun Heo 2014-03-28 2:40 ` Tejun Heo 2014-03-28 2:40 ` [PATCH 08/12] cgroup: allow cgroup creation and suppress automatic css creation in the unified hierarchy Tejun Heo 2014-03-28 2:40 ` Tejun Heo 2014-03-28 2:40 ` [PATCH 09/12] cgroup: add css_set->dfl_cgrp Tejun Heo 2014-03-28 2:40 ` Tejun Heo 2014-03-28 2:40 ` [PATCH 10/12] cgroup: update subsystem rebind restrictions Tejun Heo 2014-03-28 2:41 ` [PATCH 11/12] cgroup: prepare migration path for unified hierarchy Tejun Heo 2014-03-28 2:41 ` [PATCH 12/12] cgroup: implement dynamic subtree controller enable/disable on the default hierarchy Tejun Heo 2014-04-14 15:45 ` [PATCHSET cgroup/for-3.15] cgroup: implement unified hierarchy Vivek Goyal 2014-04-14 15:45 ` Vivek Goyal [not found] ` <20140414154556.GA9552-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2014-04-14 17:52 ` Tejun Heo 2014-04-14 17:52 ` Tejun Heo [not found] ` <20140414175236.GB15249-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org> 2014-04-14 18:14 ` Vivek Goyal 2014-04-14 18:14 ` Vivek Goyal [not found] ` <20140414181421.GC9552-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2014-04-14 19:21 ` Tejun Heo 2014-04-14 19:21 ` Tejun Heo [not found] ` <20140414192139.GB16835-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org> 2014-04-15 2:58 ` Li Zefan 2014-04-15 2:58 ` Li Zefan [not found] ` <534CA050.8060709-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> 2014-04-15 4:57 ` Tejun Heo 2014-04-15 4:57 ` Tejun Heo 2014-04-30 10:57 ` Raghavendra KT 2014-04-30 10:57 ` Raghavendra KT [not found] ` <CAC4Lta0qd2vW4ENSOpzK+0i06MorzfmsZZZuW9xUtpZ_gWA9gA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2014-05-02 15:12 ` Tejun Heo 2014-05-02 15:12 ` Tejun Heo 2014-05-05 16:37 ` Raghavendra KT 2014-05-05 16:37 ` Raghavendra KT 2014-04-14 21:31 ` Tejun Heo 2014-04-14 21:31 ` Tejun Heo 2014-04-15 22:05 ` [PATCH 0.5/12] cgroup: cgroup_apply_cftypes() shouldn't skip the default hierarhcy Tejun Heo 2014-04-15 22:05 ` Tejun Heo 2014-04-15 22:05 ` Tejun Heo [not found] ` <20140415220504.GA13099-9pTldWuhBndy/B6EtB590w@public.gmane.org> 2014-04-15 22:05 ` Tejun Heo 2014-04-15 22:05 ` Tejun Heo 2014-04-14 21:36 [PATCHSET cgroup/for-3.16] cgroup: implement unified hierarchy, v2 Tejun Heo [not found] ` <1397511430-2673-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> 2014-04-14 21:37 ` [PATCH 12/12] cgroup: implement dynamic subtree controller enable/disable on the default hierarchy Tejun Heo 2014-04-14 21:37 ` Tejun Heo 2014-04-14 21:37 ` Tejun Heo 2014-04-17 18:03 ` Raghavendra KT 2014-04-17 18:03 ` Raghavendra KT 2014-04-18 20:41 ` Tejun Heo 2014-04-18 20:41 ` Tejun Heo [not found] ` <20140418204108.GL23576-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org> 2014-04-21 8:17 ` Raghavendra K T 2014-04-21 8:17 ` Raghavendra K T [not found] ` <CAC4Lta0tD=FbtVpGnLw2sKMY69+AnqYqaOvCQdFs88JeR5Pemw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2014-04-18 20:41 ` Tejun Heo [not found] ` <1397511430-2673-13-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> 2014-04-17 18:03 ` Raghavendra KT
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.