All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] cgroup: Introducing bypass mode
@ 2017-07-21 20:34 ` Waiman Long
  0 siblings, 0 replies; 24+ messages in thread
From: Waiman Long @ 2017-07-21 20:34 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, kernel-team, pjt, luto, efault, torvalds,
	guro, Waiman Long

 v1->v2:
  - Remove relax no-internal-process constraint patch as this feature
    is in the thread mode v4 patch.
  - Remove subtree root mode patch.
  - Remove the skip dying css patch as I can no longer reproduce the
    problem.
  - Rework the bypass mode so that write to "cgroup.controllers"
    to enable or disable controller interface files is only allowed
    if the parent grants bypass mode to children by writing the
    '#'-prefixed controller to "cgroup.subtree_control".
  - Add a patch to disable subdirectory creation on an invalid domain.

 v1 patch - https://lkml.org/lkml/2017/6/14/551

This patchset introduces new capability to the cgroup v2 core to give
more freedom and flexibility to controllers so that they can shape
their own unique views of the virtual cgroup hierarchies that can
best suit thier own use cases.

This patchset is layered on top of the "review-cgroup2-cpu-on-v4"
branch of Tejun's cgroup git tree.

Patch 1 disables subdirectory creation when a cgroup is an invalid
domain.

Patch 2 introduces a new bypass mode that allows a controller to
be disabled in a cgroup, but re-enabled again in its children. This
is enabled by writing the controller name prefixed with '#' to the
"cgroup.subtree_control" file. Then all its children will have this
controller in bypass mode.

Patch 3 extends the bypass mode mechanism to allow those child cgroups
that are put into the bypass mode by their parent to re-enable the
controller by writing the controller name with the '+' prefix to the
"cgroup.controllers" file if they choose to. So setting bypass mode
in "cgroup.subtree_control" effectively delegates the authority to
enable or disable a controller to its children.

Patch 4 extends the debug controller to expose additional controller
masks introduced by this patchset.

Waiman Long (4):
  cgroup: Child cgroup creation not allowed on invalid domain
  cgroup: Allow bypass mode in subtree_control
  cgroup: Allow reenabling of controller in bypass mode
  cgroup: Make debug controller report new controller masks

 Documentation/cgroup-v2.txt |  90 +++++++++++++---
 include/linux/cgroup-defs.h |  19 +++-
 kernel/cgroup/cgroup.c      | 251 +++++++++++++++++++++++++++++++++++---------
 kernel/cgroup/debug.c       |   2 +
 4 files changed, 288 insertions(+), 74 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 0/4] cgroup: Introducing bypass mode
@ 2017-07-21 20:34 ` Waiman Long
  0 siblings, 0 replies; 24+ messages in thread
From: Waiman Long @ 2017-07-21 20:34 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg,
	pjt-hpIqsD4AKlfQT0dZR+AlfA, luto-kltTT9wpgjJwATOyAt5JVQ,
	efault-Mmb7MZpHnFY, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	guro-b10kYP2dOMg, Waiman Long

 v1->v2:
  - Remove relax no-internal-process constraint patch as this feature
    is in the thread mode v4 patch.
  - Remove subtree root mode patch.
  - Remove the skip dying css patch as I can no longer reproduce the
    problem.
  - Rework the bypass mode so that write to "cgroup.controllers"
    to enable or disable controller interface files is only allowed
    if the parent grants bypass mode to children by writing the
    '#'-prefixed controller to "cgroup.subtree_control".
  - Add a patch to disable subdirectory creation on an invalid domain.

 v1 patch - https://lkml.org/lkml/2017/6/14/551

This patchset introduces new capability to the cgroup v2 core to give
more freedom and flexibility to controllers so that they can shape
their own unique views of the virtual cgroup hierarchies that can
best suit thier own use cases.

This patchset is layered on top of the "review-cgroup2-cpu-on-v4"
branch of Tejun's cgroup git tree.

Patch 1 disables subdirectory creation when a cgroup is an invalid
domain.

Patch 2 introduces a new bypass mode that allows a controller to
be disabled in a cgroup, but re-enabled again in its children. This
is enabled by writing the controller name prefixed with '#' to the
"cgroup.subtree_control" file. Then all its children will have this
controller in bypass mode.

Patch 3 extends the bypass mode mechanism to allow those child cgroups
that are put into the bypass mode by their parent to re-enable the
controller by writing the controller name with the '+' prefix to the
"cgroup.controllers" file if they choose to. So setting bypass mode
in "cgroup.subtree_control" effectively delegates the authority to
enable or disable a controller to its children.

Patch 4 extends the debug controller to expose additional controller
masks introduced by this patchset.

Waiman Long (4):
  cgroup: Child cgroup creation not allowed on invalid domain
  cgroup: Allow bypass mode in subtree_control
  cgroup: Allow reenabling of controller in bypass mode
  cgroup: Make debug controller report new controller masks

 Documentation/cgroup-v2.txt |  90 +++++++++++++---
 include/linux/cgroup-defs.h |  19 +++-
 kernel/cgroup/cgroup.c      | 251 +++++++++++++++++++++++++++++++++++---------
 kernel/cgroup/debug.c       |   2 +
 4 files changed, 288 insertions(+), 74 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 1/4] cgroup: Child cgroup creation not allowed on invalid domain
@ 2017-07-21 20:34   ` Waiman Long
  0 siblings, 0 replies; 24+ messages in thread
From: Waiman Long @ 2017-07-21 20:34 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, kernel-team, pjt, luto, efault, torvalds,
	guro, Waiman Long

When thread mode is used, it is possible that some cgroups may be
in an invalid state. Currently users may not be aware that they are
invalid until they try to migrate tasks over. This patch disallows
child cgroup creation on invalid domain. This adds one more failure
point in reminding users that they are dealing with invalid domains.
It also minimizes the number of invalid domains outstanding as much
as possible.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/cgroup/cgroup.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index e9a377d..5fc8133 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -4664,6 +4664,14 @@ int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name, umode_t mode)
 	if (!parent)
 		return -ENODEV;
 
+	/*
+	 * New cgroup creation isn't allowed on an invalid domain parent.
+	 */
+	if (!cgroup_is_threaded(parent) && !cgroup_is_valid_domain(parent)) {
+		ret = -EOPNOTSUPP;
+		goto out_unlock;
+	}
+
 	cgrp = cgroup_create(parent);
 	if (IS_ERR(cgrp)) {
 		ret = PTR_ERR(cgrp);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 1/4] cgroup: Child cgroup creation not allowed on invalid domain
@ 2017-07-21 20:34   ` Waiman Long
  0 siblings, 0 replies; 24+ messages in thread
From: Waiman Long @ 2017-07-21 20:34 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg,
	pjt-hpIqsD4AKlfQT0dZR+AlfA, luto-kltTT9wpgjJwATOyAt5JVQ,
	efault-Mmb7MZpHnFY, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	guro-b10kYP2dOMg, Waiman Long

When thread mode is used, it is possible that some cgroups may be
in an invalid state. Currently users may not be aware that they are
invalid until they try to migrate tasks over. This patch disallows
child cgroup creation on invalid domain. This adds one more failure
point in reminding users that they are dealing with invalid domains.
It also minimizes the number of invalid domains outstanding as much
as possible.

Signed-off-by: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 kernel/cgroup/cgroup.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index e9a377d..5fc8133 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -4664,6 +4664,14 @@ int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name, umode_t mode)
 	if (!parent)
 		return -ENODEV;
 
+	/*
+	 * New cgroup creation isn't allowed on an invalid domain parent.
+	 */
+	if (!cgroup_is_threaded(parent) && !cgroup_is_valid_domain(parent)) {
+		ret = -EOPNOTSUPP;
+		goto out_unlock;
+	}
+
 	cgrp = cgroup_create(parent);
 	if (IS_ERR(cgrp)) {
 		ret = PTR_ERR(cgrp);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 2/4] cgroup: Allow bypass mode in subtree_control
  2017-07-21 20:34 ` Waiman Long
  (?)
  (?)
@ 2017-07-21 20:34 ` Waiman Long
  2017-07-22 13:50     ` Tejun Heo
  -1 siblings, 1 reply; 24+ messages in thread
From: Waiman Long @ 2017-07-21 20:34 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, kernel-team, pjt, luto, efault, torvalds,
	guro, Waiman Long

The special prefix '#' attached to a controller name can now be written
into the cgroup.subtree_control file to set that controller in bypass
mode in all the child cgroups. The controller will show up in the
children's cgroup.controllers file, but the corresponding control knobs
will be absent. However, that controller can be enabled or bypassed
in its children by writing to their respective subtree_control files.

This mode can be useful to non-domain controllers or controllers where
there are costs to each additional layer of hierarchy. This mode will
also allow more freedom in how each controller can shape its effective
hierarchy independent of each others.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/cgroup-v2.txt |  67 +++++++++++++++++-----
 include/linux/cgroup-defs.h |  12 ++--
 kernel/cgroup/cgroup.c      | 136 ++++++++++++++++++++++++++++----------------
 3 files changed, 145 insertions(+), 70 deletions(-)

diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
index 43f9811..f17a74b 100644
--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -24,8 +24,9 @@ v1 is available under Documentation/cgroup-v1/.
      2-3. [Un]populated Notification
      2-4. Controlling Controllers
        2-4-1. Enabling and Disabling
-       2-4-2. Top-down Constraint
-       2-4-3. No Internal Process Constraint
+       2-4-2. Cgroup Hierarchy
+       2-4-3. Top-down Constraint
+       2-4-4. No Internal Process Constraint
      2-5. Delegation
        2-5-1. Model of Delegation
        2-5-2. Delegation Containment
@@ -362,10 +363,15 @@ disabled by writing to the "cgroup.subtree_control" file::
 
   # echo "+cpu +memory -io" > cgroup.subtree_control
 
+The prefixes '+', '-' and '#' are used to enable, disable or put a
+controller in the bypass mode respectively.  In the bypass mode, a
+controller is disabled in a cgroup, but it can be enabled again in its
+child cgroups as it will still be listed in "cgroup.controllers".
+
 Only controllers which are listed in "cgroup.controllers" can be
-enabled.  When multiple operations are specified as above, either they
-all succeed or fail.  If multiple operations on the same controller
-are specified, the last one is effective.
+enabled or bypassed.  When multiple operations are specified as above,
+either they all succeed or fail.  If multiple operations on the same
+controller are specified, the last one is effective.
 
 Enabling a controller in a cgroup indicates that the distribution of
 the target resource across its immediate children will be controlled.
@@ -390,16 +396,47 @@ controller interface files - anything which doesn't start with
 "cgroup." are owned by the parent rather than the cgroup itself.
 
 
+Cgroup Hierarchy
+~~~~~~~~~~~~~~~~
+
+The hierarchy as seen from each controller's perspective can be different
+from that of another controller depending on which controllers are enabled
+or disabled in various nodes within the full cgroup hierarchy.
+
+In the example below::
+
+  A(+) - B(#) - C(#) - D(#) - E(+) - F(-)
+              \ G(+)
+
+The '+', '-' and '#' markers in parenthesis indicate a controller in
+enabled, disabled or bypass mode respectively.
+
+Disabling a controller in a cgroup effectively collapses it to its
+parent from that controller's point of view.  In this case, the
+effective hierarchy as seen from that particular controller will be::
+
+  A|B|C|D - E|F
+          \ G
+
+There are three effective node clusters. The top cluster contains
+(A, B, C, D) with two leaf clusters (E, F) and (G).
+
+Disabling controllers from the leaves up and using bypass mode in
+the middle layers allow controllers to have their own unique views
+of the cgroup hierarchy that can best fit their own need.
+
+
 Top-down Constraint
 ~~~~~~~~~~~~~~~~~~~
 
 Resources are distributed top-down and a cgroup can further distribute
 a resource only if the resource has been distributed to it from the
 parent.  This means that all non-root "cgroup.subtree_control" files
-can only contain controllers which are enabled in the parent's
-"cgroup.subtree_control" file.  A controller can be enabled only if
-the parent has the controller enabled and a controller can't be
-disabled if one or more children have it enabled.
+can only contain controllers which are enabled or bypassed in the parent's
+"cgroup.subtree_control" file.  A controller can be enabled or bypassed
+only if the parent has the controller enabled or bypassed and the
+state of a controller can't be changed if one or more children have
+it enabled or bypassed.
 
 
 No Internal Process Constraint
@@ -836,12 +873,12 @@ All cgroup core files are prefixed with "cgroup."
 	which are enabled to control resource distribution from the
 	cgroup to its children.
 
-	Space separated list of controllers prefixed with '+' or '-'
-	can be written to enable or disable controllers.  A controller
-	name prefixed with '+' enables the controller and '-'
-	disables.  If a controller appears more than once on the list,
-	the last one is effective.  When multiple enable and disable
-	operations are specified, either all succeed or all fail.
+	Space separated list of controllers prefixed with '+', '-' or
+	'#' can be written to enable, disable or bypass controllers
+	respectively.  If a controller appears more than once on
+	the list, the last one is effective.  When multiple enable,
+	disable or bypass operations are specified, either all succeed
+	or all fail.
 
   cgroup.events
 	A read-only flat-keyed file which exists on non-root cgroups.
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 9d74195..3cac6d0 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -295,16 +295,18 @@ struct cgroup {
 	struct cgroup_file events_file;	/* handle for "cgroup.events" */
 
 	/*
-	 * The bitmask of subsystems enabled on the child cgroups.
-	 * ->subtree_control is the one configured through
-	 * "cgroup.subtree_control" while ->child_ss_mask is the effective
-	 * one which may have more subsystems enabled.  Controller knobs
-	 * are made available iff it's enabled in ->subtree_control.
+	 * The bitmask of subsystems enabled or bypassed on the child cgroups.
+	 * ->subtree_control and ->subtree_bypass are the one configured
+	 * through "cgroup.subtree_control" while ->subtree_ss_mask is the
+	 * effective one which may have more subsystems enabled.  Controller
+	 * knobs are made available iff it's enabled in ->subtree_ss_mask.
 	 */
 	u16 subtree_control;
 	u16 subtree_ss_mask;
+	u16 subtree_bypass;
 	u16 old_subtree_control;
 	u16 old_subtree_ss_mask;
+	u16 old_subtree_bypass;
 
 	/* Private pointers for each registered subsystem */
 	struct cgroup_subsys_state __rcu *subsys[CGROUP_SUBSYS_COUNT];
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 5fc8133..1e7feae 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -365,7 +365,8 @@ static bool cgroup_can_be_thread_root(struct cgroup *cgrp)
 		return false;
 
 	/* and no domain controllers can be enabled */
-	if (cgrp->subtree_control & ~cgrp_dfl_threaded_ss_mask)
+	if ((cgrp->subtree_control|cgrp->subtree_bypass) &
+	    ~cgrp_dfl_threaded_ss_mask)
 		return false;
 
 	return true;
@@ -387,7 +388,8 @@ bool cgroup_is_thread_root(struct cgroup *cgrp)
 	 * enabled is a thread root.
 	 */
 	if (cgroup_has_tasks(cgrp) &&
-	    (cgrp->subtree_control & cgrp_dfl_threaded_ss_mask))
+	   ((cgrp->subtree_control|cgrp->subtree_bypass)
+	   & cgrp_dfl_threaded_ss_mask))
 		return true;
 
 	return false;
@@ -412,7 +414,7 @@ static bool cgroup_is_valid_domain(struct cgroup *cgrp)
 }
 
 /* subsystems visibly enabled on a cgroup */
-static u16 cgroup_control(struct cgroup *cgrp)
+static u16 cgroup_control(struct cgroup *cgrp, bool show_bypass)
 {
 	struct cgroup *parent = cgroup_parent(cgrp);
 	u16 root_ss_mask = cgrp->root->subsys_mask;
@@ -420,6 +422,9 @@ static u16 cgroup_control(struct cgroup *cgrp)
 	if (parent) {
 		u16 ss_mask = parent->subtree_control;
 
+		if (show_bypass)
+			ss_mask |= parent->subtree_bypass;
+
 		/* threaded cgroups can only have threaded controllers */
 		if (cgroup_is_threaded(cgrp))
 			ss_mask &= cgrp_dfl_threaded_ss_mask;
@@ -433,13 +438,17 @@ static u16 cgroup_control(struct cgroup *cgrp)
 }
 
 /* subsystems enabled on a cgroup */
-static u16 cgroup_ss_mask(struct cgroup *cgrp)
+static u16 cgroup_ss_mask(struct cgroup *cgrp, bool show_bypass)
 {
 	struct cgroup *parent = cgroup_parent(cgrp);
 
 	if (parent) {
 		u16 ss_mask = parent->subtree_ss_mask;
 
+
+		if (show_bypass)
+			ss_mask |= parent->subtree_bypass;
+
 		/* threaded cgroups can only have threaded controllers */
 		if (cgroup_is_threaded(cgrp))
 			ss_mask &= cgrp_dfl_threaded_ss_mask;
@@ -492,7 +501,7 @@ static struct cgroup_subsys_state *cgroup_e_css(struct cgroup *cgrp,
 	 * This function is used while updating css associations and thus
 	 * can't test the csses directly.  Test ss_mask.
 	 */
-	while (!(cgroup_ss_mask(cgrp) & (1 << ss->id))) {
+	while (!(cgroup_ss_mask(cgrp, false) & (1 << ss->id))) {
 		cgrp = cgroup_parent(cgrp);
 		if (!cgrp)
 			return NULL;
@@ -2355,7 +2364,7 @@ int cgroup_migrate_vet_dst(struct cgroup *dst_cgrp)
 		return 0;
 
 	/* apply no-internal-process constraint */
-	if (dst_cgrp->subtree_control)
+	if (dst_cgrp->subtree_control|dst_cgrp->subtree_bypass)
 		return -EBUSY;
 
 	return 0;
@@ -2653,15 +2662,18 @@ void cgroup_procs_write_finish(struct task_struct *task)
 			ss->post_attach();
 }
 
-static void cgroup_print_ss_mask(struct seq_file *seq, u16 ss_mask)
+static void cgroup_print_ss_mask(struct seq_file *seq, u16 ss_mask,
+				 u16 bypass_mask)
 {
 	struct cgroup_subsys *ss;
 	bool printed = false;
 	int ssid;
 
-	do_each_subsys_mask(ss, ssid, ss_mask) {
+	do_each_subsys_mask(ss, ssid, ss_mask|bypass_mask) {
 		if (printed)
 			seq_putc(seq, ' ');
+		if (!(ss_mask & (1 << ssid)))
+			seq_putc(seq, '#');
 		seq_printf(seq, "%s", ss->name);
 		printed = true;
 	} while_each_subsys_mask();
@@ -2673,8 +2685,10 @@ static void cgroup_print_ss_mask(struct seq_file *seq, u16 ss_mask)
 static int cgroup_controllers_show(struct seq_file *seq, void *v)
 {
 	struct cgroup *cgrp = seq_css(seq)->cgroup;
+	struct cgroup *parent = cgroup_parent(cgrp);
+	u16 bypass = parent ? parent->subtree_bypass : 0;
 
-	cgroup_print_ss_mask(seq, cgroup_control(cgrp));
+	cgroup_print_ss_mask(seq, cgroup_control(cgrp, false), bypass);
 	return 0;
 }
 
@@ -2683,7 +2697,7 @@ static int cgroup_subtree_control_show(struct seq_file *seq, void *v)
 {
 	struct cgroup *cgrp = seq_css(seq)->cgroup;
 
-	cgroup_print_ss_mask(seq, cgrp->subtree_control);
+	cgroup_print_ss_mask(seq, cgrp->subtree_control, cgrp->subtree_bypass);
 	return 0;
 }
 
@@ -2796,6 +2810,7 @@ static void cgroup_save_control(struct cgroup *cgrp)
 	cgroup_for_each_live_descendant_pre(dsct, d_css, cgrp) {
 		dsct->old_subtree_control = dsct->subtree_control;
 		dsct->old_subtree_ss_mask = dsct->subtree_ss_mask;
+		dsct->old_subtree_bypass  = dsct->subtree_bypass;
 	}
 }
 
@@ -2813,10 +2828,13 @@ static void cgroup_propagate_control(struct cgroup *cgrp)
 	struct cgroup_subsys_state *d_css;
 
 	cgroup_for_each_live_descendant_pre(dsct, d_css, cgrp) {
-		dsct->subtree_control &= cgroup_control(dsct);
+		u16 mask = cgroup_control(dsct, true);
+
+		dsct->subtree_control &= mask;
+		dsct->subtree_bypass  &= mask;
 		dsct->subtree_ss_mask =
 			cgroup_calc_subtree_ss_mask(dsct->subtree_control,
-						    cgroup_ss_mask(dsct));
+						cgroup_ss_mask(dsct, true));
 	}
 }
 
@@ -2835,6 +2853,7 @@ static void cgroup_restore_control(struct cgroup *cgrp)
 	cgroup_for_each_live_descendant_post(dsct, d_css, cgrp) {
 		dsct->subtree_control = dsct->old_subtree_control;
 		dsct->subtree_ss_mask = dsct->old_subtree_ss_mask;
+		dsct->subtree_bypass  = dsct->old_subtree_bypass;
 	}
 }
 
@@ -2843,9 +2862,9 @@ static bool css_visible(struct cgroup_subsys_state *css)
 	struct cgroup_subsys *ss = css->ss;
 	struct cgroup *cgrp = css->cgroup;
 
-	if (cgroup_control(cgrp) & (1 << ss->id))
+	if (cgroup_control(cgrp, false) & (1 << ss->id))
 		return true;
-	if (!(cgroup_ss_mask(cgrp) & (1 << ss->id)))
+	if (!(cgroup_ss_mask(cgrp, false) & (1 << ss->id)))
 		return false;
 	return cgroup_on_dfl(cgrp) && ss->implicit_on_dfl;
 }
@@ -2876,7 +2895,7 @@ static int cgroup_apply_control_enable(struct cgroup *cgrp)
 
 			WARN_ON_ONCE(css && percpu_ref_is_dying(&css->refcnt));
 
-			if (!(cgroup_ss_mask(dsct) & (1 << ss->id)))
+			if (!(cgroup_ss_mask(dsct, false) & (1 << ss->id)))
 				continue;
 
 			if (!css) {
@@ -2926,7 +2945,7 @@ static void cgroup_apply_control_disable(struct cgroup *cgrp)
 				continue;
 
 			if (css->parent &&
-			    !(cgroup_ss_mask(dsct) & (1 << ss->id))) {
+			    !(cgroup_ss_mask(dsct, false) & (1 << ss->id))) {
 				kill_css(css);
 			} else if (!css_visible(css)) {
 				css_clear_dir(css);
@@ -3038,7 +3057,8 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
 					    char *buf, size_t nbytes,
 					    loff_t off)
 {
-	u16 enable = 0, disable = 0;
+	u16 enable = 0, disable = 0, bypass = 0;
+	u16 child_enable = 0;
 	struct cgroup *cgrp, *child;
 	struct cgroup_subsys *ss;
 	char *tok;
@@ -3059,10 +3079,16 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
 
 			if (*tok == '+') {
 				enable |= 1 << ssid;
+				bypass &= ~(1 << ssid);
 				disable &= ~(1 << ssid);
 			} else if (*tok == '-') {
 				disable |= 1 << ssid;
 				enable &= ~(1 << ssid);
+				bypass &= ~(1 << ssid);
+			} else if (*tok == '#') {
+				bypass |= 1 << ssid;
+				enable &= ~(1 << ssid);
+				disable &= ~(1 << ssid);
 			} else {
 				return -EINVAL;
 			}
@@ -3076,35 +3102,35 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
 	if (!cgrp)
 		return -ENODEV;
 
-	for_each_subsys(ss, ssid) {
-		if (enable & (1 << ssid)) {
-			if (cgrp->subtree_control & (1 << ssid)) {
-				enable &= ~(1 << ssid);
-				continue;
-			}
+	/*
+	 * Cannot use controllers that aren't allowed.
+	 */
+	if (~cgroup_control(cgrp, true) & (enable|disable|bypass)) {
+		ret = -ENOENT;
+		goto out_unlock;
+	}
 
-			if (!(cgroup_control(cgrp) & (1 << ssid))) {
-				ret = -ENOENT;
-				goto out_unlock;
-			}
-		} else if (disable & (1 << ssid)) {
-			if (!(cgrp->subtree_control & (1 << ssid))) {
-				disable &= ~(1 << ssid);
-				continue;
-			}
+	/*
+	 * Strip out redundant bits.
+	 */
+	enable  &= ~cgrp->subtree_control;
+	bypass  &= ~cgrp->subtree_bypass;
+	disable &= (cgrp->subtree_control|cgrp->subtree_bypass);
 
-			/* a child has it enabled? */
-			cgroup_for_each_live_child(child, cgrp) {
-				if (child->subtree_control & (1 << ssid)) {
-					ret = -EBUSY;
-					goto out_unlock;
-				}
-			}
-		}
+	if (!(enable|bypass|disable)) {
+		ret = 0;
+		goto out_unlock;
 	}
 
-	if (!enable && !disable) {
-		ret = 0;
+
+	cgroup_for_each_live_child(child, cgrp)
+		child_enable |= child->subtree_control|child->subtree_bypass;
+
+	/*
+	 * Cannot change the state of a controller if enabled in children.
+	 */
+	if ((enable|bypass|disable) & child_enable) {
+		ret = -EBUSY;
 		goto out_unlock;
 	}
 
@@ -3116,7 +3142,9 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
 	cgroup_save_control(cgrp);
 
 	cgrp->subtree_control |= enable;
-	cgrp->subtree_control &= ~disable;
+	cgrp->subtree_control &= ~(bypass|disable);
+	cgrp->subtree_bypass  |= bypass;
+	cgrp->subtree_bypass  &= ~(enable|disable);
 
 	ret = cgroup_apply_control(cgrp);
 
@@ -4441,7 +4469,8 @@ static void css_release(struct percpu_ref *ref)
 }
 
 static void init_and_link_css(struct cgroup_subsys_state *css,
-			      struct cgroup_subsys *ss, struct cgroup *cgrp)
+			      struct cgroup_subsys *ss, struct cgroup *cgrp,
+			      struct cgroup_subsys_state *parent_css)
 {
 	lockdep_assert_held(&cgroup_mutex);
 
@@ -4456,8 +4485,8 @@ static void init_and_link_css(struct cgroup_subsys_state *css,
 	css->serial_nr = css_serial_nr_next++;
 	atomic_set(&css->online_cnt, 0);
 
-	if (cgroup_parent(cgrp)) {
-		css->parent = cgroup_css(cgroup_parent(cgrp), ss);
+	if (parent_css) {
+		css->parent = parent_css;
 		css_get(css->parent);
 	}
 
@@ -4520,19 +4549,26 @@ static struct cgroup_subsys_state *css_create(struct cgroup *cgrp,
 					      struct cgroup_subsys *ss)
 {
 	struct cgroup *parent = cgroup_parent(cgrp);
-	struct cgroup_subsys_state *parent_css = cgroup_css(parent, ss);
+	struct cgroup_subsys_state *parent_css = NULL;
 	struct cgroup_subsys_state *css;
 	int err;
 
 	lockdep_assert_held(&cgroup_mutex);
 
+	/*
+	 * As cgroup may be in bypass mode, need to skip over ancestor
+	 * cgroups with NULL CSS.
+	 */
+	for (; parent && !parent_css; parent = cgroup_parent(parent))
+		parent_css = cgroup_css(parent, ss);
+
 	css = ss->css_alloc(parent_css);
 	if (!css)
 		css = ERR_PTR(-ENOMEM);
 	if (IS_ERR(css))
 		return css;
 
-	init_and_link_css(css, ss, cgrp);
+	init_and_link_css(css, ss, cgrp, parent_css);
 
 	err = percpu_ref_init(&css->refcnt, css_release, 0, GFP_KERNEL);
 	if (err)
@@ -4634,7 +4670,7 @@ static struct cgroup *cgroup_create(struct cgroup *parent)
 	 * subtree_control from the parent.  Each is configured manually.
 	 */
 	if (!cgroup_on_dfl(cgrp))
-		cgrp->subtree_control = cgroup_control(cgrp);
+		cgrp->subtree_control = cgroup_control(cgrp, false);
 
 	if (parent)
 		cgroup_bpf_inherit(cgrp, parent);
@@ -4921,7 +4957,7 @@ static void __init cgroup_init_subsys(struct cgroup_subsys *ss, bool early)
 	css = ss->css_alloc(cgroup_css(&cgrp_dfl_root.cgrp, ss));
 	/* We don't handle early failures gracefully */
 	BUG_ON(IS_ERR(css));
-	init_and_link_css(css, ss, &cgrp_dfl_root.cgrp);
+	init_and_link_css(css, ss, &cgrp_dfl_root.cgrp, NULL);
 
 	/*
 	 * Root csses are never destroyed and we can't initialize
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 3/4] cgroup: Allow reenabling of controller in bypass mode
  2017-07-21 20:34 ` Waiman Long
                   ` (2 preceding siblings ...)
  (?)
@ 2017-07-21 20:34 ` Waiman Long
  -1 siblings, 0 replies; 24+ messages in thread
From: Waiman Long @ 2017-07-21 20:34 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, kernel-team, pjt, luto, efault, torvalds,
	guro, Waiman Long

Controllers set to bypass mode in the parent's "cgroup.subtree_control"
can now be optionally enabled by writing the controller name with the
'+' prefix to "cgroup.controllers". Using the '#' prefix will reset it
back to the bypass state.

This capability increases the flexibility each controller has in
shaping the effective cgroup hierarchy to best suit its need.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/cgroup-v2.txt |  23 +++++++++-
 include/linux/cgroup-defs.h |   7 +++
 kernel/cgroup/cgroup.c      | 109 ++++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 134 insertions(+), 5 deletions(-)

diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
index f17a74b..efb69c4 100644
--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -395,6 +395,18 @@ prefixed controller interface files from C and D.  This means that the
 controller interface files - anything which doesn't start with
 "cgroup." are owned by the parent rather than the cgroup itself.
 
+Once a controller is put into bypass mode in "cgroup.subtree_control",
+the cgroup's children can optionally enable this controller by writing
+the controller name with the '+ prefix into "cgroup.controllers".
+In this case, the controller interface files are considered to be
+owned by the child cgroup itself, not by its parent.  Therefore,
+setting the bypass mode in "cgroup.subtree_control" means delegating
+the authority of enabling or disabling the controller interface files
+to its children.  Writing the controller name with the '#' prefix into
+"cgroup.controllers" resets the state back to bypass mode.  The state
+of a controller cannot be changed if it is enabled or bypassed in its
+"cgroup.subtree_control".
+
 
 Cgroup Hierarchy
 ~~~~~~~~~~~~~~~~
@@ -859,11 +871,18 @@ All cgroup core files are prefixed with "cgroup."
 	should be granted along with the containing directory.
 
   cgroup.controllers
-	A read-only space separated values file which exists on all
+	A read-write space separated values file which exists on all
 	cgroups.
 
 	It shows space separated list of all controllers available to
-	the cgroup.  The controllers are not ordered.
+	the cgroup.  Controller names with '#' prefix are in bypass
+	mode.  The controllers are not ordered.
+
+	When a controller is set into bypass mode in its parent's
+	"cgroup.subtree_control", its name prefixed with '+' or '#'
+	can be written to enable it or reset it back to bypass mode
+	respectively.  Controllers not in bypass mode are not allowed
+	to be written.
 
   cgroup.subtree_control
 	A read-write space separated values file which exists on all
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 3cac6d0..25c2ac8 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -308,6 +308,13 @@ struct cgroup {
 	u16 old_subtree_ss_mask;
 	u16 old_subtree_bypass;
 
+	/*
+	 * The bitmask of subsystems that are set in its parent's
+	 * ->subtree_bypass and explictly enabled in this cgroup.
+	 */
+	u16 enable_ss_mask;
+	u16 old_enable_ss_mask;
+
 	/* Private pointers for each registered subsystem */
 	struct cgroup_subsys_state __rcu *subsys[CGROUP_SUBSYS_COUNT];
 
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 1e7feae..358d8b3 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -420,7 +420,7 @@ static u16 cgroup_control(struct cgroup *cgrp, bool show_bypass)
 	u16 root_ss_mask = cgrp->root->subsys_mask;
 
 	if (parent) {
-		u16 ss_mask = parent->subtree_control;
+		u16 ss_mask = parent->subtree_control|cgrp->enable_ss_mask;
 
 		if (show_bypass)
 			ss_mask |= parent->subtree_bypass;
@@ -443,7 +443,7 @@ static u16 cgroup_ss_mask(struct cgroup *cgrp, bool show_bypass)
 	struct cgroup *parent = cgroup_parent(cgrp);
 
 	if (parent) {
-		u16 ss_mask = parent->subtree_ss_mask;
+		u16 ss_mask = parent->subtree_ss_mask|cgrp->enable_ss_mask;
 
 
 		if (show_bypass)
@@ -2811,6 +2811,7 @@ static void cgroup_save_control(struct cgroup *cgrp)
 		dsct->old_subtree_control = dsct->subtree_control;
 		dsct->old_subtree_ss_mask = dsct->subtree_ss_mask;
 		dsct->old_subtree_bypass  = dsct->subtree_bypass;
+		dsct->old_enable_ss_mask  = dsct->enable_ss_mask;
 	}
 }
 
@@ -2854,6 +2855,7 @@ static void cgroup_restore_control(struct cgroup *cgrp)
 		dsct->subtree_control = dsct->old_subtree_control;
 		dsct->subtree_ss_mask = dsct->old_subtree_ss_mask;
 		dsct->subtree_bypass  = dsct->old_subtree_bypass;
+		dsct->enable_ss_mask  = dsct->old_enable_ss_mask;
 	}
 }
 
@@ -3124,7 +3126,8 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
 
 
 	cgroup_for_each_live_child(child, cgrp)
-		child_enable |= child->subtree_control|child->subtree_bypass;
+		child_enable |= child->subtree_control|child->subtree_bypass|
+				child->enable_ss_mask;
 
 	/*
 	 * Cannot change the state of a controller if enabled in children.
@@ -3157,6 +3160,105 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
 	return ret ?: nbytes;
 }
 
+/*
+ * Change bypass status of controllers for a cgroup in the default hierarchy.
+ */
+static ssize_t cgroup_controllers_write(struct kernfs_open_file *of,
+					char *buf, size_t nbytes,
+					loff_t off)
+{
+	u16 enable = 0, bypass = 0;
+	struct cgroup *cgrp, *parent;
+	struct cgroup_subsys *ss;
+	char *tok;
+	int ssid, ret;
+
+	/*
+	 * Parse input - space separated list of subsystem names prefixed
+	 * with either + or #.
+	 */
+	buf = strstrip(buf);
+	while ((tok = strsep(&buf, " "))) {
+		if (tok[0] == '\0')
+			continue;
+		do_each_subsys_mask(ss, ssid, ~cgrp_dfl_inhibit_ss_mask) {
+			if (!cgroup_ssid_enabled(ssid) ||
+			    strcmp(tok + 1, ss->name))
+				continue;
+
+			if (*tok == '+') {
+				enable |= 1 << ssid;
+				bypass &= ~(1 << ssid);
+			} else if (*tok == '#') {
+				bypass |= 1 << ssid;
+				enable &= ~(1 << ssid);
+			} else {
+				return -EINVAL;
+			}
+			break;
+		} while_each_subsys_mask();
+		if (ssid == CGROUP_SUBSYS_COUNT)
+			return -EINVAL;
+	}
+
+	cgrp = cgroup_kn_lock_live(of->kn, true);
+	if (!cgrp)
+		return -ENODEV;
+
+	/*
+	 * Write to root cgroup's controllers file is not allowed.
+	 */
+	parent = cgroup_parent(cgrp);
+	if (!parent) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	/*
+	 * Only controllers set into bypass mode in the parent cgroup
+	 * can be specified here.
+	 */
+	if (~parent->subtree_bypass & (enable|bypass)) {
+		ret = -ENOENT;
+		goto out_unlock;
+	}
+
+	/*
+	 * Mask off irrelevant bits.
+	 */
+	enable &= ~cgrp->enable_ss_mask;
+	bypass &=  cgrp->enable_ss_mask;
+
+	if (!(enable|bypass)) {
+		ret = 0;
+		goto out_unlock;
+	}
+
+	/*
+	 * We cannot change the bypass state of a controller that is enabled
+	 * in subtree_control.
+	 */
+	if ((cgrp->subtree_control|cgrp->subtree_bypass) & (enable|bypass)) {
+		ret = -EBUSY;
+		goto out_unlock;
+	}
+
+	/* Save and update control masks and prepare csses */
+	cgroup_save_control(cgrp);
+
+	cgrp->enable_ss_mask |= enable;
+	cgrp->enable_ss_mask &= ~bypass;
+
+	ret = cgroup_apply_control(cgrp);
+	cgroup_finalize_control(cgrp, ret);
+	kernfs_activate(cgrp->kn);
+	ret = 0;
+
+out_unlock:
+	cgroup_kn_unlock(of->kn);
+	return ret ?: nbytes;
+}
+
 static int cgroup_enable_threaded(struct cgroup *cgrp)
 {
 	struct cgroup *parent = cgroup_parent(cgrp);
@@ -4322,6 +4424,7 @@ static ssize_t cgroup_threads_write(struct kernfs_open_file *of,
 	{
 		.name = "cgroup.controllers",
 		.seq_show = cgroup_controllers_show,
+		.write = cgroup_controllers_write,
 	},
 	{
 		.name = "cgroup.subtree_control",
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 4/4] cgroup: Make debug controller report new controller masks
  2017-07-21 20:34 ` Waiman Long
                   ` (3 preceding siblings ...)
  (?)
@ 2017-07-21 20:34 ` Waiman Long
  -1 siblings, 0 replies; 24+ messages in thread
From: Waiman Long @ 2017-07-21 20:34 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, kernel-team, pjt, luto, efault, torvalds,
	guro, Waiman Long

The newly added cgroup controller masks (subtree_bypass and
enable_ss_mask) are now being reported in the debug.masks controller
file.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/cgroup/debug.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/cgroup/debug.c b/kernel/cgroup/debug.c
index f661b4c..5f35a76 100644
--- a/kernel/cgroup/debug.c
+++ b/kernel/cgroup/debug.c
@@ -262,6 +262,8 @@ static int cgroup_masks_read(struct seq_file *seq, void *v)
 
 	cgroup_masks_read_one(seq, "subtree_control", cgrp->subtree_control);
 	cgroup_masks_read_one(seq, "subtree_ss_mask", cgrp->subtree_ss_mask);
+	cgroup_masks_read_one(seq, "subtree_bypass",  cgrp->subtree_bypass);
+	cgroup_masks_read_one(seq, "enable_ss_mask",  cgrp->enable_ss_mask);
 
 	cgroup_kn_unlock(of->kn);
 	return 0;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/4] cgroup: Child cgroup creation not allowed on invalid domain
  2017-07-21 20:34   ` Waiman Long
  (?)
@ 2017-07-22 13:43   ` Tejun Heo
  2017-07-23 12:18       ` Tejun Heo
  2017-07-24 17:48     ` [PATCH v2 1/4] cgroup: Child cgroup creation not allowed on invalid domain Waiman Long
  -1 siblings, 2 replies; 24+ messages in thread
From: Tejun Heo @ 2017-07-22 13:43 UTC (permalink / raw)
  To: Waiman Long
  Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar, cgroups,
	linux-kernel, kernel-team, pjt, luto, efault, torvalds, guro

Hello, Waiman.

On Fri, Jul 21, 2017 at 04:34:50PM -0400, Waiman Long wrote:
> When thread mode is used, it is possible that some cgroups may be
> in an invalid state. Currently users may not be aware that they are
> invalid until they try to migrate tasks over. This patch disallows
> child cgroup creation on invalid domain. This adds one more failure
> point in reminding users that they are dealing with invalid domains.
> It also minimizes the number of invalid domains outstanding as much
> as possible.

It's a bit inconsistent because we can reach the same forbidden state
by turning a sibling cgroup threaded.  Please consider the following.

     A
    / \
   B   C
        \
	 D

Let's say all are domains and we make B threaded.  A becomes the
threaded domain, C and D become invalid, which is the configuration
you're trying to prevent.  We can either enabling threaded on B too or
relax type modifications further so that people can make C threaded
which makes sense given that that would lead to a topology which has
to supported anyway (if C were threaded before D was created, it'd
look the same).

So, I'm leaning more towards relaxing restrictions and tightening it,
and given that we have to expose invalid state anyway, I think there's
actual benefit in doing so as it gives more flexibility while building
the hierarchy.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/4] cgroup: Allow bypass mode in subtree_control
@ 2017-07-22 13:50     ` Tejun Heo
  0 siblings, 0 replies; 24+ messages in thread
From: Tejun Heo @ 2017-07-22 13:50 UTC (permalink / raw)
  To: Waiman Long
  Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar, cgroups,
	linux-kernel, kernel-team, pjt, luto, efault, torvalds, guro

Hello, Waiman.

On Fri, Jul 21, 2017 at 04:34:51PM -0400, Waiman Long wrote:
> The special prefix '#' attached to a controller name can now be written
> into the cgroup.subtree_control file to set that controller in bypass
> mode in all the child cgroups. The controller will show up in the
> children's cgroup.controllers file, but the corresponding control knobs
> will be absent. However, that controller can be enabled or bypassed
> in its children by writing to their respective subtree_control files.
> 
> This mode can be useful to non-domain controllers or controllers where
> there are costs to each additional layer of hierarchy. This mode will
> also allow more freedom in how each controller can shape its effective
> hierarchy independent of each others.

While this continues to be an interesting idea.  I'm still having a
bit of hard time with the change.  The biggest blocks are

* As raised a couple times before, how would this work in terms of
  resource ownership and delegation?  The last time we spoke about
  this, I felt that we were mostly talking past each other.  I think
  it'd really help to think about / explain how this would work with
  delegation to clarify who owns what.

* While the idea is interesting, I think we need more concrete
  usecases to justify the addition and make sure that we aren't doing
  something misguided.  Can you please illustrate / give examples of
  how this would be useful?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/4] cgroup: Allow bypass mode in subtree_control
@ 2017-07-22 13:50     ` Tejun Heo
  0 siblings, 0 replies; 24+ messages in thread
From: Tejun Heo @ 2017-07-22 13:50 UTC (permalink / raw)
  To: Waiman Long
  Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg,
	pjt-hpIqsD4AKlfQT0dZR+AlfA, luto-kltTT9wpgjJwATOyAt5JVQ,
	efault-Mmb7MZpHnFY, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	guro-b10kYP2dOMg

Hello, Waiman.

On Fri, Jul 21, 2017 at 04:34:51PM -0400, Waiman Long wrote:
> The special prefix '#' attached to a controller name can now be written
> into the cgroup.subtree_control file to set that controller in bypass
> mode in all the child cgroups. The controller will show up in the
> children's cgroup.controllers file, but the corresponding control knobs
> will be absent. However, that controller can be enabled or bypassed
> in its children by writing to their respective subtree_control files.
> 
> This mode can be useful to non-domain controllers or controllers where
> there are costs to each additional layer of hierarchy. This mode will
> also allow more freedom in how each controller can shape its effective
> hierarchy independent of each others.

While this continues to be an interesting idea.  I'm still having a
bit of hard time with the change.  The biggest blocks are

* As raised a couple times before, how would this work in terms of
  resource ownership and delegation?  The last time we spoke about
  this, I felt that we were mostly talking past each other.  I think
  it'd really help to think about / explain how this would work with
  delegation to clarify who owns what.

* While the idea is interesting, I think we need more concrete
  usecases to justify the addition and make sure that we aren't doing
  something misguided.  Can you please illustrate / give examples of
  how this would be useful?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH] cgroup: remove unnecessary empty check when enabling threaded mode
@ 2017-07-23 12:18       ` Tejun Heo
  0 siblings, 0 replies; 24+ messages in thread
From: Tejun Heo @ 2017-07-23 12:18 UTC (permalink / raw)
  To: Waiman Long
  Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar, cgroups,
	linux-kernel, kernel-team, pjt, luto, efault, torvalds, guro

cgroup_enable_threaded() checks that the cgroup doesn't have any tasks
or children and fails the operation if so.  This test is unnecessary
because the first part is already checked by
cgroup_can_be_thread_root() and the latter is unnecessary.  The latter
actually cause a behavioral oddity.  Please consider the following
hierarchy.  All cgroups are domains.

    A
   / \
  B   C
       \
        D

If B is made threaded, C and D becomes invalid domains.  Due to the no
children restriction, threaded mode can't be enabled on C.  For C and
D, the only thing the user can do is removal.

There is no reason for this restriction.  Remove it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Waiman Long <longman@redhat.com>
---
Hello,

So, the only thing inconsistent there was the extra restriction when
enabling threaded mode when all the necessary conditions are already
checked by when verifying the parent's domain.

Thanks.

 Documentation/cgroup-v2.txt |    5 +++--
 kernel/cgroup/cgroup.c      |    7 -------
 2 files changed, 3 insertions(+), 9 deletions(-)

--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -274,8 +274,9 @@ thread mode, the following conditions mu
 - As the cgroup will join the parent's resource domain.  The parent
   must either be a valid (threaded) domain or a threaded cgroup.
 
-- The cgroup must be empty.  No enabled controllers, child cgroups or
-  processes.
+- When the parent is an unthreaded domain, it must not have any domain
+  controllers enabled or populated domain children.  The root is
+  exempt from this requirement.
 
 Topology-wise, a cgroup can be in an invalid state.  Please consider
 the following toplogy::
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -3147,13 +3147,6 @@ static int cgroup_enable_threaded(struct
 		return -EOPNOTSUPP;
 
 	/*
-	 * Allow enabling thread mode only on empty cgroups to avoid
-	 * implicit migrations and recursive operations.
-	 */
-	if (cgroup_has_tasks(cgrp) || css_has_online_children(&cgrp->self))
-		return -EBUSY;
-
-	/*
 	 * The following shouldn't cause actual migrations and should
 	 * always succeed.
 	 */

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH] cgroup: remove unnecessary empty check when enabling threaded mode
@ 2017-07-23 12:18       ` Tejun Heo
  0 siblings, 0 replies; 24+ messages in thread
From: Tejun Heo @ 2017-07-23 12:18 UTC (permalink / raw)
  To: Waiman Long
  Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg,
	pjt-hpIqsD4AKlfQT0dZR+AlfA, luto-kltTT9wpgjJwATOyAt5JVQ,
	efault-Mmb7MZpHnFY, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	guro-b10kYP2dOMg

cgroup_enable_threaded() checks that the cgroup doesn't have any tasks
or children and fails the operation if so.  This test is unnecessary
because the first part is already checked by
cgroup_can_be_thread_root() and the latter is unnecessary.  The latter
actually cause a behavioral oddity.  Please consider the following
hierarchy.  All cgroups are domains.

    A
   / \
  B   C
       \
        D

If B is made threaded, C and D becomes invalid domains.  Due to the no
children restriction, threaded mode can't be enabled on C.  For C and
D, the only thing the user can do is removal.

There is no reason for this restriction.  Remove it.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
Hello,

So, the only thing inconsistent there was the extra restriction when
enabling threaded mode when all the necessary conditions are already
checked by when verifying the parent's domain.

Thanks.

 Documentation/cgroup-v2.txt |    5 +++--
 kernel/cgroup/cgroup.c      |    7 -------
 2 files changed, 3 insertions(+), 9 deletions(-)

--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -274,8 +274,9 @@ thread mode, the following conditions mu
 - As the cgroup will join the parent's resource domain.  The parent
   must either be a valid (threaded) domain or a threaded cgroup.
 
-- The cgroup must be empty.  No enabled controllers, child cgroups or
-  processes.
+- When the parent is an unthreaded domain, it must not have any domain
+  controllers enabled or populated domain children.  The root is
+  exempt from this requirement.
 
 Topology-wise, a cgroup can be in an invalid state.  Please consider
 the following toplogy::
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -3147,13 +3147,6 @@ static int cgroup_enable_threaded(struct
 		return -EOPNOTSUPP;
 
 	/*
-	 * Allow enabling thread mode only on empty cgroups to avoid
-	 * implicit migrations and recursive operations.
-	 */
-	if (cgroup_has_tasks(cgrp) || css_has_online_children(&cgrp->self))
-		return -EBUSY;
-
-	/*
 	 * The following shouldn't cause actual migrations and should
 	 * always succeed.
 	 */

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/4] cgroup: Child cgroup creation not allowed on invalid domain
  2017-07-22 13:43   ` Tejun Heo
  2017-07-23 12:18       ` Tejun Heo
@ 2017-07-24 17:48     ` Waiman Long
  1 sibling, 0 replies; 24+ messages in thread
From: Waiman Long @ 2017-07-24 17:48 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar, cgroups,
	linux-kernel, kernel-team, pjt, luto, efault, torvalds, guro

On 07/22/2017 09:43 AM, Tejun Heo wrote:
> Hello, Waiman.
>
> On Fri, Jul 21, 2017 at 04:34:50PM -0400, Waiman Long wrote:
>> When thread mode is used, it is possible that some cgroups may be
>> in an invalid state. Currently users may not be aware that they are
>> invalid until they try to migrate tasks over. This patch disallows
>> child cgroup creation on invalid domain. This adds one more failure
>> point in reminding users that they are dealing with invalid domains.
>> It also minimizes the number of invalid domains outstanding as much
>> as possible.
> It's a bit inconsistent because we can reach the same forbidden state
> by turning a sibling cgroup threaded.  Please consider the following.
>
>      A
>     / \
>    B   C
>         \
> 	 D
>
> Let's say all are domains and we make B threaded.  A becomes the
> threaded domain, C and D become invalid, which is the configuration
> you're trying to prevent.  We can either enabling threaded on B too or
> relax type modifications further so that people can make C threaded
> which makes sense given that that would lead to a topology which has
> to supported anyway (if C were threaded before D was created, it'd
> look the same).
>
> So, I'm leaning more towards relaxing restrictions and tightening it,
> and given that we have to expose invalid state anyway, I think there's
> actual benefit in doing so as it gives more flexibility while building
> the hierarchy.

Yes, I totally understand that we could have this happened in a sibling
node. It is just my idea to make users become more aware that they are
dealing with an invalid domain cgroup. It was just a suggestion on my
part and I am totally fine if it is not merged.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/4] cgroup: Allow bypass mode in subtree_control
@ 2017-07-24 18:20       ` Waiman Long
  0 siblings, 0 replies; 24+ messages in thread
From: Waiman Long @ 2017-07-24 18:20 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar, cgroups,
	linux-kernel, kernel-team, pjt, luto, efault, torvalds, guro

On 07/22/2017 09:50 AM, Tejun Heo wrote:
> Hello, Waiman.
>
> On Fri, Jul 21, 2017 at 04:34:51PM -0400, Waiman Long wrote:
>> The special prefix '#' attached to a controller name can now be written
>> into the cgroup.subtree_control file to set that controller in bypass
>> mode in all the child cgroups. The controller will show up in the
>> children's cgroup.controllers file, but the corresponding control knobs
>> will be absent. However, that controller can be enabled or bypassed
>> in its children by writing to their respective subtree_control files.
>>
>> This mode can be useful to non-domain controllers or controllers where
>> there are costs to each additional layer of hierarchy. This mode will
>> also allow more freedom in how each controller can shape its effective
>> hierarchy independent of each others.
> While this continues to be an interesting idea.  I'm still having a
> bit of hard time with the change.  The biggest blocks are
>
> * As raised a couple times before, how would this work in terms of
>   resource ownership and delegation?  The last time we spoke about
>   this, I felt that we were mostly talking past each other.  I think
>   it'd really help to think about / explain how this would work with
>   delegation to clarify who owns what.

As said in patch 3, enabling bypass mode at subtree_control delegate the
authority of enabling controllers to the children. The children own the
resource control files directly. It will be more straight forward to
explain if bypass mode can only be used consistently from the root down.
Having a mix of regular enable and bypass down the tree will be more
tricky to talk about.

> * While the idea is interesting, I think we need more concrete
>   usecases to justify the addition and make sure that we aren't doing
>   something misguided.  Can you please illustrate / give examples of
>   how this would be useful?

Bypass mode targets mainly non-domain controllers and controllers that
have cost associated with each additional level of hierarchy (e.g. cpu).
I believe the end goal of cgroup v2 is to have all controllers migrated
to it eventually. Consider the following:

    A
   / \
  B   C
 / \ / \
D  E F G

Controller X may want (A, B, C) to be controlled as one group with one
set of control files whereas D, E, F, G will have their own control
files. Controller Y may want all of them have their own control files.
Bypass mode allows us to do that. With more and more controllers enabled
in v2, the chance of this kind of configuration conflicts is going up.

I am willing to take a more limited form of bypass mode that have to be
either enabled (+) or bypass (#) only from the root down for the time
being and then consider allowing their mixing later on if you think it
is more acceptable to you.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/4] cgroup: Allow bypass mode in subtree_control
@ 2017-07-24 18:20       ` Waiman Long
  0 siblings, 0 replies; 24+ messages in thread
From: Waiman Long @ 2017-07-24 18:20 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg,
	pjt-hpIqsD4AKlfQT0dZR+AlfA, luto-kltTT9wpgjJwATOyAt5JVQ,
	efault-Mmb7MZpHnFY, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	guro-b10kYP2dOMg

On 07/22/2017 09:50 AM, Tejun Heo wrote:
> Hello, Waiman.
>
> On Fri, Jul 21, 2017 at 04:34:51PM -0400, Waiman Long wrote:
>> The special prefix '#' attached to a controller name can now be written
>> into the cgroup.subtree_control file to set that controller in bypass
>> mode in all the child cgroups. The controller will show up in the
>> children's cgroup.controllers file, but the corresponding control knobs
>> will be absent. However, that controller can be enabled or bypassed
>> in its children by writing to their respective subtree_control files.
>>
>> This mode can be useful to non-domain controllers or controllers where
>> there are costs to each additional layer of hierarchy. This mode will
>> also allow more freedom in how each controller can shape its effective
>> hierarchy independent of each others.
> While this continues to be an interesting idea.  I'm still having a
> bit of hard time with the change.  The biggest blocks are
>
> * As raised a couple times before, how would this work in terms of
>   resource ownership and delegation?  The last time we spoke about
>   this, I felt that we were mostly talking past each other.  I think
>   it'd really help to think about / explain how this would work with
>   delegation to clarify who owns what.

As said in patch 3, enabling bypass mode at subtree_control delegate the
authority of enabling controllers to the children. The children own the
resource control files directly. It will be more straight forward to
explain if bypass mode can only be used consistently from the root down.
Having a mix of regular enable and bypass down the tree will be more
tricky to talk about.

> * While the idea is interesting, I think we need more concrete
>   usecases to justify the addition and make sure that we aren't doing
>   something misguided.  Can you please illustrate / give examples of
>   how this would be useful?

Bypass mode targets mainly non-domain controllers and controllers that
have cost associated with each additional level of hierarchy (e.g. cpu).
I believe the end goal of cgroup v2 is to have all controllers migrated
to it eventually. Consider the following:

    A
   / \
  B   C
 / \ / \
D  E F G

Controller X may want (A, B, C) to be controlled as one group with one
set of control files whereas D, E, F, G will have their own control
files. Controller Y may want all of them have their own control files.
Bypass mode allows us to do that. With more and more controllers enabled
in v2, the chance of this kind of configuration conflicts is going up.

I am willing to take a more limited form of bypass mode that have to be
either enabled (+) or bypass (#) only from the root down for the time
being and then consider allowing their mixing later on if you think it
is more acceptable to you.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] cgroup: remove unnecessary empty check when enabling threaded mode
@ 2017-07-24 19:14         ` Waiman Long
  0 siblings, 0 replies; 24+ messages in thread
From: Waiman Long @ 2017-07-24 19:14 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar, cgroups,
	linux-kernel, kernel-team, pjt, luto, efault, torvalds, guro

On 07/23/2017 08:18 AM, Tejun Heo wrote:
> cgroup_enable_threaded() checks that the cgroup doesn't have any tasks
> or children and fails the operation if so.  This test is unnecessary
> because the first part is already checked by
> cgroup_can_be_thread_root() and the latter is unnecessary.  The latter
> actually cause a behavioral oddity.  Please consider the following
> hierarchy.  All cgroups are domains.
>
>     A
>    / \
>   B   C
>        \
>         D
>
> If B is made threaded, C and D becomes invalid domains.  Due to the no
> children restriction, threaded mode can't be enabled on C.  For C and
> D, the only thing the user can do is removal.
>
> There is no reason for this restriction.  Remove it.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Waiman Long <longman@redhat.com>
> ---
> Hello,
>
> So, the only thing inconsistent there was the extra restriction when
> enabling threaded mode when all the necessary conditions are already
> checked by when verifying the parent's domain.
>
> Thanks.
>
>  Documentation/cgroup-v2.txt |    5 +++--
>  kernel/cgroup/cgroup.c      |    7 -------
>  2 files changed, 3 insertions(+), 9 deletions(-)
>
> --- a/Documentation/cgroup-v2.txt
> +++ b/Documentation/cgroup-v2.txt
> @@ -274,8 +274,9 @@ thread mode, the following conditions mu
>  - As the cgroup will join the parent's resource domain.  The parent
>    must either be a valid (threaded) domain or a threaded cgroup.
>  
> -- The cgroup must be empty.  No enabled controllers, child cgroups or
> -  processes.
> +- When the parent is an unthreaded domain, it must not have any domain
> +  controllers enabled or populated domain children.  The root is
> +  exempt from this requirement.
>  
>  Topology-wise, a cgroup can be in an invalid state.  Please consider
>  the following toplogy::
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -3147,13 +3147,6 @@ static int cgroup_enable_threaded(struct
>  		return -EOPNOTSUPP;
>  
>  	/*
> -	 * Allow enabling thread mode only on empty cgroups to avoid
> -	 * implicit migrations and recursive operations.
> -	 */
> -	if (cgroup_has_tasks(cgrp) || css_has_online_children(&cgrp->self))
> -		return -EBUSY;
> -
> -	/*
>  	 * The following shouldn't cause actual migrations and should
>  	 * always succeed.
>  	 */

Acked-by: Waiman Long <longman@redhat.com>

While you are at it, the one comment that I have with
cgroup_enable_threaded() is that it does not check to see if the cgroup
parent exists. I thought for a while the first time I saw it, but then
realized that cgroup.type wasn't show in root. I think a comment about
this will make the code less puzzling to causal reader.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] cgroup: remove unnecessary empty check when enabling threaded mode
@ 2017-07-24 19:14         ` Waiman Long
  0 siblings, 0 replies; 24+ messages in thread
From: Waiman Long @ 2017-07-24 19:14 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg,
	pjt-hpIqsD4AKlfQT0dZR+AlfA, luto-kltTT9wpgjJwATOyAt5JVQ,
	efault-Mmb7MZpHnFY, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	guro-b10kYP2dOMg

On 07/23/2017 08:18 AM, Tejun Heo wrote:
> cgroup_enable_threaded() checks that the cgroup doesn't have any tasks
> or children and fails the operation if so.  This test is unnecessary
> because the first part is already checked by
> cgroup_can_be_thread_root() and the latter is unnecessary.  The latter
> actually cause a behavioral oddity.  Please consider the following
> hierarchy.  All cgroups are domains.
>
>     A
>    / \
>   B   C
>        \
>         D
>
> If B is made threaded, C and D becomes invalid domains.  Due to the no
> children restriction, threaded mode can't be enabled on C.  For C and
> D, the only thing the user can do is removal.
>
> There is no reason for this restriction.  Remove it.
>
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Cc: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
> Hello,
>
> So, the only thing inconsistent there was the extra restriction when
> enabling threaded mode when all the necessary conditions are already
> checked by when verifying the parent's domain.
>
> Thanks.
>
>  Documentation/cgroup-v2.txt |    5 +++--
>  kernel/cgroup/cgroup.c      |    7 -------
>  2 files changed, 3 insertions(+), 9 deletions(-)
>
> --- a/Documentation/cgroup-v2.txt
> +++ b/Documentation/cgroup-v2.txt
> @@ -274,8 +274,9 @@ thread mode, the following conditions mu
>  - As the cgroup will join the parent's resource domain.  The parent
>    must either be a valid (threaded) domain or a threaded cgroup.
>  
> -- The cgroup must be empty.  No enabled controllers, child cgroups or
> -  processes.
> +- When the parent is an unthreaded domain, it must not have any domain
> +  controllers enabled or populated domain children.  The root is
> +  exempt from this requirement.
>  
>  Topology-wise, a cgroup can be in an invalid state.  Please consider
>  the following toplogy::
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -3147,13 +3147,6 @@ static int cgroup_enable_threaded(struct
>  		return -EOPNOTSUPP;
>  
>  	/*
> -	 * Allow enabling thread mode only on empty cgroups to avoid
> -	 * implicit migrations and recursive operations.
> -	 */
> -	if (cgroup_has_tasks(cgrp) || css_has_online_children(&cgrp->self))
> -		return -EBUSY;
> -
> -	/*
>  	 * The following shouldn't cause actual migrations and should
>  	 * always succeed.
>  	 */

Acked-by: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

While you are at it, the one comment that I have with
cgroup_enable_threaded() is that it does not check to see if the cgroup
parent exists. I thought for a while the first time I saw it, but then
realized that cgroup.type wasn't show in root. I think a comment about
this will make the code less puzzling to causal reader.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/4] cgroup: Allow bypass mode in subtree_control
  2017-07-24 18:20       ` Waiman Long
  (?)
@ 2017-07-25 17:13       ` Tejun Heo
  2017-07-25 19:10           ` Waiman Long
  2017-08-01 14:29         ` Waiman Long
  -1 siblings, 2 replies; 24+ messages in thread
From: Tejun Heo @ 2017-07-25 17:13 UTC (permalink / raw)
  To: Waiman Long
  Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar, cgroups,
	linux-kernel, kernel-team, pjt, luto, efault, torvalds, guro

Hello, Waiman.

On Mon, Jul 24, 2017 at 02:20:59PM -0400, Waiman Long wrote:
> As said in patch 3, enabling bypass mode at subtree_control delegate the
> authority of enabling controllers to the children. The children own the
> resource control files directly. It will be more straight forward to

But that doesn't work at all because such child would end up
controlling the distribution of an ancestor's resources.  It breaks a
fundamental property of the hierarchy.

> explain if bypass mode can only be used consistently from the root down.
> Having a mix of regular enable and bypass down the tree will be more
> tricky to talk about.

Hmmm... it isn't just being tricky.  As proposed, it is in direct
conflict with the basic semantics of the resource hierarchy.

> > * While the idea is interesting, I think we need more concrete
> >   usecases to justify the addition and make sure that we aren't doing
> >   something misguided.  Can you please illustrate / give examples of
> >   how this would be useful?
> 
> Bypass mode targets mainly non-domain controllers and controllers that
> have cost associated with each additional level of hierarchy (e.g. cpu).
> I believe the end goal of cgroup v2 is to have all controllers migrated
> to it eventually. Consider the following:
> 
>     A
>    / \
>   B   C
>  / \ / \
> D  E F G
> 
> Controller X may want (A, B, C) to be controlled as one group with one
> set of control files whereas D, E, F, G will have their own control
> files. Controller Y may want all of them have their own control files.
> Bypass mode allows us to do that. With more and more controllers enabled
> in v2, the chance of this kind of configuration conflicts is going up.

I think I understand what it wants to do but I think it's still
lacking justfications given how invasive the change is to the basic
operation and usage.  We need more than one can think of this and it
can help with certain hypothetical use cases.  ie. along the line of
what the actual use cases are, what our overhead looks like and why,
and why the problem can't be solved in a different, hopefully less
intrusive, way.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] cgroup: remove unnecessary empty check when enabling threaded mode
  2017-07-23 12:18       ` Tejun Heo
  (?)
  (?)
@ 2017-07-25 17:16       ` Tejun Heo
  -1 siblings, 0 replies; 24+ messages in thread
From: Tejun Heo @ 2017-07-25 17:16 UTC (permalink / raw)
  To: Waiman Long
  Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar, cgroups,
	linux-kernel, kernel-team, pjt, luto, efault, torvalds, guro

On Sun, Jul 23, 2017 at 08:18:26AM -0400, Tejun Heo wrote:
> cgroup_enable_threaded() checks that the cgroup doesn't have any tasks
> or children and fails the operation if so.  This test is unnecessary
> because the first part is already checked by
> cgroup_can_be_thread_root() and the latter is unnecessary.  The latter
> actually cause a behavioral oddity.  Please consider the following
> hierarchy.  All cgroups are domains.
> 
>     A
>    / \
>   B   C
>        \
>         D
> 
> If B is made threaded, C and D becomes invalid domains.  Due to the no
> children restriction, threaded mode can't be enabled on C.  For C and
> D, the only thing the user can do is removal.
> 
> There is no reason for this restriction.  Remove it.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Waiman Long <longman@redhat.com>

Applied to cgroup/for-4.14 with Waiman's ack.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH cgroup/for-4.14] cgroup: add comment to cgroup_enable_threaded()
@ 2017-07-25 17:22           ` Tejun Heo
  0 siblings, 0 replies; 24+ messages in thread
From: Tejun Heo @ 2017-07-25 17:22 UTC (permalink / raw)
  To: Waiman Long
  Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar, cgroups,
	linux-kernel, kernel-team, pjt, luto, efault, torvalds, guro

>From c705a00d77457b44ba3790fdf0627ecb8593a254 Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@kernel.org>
Date: Tue, 25 Jul 2017 13:20:18 -0400

Explain cgroup_enable_threaded() and note that the function can never
be called on the root cgroup.

Signed-off-by: Tejun Heo <tj@kernel.org>
Suggested-by: Waiman Long <longman@redhat.com>
---
Hello, Waiman.

Good point.  Added comment explaining what's going on.  Applying the
patch to cgroup/for-4.14.

Thanks.

 kernel/cgroup/cgroup.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index e0a558c..85f6a11 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -3129,6 +3129,15 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
 	return ret ?: nbytes;
 }
 
+/**
+ * cgroup_enable_threaded - make @cgrp threaded
+ * @cgrp: the target cgroup
+ *
+ * Called when "threaded" is written to the cgroup.type interface file and
+ * tries to make @cgrp threaded and join the parent's resource domain.
+ * This function is never called on the root cgroup as cgroup.type doesn't
+ * exist on it.
+ */
 static int cgroup_enable_threaded(struct cgroup *cgrp)
 {
 	struct cgroup *parent = cgroup_parent(cgrp);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH cgroup/for-4.14] cgroup: add comment to cgroup_enable_threaded()
@ 2017-07-25 17:22           ` Tejun Heo
  0 siblings, 0 replies; 24+ messages in thread
From: Tejun Heo @ 2017-07-25 17:22 UTC (permalink / raw)
  To: Waiman Long
  Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg,
	pjt-hpIqsD4AKlfQT0dZR+AlfA, luto-kltTT9wpgjJwATOyAt5JVQ,
	efault-Mmb7MZpHnFY, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	guro-b10kYP2dOMg

From c705a00d77457b44ba3790fdf0627ecb8593a254 Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Date: Tue, 25 Jul 2017 13:20:18 -0400

Explain cgroup_enable_threaded() and note that the function can never
be called on the root cgroup.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Suggested-by: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
Hello, Waiman.

Good point.  Added comment explaining what's going on.  Applying the
patch to cgroup/for-4.14.

Thanks.

 kernel/cgroup/cgroup.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index e0a558c..85f6a11 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -3129,6 +3129,15 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
 	return ret ?: nbytes;
 }
 
+/**
+ * cgroup_enable_threaded - make @cgrp threaded
+ * @cgrp: the target cgroup
+ *
+ * Called when "threaded" is written to the cgroup.type interface file and
+ * tries to make @cgrp threaded and join the parent's resource domain.
+ * This function is never called on the root cgroup as cgroup.type doesn't
+ * exist on it.
+ */
 static int cgroup_enable_threaded(struct cgroup *cgrp)
 {
 	struct cgroup *parent = cgroup_parent(cgrp);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/4] cgroup: Allow bypass mode in subtree_control
@ 2017-07-25 19:10           ` Waiman Long
  0 siblings, 0 replies; 24+ messages in thread
From: Waiman Long @ 2017-07-25 19:10 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar, cgroups,
	linux-kernel, kernel-team, pjt, luto, efault, torvalds, guro

On 07/25/2017 01:13 PM, Tejun Heo wrote:
> Hello, Waiman.
>
> On Mon, Jul 24, 2017 at 02:20:59PM -0400, Waiman Long wrote:
>> As said in patch 3, enabling bypass mode at subtree_control delegate the
>> authority of enabling controllers to the children. The children own the
>> resource control files directly. It will be more straight forward to
> But that doesn't work at all because such child would end up
> controlling the distribution of an ancestor's resources.  It breaks a
> fundamental property of the hierarchy.
>
>> explain if bypass mode can only be used consistently from the root down.
>> Having a mix of regular enable and bypass down the tree will be more
>> tricky to talk about.
> Hmmm... it isn't just being tricky.  As proposed, it is in direct
> conflict with the basic semantics of the resource hierarchy.
>
>>> * While the idea is interesting, I think we need more concrete
>>>   usecases to justify the addition and make sure that we aren't doing
>>>   something misguided.  Can you please illustrate / give examples of
>>>   how this would be useful?
>> Bypass mode targets mainly non-domain controllers and controllers that
>> have cost associated with each additional level of hierarchy (e.g. cpu).
>> I believe the end goal of cgroup v2 is to have all controllers migrated
>> to it eventually. Consider the following:
>>
>>     A
>>    / \
>>   B   C
>>  / \ / \
>> D  E F G
>>
>> Controller X may want (A, B, C) to be controlled as one group with one
>> set of control files whereas D, E, F, G will have their own control
>> files. Controller Y may want all of them have their own control files.
>> Bypass mode allows us to do that. With more and more controllers enabled
>> in v2, the chance of this kind of configuration conflicts is going up.
> I think I understand what it wants to do but I think it's still
> lacking justfications given how invasive the change is to the basic
> operation and usage.  We need more than one can think of this and it
> can help with certain hypothetical use cases.  ie. along the line of
> what the actual use cases are, what our overhead looks like and why,
> and why the problem can't be solved in a different, hopefully less
> intrusive, way.

As I said above that bypass mode can be useful for non-domain
controllers. For example, controllers like net_cls, net_prio just
provides an ID for classification. There is no resource for the parent
to control or distribute. We can, of course, make them implicit like
perf_event and activate them in all the cgroups. Alternatively, we can
use bypass and what let whatever cgroups that need it activate one for
themselves to avoid proliferation of unused IDs.

Another use case is the cpu controller. As discussed a while before,
scheduler intensive workload wills suffer with each additional level of
hierarchy even if nothing is running at the same time. Image the
following hierarchy:

    R
   / \
  A   B
 / \ / \
X  Y W Z

Supoose that memory consumption is the bottleneck and we use memory
controller to distribute memory resources to tasks in X, Y, W, Z. Also
suppose we have sufficient CPU resources available that we don't care
much about how much CPU they uses. Now, if tasks in cgroup X want to use
cpu controller to restrict the amount of CPU available to a subgroup of
tasks. Currently, the only way to do that is to have cpu controller
enabled all the way down from the root. Now all the tasks in Y, W & Z
will have to suffer the additional performance overhead of this extra
levels of cpu controller hierarchy. We also need to figure out how much
CPU resource will need to be partitioned among cgroups.

With bypass mode, you only need to activate the CPU controllers where it
is needed. The tasks in the other cgroups can just compete directly with
each other without worrying about resource partition and hierarchical
overhead. This is one example of the conflict of hierarchy problem I
mentioned before.

Even though I wrote that the children own the control files in their
cgroup in bypass mode, it is mainly a conceptual framework for
discussion purpose from my perspective. In reality, it is a matter of
who has the permission to write to cgroup.controller to re-enable a
bypassed controller and the corresponding controller files. So we can
define it either ways (by parent or by child) to  best fit the narrative
that we want to convey to the cgroup users. I don't care much about the
narrative. I care more about what capability and flexibility that can be
made available to the cgroup users.

In your nsdelegate mount option patch, only cgroup.procs and
cgroup.subtree_control are to be written by delegatees. So unless we
extend it to other control files, those other files are still
practically owned by the parent.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/4] cgroup: Allow bypass mode in subtree_control
@ 2017-07-25 19:10           ` Waiman Long
  0 siblings, 0 replies; 24+ messages in thread
From: Waiman Long @ 2017-07-25 19:10 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg,
	pjt-hpIqsD4AKlfQT0dZR+AlfA, luto-kltTT9wpgjJwATOyAt5JVQ,
	efault-Mmb7MZpHnFY, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	guro-b10kYP2dOMg

On 07/25/2017 01:13 PM, Tejun Heo wrote:
> Hello, Waiman.
>
> On Mon, Jul 24, 2017 at 02:20:59PM -0400, Waiman Long wrote:
>> As said in patch 3, enabling bypass mode at subtree_control delegate the
>> authority of enabling controllers to the children. The children own the
>> resource control files directly. It will be more straight forward to
> But that doesn't work at all because such child would end up
> controlling the distribution of an ancestor's resources.  It breaks a
> fundamental property of the hierarchy.
>
>> explain if bypass mode can only be used consistently from the root down.
>> Having a mix of regular enable and bypass down the tree will be more
>> tricky to talk about.
> Hmmm... it isn't just being tricky.  As proposed, it is in direct
> conflict with the basic semantics of the resource hierarchy.
>
>>> * While the idea is interesting, I think we need more concrete
>>>   usecases to justify the addition and make sure that we aren't doing
>>>   something misguided.  Can you please illustrate / give examples of
>>>   how this would be useful?
>> Bypass mode targets mainly non-domain controllers and controllers that
>> have cost associated with each additional level of hierarchy (e.g. cpu).
>> I believe the end goal of cgroup v2 is to have all controllers migrated
>> to it eventually. Consider the following:
>>
>>     A
>>    / \
>>   B   C
>>  / \ / \
>> D  E F G
>>
>> Controller X may want (A, B, C) to be controlled as one group with one
>> set of control files whereas D, E, F, G will have their own control
>> files. Controller Y may want all of them have their own control files.
>> Bypass mode allows us to do that. With more and more controllers enabled
>> in v2, the chance of this kind of configuration conflicts is going up.
> I think I understand what it wants to do but I think it's still
> lacking justfications given how invasive the change is to the basic
> operation and usage.  We need more than one can think of this and it
> can help with certain hypothetical use cases.  ie. along the line of
> what the actual use cases are, what our overhead looks like and why,
> and why the problem can't be solved in a different, hopefully less
> intrusive, way.

As I said above that bypass mode can be useful for non-domain
controllers. For example, controllers like net_cls, net_prio just
provides an ID for classification. There is no resource for the parent
to control or distribute. We can, of course, make them implicit like
perf_event and activate them in all the cgroups. Alternatively, we can
use bypass and what let whatever cgroups that need it activate one for
themselves to avoid proliferation of unused IDs.

Another use case is the cpu controller. As discussed a while before,
scheduler intensive workload wills suffer with each additional level of
hierarchy even if nothing is running at the same time. Image the
following hierarchy:

    R
   / \
  A   B
 / \ / \
X  Y W Z

Supoose that memory consumption is the bottleneck and we use memory
controller to distribute memory resources to tasks in X, Y, W, Z. Also
suppose we have sufficient CPU resources available that we don't care
much about how much CPU they uses. Now, if tasks in cgroup X want to use
cpu controller to restrict the amount of CPU available to a subgroup of
tasks. Currently, the only way to do that is to have cpu controller
enabled all the way down from the root. Now all the tasks in Y, W & Z
will have to suffer the additional performance overhead of this extra
levels of cpu controller hierarchy. We also need to figure out how much
CPU resource will need to be partitioned among cgroups.

With bypass mode, you only need to activate the CPU controllers where it
is needed. The tasks in the other cgroups can just compete directly with
each other without worrying about resource partition and hierarchical
overhead. This is one example of the conflict of hierarchy problem I
mentioned before.

Even though I wrote that the children own the control files in their
cgroup in bypass mode, it is mainly a conceptual framework for
discussion purpose from my perspective. In reality, it is a matter of
who has the permission to write to cgroup.controller to re-enable a
bypassed controller and the corresponding controller files. So we can
define it either ways (by parent or by child) to  best fit the narrative
that we want to convey to the cgroup users. I don't care much about the
narrative. I care more about what capability and flexibility that can be
made available to the cgroup users.

In your nsdelegate mount option patch, only cgroup.procs and
cgroup.subtree_control are to be written by delegatees. So unless we
extend it to other control files, those other files are still
practically owned by the parent.

Cheers,
Longman




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/4] cgroup: Allow bypass mode in subtree_control
  2017-07-25 17:13       ` Tejun Heo
  2017-07-25 19:10           ` Waiman Long
@ 2017-08-01 14:29         ` Waiman Long
  1 sibling, 0 replies; 24+ messages in thread
From: Waiman Long @ 2017-08-01 14:29 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar, cgroups,
	linux-kernel, kernel-team, pjt, luto, efault, torvalds, guro

On 07/25/2017 01:13 PM, Tejun Heo wrote:
>>
>> Bypass mode targets mainly non-domain controllers and controllers that
>> have cost associated with each additional level of hierarchy (e.g. cpu).
>> I believe the end goal of cgroup v2 is to have all controllers migrated
>> to it eventually. Consider the following:
>>
>>     A
>>    / \
>>   B   C
>>  / \ / \
>> D  E F G
>>
>> Controller X may want (A, B, C) to be controlled as one group with one
>> set of control files whereas D, E, F, G will have their own control
>> files. Controller Y may want all of them have their own control files.
>> Bypass mode allows us to do that. With more and more controllers enabled
>> in v2, the chance of this kind of configuration conflicts is going up.
> I think I understand what it wants to do but I think it's still
> lacking justfications given how invasive the change is to the basic
> operation and usage.  We need more than one can think of this and it

I don't think it is as invasive as you have suggested. The latest patch
doesn't change the semantics of the cgroup control operation as long as
the bypass mode isn't used in subtree_control.

A major feature enabled by this mode is that instead of enabling a
controller in all its children, the bypass mode allows us to choose
which children will have the controller enabled. As for the ownership
question, I am going to change the description so that control files in
the children are still going to be owned by the parent to minimize
changes to basic operation.

> can help with certain hypothetical use cases.  ie. along the line of
> what the actual use cases are, what our overhead looks like and why,
> and why the problem can't be solved in a different, hopefully less
> intrusive, way.

A common use case is an application that want to use cpuset, for
example, to bind some worker threads to individual cpus. At the same
time, the application may also want to use cpu controller to limit the
amount of cpu consumed by some other threads. Right now, the only way to
do that with the current v2 control scheme is to create child cgroups
with both cpu and cpuset controllers enabled and put the desired
processes or threads into those child cgroups. The cost of enabling
cpuset on a task that need cpu controller is negligible. However, the
cost of enabling cpu controller on tasks that only need cpuset can be
noticeable. The performance difference may become a concern for users to
move from cgroup v1 to v2. If our goal is to retire cgroup v1
eventually, we need to make cgroup v2 as performant as cgroup v1.

The cpu controller also have a higher memory cost than other cgroup
controller due to the fact that it requires percpu runqueue structure
that is pretty big especially on system with many cpus. As a result,
more memory will be wasted on tasks that only need cpuset.

With bypass mode, you can put tasks that need cpu controller into a
child cgroups with only cpu controller enabled and put tasks that need
cpuset into child cgroups with only cpuset enabled, similar to what
users can now do on cgroup v1.

I think there is enough justification to have the bypass mode feature in
cgroup v2. I would like to know what other concern do you have with this
feature.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2017-08-01 14:29 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-21 20:34 [PATCH v2 0/4] cgroup: Introducing bypass mode Waiman Long
2017-07-21 20:34 ` Waiman Long
2017-07-21 20:34 ` [PATCH v2 1/4] cgroup: Child cgroup creation not allowed on invalid domain Waiman Long
2017-07-21 20:34   ` Waiman Long
2017-07-22 13:43   ` Tejun Heo
2017-07-23 12:18     ` [PATCH] cgroup: remove unnecessary empty check when enabling threaded mode Tejun Heo
2017-07-23 12:18       ` Tejun Heo
2017-07-24 19:14       ` Waiman Long
2017-07-24 19:14         ` Waiman Long
2017-07-25 17:22         ` [PATCH cgroup/for-4.14] cgroup: add comment to cgroup_enable_threaded() Tejun Heo
2017-07-25 17:22           ` Tejun Heo
2017-07-25 17:16       ` [PATCH] cgroup: remove unnecessary empty check when enabling threaded mode Tejun Heo
2017-07-24 17:48     ` [PATCH v2 1/4] cgroup: Child cgroup creation not allowed on invalid domain Waiman Long
2017-07-21 20:34 ` [PATCH v2 2/4] cgroup: Allow bypass mode in subtree_control Waiman Long
2017-07-22 13:50   ` Tejun Heo
2017-07-22 13:50     ` Tejun Heo
2017-07-24 18:20     ` Waiman Long
2017-07-24 18:20       ` Waiman Long
2017-07-25 17:13       ` Tejun Heo
2017-07-25 19:10         ` Waiman Long
2017-07-25 19:10           ` Waiman Long
2017-08-01 14:29         ` Waiman Long
2017-07-21 20:34 ` [PATCH v2 3/4] cgroup: Allow reenabling of controller in bypass mode Waiman Long
2017-07-21 20:34 ` [PATCH v2 4/4] cgroup: Make debug controller report new controller masks Waiman Long

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.