linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/5] cgroup: Introducing bypass mode
@ 2018-11-20 17:51 Waiman Long
  2018-11-20 17:51 ` [PATCH v4 1/5] cgroup: subtree_control bypass mode for bypassable controllers Waiman Long
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: Waiman Long @ 2018-11-20 17:51 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra,
	Ingo Molnar, Jonathan Corbet
  Cc: cgroups, linux-kernel, linux-doc, Roman Gushchin, Jens Axboe,
	Andrew Morton, Dennis Zhou, Shakeel Butt, Waiman Long

 v4:
  - Rebased to the latest for-4.21 branch of cgroup tree.
  - Make each controller explicitly opt in to become bypassable by
    setting the bypassable cgroup_subsys flag. Currently, only cpu
    controller is made bypassable.
  - Break out the cgroup_v2.rst documentation update as separate patch.

 v3:
  - Remove invalid cgroup subdirectory creation patch.
  - Add use cases for the bypass mode and removing statements about
    control files ownership in cgroup-v2.txt.
  - Restrict bypass mode to non-domain (threaded) controllers only.

 v3 patch - https://lkml.org/lkml/2017/8/9/604

This patchset introduces a new bypass mode to the cgroup v2 core to
give more freedom and flexibility to controllers which choose to become
bypassable the freedom to shape their own unique views of the virtual
cgroup hierarchies that can best suit thier own use cases.

Because of the inherent performance overhead in enabling cpu controller,
it is made bypassable so that the controller only needs to be enabled
at those cgroups that really need it instead of in every cgroups at a
given layer if at least one of them needs it.

The cpu controller performance problem is one of the major issues
in migrating from cgroup v1 to v2.

For example,

  R - A(+) - B(#) - C(+)
                  \ D(#)

where "+" means the controller is enabled and "#" means the controller
is bypassed. For this controller's perspective, the cgroups are
equivalent to:

  R - A|B|D - C

Underneath the root R, cgoups A, B and D are controlled by one set of
knobs and cgroup C is controlled by another set of knobs as a child of
cgroups A|B|C.

This patchset is layered on top of the "for-4.21" branch of Tejun's
cgroup git tree.

Patch 1 introduces a new bypass mode that allows a bypassable
controller to be disabled in a cgroup, but to be re-enabled again in its
children. This is enabled by writing the controller name prefixed with
'#' to the "cgroup.subtree_control" file. Then all its children will
have this controller in bypass mode.

Patch 2 extends the bypass mode mechanism to allow those child cgroups
that are put into the bypass mode for a particular bypassable controller
by their parent to be re-enabled again by writing the controller name
with the '+' prefix to the "cgroup.controllers" file.

Patch 3 extends the debug controller to expose additional controller
masks introduced by this patchset.

Patch 4 makes the cpu controller bypassable.

Patch 5 documents the new bypass mode in cgroup-v2.rst file.

Waiman Long (5):
  cgroup: subtree_control bypass mode for bypassable controllers
  cgroup: Allow reenabling of controller in bypass mode
  cgroup: Make debug controller report new controller masks
  sched/core: Make cpu cgroup controller bypassable
  cgroup: Document bypass mode

 Documentation/admin-guide/cgroup-v2.rst |  66 +++++---
 include/linux/cgroup-defs.h             |  26 +++-
 kernel/cgroup/cgroup.c                  | 256 +++++++++++++++++++++++++-------
 kernel/cgroup/debug.c                   |   2 +
 kernel/sched/core.c                     |   1 +
 5 files changed, 276 insertions(+), 75 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v4 1/5] cgroup: subtree_control bypass mode for bypassable controllers
  2018-11-20 17:51 [PATCH v4 0/5] cgroup: Introducing bypass mode Waiman Long
@ 2018-11-20 17:51 ` Waiman Long
  2018-11-29 11:18   ` Dan Carpenter
  2018-11-20 17:51 ` [PATCH v4 2/5] cgroup: Allow reenabling of controller in bypass mode Waiman Long
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 9+ messages in thread
From: Waiman Long @ 2018-11-20 17:51 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra,
	Ingo Molnar, Jonathan Corbet
  Cc: cgroups, linux-kernel, linux-doc, Roman Gushchin, Jens Axboe,
	Andrew Morton, Dennis Zhou, Shakeel Butt, Waiman Long

A controller in the default hierarchy is bypassable if the bypassable
flag is set in its cgroup_subsys structure.  The special prefix '#'
attached to a bypassable controller name can now be written into the
cgroup.subtree_control file to set that controller in bypass mode in
all the child cgroups. The controller will show up in the children's
cgroup.controllers file, but the corresponding control knobs will be
absent. However, that controller can be enabled or bypassed in its
children by writing to their respective subtree_control files.

In term of resource control, a bypassed cgroup was lumped into its
nearest non-bypassed ancestor. So all the tasks in that ancestor as
well as those in its bypassed direct descendants are controlled by the
same set of control knobs.

This mode is useful to those bypassable controllers where there are
costs to each additional layer of hierarchy. This mode will also allow
more freedom in how each controller can shape its effective hierarchy
independent of the others.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 include/linux/cgroup-defs.h |  19 ++++--
 kernel/cgroup/cgroup.c      | 149 +++++++++++++++++++++++++++++---------------
 2 files changed, 113 insertions(+), 55 deletions(-)

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 8fcbae1..5bff798 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -375,16 +375,18 @@ struct cgroup {
 	struct cgroup_file events_file;	/* handle for "cgroup.events" */
 
 	/*
-	 * The bitmask of subsystems enabled on the child cgroups.
-	 * ->subtree_control is the one configured through
-	 * "cgroup.subtree_control" while ->child_ss_mask is the effective
-	 * one which may have more subsystems enabled.  Controller knobs
-	 * are made available iff it's enabled in ->subtree_control.
+	 * The bitmask of subsystems enabled or bypassed on the child cgroups.
+	 * ->subtree_control and ->subtree_bypass are the one configured
+	 * through "cgroup.subtree_control" while ->subtree_ss_mask is the
+	 * effective one which may have more subsystems enabled.  Controller
+	 * knobs are made available iff it's enabled in ->subtree_ss_mask.
 	 */
 	u16 subtree_control;
 	u16 subtree_ss_mask;
+	u16 subtree_bypass;
 	u16 old_subtree_control;
 	u16 old_subtree_ss_mask;
+	u16 old_subtree_bypass;
 
 	/* Private pointers for each registered subsystem */
 	struct cgroup_subsys_state __rcu *subsys[CGROUP_SUBSYS_COUNT];
@@ -647,6 +649,13 @@ struct cgroup_subsys {
 	bool broken_hierarchy:1;
 	bool warned_broken_hierarchy:1;
 
+	/*
+	 * If %true, the controller, on the default hierarchy, can be
+	 * bypassable. IOW, it can have a virtual hierarchy that is
+	 * different from the default hierarchy.
+	 */
+	bool bypassable:1;
+
 	/* the following two fields are initialized automtically during boot */
 	int id;
 	const char *name;
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index e06994f..a361c10 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -175,6 +175,9 @@ struct cgroup_subsys *cgroup_subsys[] = {
 /* some controllers can be threaded on the default hierarchy */
 static u16 cgrp_dfl_threaded_ss_mask;
 
+/* some controllers can be bypassable on the default hierarchy */
+static u16 cgrp_dfl_bypass_ss_mask;
+
 /* The list of hierarchy roots */
 LIST_HEAD(cgroup_roots);
 static int cgroup_root_count;
@@ -366,7 +369,8 @@ static bool cgroup_can_be_thread_root(struct cgroup *cgrp)
 		return false;
 
 	/* and no domain controllers can be enabled */
-	if (cgrp->subtree_control & ~cgrp_dfl_threaded_ss_mask)
+	if ((cgrp->subtree_control|cgrp->subtree_bypass) &
+	    ~cgrp_dfl_threaded_ss_mask)
 		return false;
 
 	return true;
@@ -388,7 +392,8 @@ bool cgroup_is_thread_root(struct cgroup *cgrp)
 	 * enabled is a thread root.
 	 */
 	if (cgroup_has_tasks(cgrp) &&
-	    (cgrp->subtree_control & cgrp_dfl_threaded_ss_mask))
+	   ((cgrp->subtree_control|cgrp->subtree_bypass)
+	   & cgrp_dfl_threaded_ss_mask))
 		return true;
 
 	return false;
@@ -413,7 +418,7 @@ static bool cgroup_is_valid_domain(struct cgroup *cgrp)
 }
 
 /* subsystems visibly enabled on a cgroup */
-static u16 cgroup_control(struct cgroup *cgrp)
+static u16 cgroup_control(struct cgroup *cgrp, bool show_bypass)
 {
 	struct cgroup *parent = cgroup_parent(cgrp);
 	u16 root_ss_mask = cgrp->root->subsys_mask;
@@ -421,6 +426,9 @@ static u16 cgroup_control(struct cgroup *cgrp)
 	if (parent) {
 		u16 ss_mask = parent->subtree_control;
 
+		if (show_bypass)
+			ss_mask |= parent->subtree_bypass;
+
 		/* threaded cgroups can only have threaded controllers */
 		if (cgroup_is_threaded(cgrp))
 			ss_mask &= cgrp_dfl_threaded_ss_mask;
@@ -434,13 +442,17 @@ static u16 cgroup_control(struct cgroup *cgrp)
 }
 
 /* subsystems enabled on a cgroup */
-static u16 cgroup_ss_mask(struct cgroup *cgrp)
+static u16 cgroup_ss_mask(struct cgroup *cgrp, bool show_bypass)
 {
 	struct cgroup *parent = cgroup_parent(cgrp);
 
 	if (parent) {
 		u16 ss_mask = parent->subtree_ss_mask;
 
+
+		if (show_bypass)
+			ss_mask |= parent->subtree_bypass;
+
 		/* threaded cgroups can only have threaded controllers */
 		if (cgroup_is_threaded(cgrp))
 			ss_mask &= cgrp_dfl_threaded_ss_mask;
@@ -515,7 +527,7 @@ static struct cgroup_subsys_state *cgroup_e_css(struct cgroup *cgrp,
 	 * This function is used while updating css associations and thus
 	 * can't test the csses directly.  Test ss_mask.
 	 */
-	while (!(cgroup_ss_mask(cgrp) & (1 << ss->id))) {
+	while (!(cgroup_ss_mask(cgrp, false) & (1 << ss->id))) {
 		cgrp = cgroup_parent(cgrp);
 		if (!cgrp)
 			return NULL;
@@ -2414,7 +2426,7 @@ int cgroup_migrate_vet_dst(struct cgroup *dst_cgrp)
 		return 0;
 
 	/* apply no-internal-process constraint */
-	if (dst_cgrp->subtree_control)
+	if (dst_cgrp->subtree_control|dst_cgrp->subtree_bypass)
 		return -EBUSY;
 
 	return 0;
@@ -2712,15 +2724,18 @@ void cgroup_procs_write_finish(struct task_struct *task)
 			ss->post_attach();
 }
 
-static void cgroup_print_ss_mask(struct seq_file *seq, u16 ss_mask)
+static void cgroup_print_ss_mask(struct seq_file *seq, u16 ss_mask,
+				 u16 bypass_mask)
 {
 	struct cgroup_subsys *ss;
 	bool printed = false;
 	int ssid;
 
-	do_each_subsys_mask(ss, ssid, ss_mask) {
+	do_each_subsys_mask(ss, ssid, ss_mask|bypass_mask) {
 		if (printed)
 			seq_putc(seq, ' ');
+		if (!(ss_mask & (1 << ssid)))
+			seq_putc(seq, '#');
 		seq_printf(seq, "%s", ss->name);
 		printed = true;
 	} while_each_subsys_mask();
@@ -2732,8 +2747,10 @@ static void cgroup_print_ss_mask(struct seq_file *seq, u16 ss_mask)
 static int cgroup_controllers_show(struct seq_file *seq, void *v)
 {
 	struct cgroup *cgrp = seq_css(seq)->cgroup;
+	struct cgroup *parent = cgroup_parent(cgrp);
+	u16 bypass = parent ? parent->subtree_bypass : 0;
 
-	cgroup_print_ss_mask(seq, cgroup_control(cgrp));
+	cgroup_print_ss_mask(seq, cgroup_control(cgrp, false), bypass);
 	return 0;
 }
 
@@ -2742,7 +2759,7 @@ static int cgroup_subtree_control_show(struct seq_file *seq, void *v)
 {
 	struct cgroup *cgrp = seq_css(seq)->cgroup;
 
-	cgroup_print_ss_mask(seq, cgrp->subtree_control);
+	cgroup_print_ss_mask(seq, cgrp->subtree_control, cgrp->subtree_bypass);
 	return 0;
 }
 
@@ -2856,6 +2873,7 @@ static void cgroup_save_control(struct cgroup *cgrp)
 	cgroup_for_each_live_descendant_pre(dsct, d_css, cgrp) {
 		dsct->old_subtree_control = dsct->subtree_control;
 		dsct->old_subtree_ss_mask = dsct->subtree_ss_mask;
+		dsct->old_subtree_bypass  = dsct->subtree_bypass;
 		dsct->old_dom_cgrp = dsct->dom_cgrp;
 	}
 }
@@ -2874,10 +2892,13 @@ static void cgroup_propagate_control(struct cgroup *cgrp)
 	struct cgroup_subsys_state *d_css;
 
 	cgroup_for_each_live_descendant_pre(dsct, d_css, cgrp) {
-		dsct->subtree_control &= cgroup_control(dsct);
+		u16 mask = cgroup_control(dsct, true);
+
+		dsct->subtree_control &= mask;
+		dsct->subtree_bypass  &= mask;
 		dsct->subtree_ss_mask =
 			cgroup_calc_subtree_ss_mask(dsct->subtree_control,
-						    cgroup_ss_mask(dsct));
+						cgroup_ss_mask(dsct, true));
 	}
 }
 
@@ -2897,6 +2918,7 @@ static void cgroup_restore_control(struct cgroup *cgrp)
 	cgroup_for_each_live_descendant_post(dsct, d_css, cgrp) {
 		dsct->subtree_control = dsct->old_subtree_control;
 		dsct->subtree_ss_mask = dsct->old_subtree_ss_mask;
+		dsct->subtree_bypass  = dsct->old_subtree_bypass;
 		dsct->dom_cgrp = dsct->old_dom_cgrp;
 	}
 }
@@ -2906,9 +2928,9 @@ static bool css_visible(struct cgroup_subsys_state *css)
 	struct cgroup_subsys *ss = css->ss;
 	struct cgroup *cgrp = css->cgroup;
 
-	if (cgroup_control(cgrp) & (1 << ss->id))
+	if (cgroup_control(cgrp, false) & (1 << ss->id))
 		return true;
-	if (!(cgroup_ss_mask(cgrp) & (1 << ss->id)))
+	if (!(cgroup_ss_mask(cgrp, false) & (1 << ss->id)))
 		return false;
 	return cgroup_on_dfl(cgrp) && ss->implicit_on_dfl;
 }
@@ -2939,7 +2961,7 @@ static int cgroup_apply_control_enable(struct cgroup *cgrp)
 
 			WARN_ON_ONCE(css && percpu_ref_is_dying(&css->refcnt));
 
-			if (!(cgroup_ss_mask(dsct) & (1 << ss->id)))
+			if (!(cgroup_ss_mask(dsct, false) & (1 << ss->id)))
 				continue;
 
 			if (!css) {
@@ -2989,7 +3011,7 @@ static void cgroup_apply_control_disable(struct cgroup *cgrp)
 				continue;
 
 			if (css->parent &&
-			    !(cgroup_ss_mask(dsct) & (1 << ss->id))) {
+			    !(cgroup_ss_mask(dsct, false) & (1 << ss->id))) {
 				kill_css(css);
 			} else if (!css_visible(css)) {
 				css_clear_dir(css);
@@ -3101,7 +3123,8 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
 					    char *buf, size_t nbytes,
 					    loff_t off)
 {
-	u16 enable = 0, disable = 0;
+	u16 enable = 0, disable = 0, bypass = 0;
+	u16 child_enable = 0;
 	struct cgroup *cgrp, *child;
 	struct cgroup_subsys *ss;
 	char *tok;
@@ -3122,10 +3145,16 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
 
 			if (*tok == '+') {
 				enable |= 1 << ssid;
+				bypass &= ~(1 << ssid);
 				disable &= ~(1 << ssid);
 			} else if (*tok == '-') {
 				disable |= 1 << ssid;
 				enable &= ~(1 << ssid);
+				bypass &= ~(1 << ssid);
+			} else if (*tok == '#') {
+				bypass |= 1 << ssid;
+				enable &= ~(1 << ssid);
+				disable &= ~(1 << ssid);
 			} else {
 				return -EINVAL;
 			}
@@ -3139,35 +3168,42 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
 	if (!cgrp)
 		return -ENODEV;
 
-	for_each_subsys(ss, ssid) {
-		if (enable & (1 << ssid)) {
-			if (cgrp->subtree_control & (1 << ssid)) {
-				enable &= ~(1 << ssid);
-				continue;
-			}
+	/*
+	 * Cannot use controllers that aren't allowed.
+	 */
+	if (~cgroup_control(cgrp, true) & (enable|disable|bypass)) {
+		ret = -ENOENT;
+		goto out_unlock;
+	}
 
-			if (!(cgroup_control(cgrp) & (1 << ssid))) {
-				ret = -ENOENT;
-				goto out_unlock;
-			}
-		} else if (disable & (1 << ssid)) {
-			if (!(cgrp->subtree_control & (1 << ssid))) {
-				disable &= ~(1 << ssid);
-				continue;
-			}
+	/*
+	 * Strip out redundant bits.
+	 */
+	enable  &= ~cgrp->subtree_control;
+	bypass  &= ~cgrp->subtree_bypass;
+	disable &= (cgrp->subtree_control|cgrp->subtree_bypass);
 
-			/* a child has it enabled? */
-			cgroup_for_each_live_child(child, cgrp) {
-				if (child->subtree_control & (1 << ssid)) {
-					ret = -EBUSY;
-					goto out_unlock;
-				}
-			}
-		}
+	if (!(enable|bypass|disable)) {
+		ret = 0;
+		goto out_unlock;
 	}
 
-	if (!enable && !disable) {
-		ret = 0;
+	/*
+	 * Only bypassable controllers can be bypassed.
+	 */
+	if (bypass & ~cgrp_dfl_bypass_ss_mask) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	cgroup_for_each_live_child(child, cgrp)
+		child_enable |= child->subtree_control|child->subtree_bypass;
+
+	/*
+	 * Cannot change the state of a controller if enabled in children.
+	 */
+	if ((enable|bypass|disable) & child_enable) {
+		ret = -EBUSY;
 		goto out_unlock;
 	}
 
@@ -3179,7 +3215,9 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
 	cgroup_save_control(cgrp);
 
 	cgrp->subtree_control |= enable;
-	cgrp->subtree_control &= ~disable;
+	cgrp->subtree_control &= ~(bypass|disable);
+	cgrp->subtree_bypass  |= bypass;
+	cgrp->subtree_bypass  &= ~(enable|disable);
 
 	ret = cgroup_apply_control(cgrp);
 	cgroup_finalize_control(cgrp, ret);
@@ -4727,7 +4765,8 @@ static void css_release(struct percpu_ref *ref)
 }
 
 static void init_and_link_css(struct cgroup_subsys_state *css,
-			      struct cgroup_subsys *ss, struct cgroup *cgrp)
+			      struct cgroup_subsys *ss, struct cgroup *cgrp,
+			      struct cgroup_subsys_state *parent_css)
 {
 	lockdep_assert_held(&cgroup_mutex);
 
@@ -4743,8 +4782,8 @@ static void init_and_link_css(struct cgroup_subsys_state *css,
 	css->serial_nr = css_serial_nr_next++;
 	atomic_set(&css->online_cnt, 0);
 
-	if (cgroup_parent(cgrp)) {
-		css->parent = cgroup_css(cgroup_parent(cgrp), ss);
+	if (parent_css) {
+		css->parent = parent_css;
 		css_get(css->parent);
 	}
 
@@ -4807,19 +4846,26 @@ static struct cgroup_subsys_state *css_create(struct cgroup *cgrp,
 					      struct cgroup_subsys *ss)
 {
 	struct cgroup *parent = cgroup_parent(cgrp);
-	struct cgroup_subsys_state *parent_css = cgroup_css(parent, ss);
+	struct cgroup_subsys_state *parent_css = NULL;
 	struct cgroup_subsys_state *css;
 	int err;
 
 	lockdep_assert_held(&cgroup_mutex);
 
+	/*
+	 * As cgroup may be in bypass mode, need to skip over ancestor
+	 * cgroups with NULL CSS.
+	 */
+	for (; parent && !parent_css; parent = cgroup_parent(parent))
+		parent_css = cgroup_css(parent, ss);
+
 	css = ss->css_alloc(parent_css);
 	if (!css)
 		css = ERR_PTR(-ENOMEM);
 	if (IS_ERR(css))
 		return css;
 
-	init_and_link_css(css, ss, cgrp);
+	init_and_link_css(css, ss, cgrp, parent_css);
 
 	err = percpu_ref_init(&css->refcnt, css_release, 0, GFP_KERNEL);
 	if (err)
@@ -4941,7 +4987,7 @@ static struct cgroup *cgroup_create(struct cgroup *parent)
 	 * subtree_control from the parent.  Each is configured manually.
 	 */
 	if (!cgroup_on_dfl(cgrp))
-		cgrp->subtree_control = cgroup_control(cgrp);
+		cgrp->subtree_control = cgroup_control(cgrp, false);
 
 	cgroup_propagate_control(cgrp);
 
@@ -5254,7 +5300,7 @@ static void __init cgroup_init_subsys(struct cgroup_subsys *ss, bool early)
 	css = ss->css_alloc(cgroup_css(&cgrp_dfl_root.cgrp, ss));
 	/* We don't handle early failures gracefully */
 	BUG_ON(IS_ERR(css));
-	init_and_link_css(css, ss, &cgrp_dfl_root.cgrp);
+	init_and_link_css(css, ss, &cgrp_dfl_root.cgrp, NULL);
 
 	/*
 	 * Root csses are never destroyed and we can't initialize
@@ -5412,6 +5458,9 @@ int __init cgroup_init(void)
 		if (ss->threaded)
 			cgrp_dfl_threaded_ss_mask |= 1 << ss->id;
 
+		if (ss->bypassable)
+			cgrp_dfl_bypass_ss_mask |= 1 << ss->id;
+
 		if (ss->dfl_cftypes == ss->legacy_cftypes) {
 			WARN_ON(cgroup_add_cftypes(ss, ss->dfl_cftypes));
 		} else {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 2/5] cgroup: Allow reenabling of controller in bypass mode
  2018-11-20 17:51 [PATCH v4 0/5] cgroup: Introducing bypass mode Waiman Long
  2018-11-20 17:51 ` [PATCH v4 1/5] cgroup: subtree_control bypass mode for bypassable controllers Waiman Long
@ 2018-11-20 17:51 ` Waiman Long
  2018-11-20 17:51 ` [PATCH v4 3/5] cgroup: Make debug controller report new controller masks Waiman Long
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Waiman Long @ 2018-11-20 17:51 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra,
	Ingo Molnar, Jonathan Corbet
  Cc: cgroups, linux-kernel, linux-doc, Roman Gushchin, Jens Axboe,
	Andrew Morton, Dennis Zhou, Shakeel Butt, Waiman Long

Bypassable controllers set to bypass mode in the parent's
"cgroup.subtree_control" can now be optionally enabled by writing the
controller name with the '+' prefix to "cgroup.controllers". Using the
'#' prefix will reset it back to the bypass state.

This capability allows a cgroup parent to individually enable bypassable
controllers in a subset of its children instead of either all or none
of them. This increases the flexibility each controller has in shaping
the effective cgroup hierarchy to best suit its need.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 include/linux/cgroup-defs.h |   7 +++
 kernel/cgroup/cgroup.c      | 109 ++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 113 insertions(+), 3 deletions(-)

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 5bff798..ab1b355 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -388,6 +388,13 @@ struct cgroup {
 	u16 old_subtree_ss_mask;
 	u16 old_subtree_bypass;
 
+	/*
+	 * The bitmask of subsystems that are set in its parent's
+	 * ->subtree_bypass and explicitly enabled in this cgroup.
+	 */
+	u16 enable_ss_mask;
+	u16 old_enable_ss_mask;
+
 	/* Private pointers for each registered subsystem */
 	struct cgroup_subsys_state __rcu *subsys[CGROUP_SUBSYS_COUNT];
 
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index a361c10..fa538f2 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -424,7 +424,7 @@ static u16 cgroup_control(struct cgroup *cgrp, bool show_bypass)
 	u16 root_ss_mask = cgrp->root->subsys_mask;
 
 	if (parent) {
-		u16 ss_mask = parent->subtree_control;
+		u16 ss_mask = parent->subtree_control|cgrp->enable_ss_mask;
 
 		if (show_bypass)
 			ss_mask |= parent->subtree_bypass;
@@ -447,7 +447,7 @@ static u16 cgroup_ss_mask(struct cgroup *cgrp, bool show_bypass)
 	struct cgroup *parent = cgroup_parent(cgrp);
 
 	if (parent) {
-		u16 ss_mask = parent->subtree_ss_mask;
+		u16 ss_mask = parent->subtree_ss_mask|cgrp->enable_ss_mask;
 
 
 		if (show_bypass)
@@ -2874,6 +2874,7 @@ static void cgroup_save_control(struct cgroup *cgrp)
 		dsct->old_subtree_control = dsct->subtree_control;
 		dsct->old_subtree_ss_mask = dsct->subtree_ss_mask;
 		dsct->old_subtree_bypass  = dsct->subtree_bypass;
+		dsct->old_enable_ss_mask  = dsct->enable_ss_mask;
 		dsct->old_dom_cgrp = dsct->dom_cgrp;
 	}
 }
@@ -2919,6 +2920,7 @@ static void cgroup_restore_control(struct cgroup *cgrp)
 		dsct->subtree_control = dsct->old_subtree_control;
 		dsct->subtree_ss_mask = dsct->old_subtree_ss_mask;
 		dsct->subtree_bypass  = dsct->old_subtree_bypass;
+		dsct->enable_ss_mask  = dsct->old_enable_ss_mask;
 		dsct->dom_cgrp = dsct->old_dom_cgrp;
 	}
 }
@@ -3197,7 +3199,8 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
 	}
 
 	cgroup_for_each_live_child(child, cgrp)
-		child_enable |= child->subtree_control|child->subtree_bypass;
+		child_enable |= child->subtree_control|child->subtree_bypass|
+				child->enable_ss_mask;
 
 	/*
 	 * Cannot change the state of a controller if enabled in children.
@@ -3230,6 +3233,105 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
 	return ret ?: nbytes;
 }
 
+/*
+ * Change bypass status of controllers for a cgroup in the default hierarchy.
+ */
+static ssize_t cgroup_controllers_write(struct kernfs_open_file *of,
+					char *buf, size_t nbytes,
+					loff_t off)
+{
+	u16 enable = 0, bypass = 0;
+	struct cgroup *cgrp, *parent;
+	struct cgroup_subsys *ss;
+	char *tok;
+	int ssid, ret;
+
+	/*
+	 * Parse input - space separated list of subsystem names prefixed
+	 * with either + or #.
+	 */
+	buf = strstrip(buf);
+	while ((tok = strsep(&buf, " "))) {
+		if (tok[0] == '\0')
+			continue;
+		do_each_subsys_mask(ss, ssid, ~cgrp_dfl_inhibit_ss_mask) {
+			if (!cgroup_ssid_enabled(ssid) ||
+			    strcmp(tok + 1, ss->name))
+				continue;
+
+			if (*tok == '+') {
+				enable |= 1 << ssid;
+				bypass &= ~(1 << ssid);
+			} else if (*tok == '#') {
+				bypass |= 1 << ssid;
+				enable &= ~(1 << ssid);
+			} else {
+				return -EINVAL;
+			}
+			break;
+		} while_each_subsys_mask();
+		if (ssid == CGROUP_SUBSYS_COUNT)
+			return -EINVAL;
+	}
+
+	cgrp = cgroup_kn_lock_live(of->kn, true);
+	if (!cgrp)
+		return -ENODEV;
+
+	/*
+	 * Write to root cgroup's controllers file is not allowed.
+	 */
+	parent = cgroup_parent(cgrp);
+	if (!parent) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	/*
+	 * Only controllers set into bypass mode in the parent cgroup
+	 * can be specified here.
+	 */
+	if (~parent->subtree_bypass & (enable|bypass)) {
+		ret = -ENOENT;
+		goto out_unlock;
+	}
+
+	/*
+	 * Mask off irrelevant bits.
+	 */
+	enable &= ~cgrp->enable_ss_mask;
+	bypass &=  cgrp->enable_ss_mask;
+
+	if (!(enable|bypass)) {
+		ret = 0;
+		goto out_unlock;
+	}
+
+	/*
+	 * We cannot change the bypass state of a controller that is enabled
+	 * in subtree_control.
+	 */
+	if ((cgrp->subtree_control|cgrp->subtree_bypass) & (enable|bypass)) {
+		ret = -EBUSY;
+		goto out_unlock;
+	}
+
+	/* Save and update control masks and prepare csses */
+	cgroup_save_control(cgrp);
+
+	cgrp->enable_ss_mask |= enable;
+	cgrp->enable_ss_mask &= ~bypass;
+
+	ret = cgroup_apply_control(cgrp);
+	cgroup_finalize_control(cgrp, ret);
+	kernfs_activate(cgrp->kn);
+	ret = 0;
+
+out_unlock:
+	cgroup_kn_unlock(of->kn);
+	return ret ?: nbytes;
+}
+
 /**
  * cgroup_enable_threaded - make @cgrp threaded
  * @cgrp: the target cgroup
@@ -4573,6 +4675,7 @@ static ssize_t cgroup_threads_write(struct kernfs_open_file *of,
 	{
 		.name = "cgroup.controllers",
 		.seq_show = cgroup_controllers_show,
+		.write = cgroup_controllers_write,
 	},
 	{
 		.name = "cgroup.subtree_control",
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 3/5] cgroup: Make debug controller report new controller masks
  2018-11-20 17:51 [PATCH v4 0/5] cgroup: Introducing bypass mode Waiman Long
  2018-11-20 17:51 ` [PATCH v4 1/5] cgroup: subtree_control bypass mode for bypassable controllers Waiman Long
  2018-11-20 17:51 ` [PATCH v4 2/5] cgroup: Allow reenabling of controller in bypass mode Waiman Long
@ 2018-11-20 17:51 ` Waiman Long
  2018-11-20 17:51 ` [PATCH v4 4/5] sched/core: Make cpu cgroup controller bypassable Waiman Long
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Waiman Long @ 2018-11-20 17:51 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra,
	Ingo Molnar, Jonathan Corbet
  Cc: cgroups, linux-kernel, linux-doc, Roman Gushchin, Jens Axboe,
	Andrew Morton, Dennis Zhou, Shakeel Butt, Waiman Long

The newly added cgroup controller masks (subtree_bypass and
enable_ss_mask) are now being reported in the debug.masks controller
file.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/cgroup/debug.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/cgroup/debug.c b/kernel/cgroup/debug.c
index 5f1b873..176bb9e 100644
--- a/kernel/cgroup/debug.c
+++ b/kernel/cgroup/debug.c
@@ -263,6 +263,8 @@ static int cgroup_masks_read(struct seq_file *seq, void *v)
 
 	cgroup_masks_read_one(seq, "subtree_control", cgrp->subtree_control);
 	cgroup_masks_read_one(seq, "subtree_ss_mask", cgrp->subtree_ss_mask);
+	cgroup_masks_read_one(seq, "subtree_bypass",  cgrp->subtree_bypass);
+	cgroup_masks_read_one(seq, "enable_ss_mask",  cgrp->enable_ss_mask);
 
 	cgroup_kn_unlock(of->kn);
 	return 0;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 4/5] sched/core: Make cpu cgroup controller bypassable
  2018-11-20 17:51 [PATCH v4 0/5] cgroup: Introducing bypass mode Waiman Long
                   ` (2 preceding siblings ...)
  2018-11-20 17:51 ` [PATCH v4 3/5] cgroup: Make debug controller report new controller masks Waiman Long
@ 2018-11-20 17:51 ` Waiman Long
  2018-11-20 17:51 ` [PATCH v4 5/5] cgroup: Document bypass mode Waiman Long
  2018-11-21 14:27 ` [PATCH v4 0/5] cgroup: Introducing " Michael Kerrisk
  5 siblings, 0 replies; 9+ messages in thread
From: Waiman Long @ 2018-11-20 17:51 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra,
	Ingo Molnar, Jonathan Corbet
  Cc: cgroups, linux-kernel, linux-doc, Roman Gushchin, Jens Axboe,
	Andrew Morton, Dennis Zhou, Shakeel Butt, Waiman Long

Make the cpu cgroup controller bypassable in the default hierarchy so
that cpu controller could be activated only in those cgroups that really
need it instead of all the cgroups down to the lowest level that need it.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/sched/core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f12225f..ba4550c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7000,6 +7000,7 @@ struct cgroup_subsys cpu_cgrp_subsys = {
 	.dfl_cftypes	= cpu_files,
 	.early_init	= true,
 	.threaded	= true,
+	.bypassable	= true,
 };
 
 #endif	/* CONFIG_CGROUP_SCHED */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 5/5] cgroup: Document bypass mode
  2018-11-20 17:51 [PATCH v4 0/5] cgroup: Introducing bypass mode Waiman Long
                   ` (3 preceding siblings ...)
  2018-11-20 17:51 ` [PATCH v4 4/5] sched/core: Make cpu cgroup controller bypassable Waiman Long
@ 2018-11-20 17:51 ` Waiman Long
  2018-11-21 14:27 ` [PATCH v4 0/5] cgroup: Introducing " Michael Kerrisk
  5 siblings, 0 replies; 9+ messages in thread
From: Waiman Long @ 2018-11-20 17:51 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra,
	Ingo Molnar, Jonathan Corbet
  Cc: cgroups, linux-kernel, linux-doc, Roman Gushchin, Jens Axboe,
	Andrew Morton, Dennis Zhou, Shakeel Butt, Waiman Long

The cgroup-v2.rst file is updated to document the new bypass mode.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/admin-guide/cgroup-v2.rst | 66 ++++++++++++++++++++++++---------
 1 file changed, 48 insertions(+), 18 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 07e0613..17e6584 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -372,9 +372,16 @@ disabled by writing to the "cgroup.subtree_control" file::
 
   # echo "+cpu +memory -io" > cgroup.subtree_control
 
-Only controllers which are listed in "cgroup.controllers" can be
-enabled.  When multiple operations are specified as above, either they
-all succeed or fail.  If multiple operations on the same controller
+The prefixes '+', '-' and '#' are used to enable, disable or put
+a controller in the bypass mode respectively.  In the bypass mode,
+a controller is disabled in a cgroup, but it can be enabled again in
+its child cgroups as it will still be listed in "cgroup.controllers".
+Bypass mode can only be used on bypassable controllers.  Currently,
+only the cpu controller is bypassable.
+
+Only controllers which are listed in "cgroup.controllers" can be enabled
+or bypassed.  When multiple operations are specified as above, either
+they all succeed or fail.  If multiple operations on the same controller
 are specified, the last one is effective.
 
 Enabling a controller in a cgroup indicates that the distribution of
@@ -399,6 +406,20 @@ prefixed controller interface files from C and D.  This means that the
 controller interface files - anything which doesn't start with
 "cgroup." are owned by the parent rather than the cgroup itself.
 
+Once a bypassable controller is put into bypass mode in
+"cgroup.subtree_control", that controller can optionally be enabled
+again in child cgroups by writing the controller name with the '+ prefix
+into "cgroup.controllers".  Writing the controller name with the '#'
+prefix into "cgroup.controllers" resets the state back to bypass mode.
+The state of a bypassable controller cannot be changed anymore if it
+is enabled or bypassed in its "cgroup.subtree_control".
+
+The use of bypass mode thus allows a cgroup parent to have the ability
+to selectively enable a bypassable controller in a subset of its child
+cgroups instead of in either all or none of them. In other words, a
+bypassable can be enabled only on the cgroups that actually need it,
+if desired.
+
 
 Top-down Constraint
 ~~~~~~~~~~~~~~~~~~~
@@ -406,10 +427,11 @@ Top-down Constraint
 Resources are distributed top-down and a cgroup can further distribute
 a resource only if the resource has been distributed to it from the
 parent.  This means that all non-root "cgroup.subtree_control" files
-can only contain controllers which are enabled in the parent's
-"cgroup.subtree_control" file.  A controller can be enabled only if
-the parent has the controller enabled and a controller can't be
-disabled if one or more children have it enabled.
+can only contain controllers which are enabled or bypassed in the
+parent's "cgroup.subtree_control" file.  A controller can be enabled
+or bypassed only if the parent has the controller enabled or bypassed
+and the state of a controller can't be changed if one or more children
+have it enabled or bypassed.
 
 
 No Internal Process Constraint
@@ -834,11 +856,18 @@ All cgroup core files are prefixed with "cgroup."
 	should be granted along with the containing directory.
 
   cgroup.controllers
-	A read-only space separated values file which exists on all
+	A read-write space separated values file which exists on all
 	cgroups.
 
 	It shows space separated list of all controllers available to
-	the cgroup.  The controllers are not ordered.
+	the cgroup.  Controller names with '#' prefix are in bypass mode.
+	The controllers are not ordered.
+
+	When a controller is set into bypass mode in its parent's
+	"cgroup.subtree_control", its name prefixed with '+' or '#'
+	can be written to enable it or reset it back to bypass mode
+	respectively.  Controllers not in bypass mode are not allowed
+	to be written.
 
   cgroup.subtree_control
 	A read-write space separated values file which exists on all
@@ -848,12 +877,12 @@ All cgroup core files are prefixed with "cgroup."
 	which are enabled to control resource distribution from the
 	cgroup to its children.
 
-	Space separated list of controllers prefixed with '+' or '-'
-	can be written to enable or disable controllers.  A controller
-	name prefixed with '+' enables the controller and '-'
-	disables.  If a controller appears more than once on the list,
-	the last one is effective.  When multiple enable and disable
-	operations are specified, either all succeed or all fail.
+	Space separated list of controllers prefixed with '+', '-' or
+	'#' can be written to enable, disable or bypass controllers
+	respectively.  If a controller appears more than once on
+	the list, the last one is effective.  When multiple enable,
+	disable or bypass operations are specified, either all succeed
+	or all fail.
 
   cgroup.events
 	A read-only flat-keyed file which exists on non-root cgroups.
@@ -904,17 +933,18 @@ Controllers
 CPU
 ---
 
-The "cpu" controllers regulates distribution of CPU cycles.  This
+The "cpu" controller regulates distribution of CPU cycles.  This
 controller implements weight and absolute bandwidth limit models for
 normal scheduling policy and absolute bandwidth allocation model for
-realtime scheduling policy.
+realtime scheduling policy.  The cpu controller is bypassable.
 
 WARNING: cgroup2 doesn't yet support control of realtime processes and
 the cpu controller can only be enabled when all RT processes are in
 the root cgroup.  Be aware that system management software may already
 have placed RT processes into nonroot cgroups during the system boot
 process, and these processes may need to be moved to the root cgroup
-before the cpu controller can be enabled.
+before the cpu controller can be enabled unless bypass mode is used
+in those non-root cgroups.
 
 
 CPU Interface Files
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 0/5] cgroup: Introducing bypass mode
  2018-11-20 17:51 [PATCH v4 0/5] cgroup: Introducing bypass mode Waiman Long
                   ` (4 preceding siblings ...)
  2018-11-20 17:51 ` [PATCH v4 5/5] cgroup: Document bypass mode Waiman Long
@ 2018-11-21 14:27 ` Michael Kerrisk
  5 siblings, 0 replies; 9+ messages in thread
From: Michael Kerrisk @ 2018-11-21 14:27 UTC (permalink / raw)
  To: longman
  Cc: Tejun Heo, lizefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar,
	Jonathan Corbet, cgroups, Linux Kernel, linux-doc, guro, axboe,
	Andrew Morton, dennis, shakeelb, Linux API, Michael Kerrisk

[CC += linux-api@vger.kernel.org]

Hi Waiman,

Since this is a kernel-user-space API change, for all future
iterations of this patch, please CC linux-api@.

The kernel source file Documentation/SubmitChecklist notes that all
Linux kernel patches that change userspace interfaces should be CCed
to linux-api@vger.kernel.org, so that the various parties who are
interested in API changes are informed. For further information, see
https://www.kernel.org/doc/man-pages/linux-api-ml.html

Thanks,

Michael
On Tue, Nov 20, 2018 at 8:45 PM Waiman Long <longman@redhat.com> wrote:
>
>  v4:
>   - Rebased to the latest for-4.21 branch of cgroup tree.
>   - Make each controller explicitly opt in to become bypassable by
>     setting the bypassable cgroup_subsys flag. Currently, only cpu
>     controller is made bypassable.
>   - Break out the cgroup_v2.rst documentation update as separate patch.
>
>  v3:
>   - Remove invalid cgroup subdirectory creation patch.
>   - Add use cases for the bypass mode and removing statements about
>     control files ownership in cgroup-v2.txt.
>   - Restrict bypass mode to non-domain (threaded) controllers only.
>
>  v3 patch - https://lkml.org/lkml/2017/8/9/604
>
> This patchset introduces a new bypass mode to the cgroup v2 core to
> give more freedom and flexibility to controllers which choose to become
> bypassable the freedom to shape their own unique views of the virtual
> cgroup hierarchies that can best suit thier own use cases.
>
> Because of the inherent performance overhead in enabling cpu controller,
> it is made bypassable so that the controller only needs to be enabled
> at those cgroups that really need it instead of in every cgroups at a
> given layer if at least one of them needs it.
>
> The cpu controller performance problem is one of the major issues
> in migrating from cgroup v1 to v2.
>
> For example,
>
>   R - A(+) - B(#) - C(+)
>                   \ D(#)
>
> where "+" means the controller is enabled and "#" means the controller
> is bypassed. For this controller's perspective, the cgroups are
> equivalent to:
>
>   R - A|B|D - C
>
> Underneath the root R, cgoups A, B and D are controlled by one set of
> knobs and cgroup C is controlled by another set of knobs as a child of
> cgroups A|B|C.
>
> This patchset is layered on top of the "for-4.21" branch of Tejun's
> cgroup git tree.
>
> Patch 1 introduces a new bypass mode that allows a bypassable
> controller to be disabled in a cgroup, but to be re-enabled again in its
> children. This is enabled by writing the controller name prefixed with
> '#' to the "cgroup.subtree_control" file. Then all its children will
> have this controller in bypass mode.
>
> Patch 2 extends the bypass mode mechanism to allow those child cgroups
> that are put into the bypass mode for a particular bypassable controller
> by their parent to be re-enabled again by writing the controller name
> with the '+' prefix to the "cgroup.controllers" file.
>
> Patch 3 extends the debug controller to expose additional controller
> masks introduced by this patchset.
>
> Patch 4 makes the cpu controller bypassable.
>
> Patch 5 documents the new bypass mode in cgroup-v2.rst file.
>
> Waiman Long (5):
>   cgroup: subtree_control bypass mode for bypassable controllers
>   cgroup: Allow reenabling of controller in bypass mode
>   cgroup: Make debug controller report new controller masks
>   sched/core: Make cpu cgroup controller bypassable
>   cgroup: Document bypass mode
>
>  Documentation/admin-guide/cgroup-v2.rst |  66 +++++---
>  include/linux/cgroup-defs.h             |  26 +++-
>  kernel/cgroup/cgroup.c                  | 256 +++++++++++++++++++++++++-------
>  kernel/cgroup/debug.c                   |   2 +
>  kernel/sched/core.c                     |   1 +
>  5 files changed, 276 insertions(+), 75 deletions(-)
>
> --
> 1.8.3.1
>


-- 
Michael Kerrisk Linux man-pages maintainer;
http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface", http://blog.man7.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 1/5] cgroup: subtree_control bypass mode for bypassable controllers
  2018-11-20 17:51 ` [PATCH v4 1/5] cgroup: subtree_control bypass mode for bypassable controllers Waiman Long
@ 2018-11-29 11:18   ` Dan Carpenter
  2018-12-10 19:56     ` Waiman Long
  0 siblings, 1 reply; 9+ messages in thread
From: Dan Carpenter @ 2018-11-29 11:18 UTC (permalink / raw)
  To: kbuild, Waiman Long
  Cc: kbuild-all, Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra,
	Ingo Molnar, Jonathan Corbet, cgroups, linux-kernel, linux-doc,
	Roman Gushchin, Jens Axboe, Andrew Morton, Dennis Zhou,
	Shakeel Butt, Waiman Long

Hi Waiman,

Thank you for the patch! Perhaps something to improve:

url:    https://github.com/0day-ci/linux/commits/Waiman-Long/cgroup-Introducing-bypass-mode/20181123-030552
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-next

smatch warnings:
kernel/cgroup/cgroup.c:4893 css_create() error: we previously assumed 'parent' could be null (see line 4864)

# https://github.com/0day-ci/linux/commit/8b68fd4330e043645667a5d3306398f8f88f9ff2
git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout 8b68fd4330e043645667a5d3306398f8f88f9ff2
vim +/parent +4893 kernel/cgroup/cgroup.c

a31f2d3ff kernel/cgroup.c        Tejun Heo        2012-11-19  4840  
c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4841  /**
6cd0f5bba kernel/cgroup.c        Tejun Heo        2016-03-03  4842   * css_create - create a cgroup_subsys_state
c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4843   * @cgrp: the cgroup new css will be associated with
c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4844   * @ss: the subsys of new css
c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4845   *
c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4846   * Create a new css associated with @cgrp - @ss pair.  On success, the new
6cd0f5bba kernel/cgroup.c        Tejun Heo        2016-03-03  4847   * css is online and installed in @cgrp.  This function doesn't create the
6cd0f5bba kernel/cgroup.c        Tejun Heo        2016-03-03  4848   * interface files.  Returns 0 on success, -errno on failure.
c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4849   */
6cd0f5bba kernel/cgroup.c        Tejun Heo        2016-03-03  4850  static struct cgroup_subsys_state *css_create(struct cgroup *cgrp,
6cd0f5bba kernel/cgroup.c        Tejun Heo        2016-03-03  4851  					      struct cgroup_subsys *ss)
c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4852  {
d51f39b05 kernel/cgroup.c        Tejun Heo        2014-05-16  4853  	struct cgroup *parent = cgroup_parent(cgrp);
8b68fd433 kernel/cgroup/cgroup.c Waiman Long      2018-11-20  4854  	struct cgroup_subsys_state *parent_css = NULL;
c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4855  	struct cgroup_subsys_state *css;
c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4856  	int err;
c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4857  
c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4858  	lockdep_assert_held(&cgroup_mutex);
c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4859  
8b68fd433 kernel/cgroup/cgroup.c Waiman Long      2018-11-20  4860  	/*
8b68fd433 kernel/cgroup/cgroup.c Waiman Long      2018-11-20  4861  	 * As cgroup may be in bypass mode, need to skip over ancestor
8b68fd433 kernel/cgroup/cgroup.c Waiman Long      2018-11-20  4862  	 * cgroups with NULL CSS.
8b68fd433 kernel/cgroup/cgroup.c Waiman Long      2018-11-20  4863  	 */
8b68fd433 kernel/cgroup/cgroup.c Waiman Long      2018-11-20 @4864  	for (; parent && !parent_css; parent = cgroup_parent(parent))
                                                                               ^^^^^^^^^^^^^^^^^^^^^
8b68fd433 kernel/cgroup/cgroup.c Waiman Long      2018-11-20  4865  		parent_css = cgroup_css(parent, ss);

When we exit this loop it means either parent is NULL or parent_css
is non-NULL.

8b68fd433 kernel/cgroup/cgroup.c Waiman Long      2018-11-20  4866  
1fed1b2e3 kernel/cgroup.c        Tejun Heo        2014-05-16  4867  	css = ss->css_alloc(parent_css);
e7e15b87f kernel/cgroup.c        Tejun Heo        2016-06-21  4868  	if (!css)
e7e15b87f kernel/cgroup.c        Tejun Heo        2016-06-21  4869  		css = ERR_PTR(-ENOMEM);
c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4870  	if (IS_ERR(css))
6cd0f5bba kernel/cgroup.c        Tejun Heo        2016-03-03  4871  		return css;
c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4872  
8b68fd433 kernel/cgroup/cgroup.c Waiman Long      2018-11-20  4873  	init_and_link_css(css, ss, cgrp, parent_css);
a2bed8209 kernel/cgroup.c        Tejun Heo        2014-05-04  4874  
2aad2a86f kernel/cgroup.c        Tejun Heo        2014-09-24  4875  	err = percpu_ref_init(&css->refcnt, css_release, 0, GFP_KERNEL);
c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4876  	if (err)
3eb59ec64 kernel/cgroup.c        Li Zefan         2014-03-18  4877  		goto err_free_css;
c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4878  
cf780b7dc kernel/cgroup.c        Vladimir Davydov 2015-08-03  4879  	err = cgroup_idr_alloc(&ss->css_idr, NULL, 2, 0, GFP_KERNEL);
15a4c835e kernel/cgroup.c        Tejun Heo        2014-05-04  4880  	if (err < 0)
b00c52dae kernel/cgroup.c        Wenwei Tao       2016-05-13  4881  		goto err_free_css;
15a4c835e kernel/cgroup.c        Tejun Heo        2014-05-04  4882  	css->id = err;
c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4883  
15a4c835e kernel/cgroup.c        Tejun Heo        2014-05-04  4884  	/* @css is ready to be brought online now, make it visible */
1fed1b2e3 kernel/cgroup.c        Tejun Heo        2014-05-16  4885  	list_add_tail_rcu(&css->sibling, &parent_css->children);
15a4c835e kernel/cgroup.c        Tejun Heo        2014-05-04  4886  	cgroup_idr_replace(&ss->css_idr, css, css->id);
c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4887  
c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4888  	err = online_css(css);
c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4889  	if (err)
1fed1b2e3 kernel/cgroup.c        Tejun Heo        2014-05-16  4890  		goto err_list_del;
944196278 kernel/cgroup.c        Tejun Heo        2014-03-19  4891  
c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4892  	if (ss->broken_hierarchy && !ss->warned_broken_hierarchy &&
d51f39b05 kernel/cgroup.c        Tejun Heo        2014-05-16 @4893  	    cgroup_parent(parent)) {
                                                                                          ^^^^^^
We dereference parent inside the function, but I don't know for sure
if this is reachable when "parent" is NULL.

ed3d261b5 kernel/cgroup.c        Joe Perches      2014-04-25  4894  		pr_warn("%s (%d) created nested cgroup for controller \"%s\" which has incomplete hierarchy support. Nested cgroups may change behavior in the future.\n",
c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4895  			current->comm, current->pid, ss->name);
c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4896  		if (!strcmp(ss->name, "memory"))
ed3d261b5 kernel/cgroup.c        Joe Perches      2014-04-25  4897  			pr_warn("\"memory\" requires setting use_hierarchy to 1 on the root\n");
c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4898  		ss->warned_broken_hierarchy = true;
c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4899  	}
c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4900  
6cd0f5bba kernel/cgroup.c        Tejun Heo        2016-03-03  4901  	return css;
c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4902  
1fed1b2e3 kernel/cgroup.c        Tejun Heo        2014-05-16  4903  err_list_del:
1fed1b2e3 kernel/cgroup.c        Tejun Heo        2014-05-16  4904  	list_del_rcu(&css->sibling);
3eb59ec64 kernel/cgroup.c        Li Zefan         2014-03-18  4905  err_free_css:
8f53470ba kernel/cgroup/cgroup.c Tejun Heo        2018-04-26  4906  	list_del_rcu(&css->rstat_css_node);
8f36aaec9 kernel/cgroup/cgroup.c Tejun Heo        2018-03-14  4907  	INIT_RCU_WORK(&css->destroy_rwork, css_free_rwork_fn);
8f36aaec9 kernel/cgroup/cgroup.c Tejun Heo        2018-03-14  4908  	queue_rcu_work(cgroup_destroy_wq, &css->destroy_rwork);
6cd0f5bba kernel/cgroup.c        Tejun Heo        2016-03-03  4909  	return ERR_PTR(err);
c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4910  }
c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4911  

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 1/5] cgroup: subtree_control bypass mode for bypassable controllers
  2018-11-29 11:18   ` Dan Carpenter
@ 2018-12-10 19:56     ` Waiman Long
  0 siblings, 0 replies; 9+ messages in thread
From: Waiman Long @ 2018-12-10 19:56 UTC (permalink / raw)
  To: Dan Carpenter, kbuild
  Cc: kbuild-all, Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra,
	Ingo Molnar, Jonathan Corbet, cgroups, linux-kernel, linux-doc,
	Roman Gushchin, Jens Axboe, Andrew Morton, Dennis Zhou,
	Shakeel Butt

On 11/29/2018 06:18 AM, Dan Carpenter wrote:
> Hi Waiman,
>
> Thank you for the patch! Perhaps something to improve:
>
> url:    https://github.com/0day-ci/linux/commits/Waiman-Long/cgroup-Introducing-bypass-mode/20181123-030552
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-next
>
> smatch warnings:
> kernel/cgroup/cgroup.c:4893 css_create() error: we previously assumed 'parent' could be null (see line 4864)
>
> # https://github.com/0day-ci/linux/commit/8b68fd4330e043645667a5d3306398f8f88f9ff2
> git remote add linux-review https://github.com/0day-ci/linux
> git remote update linux-review
> git checkout 8b68fd4330e043645667a5d3306398f8f88f9ff2
> vim +/parent +4893 kernel/cgroup/cgroup.c
>
> a31f2d3ff kernel/cgroup.c        Tejun Heo        2012-11-19  4840  
> c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4841  /**
> 6cd0f5bba kernel/cgroup.c        Tejun Heo        2016-03-03  4842   * css_create - create a cgroup_subsys_state
> c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4843   * @cgrp: the cgroup new css will be associated with
> c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4844   * @ss: the subsys of new css
> c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4845   *
> c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4846   * Create a new css associated with @cgrp - @ss pair.  On success, the new
> 6cd0f5bba kernel/cgroup.c        Tejun Heo        2016-03-03  4847   * css is online and installed in @cgrp.  This function doesn't create the
> 6cd0f5bba kernel/cgroup.c        Tejun Heo        2016-03-03  4848   * interface files.  Returns 0 on success, -errno on failure.
> c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4849   */
> 6cd0f5bba kernel/cgroup.c        Tejun Heo        2016-03-03  4850  static struct cgroup_subsys_state *css_create(struct cgroup *cgrp,
> 6cd0f5bba kernel/cgroup.c        Tejun Heo        2016-03-03  4851  					      struct cgroup_subsys *ss)
> c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4852  {
> d51f39b05 kernel/cgroup.c        Tejun Heo        2014-05-16  4853  	struct cgroup *parent = cgroup_parent(cgrp);
> 8b68fd433 kernel/cgroup/cgroup.c Waiman Long      2018-11-20  4854  	struct cgroup_subsys_state *parent_css = NULL;
> c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4855  	struct cgroup_subsys_state *css;
> c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4856  	int err;
> c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4857  
> c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4858  	lockdep_assert_held(&cgroup_mutex);
> c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4859  
> 8b68fd433 kernel/cgroup/cgroup.c Waiman Long      2018-11-20  4860  	/*
> 8b68fd433 kernel/cgroup/cgroup.c Waiman Long      2018-11-20  4861  	 * As cgroup may be in bypass mode, need to skip over ancestor
> 8b68fd433 kernel/cgroup/cgroup.c Waiman Long      2018-11-20  4862  	 * cgroups with NULL CSS.
> 8b68fd433 kernel/cgroup/cgroup.c Waiman Long      2018-11-20  4863  	 */
> 8b68fd433 kernel/cgroup/cgroup.c Waiman Long      2018-11-20 @4864  	for (; parent && !parent_css; parent = cgroup_parent(parent))
>                                                                                ^^^^^^^^^^^^^^^^^^^^^
> 8b68fd433 kernel/cgroup/cgroup.c Waiman Long      2018-11-20  4865  		parent_css = cgroup_css(parent, ss);
>
> When we exit this loop it means either parent is NULL or parent_css
> is non-NULL.

Sorry for the bug. I should have a temp variable to go up
cgroup_parent() iteration chain and leave parent alone. Thanks for
spotting that.

> 8b68fd433 kernel/cgroup/cgroup.c Waiman Long      2018-11-20  4866  
> 1fed1b2e3 kernel/cgroup.c        Tejun Heo        2014-05-16  4867  	css = ss->css_alloc(parent_css);
> e7e15b87f kernel/cgroup.c        Tejun Heo        2016-06-21  4868  	if (!css)
> e7e15b87f kernel/cgroup.c        Tejun Heo        2016-06-21  4869  		css = ERR_PTR(-ENOMEM);
> c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4870  	if (IS_ERR(css))
> 6cd0f5bba kernel/cgroup.c        Tejun Heo        2016-03-03  4871  		return css;
> c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4872  
> 8b68fd433 kernel/cgroup/cgroup.c Waiman Long      2018-11-20  4873  	init_and_link_css(css, ss, cgrp, parent_css);
> a2bed8209 kernel/cgroup.c        Tejun Heo        2014-05-04  4874  
> 2aad2a86f kernel/cgroup.c        Tejun Heo        2014-09-24  4875  	err = percpu_ref_init(&css->refcnt, css_release, 0, GFP_KERNEL);
> c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4876  	if (err)
> 3eb59ec64 kernel/cgroup.c        Li Zefan         2014-03-18  4877  		goto err_free_css;
> c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4878  
> cf780b7dc kernel/cgroup.c        Vladimir Davydov 2015-08-03  4879  	err = cgroup_idr_alloc(&ss->css_idr, NULL, 2, 0, GFP_KERNEL);
> 15a4c835e kernel/cgroup.c        Tejun Heo        2014-05-04  4880  	if (err < 0)
> b00c52dae kernel/cgroup.c        Wenwei Tao       2016-05-13  4881  		goto err_free_css;
> 15a4c835e kernel/cgroup.c        Tejun Heo        2014-05-04  4882  	css->id = err;
> c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4883  
> 15a4c835e kernel/cgroup.c        Tejun Heo        2014-05-04  4884  	/* @css is ready to be brought online now, make it visible */
> 1fed1b2e3 kernel/cgroup.c        Tejun Heo        2014-05-16  4885  	list_add_tail_rcu(&css->sibling, &parent_css->children);
> 15a4c835e kernel/cgroup.c        Tejun Heo        2014-05-04  4886  	cgroup_idr_replace(&ss->css_idr, css, css->id);
> c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4887  
> c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4888  	err = online_css(css);
> c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4889  	if (err)
> 1fed1b2e3 kernel/cgroup.c        Tejun Heo        2014-05-16  4890  		goto err_list_del;
> 944196278 kernel/cgroup.c        Tejun Heo        2014-03-19  4891  
> c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4892  	if (ss->broken_hierarchy && !ss->warned_broken_hierarchy &&
> d51f39b05 kernel/cgroup.c        Tejun Heo        2014-05-16 @4893  	    cgroup_parent(parent)) {
>                                                                                           ^^^^^^
> We dereference parent inside the function, but I don't know for sure
> if this is reachable when "parent" is NULL.
>
> ed3d261b5 kernel/cgroup.c        Joe Perches      2014-04-25  4894  		pr_warn("%s (%d) created nested cgroup for controller \"%s\" which has incomplete hierarchy support. Nested cgroups may change behavior in the future.\n",
> c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4895  			current->comm, current->pid, ss->name);
> c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4896  		if (!strcmp(ss->name, "memory"))
> ed3d261b5 kernel/cgroup.c        Joe Perches      2014-04-25  4897  			pr_warn("\"memory\" requires setting use_hierarchy to 1 on the root\n");
> c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4898  		ss->warned_broken_hierarchy = true;
> c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4899  	}
> c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4900  
> 6cd0f5bba kernel/cgroup.c        Tejun Heo        2016-03-03  4901  	return css;
> c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4902  
> 1fed1b2e3 kernel/cgroup.c        Tejun Heo        2014-05-16  4903  err_list_del:
> 1fed1b2e3 kernel/cgroup.c        Tejun Heo        2014-05-16  4904  	list_del_rcu(&css->sibling);
> 3eb59ec64 kernel/cgroup.c        Li Zefan         2014-03-18  4905  err_free_css:
> 8f53470ba kernel/cgroup/cgroup.c Tejun Heo        2018-04-26  4906  	list_del_rcu(&css->rstat_css_node);
> 8f36aaec9 kernel/cgroup/cgroup.c Tejun Heo        2018-03-14  4907  	INIT_RCU_WORK(&css->destroy_rwork, css_free_rwork_fn);
> 8f36aaec9 kernel/cgroup/cgroup.c Tejun Heo        2018-03-14  4908  	queue_rcu_work(cgroup_destroy_wq, &css->destroy_rwork);
> 6cd0f5bba kernel/cgroup.c        Tejun Heo        2016-03-03  4909  	return ERR_PTR(err);
> c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4910  }
> c81c925ad kernel/cgroup.c        Tejun Heo        2013-12-06  4911  
>
> ---
> 0-DAY kernel test infrastructure                Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

Cheers,
Longman


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-12-10 19:56 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-20 17:51 [PATCH v4 0/5] cgroup: Introducing bypass mode Waiman Long
2018-11-20 17:51 ` [PATCH v4 1/5] cgroup: subtree_control bypass mode for bypassable controllers Waiman Long
2018-11-29 11:18   ` Dan Carpenter
2018-12-10 19:56     ` Waiman Long
2018-11-20 17:51 ` [PATCH v4 2/5] cgroup: Allow reenabling of controller in bypass mode Waiman Long
2018-11-20 17:51 ` [PATCH v4 3/5] cgroup: Make debug controller report new controller masks Waiman Long
2018-11-20 17:51 ` [PATCH v4 4/5] sched/core: Make cpu cgroup controller bypassable Waiman Long
2018-11-20 17:51 ` [PATCH v4 5/5] cgroup: Document bypass mode Waiman Long
2018-11-21 14:27 ` [PATCH v4 0/5] cgroup: Introducing " Michael Kerrisk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).