All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET] cgroup, perf_event: make perf_event work on v2 hierarchy
@ 2016-01-07 22:29 ` Tejun Heo
  0 siblings, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2016-01-07 22:29 UTC (permalink / raw)
  To: lizefan, hannes, a.p.zijlstra, mingo, acme
  Cc: linux-kernel, cgroups, kernel-team

Hello,

perf_event is special in that while it primarily is there to identify
cgroup membership of tasks, it also keeps some per-cgroup state, so
unlike other controllers which are purely for identification, it can't
be replaced by v2 hierarchy membership tests.  At the same time, as it
is primarily used for tagging, doesn't incur meaningful runtime
overhead and doesn't have any interface, it's awkward to control it
via the usual "cgroup.subtree_control" mechanism on the v2 hierarchy.

This patchset makes perf_event implicitly enabled on the v2 hierarchy
so that cgroup v2 path automatically works with "perf record --cgroup"
as long as perf_event controller is available and not mounted on a
legacy hierarchy.  "perf record" is updated so that it searches for
both v1 hierarchy w/ perf_event on it and v2 hierarchy.  The v1
hierarchy is used if exists; otherwise, v2 is used, making the
transition transparent to users.

This patchset contains the following six patches.

 0001-cgroup-s-cgrp_dfl_root_-cgrp_dfl_.patch
 0002-cgroup-convert-cgroup_subsys-flag-fields-to-bool-bit.patch
 0003-cgroup-make-css_tryget_online_from_dir-also-recogniz.patch
 0004-cgroup-use-subtree_control-when-testing-no-internal-.patch
 0005-cgroup-implement-cgroup_subsys-implicit_on_dfl.patch
 0006-cgroup-perf_event-make-perf_event-controller-work-on.patch

0001-0004 are prep patches.

0005 implents implicit controller enabling.

0006 marks perf_event as implicitly enabled on v2 and updates "perf
record"

This patchset is on top of cgroup/for-4.5 and available in the
following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-cgroup2-perf_event

diffstat follows.  Thanks.

 Documentation/cgroup.txt    |   13 ++++
 include/linux/cgroup-defs.h |   21 ++++++-
 kernel/cgroup.c             |  129 ++++++++++++++++++++++++++++++++++----------
 kernel/cpuset.c             |    2 
 kernel/events/core.c        |    6 ++
 kernel/sched/core.c         |    2 
 kernel/sched/cpuacct.c      |    2 
 tools/perf/util/cgroup.c    |   26 ++++++--
 8 files changed, 160 insertions(+), 41 deletions(-)

--
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCHSET] cgroup, perf_event: make perf_event work on v2 hierarchy
@ 2016-01-07 22:29 ` Tejun Heo
  0 siblings, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2016-01-07 22:29 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, hannes-druUgvl0LCNAfugRpC6u6w,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw,
	mingo-H+wXaHxf7aLQT0dZR+AlfA, acme-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg

Hello,

perf_event is special in that while it primarily is there to identify
cgroup membership of tasks, it also keeps some per-cgroup state, so
unlike other controllers which are purely for identification, it can't
be replaced by v2 hierarchy membership tests.  At the same time, as it
is primarily used for tagging, doesn't incur meaningful runtime
overhead and doesn't have any interface, it's awkward to control it
via the usual "cgroup.subtree_control" mechanism on the v2 hierarchy.

This patchset makes perf_event implicitly enabled on the v2 hierarchy
so that cgroup v2 path automatically works with "perf record --cgroup"
as long as perf_event controller is available and not mounted on a
legacy hierarchy.  "perf record" is updated so that it searches for
both v1 hierarchy w/ perf_event on it and v2 hierarchy.  The v1
hierarchy is used if exists; otherwise, v2 is used, making the
transition transparent to users.

This patchset contains the following six patches.

 0001-cgroup-s-cgrp_dfl_root_-cgrp_dfl_.patch
 0002-cgroup-convert-cgroup_subsys-flag-fields-to-bool-bit.patch
 0003-cgroup-make-css_tryget_online_from_dir-also-recogniz.patch
 0004-cgroup-use-subtree_control-when-testing-no-internal-.patch
 0005-cgroup-implement-cgroup_subsys-implicit_on_dfl.patch
 0006-cgroup-perf_event-make-perf_event-controller-work-on.patch

0001-0004 are prep patches.

0005 implents implicit controller enabling.

0006 marks perf_event as implicitly enabled on v2 and updates "perf
record"

This patchset is on top of cgroup/for-4.5 and available in the
following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-cgroup2-perf_event

diffstat follows.  Thanks.

 Documentation/cgroup.txt    |   13 ++++
 include/linux/cgroup-defs.h |   21 ++++++-
 kernel/cgroup.c             |  129 ++++++++++++++++++++++++++++++++++----------
 kernel/cpuset.c             |    2 
 kernel/events/core.c        |    6 ++
 kernel/sched/core.c         |    2 
 kernel/sched/cpuacct.c      |    2 
 tools/perf/util/cgroup.c    |   26 ++++++--
 8 files changed, 160 insertions(+), 41 deletions(-)

--
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/6] cgroup: s/cgrp_dfl_root_/cgrp_dfl_/
  2016-01-07 22:29 ` Tejun Heo
  (?)
@ 2016-01-07 22:29 ` Tejun Heo
  -1 siblings, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2016-01-07 22:29 UTC (permalink / raw)
  To: lizefan, hannes, a.p.zijlstra, mingo, acme
  Cc: linux-kernel, cgroups, kernel-team, Tejun Heo

These var names are unnecessarily unwiedly and another similar
variable will be added.  Let's shorten them.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/cgroup.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 122ec55..f542264 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -177,10 +177,10 @@ EXPORT_SYMBOL_GPL(cgrp_dfl_root);
  * The default hierarchy always exists but is hidden until mounted for the
  * first time.  This is for backward compatibility.
  */
-static bool cgrp_dfl_root_visible;
+static bool cgrp_dfl_visible;
 
 /* some controllers are not supported in the default hierarchy */
-static unsigned long cgrp_dfl_root_inhibit_ss_mask;
+static unsigned long cgrp_dfl_inhibit_ss_mask;
 
 /* The list of hierarchy roots */
 
@@ -1473,7 +1473,7 @@ static int rebind_subsystems(struct cgroup_root *dst_root,
 	/* skip creating root files on dfl_root for inhibited subsystems */
 	tmp_ss_mask = ss_mask;
 	if (dst_root == &cgrp_dfl_root)
-		tmp_ss_mask &= ~cgrp_dfl_root_inhibit_ss_mask;
+		tmp_ss_mask &= ~cgrp_dfl_inhibit_ss_mask;
 
 	for_each_subsys_which(ss, ssid, &tmp_ss_mask) {
 		struct cgroup *scgrp = &ss->root->cgrp;
@@ -1490,7 +1490,7 @@ static int rebind_subsystems(struct cgroup_root *dst_root,
 		 * Just warn about it and continue.
 		 */
 		if (dst_root == &cgrp_dfl_root) {
-			if (cgrp_dfl_root_visible) {
+			if (cgrp_dfl_visible) {
 				pr_warn("failed to create files (%d) while rebinding 0x%lx to default root\n",
 					ret, ss_mask);
 				pr_warn("you may retry by moving them to a different hierarchy and unbinding\n");
@@ -1991,7 +1991,7 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
 			pr_err("cgroup2: unknown option \"%s\"\n", (char *)data);
 			return ERR_PTR(-EINVAL);
 		}
-		cgrp_dfl_root_visible = true;
+		cgrp_dfl_visible = true;
 		root = &cgrp_dfl_root;
 		cgroup_get(&root->cgrp);
 		goto out_mount;
@@ -2842,7 +2842,7 @@ static int cgroup_root_controllers_show(struct seq_file *seq, void *v)
 	struct cgroup *cgrp = seq_css(seq)->cgroup;
 
 	cgroup_print_ss_mask(seq, cgrp->root->subsys_mask &
-			     ~cgrp_dfl_root_inhibit_ss_mask);
+			     ~cgrp_dfl_inhibit_ss_mask);
 	return 0;
 }
 
@@ -2944,7 +2944,7 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
 	 */
 	buf = strstrip(buf);
 	while ((tok = strsep(&buf, " "))) {
-		unsigned long tmp_ss_mask = ~cgrp_dfl_root_inhibit_ss_mask;
+		unsigned long tmp_ss_mask = ~cgrp_dfl_inhibit_ss_mask;
 
 		if (tok[0] == '\0')
 			continue;
@@ -5312,7 +5312,7 @@ int __init cgroup_init(void)
 		cgrp_dfl_root.subsys_mask |= 1 << ss->id;
 
 		if (!ss->dfl_cftypes)
-			cgrp_dfl_root_inhibit_ss_mask |= 1 << ss->id;
+			cgrp_dfl_inhibit_ss_mask |= 1 << ss->id;
 
 		if (ss->dfl_cftypes == ss->legacy_cftypes) {
 			WARN_ON(cgroup_add_cftypes(ss, ss->dfl_cftypes));
@@ -5383,7 +5383,7 @@ int proc_cgroup_show(struct seq_file *m, struct pid_namespace *ns,
 		struct cgroup *cgrp;
 		int ssid, count = 0;
 
-		if (root == &cgrp_dfl_root && !cgrp_dfl_root_visible)
+		if (root == &cgrp_dfl_root && !cgrp_dfl_visible)
 			continue;
 
 		seq_printf(m, "%d:", root->hierarchy_id);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/6] cgroup: convert cgroup_subsys flag fields to bool bitfields
  2016-01-07 22:29 ` Tejun Heo
  (?)
  (?)
@ 2016-01-07 22:29 ` Tejun Heo
  -1 siblings, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2016-01-07 22:29 UTC (permalink / raw)
  To: lizefan, hannes, a.p.zijlstra, mingo, acme
  Cc: linux-kernel, cgroups, kernel-team, Tejun Heo, Peter Zijlstra

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 include/linux/cgroup-defs.h | 6 +++---
 kernel/cpuset.c             | 2 +-
 kernel/sched/core.c         | 2 +-
 kernel/sched/cpuacct.c      | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 7f334c2..dc6f0ce 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -440,7 +440,7 @@ struct cgroup_subsys {
 	void (*free)(struct task_struct *task);
 	void (*bind)(struct cgroup_subsys_state *root_css);
 
-	int early_init;
+	bool early_init:1;
 
 	/*
 	 * If %false, this subsystem is properly hierarchical -
@@ -454,8 +454,8 @@ struct cgroup_subsys {
 	 * cases.  Eventually, all subsystems will be made properly
 	 * hierarchical and this will go away.
 	 */
-	bool broken_hierarchy;
-	bool warned_broken_hierarchy;
+	bool broken_hierarchy:1;
+	bool warned_broken_hierarchy:1;
 
 	/* the following two fields are initialized automtically during boot */
 	int id;
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 3e945fc..0cf412b 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -2065,7 +2065,7 @@ struct cgroup_subsys cpuset_cgrp_subsys = {
 	.attach		= cpuset_attach,
 	.bind		= cpuset_bind,
 	.legacy_cftypes	= files,
-	.early_init	= 1,
+	.early_init	= true,
 };
 
 /**
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b7d2271..9a5b368 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8576,7 +8576,7 @@ struct cgroup_subsys cpu_cgrp_subsys = {
 	.can_attach	= cpu_cgroup_can_attach,
 	.attach		= cpu_cgroup_attach,
 	.legacy_cftypes	= cpu_files,
-	.early_init	= 1,
+	.early_init	= true,
 };
 
 #endif	/* CONFIG_CGROUP_SCHED */
diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
index dd7cbb5..2ddaebf 100644
--- a/kernel/sched/cpuacct.c
+++ b/kernel/sched/cpuacct.c
@@ -279,5 +279,5 @@ struct cgroup_subsys cpuacct_cgrp_subsys = {
 	.css_alloc	= cpuacct_css_alloc,
 	.css_free	= cpuacct_css_free,
 	.legacy_cftypes	= files,
-	.early_init	= 1,
+	.early_init	= true,
 };
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/6] cgroup: make css_tryget_online_from_dir() also recognize cgroup2 fs
  2016-01-07 22:29 ` Tejun Heo
                   ` (2 preceding siblings ...)
  (?)
@ 2016-01-07 22:29 ` Tejun Heo
  -1 siblings, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2016-01-07 22:29 UTC (permalink / raw)
  To: lizefan, hannes, a.p.zijlstra, mingo, acme
  Cc: linux-kernel, cgroups, kernel-team, Tejun Heo

The function currently returns -EBADF for a directory on the default
hierarchy.  Make it also recognize cgroup2_fs_type.  This will be used
for perf_event cgroup2 support.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/cgroup.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index f542264..8a92043 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -5748,12 +5748,13 @@ struct cgroup_subsys_state *css_tryget_online_from_dir(struct dentry *dentry,
 						       struct cgroup_subsys *ss)
 {
 	struct kernfs_node *kn = kernfs_node_from_dentry(dentry);
+	struct file_system_type *s_type = dentry->d_sb->s_type;
 	struct cgroup_subsys_state *css = NULL;
 	struct cgroup *cgrp;
 
 	/* is @dentry a cgroup dir? */
-	if (dentry->d_sb->s_type != &cgroup_fs_type || !kn ||
-	    kernfs_type(kn) != KERNFS_DIR)
+	if ((s_type != &cgroup_fs_type && s_type != &cgroup2_fs_type) ||
+	    !kn || kernfs_type(kn) != KERNFS_DIR)
 		return ERR_PTR(-EBADF);
 
 	rcu_read_lock();
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/6] cgroup: use ->subtree_control when testing no internal process rule
  2016-01-07 22:29 ` Tejun Heo
                   ` (3 preceding siblings ...)
  (?)
@ 2016-01-07 22:29 ` Tejun Heo
  2016-02-23 14:59     ` Tejun Heo
  -1 siblings, 1 reply; 10+ messages in thread
From: Tejun Heo @ 2016-01-07 22:29 UTC (permalink / raw)
  To: lizefan, hannes, a.p.zijlstra, mingo, acme
  Cc: linux-kernel, cgroups, kernel-team, Tejun Heo

No internal process rule is enforced by cgroup_migrate_prepare_dst()
during process migration.  It tests whether the target cgroup's
->child_subsys_mask is zero which is different from "subtree_control"
write path which tests ->subtree_control.  This hasn't mattered
because up until now, both ->child_subsys_mask and ->subtree_control
are zero or non-zero at the same time.  However, with the planned
addition of implicit controllers, this will no longer be true.

This patch prepares for the change by making
cgorup_migrate_prepare_dst() test ->subtree_control instead.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/cgroup.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 8a92043..8bc83fd 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2512,11 +2512,11 @@ static int cgroup_migrate_prepare_dst(struct cgroup *dst_cgrp,
 	lockdep_assert_held(&cgroup_mutex);
 
 	/*
-	 * Except for the root, child_subsys_mask must be zero for a cgroup
+	 * Except for the root, subtree_control must be zero for a cgroup
 	 * with tasks so that child cgroups don't compete against tasks.
 	 */
 	if (dst_cgrp && cgroup_on_dfl(dst_cgrp) && cgroup_parent(dst_cgrp) &&
-	    dst_cgrp->child_subsys_mask)
+	    dst_cgrp->subtree_control)
 		return -EBUSY;
 
 	/* look up the dst cset for each src cset and link it to src */
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 5/6] cgroup: implement cgroup_subsys->implicit_on_dfl
  2016-01-07 22:29 ` Tejun Heo
                   ` (4 preceding siblings ...)
  (?)
@ 2016-01-07 22:29 ` Tejun Heo
  -1 siblings, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2016-01-07 22:29 UTC (permalink / raw)
  To: lizefan, hannes, a.p.zijlstra, mingo, acme
  Cc: linux-kernel, cgroups, kernel-team, Tejun Heo

Some controllers, perf_event for now and possibly freezer in the
future, don't really make sense to control explicitly through
"cgroup.subtree_control".  For example, the primary role of perf_event
is identifying the cgroups of tasks; however, because the controller
also keeps a small amount of state per cgroup, it can't be replaced
with simple cgroup membership tests.

This patch implements cgroup_subsys->implicit_on_dfl flag.  When set,
the controller is implicitly enabled on all cgroups on the v2
hierarchy so that utility type controllers such as perf_event can be
enabled and function transparently.

An implicit controller doesn't show up in "cgroup.controllers" or
"cgroup.subtree_control", is exempt from no internal process rule and
can be stolen from the default hierarchy even if there are non-root
csses.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/cgroup-defs.h |  15 +++++++
 kernel/cgroup.c             | 106 +++++++++++++++++++++++++++++++++++++-------
 2 files changed, 104 insertions(+), 17 deletions(-)

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index dc6f0ce..e7f69ef7 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -443,6 +443,21 @@ struct cgroup_subsys {
 	bool early_init:1;
 
 	/*
+	 * If %true, the controller, on the default hierarchy, doesn't show
+	 * up in "cgroup.controllers" or "cgroup.subtree_control", is
+	 * implicitly enabled on all cgroups on the default hierarchy, and
+	 * bypasses the "no internal process" constraint.  This is for
+	 * utility type controllers which is transparent to userland.
+	 *
+	 * For an implicit controller to be available to the default
+	 * hierarchy, it shouldn't be attached to a legacy hierarchy when
+	 * the default hierarchy is first mounted.  A legacy hierarchy can
+	 * steal an implicit controller but it can't be given back to the
+	 * default hierarchy afterwards.
+	 */
+	bool implicit_on_dfl:1;
+
+	/*
 	 * If %false, this subsystem is properly hierarchical -
 	 * configuration, resource accounting and restriction on a parent
 	 * cgroup cover those of its children.  If %true, hierarchy support
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 8bc83fd..bd86106 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -182,6 +182,9 @@ static bool cgrp_dfl_visible;
 /* some controllers are not supported in the default hierarchy */
 static unsigned long cgrp_dfl_inhibit_ss_mask;
 
+/* some controllers are implicitly enabled on the default hierarchy */
+static unsigned long cgrp_dfl_implicit_ss_mask;
+
 /* The list of hierarchy roots */
 
 static LIST_HEAD(cgroup_roots);
@@ -215,6 +218,7 @@ static struct file_system_type cgroup2_fs_type;
 static struct cftype cgroup_dfl_base_files[];
 static struct cftype cgroup_legacy_base_files[];
 
+static int cgroup_update_dfl_csses(struct cgroup *cgrp);
 static int rebind_subsystems(struct cgroup_root *dst_root,
 			     unsigned long ss_mask);
 static void css_task_iter_advance(struct css_task_iter *it);
@@ -1272,6 +1276,8 @@ static unsigned long cgroup_calc_child_subsys_mask(struct cgroup *cgrp,
 	if (!cgroup_on_dfl(cgrp))
 		return cur_ss_mask;
 
+	cur_ss_mask |= cgrp_dfl_implicit_ss_mask;
+
 	while (true) {
 		unsigned long new_ss_mask = cur_ss_mask;
 
@@ -1450,6 +1456,32 @@ static int css_populate_dir(struct cgroup_subsys_state *css,
 	return ret;
 }
 
+static void steal_implicit_ss(struct cgroup_subsys *ss)
+{
+	struct cgroup_subsys_state *css;
+
+	if (cgrp_dfl_visible) {
+		pr_info("%s is detached from and won't be available on v2 hierarchy\n",
+			ss->name);
+		cgrp_dfl_implicit_ss_mask &= ~(1 << ss->id);
+	}
+
+	/*
+	 * With @ss cleared from the implicit_ss_mask, the following will
+	 * clear it from ->child_subsys_mask on all dfl cgroups.
+	 */
+	css_for_each_descendant_post(css, cgroup_css(&cgrp_dfl_root.cgrp, ss))
+		cgroup_refresh_child_subsys_mask(css->cgroup);
+
+	/* migrate all processes to the root */
+	WARN_ON_ONCE(cgroup_update_dfl_csses(&cgrp_dfl_root.cgrp));
+
+	/* and kill all !root csses */
+	css_for_each_descendant_post(css, cgroup_css(&cgrp_dfl_root.cgrp, ss))
+		if (css->parent)
+			kill_css(css);
+}
+
 static int rebind_subsystems(struct cgroup_root *dst_root,
 			     unsigned long ss_mask)
 {
@@ -1461,8 +1493,13 @@ static int rebind_subsystems(struct cgroup_root *dst_root,
 	lockdep_assert_held(&cgroup_mutex);
 
 	for_each_subsys_which(ss, ssid, &ss_mask) {
-		/* if @ss has non-root csses attached to it, can't move */
-		if (css_next_child(NULL, cgroup_css(&ss->root->cgrp, ss)))
+		/*
+		 * If @ss has non-root csses attached to it, can't move.
+		 * If @ss is an implicit controller on dfl, it can be
+		 * stolen and is exempt from this rule.
+		 */
+		if (css_next_child(NULL, cgroup_css(&ss->root->cgrp, ss)) &&
+		    !(ss->implicit_on_dfl && ss->root == &cgrp_dfl_root))
 			return -EBUSY;
 
 		/* can't move between two non-dummy roots either */
@@ -1470,6 +1507,11 @@ static int rebind_subsystems(struct cgroup_root *dst_root,
 			return -EBUSY;
 	}
 
+	/* steal implcit controllers on dfl */
+	for_each_subsys_which(ss, ssid, &ss_mask)
+		if (ss->implicit_on_dfl && ss->root == &cgrp_dfl_root)
+			steal_implicit_ss(ss);
+
 	/* skip creating root files on dfl_root for inhibited subsystems */
 	tmp_ss_mask = ss_mask;
 	if (dst_root == &cgrp_dfl_root)
@@ -1973,9 +2015,9 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
 	struct super_block *pinned_sb = NULL;
 	struct cgroup_subsys *ss;
 	struct cgroup_root *root;
-	struct cgroup_sb_opts opts;
+	struct cgroup_sb_opts opts = { };
 	struct dentry *dentry;
-	int ret;
+	int ret = 0;
 	int i;
 	bool new_sb;
 
@@ -1986,18 +2028,41 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
 	if (!use_task_css_set_links)
 		cgroup_enable_task_cg_lists();
 
+	mutex_lock(&cgroup_mutex);
+
 	if (is_v2) {
 		if (data) {
 			pr_err("cgroup2: unknown option \"%s\"\n", (char *)data);
-			return ERR_PTR(-EINVAL);
+			ret = -EINVAL;
+			goto out_unlock;
 		}
-		cgrp_dfl_visible = true;
+
 		root = &cgrp_dfl_root;
 		cgroup_get(&root->cgrp);
-		goto out_mount;
-	}
 
-	mutex_lock(&cgroup_mutex);
+		/*
+		 * dfl hierarchy is being mounted for the first time.
+		 * Toggle its visibility and see which implicit controllers
+		 * are available.
+		 */
+		if (!cgrp_dfl_visible) {
+			cgrp_dfl_visible = true;
+
+			for_each_subsys_which(ss, i, &cgrp_dfl_implicit_ss_mask) {
+				if (ss->root == &cgrp_dfl_root)
+					continue;
+
+				pr_info("%s is busy and won't be available on v2 hierarchy\n",
+					ss->name);
+				cgrp_dfl_implicit_ss_mask &= ~(1 << ss->id);
+			}
+
+			/* apply implicit_ss_mask to the root */
+			cgroup_refresh_child_subsys_mask(&cgrp_dfl_root.cgrp);
+		}
+
+		goto out_unlock;
+	}
 
 	/* First find the desired set of subsystems */
 	ret = parse_cgroupfs_options(data, &opts);
@@ -2114,7 +2179,7 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
 
 	if (ret)
 		return ERR_PTR(ret);
-out_mount:
+
 	dentry = kernfs_mount(fs_type, flags, root->kf_root,
 			      is_v2 ? CGROUP2_SUPER_MAGIC : CGROUP_SUPER_MAGIC,
 			      &new_sb);
@@ -2826,6 +2891,8 @@ static void cgroup_print_ss_mask(struct seq_file *seq, unsigned long ss_mask)
 	bool printed = false;
 	int ssid;
 
+	ss_mask &= ~cgrp_dfl_implicit_ss_mask;
+
 	for_each_subsys_which(ss, ssid, &ss_mask) {
 		if (printed)
 			seq_putc(seq, ' ');
@@ -2944,7 +3011,8 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
 	 */
 	buf = strstrip(buf);
 	while ((tok = strsep(&buf, " "))) {
-		unsigned long tmp_ss_mask = ~cgrp_dfl_inhibit_ss_mask;
+		unsigned long tmp_ss_mask = ~(cgrp_dfl_inhibit_ss_mask |
+					      cgrp_dfl_implicit_ss_mask);
 
 		if (tok[0] == '\0')
 			continue;
@@ -4970,8 +5038,10 @@ static int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
 	/* let's create and online css's */
 	for_each_subsys(ss, ssid) {
 		if (parent->child_subsys_mask & (1 << ssid)) {
-			ret = create_css(cgrp, ss,
-					 parent->subtree_control & (1 << ssid));
+			bool visible = ss->implicit_on_dfl ||
+				(parent->subtree_control & (1 << ssid));
+
+			ret = create_css(cgrp, ss, visible);
 			if (ret)
 				goto out_destroy;
 		}
@@ -4981,10 +5051,10 @@ static int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
 	 * On the default hierarchy, a child doesn't automatically inherit
 	 * subtree_control from the parent.  Each is configured manually.
 	 */
-	if (!cgroup_on_dfl(cgrp)) {
+	if (!cgroup_on_dfl(cgrp))
 		cgrp->subtree_control = parent->subtree_control;
-		cgroup_refresh_child_subsys_mask(cgrp);
-	}
+
+	cgroup_refresh_child_subsys_mask(cgrp);
 
 	kernfs_activate(kn);
 
@@ -5311,7 +5381,9 @@ int __init cgroup_init(void)
 
 		cgrp_dfl_root.subsys_mask |= 1 << ss->id;
 
-		if (!ss->dfl_cftypes)
+		if (ss->implicit_on_dfl)
+			cgrp_dfl_implicit_ss_mask |= 1 << ss->id;
+		else if (!ss->dfl_cftypes)
 			cgrp_dfl_inhibit_ss_mask |= 1 << ss->id;
 
 		if (ss->dfl_cftypes == ss->legacy_cftypes) {
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 6/6] cgroup, perf_event: make perf_event controller work on cgroup2 hierarchy
  2016-01-07 22:29 ` Tejun Heo
                   ` (5 preceding siblings ...)
  (?)
@ 2016-01-07 22:29 ` Tejun Heo
  -1 siblings, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2016-01-07 22:29 UTC (permalink / raw)
  To: lizefan, hannes, a.p.zijlstra, mingo, acme
  Cc: linux-kernel, cgroups, kernel-team, Tejun Heo

perf_event is a utility controller whose primary role is identifying
cgroup membership to filter perf events; however, because it also
tracks some per-css state, it can't be replaced by pure cgroup
membership test.  Mark the controller as implicitly enabled on the
default hierarchy so that perf events can always be filtered based on
cgroup v2 path as long as the controller is not mounted on a legacy
hierarchy.

"perf record" is updated accordingly so that it searches for both v1
and v2 hierarchies.  A v1 hierarchy is used if perf_event is mounted
on it; otherwise, it uses the v2 hierarchy.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
---
 Documentation/cgroup.txt | 13 +++++++++++++
 kernel/events/core.c     |  6 ++++++
 tools/perf/util/cgroup.c | 26 +++++++++++++++++++-------
 3 files changed, 38 insertions(+), 7 deletions(-)

diff --git a/Documentation/cgroup.txt b/Documentation/cgroup.txt
index 31d1f7b..721e43e 100644
--- a/Documentation/cgroup.txt
+++ b/Documentation/cgroup.txt
@@ -47,6 +47,8 @@ CONTENTS
   5-3. IO
     5-3-1. IO Interface Files
     5-3-2. Writeback
+  5-4. Misc
+    5-4-1. perf_event
 P. Information on Kernel Programming
   P-1. Filesystem Support for Writeback
 D. Deprecated v1 Core Features
@@ -1013,6 +1015,17 @@ writeback as follows.
 	vm.dirty[_background]_ratio.
 
 
+5-4. Misc
+
+5-4-1. perf_event
+
+perf_event controller, if not mounted on a legacy hierarchy on the
+first mount of cgroup v2, is automatically enabled on the v2 hierarchy
+so that perf events can always be filtered by cgroup v2 path.  The
+controller can still be moved to a legacy hierarchy after v2 hierarchy
+is populated.
+
+
 P. Information on Kernel Programming
 
 This section contains kernel programming information in the areas
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 026305d..02ad5b3 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9469,5 +9469,11 @@ struct cgroup_subsys perf_event_cgrp_subsys = {
 	.css_alloc	= perf_cgroup_css_alloc,
 	.css_free	= perf_cgroup_css_free,
 	.attach		= perf_cgroup_attach,
+	/*
+	 * Implicitly enable on dfl hierarchy so that perf events can
+	 * always be filtered by cgroup2 path as long as perf_event
+	 * controller is not mounted on a legacy hierarchy.
+	 */
+	.implicit_on_dfl = true,
 };
 #endif /* CONFIG_CGROUP_PERF */
diff --git a/tools/perf/util/cgroup.c b/tools/perf/util/cgroup.c
index 32e12ec..54b5bf4 100644
--- a/tools/perf/util/cgroup.c
+++ b/tools/perf/util/cgroup.c
@@ -12,8 +12,8 @@ cgroupfs_find_mountpoint(char *buf, size_t maxlen)
 {
 	FILE *fp;
 	char mountpoint[PATH_MAX + 1], tokens[PATH_MAX + 1], type[PATH_MAX + 1];
+	char path_v1[PATH_MAX + 1], path_v2[PATH_MAX + 2], *path;
 	char *token, *saved_ptr = NULL;
-	int found = 0;
 
 	fp = fopen("/proc/mounts", "r");
 	if (!fp)
@@ -24,31 +24,43 @@ cgroupfs_find_mountpoint(char *buf, size_t maxlen)
 	 * and inspect every cgroupfs mount point to find one that has
 	 * perf_event subsystem
 	 */
+	path_v1[0] = '\0';
+	path_v2[0] = '\0';
+
 	while (fscanf(fp, "%*s %"STR(PATH_MAX)"s %"STR(PATH_MAX)"s %"
 				STR(PATH_MAX)"s %*d %*d\n",
 				mountpoint, type, tokens) == 3) {
 
-		if (!strcmp(type, "cgroup")) {
+		if (!path_v1[0] && !strcmp(type, "cgroup")) {
 
 			token = strtok_r(tokens, ",", &saved_ptr);
 
 			while (token != NULL) {
 				if (!strcmp(token, "perf_event")) {
-					found = 1;
+					strcpy(path_v1, mountpoint);
 					break;
 				}
 				token = strtok_r(NULL, ",", &saved_ptr);
 			}
 		}
-		if (found)
+
+		if (!path_v2[0] && !strcmp(type, "cgroup2"))
+			strcpy(path_v2, mountpoint);
+
+		if (path_v1[0] && path_v2[0])
 			break;
 	}
 	fclose(fp);
-	if (!found)
+
+	if (path_v1[0])
+		path = path_v1;
+	else if (path_v2[0])
+		path = path_v2;
+	else
 		return -1;
 
-	if (strlen(mountpoint) < maxlen) {
-		strcpy(buf, mountpoint);
+	if (strlen(path) < maxlen) {
+		strcpy(buf, path);
 		return 0;
 	}
 	return -1;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 4/6] cgroup: use ->subtree_control when testing no internal process rule
@ 2016-02-23 14:59     ` Tejun Heo
  0 siblings, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2016-02-23 14:59 UTC (permalink / raw)
  To: lizefan, hannes, a.p.zijlstra, mingo, acme
  Cc: linux-kernel, cgroups, kernel-team

On Thu, Jan 07, 2016 at 05:29:48PM -0500, Tejun Heo wrote:
> No internal process rule is enforced by cgroup_migrate_prepare_dst()
> during process migration.  It tests whether the target cgroup's
> ->child_subsys_mask is zero which is different from "subtree_control"
> write path which tests ->subtree_control.  This hasn't mattered
> because up until now, both ->child_subsys_mask and ->subtree_control
> are zero or non-zero at the same time.  However, with the planned
> addition of implicit controllers, this will no longer be true.
> 
> This patch prepares for the change by making
> cgorup_migrate_prepare_dst() test ->subtree_control instead.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>

Applying 1-4 to cgroup/for-4.6.  Will post refreshed version of
perf_event update soon.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 4/6] cgroup: use ->subtree_control when testing no internal process rule
@ 2016-02-23 14:59     ` Tejun Heo
  0 siblings, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2016-02-23 14:59 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, hannes-druUgvl0LCNAfugRpC6u6w,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw,
	mingo-H+wXaHxf7aLQT0dZR+AlfA, acme-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg

On Thu, Jan 07, 2016 at 05:29:48PM -0500, Tejun Heo wrote:
> No internal process rule is enforced by cgroup_migrate_prepare_dst()
> during process migration.  It tests whether the target cgroup's
> ->child_subsys_mask is zero which is different from "subtree_control"
> write path which tests ->subtree_control.  This hasn't mattered
> because up until now, both ->child_subsys_mask and ->subtree_control
> are zero or non-zero at the same time.  However, with the planned
> addition of implicit controllers, this will no longer be true.
> 
> This patch prepares for the change by making
> cgorup_migrate_prepare_dst() test ->subtree_control instead.
> 
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

Applying 1-4 to cgroup/for-4.6.  Will post refreshed version of
perf_event update soon.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-02-23 14:59 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-07 22:29 [PATCHSET] cgroup, perf_event: make perf_event work on v2 hierarchy Tejun Heo
2016-01-07 22:29 ` Tejun Heo
2016-01-07 22:29 ` [PATCH 1/6] cgroup: s/cgrp_dfl_root_/cgrp_dfl_/ Tejun Heo
2016-01-07 22:29 ` [PATCH 2/6] cgroup: convert cgroup_subsys flag fields to bool bitfields Tejun Heo
2016-01-07 22:29 ` [PATCH 3/6] cgroup: make css_tryget_online_from_dir() also recognize cgroup2 fs Tejun Heo
2016-01-07 22:29 ` [PATCH 4/6] cgroup: use ->subtree_control when testing no internal process rule Tejun Heo
2016-02-23 14:59   ` Tejun Heo
2016-02-23 14:59     ` Tejun Heo
2016-01-07 22:29 ` [PATCH 5/6] cgroup: implement cgroup_subsys->implicit_on_dfl Tejun Heo
2016-01-07 22:29 ` [PATCH 6/6] cgroup, perf_event: make perf_event controller work on cgroup2 hierarchy Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.