All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/4] cgroup hierarchy controls and stats
@ 2017-08-02 16:55 Roman Gushchin
  2017-08-02 16:55   ` Roman Gushchin
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Roman Gushchin @ 2017-08-02 16:55 UTC (permalink / raw)
  To: cgroups-u79uwXL29TY76Z2rM5mHXA; +Cc: Roman Gushchin

Creating cgroup hierearchies of an unreasonable size can affect
system performance. A user might want to limit the size
of the cgroup hierarchy.

This patchset implements an ability to control and monitor cgroup
hierarchy size.

Patch 1 implements tracking of live and dying descendant cgroups
        on each cgroup level.
Patch 2 adds cgroup.max.descendants and cgroup.max.depth interfaces
        to set up hierarchy limits.
Patch 3 adds cgroup.stat interface with simple hierarchy stats.
Patch 4 is a trivial cleanup.

Roman Gushchin (4):
  cgroup: keep track of number of descent cgroups
  cgroup: implement hierarchy limits
  cgroup: add cgroup.stat interface with basic hierarchy stats
  cgroup: re-use the parent pointer in cgroup_destroy_locked()

 Documentation/cgroup-v2.txt |  32 +++++++++
 include/linux/cgroup-defs.h |  13 ++++
 kernel/cgroup/cgroup.c      | 163 +++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 205 insertions(+), 3 deletions(-)

--
2.13.3

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC 1/4] cgroup: keep track of number of descent cgroups
  2017-08-02 16:55 [RFC 0/4] cgroup hierarchy controls and stats Roman Gushchin
@ 2017-08-02 16:55   ` Roman Gushchin
  2017-08-02 16:55   ` Roman Gushchin
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 16+ messages in thread
From: Roman Gushchin @ 2017-08-02 16:55 UTC (permalink / raw)
  To: cgroups
  Cc: Roman Gushchin, Tejun Heo, Zefan Li, Waiman Long,
	Johannes Weiner, kernel-team, linux-doc, linux-kernel

Keep track of the number of online and dying descent cgroups.

This data will be used later to add an ability to control cgroup
hierarchy (limit the depth and the number of descent cgroups)
and display hierarchy stats.

Signed-off-by: Roman Gushchin <guro@fb.com>
Suggested-by: Tejun Heo <tj@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: kernel-team@fb.com
Cc: cgroups@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 include/linux/cgroup-defs.h |  8 ++++++++
 kernel/cgroup/cgroup.c      | 19 +++++++++++++++++--
 2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 9d741959f218..58b4c425a155 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -274,6 +274,14 @@ struct cgroup {
 	int level;
 
 	/*
+	 * Keep track of total numbers of visible and dying descent cgroups.
+	 * Dying cgroups are cgroups which were deleted by a user,
+	 * but are still existing because someone else is holding a reference.
+	 */
+	int nr_descendants;
+	int nr_dying_descendants;
+
+	/*
 	 * Each non-empty css_set associated with this cgroup contributes
 	 * one to nr_populated_csets.  The counter is zero iff this cgroup
 	 * doesn't have any tasks.
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 85f6a112344b..cfdbb1e780de 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -4408,9 +4408,15 @@ static void css_release_work_fn(struct work_struct *work)
 		if (ss->css_released)
 			ss->css_released(css);
 	} else {
+		struct cgroup *tcgrp;
+
 		/* cgroup release path */
 		trace_cgroup_release(cgrp);
 
+		for (tcgrp = cgroup_parent(cgrp); tcgrp;
+		     tcgrp = cgroup_parent(tcgrp))
+			tcgrp->nr_dying_descendants--;
+
 		cgroup_idr_remove(&cgrp->root->cgroup_idr, cgrp->id);
 		cgrp->id = -1;
 
@@ -4609,9 +4615,13 @@ static struct cgroup *cgroup_create(struct cgroup *parent)
 	cgrp->root = root;
 	cgrp->level = level;
 
-	for (tcgrp = cgrp; tcgrp; tcgrp = cgroup_parent(tcgrp))
+	for (tcgrp = cgrp; tcgrp; tcgrp = cgroup_parent(tcgrp)) {
 		cgrp->ancestor_ids[tcgrp->level] = tcgrp->id;
 
+		if (tcgrp != cgrp)
+			tcgrp->nr_descendants++;
+	}
+
 	if (notify_on_release(parent))
 		set_bit(CGRP_NOTIFY_ON_RELEASE, &cgrp->flags);
 
@@ -4817,7 +4827,7 @@ static void kill_css(struct cgroup_subsys_state *css)
 static int cgroup_destroy_locked(struct cgroup *cgrp)
 	__releases(&cgroup_mutex) __acquires(&cgroup_mutex)
 {
-	struct cgroup *parent = cgroup_parent(cgrp);
+	struct cgroup *tcgrp, *parent = cgroup_parent(cgrp);
 	struct cgroup_subsys_state *css;
 	struct cgrp_cset_link *link;
 	int ssid;
@@ -4865,6 +4875,11 @@ static int cgroup_destroy_locked(struct cgroup *cgrp)
 	if (parent && cgroup_is_threaded(cgrp))
 		parent->nr_threaded_children--;
 
+	for (tcgrp = cgroup_parent(cgrp); tcgrp; tcgrp = cgroup_parent(tcgrp)) {
+		tcgrp->nr_descendants--;
+		tcgrp->nr_dying_descendants++;
+	}
+
 	cgroup1_check_for_release(cgroup_parent(cgrp));
 
 	/* put the base reference */
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC 1/4] cgroup: keep track of number of descent cgroups
@ 2017-08-02 16:55   ` Roman Gushchin
  0 siblings, 0 replies; 16+ messages in thread
From: Roman Gushchin @ 2017-08-02 16:55 UTC (permalink / raw)
  To: cgroups
  Cc: Roman Gushchin, Tejun Heo, Zefan Li, Waiman Long,
	Johannes Weiner, kernel-team, linux-doc, linux-kernel

Keep track of the number of online and dying descent cgroups.

This data will be used later to add an ability to control cgroup
hierarchy (limit the depth and the number of descent cgroups)
and display hierarchy stats.

Signed-off-by: Roman Gushchin <guro@fb.com>
Suggested-by: Tejun Heo <tj@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: kernel-team@fb.com
Cc: cgroups@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 include/linux/cgroup-defs.h |  8 ++++++++
 kernel/cgroup/cgroup.c      | 19 +++++++++++++++++--
 2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 9d741959f218..58b4c425a155 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -274,6 +274,14 @@ struct cgroup {
 	int level;
 
 	/*
+	 * Keep track of total numbers of visible and dying descent cgroups.
+	 * Dying cgroups are cgroups which were deleted by a user,
+	 * but are still existing because someone else is holding a reference.
+	 */
+	int nr_descendants;
+	int nr_dying_descendants;
+
+	/*
 	 * Each non-empty css_set associated with this cgroup contributes
 	 * one to nr_populated_csets.  The counter is zero iff this cgroup
 	 * doesn't have any tasks.
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 85f6a112344b..cfdbb1e780de 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -4408,9 +4408,15 @@ static void css_release_work_fn(struct work_struct *work)
 		if (ss->css_released)
 			ss->css_released(css);
 	} else {
+		struct cgroup *tcgrp;
+
 		/* cgroup release path */
 		trace_cgroup_release(cgrp);
 
+		for (tcgrp = cgroup_parent(cgrp); tcgrp;
+		     tcgrp = cgroup_parent(tcgrp))
+			tcgrp->nr_dying_descendants--;
+
 		cgroup_idr_remove(&cgrp->root->cgroup_idr, cgrp->id);
 		cgrp->id = -1;
 
@@ -4609,9 +4615,13 @@ static struct cgroup *cgroup_create(struct cgroup *parent)
 	cgrp->root = root;
 	cgrp->level = level;
 
-	for (tcgrp = cgrp; tcgrp; tcgrp = cgroup_parent(tcgrp))
+	for (tcgrp = cgrp; tcgrp; tcgrp = cgroup_parent(tcgrp)) {
 		cgrp->ancestor_ids[tcgrp->level] = tcgrp->id;
 
+		if (tcgrp != cgrp)
+			tcgrp->nr_descendants++;
+	}
+
 	if (notify_on_release(parent))
 		set_bit(CGRP_NOTIFY_ON_RELEASE, &cgrp->flags);
 
@@ -4817,7 +4827,7 @@ static void kill_css(struct cgroup_subsys_state *css)
 static int cgroup_destroy_locked(struct cgroup *cgrp)
 	__releases(&cgroup_mutex) __acquires(&cgroup_mutex)
 {
-	struct cgroup *parent = cgroup_parent(cgrp);
+	struct cgroup *tcgrp, *parent = cgroup_parent(cgrp);
 	struct cgroup_subsys_state *css;
 	struct cgrp_cset_link *link;
 	int ssid;
@@ -4865,6 +4875,11 @@ static int cgroup_destroy_locked(struct cgroup *cgrp)
 	if (parent && cgroup_is_threaded(cgrp))
 		parent->nr_threaded_children--;
 
+	for (tcgrp = cgroup_parent(cgrp); tcgrp; tcgrp = cgroup_parent(tcgrp)) {
+		tcgrp->nr_descendants--;
+		tcgrp->nr_dying_descendants++;
+	}
+
 	cgroup1_check_for_release(cgroup_parent(cgrp));
 
 	/* put the base reference */
-- 
2.13.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC 2/4] cgroup: implement hierarchy limits
  2017-08-02 16:55 [RFC 0/4] cgroup hierarchy controls and stats Roman Gushchin
@ 2017-08-02 16:55   ` Roman Gushchin
  2017-08-02 16:55   ` Roman Gushchin
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 16+ messages in thread
From: Roman Gushchin @ 2017-08-02 16:55 UTC (permalink / raw)
  To: cgroups
  Cc: Roman Gushchin, Tejun Heo, Zefan Li, Waiman Long,
	Johannes Weiner, kernel-team, linux-doc, linux-kernel

Creating cgroup hierearchies of unreasonable size can affect
overall system performance. A user might want to limit the
size of cgroup hierarchy. This is especially important if a user
is delegating some cgroup sub-tree.

To address this issue, introduce an ability to control
the size of cgroup hierarchy.

The cgroup.max.descendants control file allows to set the maximum
allowed number of descendant cgroups.
The cgroup.max.depth file controls the maximum depth of the cgroup
tree. Both are single value r/w files, with "max" default value.

The control files exist on each hierarchy level (including root).
When a new cgroup is created, we check the total descendants
and depth limits on each level, and if none of them are exceeded,
a new cgroup is created.

Only alive cgroups are counted, removed (dying) cgroups are
ignored.

Signed-off-by: Roman Gushchin <guro@fb.com>
Suggested-by: Tejun Heo <tj@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: kernel-team@fb.com
Cc: cgroups@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 Documentation/cgroup-v2.txt |  14 +++++
 include/linux/cgroup-defs.h |   5 ++
 kernel/cgroup/cgroup.c      | 126 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 145 insertions(+)

diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
index dec5afdaa36d..46ec3f76211c 100644
--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -854,6 +854,20 @@ All cgroup core files are prefixed with "cgroup."
 		1 if the cgroup or its descendants contains any live
 		processes; otherwise, 0.
 
+  cgroup.max.descendants
+	A read-write single value files.  The default is "max".
+
+	Maximum allowed number of descent cgroups.
+	If the actual number of descendants is equal or larger,
+	an attempt to create a new cgroup in the hierarchy will fail.
+
+  cgroup.max.depth
+	A read-write single value files.  The default is "max".
+
+	Maximum allowed descent depth below the current cgroup.
+	If the actual descent depth is equal or larger,
+	an attempt to create a new child cgroup will fail.
+
 
 Controllers
 ===========
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 58b4c425a155..59e4ad9e7bac 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -273,13 +273,18 @@ struct cgroup {
 	 */
 	int level;
 
+	/* Maximum allowed descent tree depth */
+	int max_depth;
+
 	/*
 	 * Keep track of total numbers of visible and dying descent cgroups.
 	 * Dying cgroups are cgroups which were deleted by a user,
 	 * but are still existing because someone else is holding a reference.
+	 * max_descendants is a maximum allowed number of descent cgroups.
 	 */
 	int nr_descendants;
 	int nr_dying_descendants;
+	int max_descendants;
 
 	/*
 	 * Each non-empty css_set associated with this cgroup contributes
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index cfdbb1e780de..9d53d69e44bb 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -1827,6 +1827,8 @@ static void init_cgroup_housekeeping(struct cgroup *cgrp)
 	cgrp->self.cgroup = cgrp;
 	cgrp->self.flags |= CSS_ONLINE;
 	cgrp->dom_cgrp = cgrp;
+	cgrp->max_descendants = INT_MAX;
+	cgrp->max_depth = INT_MAX;
 
 	for_each_subsys(ss, ssid)
 		INIT_LIST_HEAD(&cgrp->e_csets[ssid]);
@@ -3209,6 +3211,92 @@ static ssize_t cgroup_type_write(struct kernfs_open_file *of, char *buf,
 	return ret ?: nbytes;
 }
 
+static int cgroup_max_descendants_show(struct seq_file *seq, void *v)
+{
+	struct cgroup *cgrp = seq_css(seq)->cgroup;
+	int descendants = READ_ONCE(cgrp->max_descendants);
+
+	if (descendants == INT_MAX)
+		seq_puts(seq, "max\n");
+	else
+		seq_printf(seq, "%d\n", descendants);
+
+	return 0;
+}
+
+static ssize_t cgroup_max_descendants_write(struct kernfs_open_file *of,
+					   char *buf, size_t nbytes, loff_t off)
+{
+	struct cgroup *cgrp;
+	int descendants;
+	ssize_t ret;
+
+	buf = strstrip(buf);
+	if (!strcmp(buf, "max")) {
+		descendants = INT_MAX;
+	} else {
+		ret = kstrtouint(buf, 0, &descendants);
+		if (ret)
+			return ret;
+	}
+
+	if (descendants < 0 || descendants > INT_MAX)
+		return -ERANGE;
+
+	cgrp = cgroup_kn_lock_live(of->kn, false);
+	if (!cgrp)
+		return -ENOENT;
+
+	cgrp->max_descendants = descendants;
+
+	cgroup_kn_unlock(of->kn);
+
+	return nbytes;
+}
+
+static int cgroup_max_depth_show(struct seq_file *seq, void *v)
+{
+	struct cgroup *cgrp = seq_css(seq)->cgroup;
+	int depth = READ_ONCE(cgrp->max_depth);
+
+	if (depth == INT_MAX)
+		seq_puts(seq, "max\n");
+	else
+		seq_printf(seq, "%d\n", depth);
+
+	return 0;
+}
+
+static ssize_t cgroup_max_depth_write(struct kernfs_open_file *of,
+				      char *buf, size_t nbytes, loff_t off)
+{
+	struct cgroup *cgrp;
+	ssize_t ret;
+	int depth;
+
+	buf = strstrip(buf);
+	if (!strcmp(buf, "max")) {
+		depth = INT_MAX;
+	} else {
+		ret = kstrtoint(buf, 0, &depth);
+		if (ret)
+			return ret;
+	}
+
+	if (depth < 0 || depth > INT_MAX)
+		return -ERANGE;
+
+	cgrp = cgroup_kn_lock_live(of->kn, false);
+	if (!cgrp)
+		return -ENOENT;
+
+	cgrp->max_depth = depth;
+
+	cgroup_kn_unlock(of->kn);
+
+	return nbytes;
+}
+
 static int cgroup_events_show(struct seq_file *seq, void *v)
 {
 	seq_printf(seq, "populated %d\n",
@@ -4309,6 +4397,16 @@ static struct cftype cgroup_base_files[] = {
 		.file_offset = offsetof(struct cgroup, events_file),
 		.seq_show = cgroup_events_show,
 	},
+	{
+		.name = "cgroup.max.descendants",
+		.seq_show = cgroup_max_descendants_show,
+		.write = cgroup_max_descendants_write,
+	},
+	{
+		.name = "cgroup.max.depth",
+		.seq_show = cgroup_max_depth_show,
+		.write = cgroup_max_depth_write,
+	},
 	{ }	/* terminate */
 };
 
@@ -4662,6 +4760,29 @@ static struct cgroup *cgroup_create(struct cgroup *parent)
 	return ERR_PTR(ret);
 }
 
+static bool cgroup_check_hierarchy_limits(struct cgroup *parent)
+{
+	struct cgroup *cgroup;
+	int ret = false;
+	int level = 1;
+
+	lockdep_assert_held(&cgroup_mutex);
+
+	for (cgroup = parent; cgroup; cgroup = cgroup_parent(cgroup)) {
+		if (cgroup->nr_descendants >= cgroup->max_descendants)
+			goto fail;
+
+		if (level > cgroup->max_depth)
+			goto fail;
+
+		level++;
+	}
+
+	ret = true;
+fail:
+	return ret;
+}
+
 int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name, umode_t mode)
 {
 	struct cgroup *parent, *cgrp;
@@ -4676,6 +4797,11 @@ int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name, umode_t mode)
 	if (!parent)
 		return -ENODEV;
 
+	if (!cgroup_check_hierarchy_limits(parent)) {
+		ret = -EAGAIN;
+		goto out_unlock;
+	}
+
 	cgrp = cgroup_create(parent);
 	if (IS_ERR(cgrp)) {
 		ret = PTR_ERR(cgrp);
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC 2/4] cgroup: implement hierarchy limits
@ 2017-08-02 16:55   ` Roman Gushchin
  0 siblings, 0 replies; 16+ messages in thread
From: Roman Gushchin @ 2017-08-02 16:55 UTC (permalink / raw)
  To: cgroups
  Cc: Roman Gushchin, Tejun Heo, Zefan Li, Waiman Long,
	Johannes Weiner, kernel-team, linux-doc, linux-kernel

Creating cgroup hierearchies of unreasonable size can affect
overall system performance. A user might want to limit the
size of cgroup hierarchy. This is especially important if a user
is delegating some cgroup sub-tree.

To address this issue, introduce an ability to control
the size of cgroup hierarchy.

The cgroup.max.descendants control file allows to set the maximum
allowed number of descendant cgroups.
The cgroup.max.depth file controls the maximum depth of the cgroup
tree. Both are single value r/w files, with "max" default value.

The control files exist on each hierarchy level (including root).
When a new cgroup is created, we check the total descendants
and depth limits on each level, and if none of them are exceeded,
a new cgroup is created.

Only alive cgroups are counted, removed (dying) cgroups are
ignored.

Signed-off-by: Roman Gushchin <guro@fb.com>
Suggested-by: Tejun Heo <tj@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: kernel-team@fb.com
Cc: cgroups@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 Documentation/cgroup-v2.txt |  14 +++++
 include/linux/cgroup-defs.h |   5 ++
 kernel/cgroup/cgroup.c      | 126 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 145 insertions(+)

diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
index dec5afdaa36d..46ec3f76211c 100644
--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -854,6 +854,20 @@ All cgroup core files are prefixed with "cgroup."
 		1 if the cgroup or its descendants contains any live
 		processes; otherwise, 0.
 
+  cgroup.max.descendants
+	A read-write single value files.  The default is "max".
+
+	Maximum allowed number of descent cgroups.
+	If the actual number of descendants is equal or larger,
+	an attempt to create a new cgroup in the hierarchy will fail.
+
+  cgroup.max.depth
+	A read-write single value files.  The default is "max".
+
+	Maximum allowed descent depth below the current cgroup.
+	If the actual descent depth is equal or larger,
+	an attempt to create a new child cgroup will fail.
+
 
 Controllers
 ===========
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 58b4c425a155..59e4ad9e7bac 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -273,13 +273,18 @@ struct cgroup {
 	 */
 	int level;
 
+	/* Maximum allowed descent tree depth */
+	int max_depth;
+
 	/*
 	 * Keep track of total numbers of visible and dying descent cgroups.
 	 * Dying cgroups are cgroups which were deleted by a user,
 	 * but are still existing because someone else is holding a reference.
+	 * max_descendants is a maximum allowed number of descent cgroups.
 	 */
 	int nr_descendants;
 	int nr_dying_descendants;
+	int max_descendants;
 
 	/*
 	 * Each non-empty css_set associated with this cgroup contributes
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index cfdbb1e780de..9d53d69e44bb 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -1827,6 +1827,8 @@ static void init_cgroup_housekeeping(struct cgroup *cgrp)
 	cgrp->self.cgroup = cgrp;
 	cgrp->self.flags |= CSS_ONLINE;
 	cgrp->dom_cgrp = cgrp;
+	cgrp->max_descendants = INT_MAX;
+	cgrp->max_depth = INT_MAX;
 
 	for_each_subsys(ss, ssid)
 		INIT_LIST_HEAD(&cgrp->e_csets[ssid]);
@@ -3209,6 +3211,92 @@ static ssize_t cgroup_type_write(struct kernfs_open_file *of, char *buf,
 	return ret ?: nbytes;
 }
 
+static int cgroup_max_descendants_show(struct seq_file *seq, void *v)
+{
+	struct cgroup *cgrp = seq_css(seq)->cgroup;
+	int descendants = READ_ONCE(cgrp->max_descendants);
+
+	if (descendants == INT_MAX)
+		seq_puts(seq, "max\n");
+	else
+		seq_printf(seq, "%d\n", descendants);
+
+	return 0;
+}
+
+static ssize_t cgroup_max_descendants_write(struct kernfs_open_file *of,
+					   char *buf, size_t nbytes, loff_t off)
+{
+	struct cgroup *cgrp;
+	int descendants;
+	ssize_t ret;
+
+	buf = strstrip(buf);
+	if (!strcmp(buf, "max")) {
+		descendants = INT_MAX;
+	} else {
+		ret = kstrtouint(buf, 0, &descendants);
+		if (ret)
+			return ret;
+	}
+
+	if (descendants < 0 || descendants > INT_MAX)
+		return -ERANGE;
+
+	cgrp = cgroup_kn_lock_live(of->kn, false);
+	if (!cgrp)
+		return -ENOENT;
+
+	cgrp->max_descendants = descendants;
+
+	cgroup_kn_unlock(of->kn);
+
+	return nbytes;
+}
+
+static int cgroup_max_depth_show(struct seq_file *seq, void *v)
+{
+	struct cgroup *cgrp = seq_css(seq)->cgroup;
+	int depth = READ_ONCE(cgrp->max_depth);
+
+	if (depth == INT_MAX)
+		seq_puts(seq, "max\n");
+	else
+		seq_printf(seq, "%d\n", depth);
+
+	return 0;
+}
+
+static ssize_t cgroup_max_depth_write(struct kernfs_open_file *of,
+				      char *buf, size_t nbytes, loff_t off)
+{
+	struct cgroup *cgrp;
+	ssize_t ret;
+	int depth;
+
+	buf = strstrip(buf);
+	if (!strcmp(buf, "max")) {
+		depth = INT_MAX;
+	} else {
+		ret = kstrtoint(buf, 0, &depth);
+		if (ret)
+			return ret;
+	}
+
+	if (depth < 0 || depth > INT_MAX)
+		return -ERANGE;
+
+	cgrp = cgroup_kn_lock_live(of->kn, false);
+	if (!cgrp)
+		return -ENOENT;
+
+	cgrp->max_depth = depth;
+
+	cgroup_kn_unlock(of->kn);
+
+	return nbytes;
+}
+
 static int cgroup_events_show(struct seq_file *seq, void *v)
 {
 	seq_printf(seq, "populated %d\n",
@@ -4309,6 +4397,16 @@ static struct cftype cgroup_base_files[] = {
 		.file_offset = offsetof(struct cgroup, events_file),
 		.seq_show = cgroup_events_show,
 	},
+	{
+		.name = "cgroup.max.descendants",
+		.seq_show = cgroup_max_descendants_show,
+		.write = cgroup_max_descendants_write,
+	},
+	{
+		.name = "cgroup.max.depth",
+		.seq_show = cgroup_max_depth_show,
+		.write = cgroup_max_depth_write,
+	},
 	{ }	/* terminate */
 };
 
@@ -4662,6 +4760,29 @@ static struct cgroup *cgroup_create(struct cgroup *parent)
 	return ERR_PTR(ret);
 }
 
+static bool cgroup_check_hierarchy_limits(struct cgroup *parent)
+{
+	struct cgroup *cgroup;
+	int ret = false;
+	int level = 1;
+
+	lockdep_assert_held(&cgroup_mutex);
+
+	for (cgroup = parent; cgroup; cgroup = cgroup_parent(cgroup)) {
+		if (cgroup->nr_descendants >= cgroup->max_descendants)
+			goto fail;
+
+		if (level > cgroup->max_depth)
+			goto fail;
+
+		level++;
+	}
+
+	ret = true;
+fail:
+	return ret;
+}
+
 int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name, umode_t mode)
 {
 	struct cgroup *parent, *cgrp;
@@ -4676,6 +4797,11 @@ int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name, umode_t mode)
 	if (!parent)
 		return -ENODEV;
 
+	if (!cgroup_check_hierarchy_limits(parent)) {
+		ret = -EAGAIN;
+		goto out_unlock;
+	}
+
 	cgrp = cgroup_create(parent);
 	if (IS_ERR(cgrp)) {
 		ret = PTR_ERR(cgrp);
-- 
2.13.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC 3/4] cgroup: add cgroup.stat interface with basic hierarchy stats
  2017-08-02 16:55 [RFC 0/4] cgroup hierarchy controls and stats Roman Gushchin
@ 2017-08-02 16:55   ` Roman Gushchin
  2017-08-02 16:55   ` Roman Gushchin
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 16+ messages in thread
From: Roman Gushchin @ 2017-08-02 16:55 UTC (permalink / raw)
  To: cgroups
  Cc: Roman Gushchin, Tejun Heo, Zefan Li, Waiman Long,
	Johannes Weiner, kernel-team, linux-doc, linux-kernel

A cgroup can consume resources even after being deleted by a user.
For example, writing back dirty pages should be accounted and
limited, despite the corresponding cgroup might contain no processes
and being deleted by a user.

In the current implementation a cgroup can remain in such "dying" state
for an undefined amount of time. For instance, if a memory cgroup
contains a pge, mlocked by a process belonging to an other cgroup.

Although the lifecycle of a dying cgroup is out of user's control,
it's important to have some insight of what's going on under the hood.

In particular, it's handy to have a counter which will allow
to detect css leaks.

To solve this problem, add a cgroup.stat interface to
the base cgroup control files with the following metrics:

nr_descendants		total number of visible descendant cgroups
nr_dying_descendants	total number of dying descendant cgroups

Signed-off-by: Roman Gushchin <guro@fb.com>
Suggested-by: Tejun Heo <tj@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: kernel-team@fb.com
Cc: cgroups@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 Documentation/cgroup-v2.txt | 18 ++++++++++++++++++
 kernel/cgroup/cgroup.c      | 16 ++++++++++++++++
 2 files changed, 34 insertions(+)

diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
index 46ec3f76211c..dc44785dc0fa 100644
--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -868,6 +868,24 @@ All cgroup core files are prefixed with "cgroup."
 	If the actual descent depth is equal or larger,
 	an attempt to create a new child cgroup will fail.
 
+  cgroup.stat
+	A read-only flat-keyed file with the following entries:
+
+	  nr_descendants
+		Total number of visible descendant cgroups.
+
+	  nr_dying_descendants
+		Total number of dying descendant cgroups. A cgroup becomes
+		dying after being deleted by a user. The cgroup will remain
+		in dying state for some time undefined time (which can depend
+		on system load) before being completely destroyed.
+
+		A process can't enter a dying cgroup under any circumstances,
+		a dying cgroup can't revive.
+
+		A dying cgroup can consume system resources not exceeding
+		limits, which were active at the moment of cgroup deletion.
+
 
 Controllers
 ===========
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 9d53d69e44bb..f58e1fe8bebd 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -3304,6 +3304,18 @@ static int cgroup_events_show(struct seq_file *seq, void *v)
 	return 0;
 }
 
+static int cgroup_stats_show(struct seq_file *seq, void *v)
+{
+	struct cgroup *cgroup = seq_css(seq)->cgroup;
+
+	seq_printf(seq, "nr_descendants %d\n",
+		   cgroup->nr_descendants);
+	seq_printf(seq, "nr_dying_descendants %d\n",
+		   cgroup->nr_dying_descendants);
+
+	return 0;
+}
+
 static int cgroup_file_open(struct kernfs_open_file *of)
 {
 	struct cftype *cft = of->kn->priv;
@@ -4407,6 +4419,10 @@ static struct cftype cgroup_base_files[] = {
 		.seq_show = cgroup_max_depth_show,
 		.write = cgroup_max_depth_write,
 	},
+	{
+		.name = "cgroup.stat",
+		.seq_show = cgroup_stats_show,
+	},
 	{ }	/* terminate */
 };
 
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC 3/4] cgroup: add cgroup.stat interface with basic hierarchy stats
@ 2017-08-02 16:55   ` Roman Gushchin
  0 siblings, 0 replies; 16+ messages in thread
From: Roman Gushchin @ 2017-08-02 16:55 UTC (permalink / raw)
  To: cgroups
  Cc: Roman Gushchin, Tejun Heo, Zefan Li, Waiman Long,
	Johannes Weiner, kernel-team, linux-doc, linux-kernel

A cgroup can consume resources even after being deleted by a user.
For example, writing back dirty pages should be accounted and
limited, despite the corresponding cgroup might contain no processes
and being deleted by a user.

In the current implementation a cgroup can remain in such "dying" state
for an undefined amount of time. For instance, if a memory cgroup
contains a pge, mlocked by a process belonging to an other cgroup.

Although the lifecycle of a dying cgroup is out of user's control,
it's important to have some insight of what's going on under the hood.

In particular, it's handy to have a counter which will allow
to detect css leaks.

To solve this problem, add a cgroup.stat interface to
the base cgroup control files with the following metrics:

nr_descendants		total number of visible descendant cgroups
nr_dying_descendants	total number of dying descendant cgroups

Signed-off-by: Roman Gushchin <guro@fb.com>
Suggested-by: Tejun Heo <tj@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: kernel-team@fb.com
Cc: cgroups@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 Documentation/cgroup-v2.txt | 18 ++++++++++++++++++
 kernel/cgroup/cgroup.c      | 16 ++++++++++++++++
 2 files changed, 34 insertions(+)

diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
index 46ec3f76211c..dc44785dc0fa 100644
--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -868,6 +868,24 @@ All cgroup core files are prefixed with "cgroup."
 	If the actual descent depth is equal or larger,
 	an attempt to create a new child cgroup will fail.
 
+  cgroup.stat
+	A read-only flat-keyed file with the following entries:
+
+	  nr_descendants
+		Total number of visible descendant cgroups.
+
+	  nr_dying_descendants
+		Total number of dying descendant cgroups. A cgroup becomes
+		dying after being deleted by a user. The cgroup will remain
+		in dying state for some time undefined time (which can depend
+		on system load) before being completely destroyed.
+
+		A process can't enter a dying cgroup under any circumstances,
+		a dying cgroup can't revive.
+
+		A dying cgroup can consume system resources not exceeding
+		limits, which were active at the moment of cgroup deletion.
+
 
 Controllers
 ===========
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 9d53d69e44bb..f58e1fe8bebd 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -3304,6 +3304,18 @@ static int cgroup_events_show(struct seq_file *seq, void *v)
 	return 0;
 }
 
+static int cgroup_stats_show(struct seq_file *seq, void *v)
+{
+	struct cgroup *cgroup = seq_css(seq)->cgroup;
+
+	seq_printf(seq, "nr_descendants %d\n",
+		   cgroup->nr_descendants);
+	seq_printf(seq, "nr_dying_descendants %d\n",
+		   cgroup->nr_dying_descendants);
+
+	return 0;
+}
+
 static int cgroup_file_open(struct kernfs_open_file *of)
 {
 	struct cftype *cft = of->kn->priv;
@@ -4407,6 +4419,10 @@ static struct cftype cgroup_base_files[] = {
 		.seq_show = cgroup_max_depth_show,
 		.write = cgroup_max_depth_write,
 	},
+	{
+		.name = "cgroup.stat",
+		.seq_show = cgroup_stats_show,
+	},
 	{ }	/* terminate */
 };
 
-- 
2.13.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC 4/4] cgroup: re-use the parent pointer in cgroup_destroy_locked()
  2017-08-02 16:55 [RFC 0/4] cgroup hierarchy controls and stats Roman Gushchin
@ 2017-08-02 16:55   ` Roman Gushchin
  2017-08-02 16:55   ` Roman Gushchin
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 16+ messages in thread
From: Roman Gushchin @ 2017-08-02 16:55 UTC (permalink / raw)
  To: cgroups
  Cc: Roman Gushchin, Tejun Heo, Zefan Li, Waiman Long,
	Johannes Weiner, kernel-team, linux-kernel

As we already have a pointer to the parent cgroup in
cgroup_destroy_locked(), we don't need to calculate it again
to pass as an argument for cgroup1_check_for_release().

Signed-off-by: Roman Gushchin <guro@fb.com>
Suggested-by: Tejun Heo <tj@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: kernel-team@fb.com
Cc: linux-kernel@vger.kernel.org
---
 kernel/cgroup/cgroup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index f58e1fe8bebd..2d9de4ec7727 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -5022,7 +5022,7 @@ static int cgroup_destroy_locked(struct cgroup *cgrp)
 		tcgrp->nr_dying_descendants++;
 	}
 
-	cgroup1_check_for_release(cgroup_parent(cgrp));
+	cgroup1_check_for_release(parent);
 
 	/* put the base reference */
 	percpu_ref_kill(&cgrp->self.refcnt);
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC 4/4] cgroup: re-use the parent pointer in cgroup_destroy_locked()
@ 2017-08-02 16:55   ` Roman Gushchin
  0 siblings, 0 replies; 16+ messages in thread
From: Roman Gushchin @ 2017-08-02 16:55 UTC (permalink / raw)
  To: cgroups
  Cc: Roman Gushchin, Tejun Heo, Zefan Li, Waiman Long,
	Johannes Weiner, kernel-team, linux-kernel

As we already have a pointer to the parent cgroup in
cgroup_destroy_locked(), we don't need to calculate it again
to pass as an argument for cgroup1_check_for_release().

Signed-off-by: Roman Gushchin <guro@fb.com>
Suggested-by: Tejun Heo <tj@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: kernel-team@fb.com
Cc: linux-kernel@vger.kernel.org
---
 kernel/cgroup/cgroup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index f58e1fe8bebd..2d9de4ec7727 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -5022,7 +5022,7 @@ static int cgroup_destroy_locked(struct cgroup *cgrp)
 		tcgrp->nr_dying_descendants++;
 	}
 
-	cgroup1_check_for_release(cgroup_parent(cgrp));
+	cgroup1_check_for_release(parent);
 
 	/* put the base reference */
 	percpu_ref_kill(&cgrp->self.refcnt);
-- 
2.13.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC 2/4] cgroup: implement hierarchy limits
@ 2017-08-02 18:44     ` Tejun Heo
  0 siblings, 0 replies; 16+ messages in thread
From: Tejun Heo @ 2017-08-02 18:44 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: cgroups, Zefan Li, Waiman Long, Johannes Weiner, kernel-team,
	linux-doc, linux-kernel

Hello, Roman.

Generally looks good to me.  One minor nit.

On Wed, Aug 02, 2017 at 05:55:30PM +0100, Roman Gushchin wrote:
> +static ssize_t cgroup_max_descendants_write(struct kernfs_open_file *of,
> +					   char *buf, size_t nbytes, loff_t off)
> +{
> +	struct cgroup *cgrp;
> +	int descendants;
> +	ssize_t ret;
> +
> +	buf = strstrip(buf);
> +	if (!strcmp(buf, "max")) {
> +		descendants = INT_MAX;
> +	} else {
> +		ret = kstrtouint(buf, 0, &descendants);
                     ^^^^^^^^^^^
		     shouldn't this be kstrtoint?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 2/4] cgroup: implement hierarchy limits
@ 2017-08-02 18:44     ` Tejun Heo
  0 siblings, 0 replies; 16+ messages in thread
From: Tejun Heo @ 2017-08-02 18:44 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, Zefan Li, Waiman Long,
	Johannes Weiner, kernel-team-b10kYP2dOMg,
	linux-doc-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Hello, Roman.

Generally looks good to me.  One minor nit.

On Wed, Aug 02, 2017 at 05:55:30PM +0100, Roman Gushchin wrote:
> +static ssize_t cgroup_max_descendants_write(struct kernfs_open_file *of,
> +					   char *buf, size_t nbytes, loff_t off)
> +{
> +	struct cgroup *cgrp;
> +	int descendants;
> +	ssize_t ret;
> +
> +	buf = strstrip(buf);
> +	if (!strcmp(buf, "max")) {
> +		descendants = INT_MAX;
> +	} else {
> +		ret = kstrtouint(buf, 0, &descendants);
                     ^^^^^^^^^^^
		     shouldn't this be kstrtoint?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 4/4] cgroup: re-use the parent pointer in cgroup_destroy_locked()
  2017-08-02 16:55   ` Roman Gushchin
  (?)
@ 2017-08-02 18:50   ` Tejun Heo
  -1 siblings, 0 replies; 16+ messages in thread
From: Tejun Heo @ 2017-08-02 18:50 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: cgroups, Zefan Li, Waiman Long, Johannes Weiner, kernel-team,
	linux-kernel

On Wed, Aug 02, 2017 at 05:55:32PM +0100, Roman Gushchin wrote:
> As we already have a pointer to the parent cgroup in
> cgroup_destroy_locked(), we don't need to calculate it again
> to pass as an argument for cgroup1_check_for_release().

Except for the minor nit earlier, everything in the series looks good
to me.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 2/4] cgroup: implement hierarchy limits
  2017-08-02 18:44     ` Tejun Heo
@ 2017-08-02 18:55       ` Roman Gushchin
  -1 siblings, 0 replies; 16+ messages in thread
From: Roman Gushchin @ 2017-08-02 18:55 UTC (permalink / raw)
  To: Tejun Heo
  Cc: cgroups, Zefan Li, Waiman Long, Johannes Weiner, kernel-team,
	linux-doc, linux-kernel

On Wed, Aug 02, 2017 at 11:44:17AM -0700, Tejun Heo wrote:
> Hello, Roman.
> 
> Generally looks good to me.  One minor nit.
> 
> On Wed, Aug 02, 2017 at 05:55:30PM +0100, Roman Gushchin wrote:
> > +static ssize_t cgroup_max_descendants_write(struct kernfs_open_file *of,
> > +					   char *buf, size_t nbytes, loff_t off)
> > +{
> > +	struct cgroup *cgrp;
> > +	int descendants;
> > +	ssize_t ret;
> > +
> > +	buf = strstrip(buf);
> > +	if (!strcmp(buf, "max")) {
> > +		descendants = INT_MAX;
> > +	} else {
> > +		ret = kstrtouint(buf, 0, &descendants);
>                      ^^^^^^^^^^^
> 		     shouldn't this be kstrtoint?

Hi, Tejun!

Of course, it should.
Please, find an updated version below.

Thank you!

Roman

--

>From c6492640f7a70b88e4b573c6f04081e3c82ce8fa Mon Sep 17 00:00:00 2001
From: Roman Gushchin <guro@fb.com>
Date: Fri, 28 Jul 2017 18:28:44 +0100
Subject: [PATCH 2/4] cgroup: implement hierarchy limits

Creating cgroup hierearchies of unreasonable size can affect
overall system performance. A user might want to limit the
size of cgroup hierarchy. This is especially important if a user
is delegating some cgroup sub-tree.

To address this issue, introduce an ability to control
the size of cgroup hierarchy.

The cgroup.max.descendants control file allows to set the maximum
allowed number of descendant cgroups.
The cgroup.max.depth file controls the maximum depth of the cgroup
tree. Both are single value r/w files, with "max" default value.

The control files exist on each hierarchy level (including root).
When a new cgroup is created, we check the total descendants
and depth limits on each level, and if none of them are exceeded,
a new cgroup is created.

Only alive cgroups are counted, removed (dying) cgroups are
ignored.

Signed-off-by: Roman Gushchin <guro@fb.com>
Suggested-by: Tejun Heo <tj@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: kernel-team@fb.com
Cc: cgroups@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 Documentation/cgroup-v2.txt |  14 +++++
 include/linux/cgroup-defs.h |   5 ++
 kernel/cgroup/cgroup.c      | 126 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 145 insertions(+)

diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
index dec5afdaa36d..46ec3f76211c 100644
--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -854,6 +854,20 @@ All cgroup core files are prefixed with "cgroup."
 		1 if the cgroup or its descendants contains any live
 		processes; otherwise, 0.
 
+  cgroup.max.descendants
+	A read-write single value files.  The default is "max".
+
+	Maximum allowed number of descent cgroups.
+	If the actual number of descendants is equal or larger,
+	an attempt to create a new cgroup in the hierarchy will fail.
+
+  cgroup.max.depth
+	A read-write single value files.  The default is "max".
+
+	Maximum allowed descent depth below the current cgroup.
+	If the actual descent depth is equal or larger,
+	an attempt to create a new child cgroup will fail.
+
 
 Controllers
 ===========
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 58b4c425a155..59e4ad9e7bac 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -273,13 +273,18 @@ struct cgroup {
 	 */
 	int level;
 
+	/* Maximum allowed descent tree depth */
+	int max_depth;
+
 	/*
 	 * Keep track of total numbers of visible and dying descent cgroups.
 	 * Dying cgroups are cgroups which were deleted by a user,
 	 * but are still existing because someone else is holding a reference.
+	 * max_descendants is a maximum allowed number of descent cgroups.
 	 */
 	int nr_descendants;
 	int nr_dying_descendants;
+	int max_descendants;
 
 	/*
 	 * Each non-empty css_set associated with this cgroup contributes
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index cfdbb1e780de..0fd9134e1720 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -1827,6 +1827,8 @@ static void init_cgroup_housekeeping(struct cgroup *cgrp)
 	cgrp->self.cgroup = cgrp;
 	cgrp->self.flags |= CSS_ONLINE;
 	cgrp->dom_cgrp = cgrp;
+	cgrp->max_descendants = INT_MAX;
+	cgrp->max_depth = INT_MAX;
 
 	for_each_subsys(ss, ssid)
 		INIT_LIST_HEAD(&cgrp->e_csets[ssid]);
@@ -3209,6 +3211,92 @@ static ssize_t cgroup_type_write(struct kernfs_open_file *of, char *buf,
 	return ret ?: nbytes;
 }
 
+static int cgroup_max_descendants_show(struct seq_file *seq, void *v)
+{
+	struct cgroup *cgrp = seq_css(seq)->cgroup;
+	int descendants = READ_ONCE(cgrp->max_descendants);
+
+	if (descendants == INT_MAX)
+		seq_puts(seq, "max\n");
+	else
+		seq_printf(seq, "%d\n", descendants);
+
+	return 0;
+}
+
+static ssize_t cgroup_max_descendants_write(struct kernfs_open_file *of,
+					   char *buf, size_t nbytes, loff_t off)
+{
+	struct cgroup *cgrp;
+	int descendants;
+	ssize_t ret;
+
+	buf = strstrip(buf);
+	if (!strcmp(buf, "max")) {
+		descendants = INT_MAX;
+	} else {
+		ret = kstrtoint(buf, 0, &descendants);
+		if (ret)
+			return ret;
+	}
+
+	if (descendants < 0 || descendants > INT_MAX)
+		return -ERANGE;
+
+	cgrp = cgroup_kn_lock_live(of->kn, false);
+	if (!cgrp)
+		return -ENOENT;
+
+	cgrp->max_descendants = descendants;
+
+	cgroup_kn_unlock(of->kn);
+
+	return nbytes;
+}
+
+static int cgroup_max_depth_show(struct seq_file *seq, void *v)
+{
+	struct cgroup *cgrp = seq_css(seq)->cgroup;
+	int depth = READ_ONCE(cgrp->max_depth);
+
+	if (depth == INT_MAX)
+		seq_puts(seq, "max\n");
+	else
+		seq_printf(seq, "%d\n", depth);
+
+	return 0;
+}
+
+static ssize_t cgroup_max_depth_write(struct kernfs_open_file *of,
+				      char *buf, size_t nbytes, loff_t off)
+{
+	struct cgroup *cgrp;
+	ssize_t ret;
+	int depth;
+
+	buf = strstrip(buf);
+	if (!strcmp(buf, "max")) {
+		depth = INT_MAX;
+	} else {
+		ret = kstrtoint(buf, 0, &depth);
+		if (ret)
+			return ret;
+	}
+
+	if (depth < 0 || depth > INT_MAX)
+		return -ERANGE;
+
+	cgrp = cgroup_kn_lock_live(of->kn, false);
+	if (!cgrp)
+		return -ENOENT;
+
+	cgrp->max_depth = depth;
+
+	cgroup_kn_unlock(of->kn);
+
+	return nbytes;
+}
+
 static int cgroup_events_show(struct seq_file *seq, void *v)
 {
 	seq_printf(seq, "populated %d\n",
@@ -4309,6 +4397,16 @@ static struct cftype cgroup_base_files[] = {
 		.file_offset = offsetof(struct cgroup, events_file),
 		.seq_show = cgroup_events_show,
 	},
+	{
+		.name = "cgroup.max.descendants",
+		.seq_show = cgroup_max_descendants_show,
+		.write = cgroup_max_descendants_write,
+	},
+	{
+		.name = "cgroup.max.depth",
+		.seq_show = cgroup_max_depth_show,
+		.write = cgroup_max_depth_write,
+	},
 	{ }	/* terminate */
 };
 
@@ -4662,6 +4760,29 @@ static struct cgroup *cgroup_create(struct cgroup *parent)
 	return ERR_PTR(ret);
 }
 
+static bool cgroup_check_hierarchy_limits(struct cgroup *parent)
+{
+	struct cgroup *cgroup;
+	int ret = false;
+	int level = 1;
+
+	lockdep_assert_held(&cgroup_mutex);
+
+	for (cgroup = parent; cgroup; cgroup = cgroup_parent(cgroup)) {
+		if (cgroup->nr_descendants >= cgroup->max_descendants)
+			goto fail;
+
+		if (level > cgroup->max_depth)
+			goto fail;
+
+		level++;
+	}
+
+	ret = true;
+fail:
+	return ret;
+}
+
 int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name, umode_t mode)
 {
 	struct cgroup *parent, *cgrp;
@@ -4676,6 +4797,11 @@ int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name, umode_t mode)
 	if (!parent)
 		return -ENODEV;
 
+	if (!cgroup_check_hierarchy_limits(parent)) {
+		ret = -EAGAIN;
+		goto out_unlock;
+	}
+
 	cgrp = cgroup_create(parent);
 	if (IS_ERR(cgrp)) {
 		ret = PTR_ERR(cgrp);
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC 2/4] cgroup: implement hierarchy limits
@ 2017-08-02 18:55       ` Roman Gushchin
  0 siblings, 0 replies; 16+ messages in thread
From: Roman Gushchin @ 2017-08-02 18:55 UTC (permalink / raw)
  To: Tejun Heo
  Cc: cgroups, Zefan Li, Waiman Long, Johannes Weiner, kernel-team,
	linux-doc, linux-kernel

On Wed, Aug 02, 2017 at 11:44:17AM -0700, Tejun Heo wrote:
> Hello, Roman.
> 
> Generally looks good to me.  One minor nit.
> 
> On Wed, Aug 02, 2017 at 05:55:30PM +0100, Roman Gushchin wrote:
> > +static ssize_t cgroup_max_descendants_write(struct kernfs_open_file *of,
> > +					   char *buf, size_t nbytes, loff_t off)
> > +{
> > +	struct cgroup *cgrp;
> > +	int descendants;
> > +	ssize_t ret;
> > +
> > +	buf = strstrip(buf);
> > +	if (!strcmp(buf, "max")) {
> > +		descendants = INT_MAX;
> > +	} else {
> > +		ret = kstrtouint(buf, 0, &descendants);
>                      ^^^^^^^^^^^
> 		     shouldn't this be kstrtoint?

Hi, Tejun!

Of course, it should.
Please, find an updated version below.

Thank you!

Roman

--

From c6492640f7a70b88e4b573c6f04081e3c82ce8fa Mon Sep 17 00:00:00 2001
From: Roman Gushchin <guro@fb.com>
Date: Fri, 28 Jul 2017 18:28:44 +0100
Subject: [PATCH 2/4] cgroup: implement hierarchy limits

Creating cgroup hierearchies of unreasonable size can affect
overall system performance. A user might want to limit the
size of cgroup hierarchy. This is especially important if a user
is delegating some cgroup sub-tree.

To address this issue, introduce an ability to control
the size of cgroup hierarchy.

The cgroup.max.descendants control file allows to set the maximum
allowed number of descendant cgroups.
The cgroup.max.depth file controls the maximum depth of the cgroup
tree. Both are single value r/w files, with "max" default value.

The control files exist on each hierarchy level (including root).
When a new cgroup is created, we check the total descendants
and depth limits on each level, and if none of them are exceeded,
a new cgroup is created.

Only alive cgroups are counted, removed (dying) cgroups are
ignored.

Signed-off-by: Roman Gushchin <guro@fb.com>
Suggested-by: Tejun Heo <tj@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: kernel-team@fb.com
Cc: cgroups@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 Documentation/cgroup-v2.txt |  14 +++++
 include/linux/cgroup-defs.h |   5 ++
 kernel/cgroup/cgroup.c      | 126 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 145 insertions(+)

diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
index dec5afdaa36d..46ec3f76211c 100644
--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -854,6 +854,20 @@ All cgroup core files are prefixed with "cgroup."
 		1 if the cgroup or its descendants contains any live
 		processes; otherwise, 0.
 
+  cgroup.max.descendants
+	A read-write single value files.  The default is "max".
+
+	Maximum allowed number of descent cgroups.
+	If the actual number of descendants is equal or larger,
+	an attempt to create a new cgroup in the hierarchy will fail.
+
+  cgroup.max.depth
+	A read-write single value files.  The default is "max".
+
+	Maximum allowed descent depth below the current cgroup.
+	If the actual descent depth is equal or larger,
+	an attempt to create a new child cgroup will fail.
+
 
 Controllers
 ===========
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 58b4c425a155..59e4ad9e7bac 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -273,13 +273,18 @@ struct cgroup {
 	 */
 	int level;
 
+	/* Maximum allowed descent tree depth */
+	int max_depth;
+
 	/*
 	 * Keep track of total numbers of visible and dying descent cgroups.
 	 * Dying cgroups are cgroups which were deleted by a user,
 	 * but are still existing because someone else is holding a reference.
+	 * max_descendants is a maximum allowed number of descent cgroups.
 	 */
 	int nr_descendants;
 	int nr_dying_descendants;
+	int max_descendants;
 
 	/*
 	 * Each non-empty css_set associated with this cgroup contributes
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index cfdbb1e780de..0fd9134e1720 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -1827,6 +1827,8 @@ static void init_cgroup_housekeeping(struct cgroup *cgrp)
 	cgrp->self.cgroup = cgrp;
 	cgrp->self.flags |= CSS_ONLINE;
 	cgrp->dom_cgrp = cgrp;
+	cgrp->max_descendants = INT_MAX;
+	cgrp->max_depth = INT_MAX;
 
 	for_each_subsys(ss, ssid)
 		INIT_LIST_HEAD(&cgrp->e_csets[ssid]);
@@ -3209,6 +3211,92 @@ static ssize_t cgroup_type_write(struct kernfs_open_file *of, char *buf,
 	return ret ?: nbytes;
 }
 
+static int cgroup_max_descendants_show(struct seq_file *seq, void *v)
+{
+	struct cgroup *cgrp = seq_css(seq)->cgroup;
+	int descendants = READ_ONCE(cgrp->max_descendants);
+
+	if (descendants == INT_MAX)
+		seq_puts(seq, "max\n");
+	else
+		seq_printf(seq, "%d\n", descendants);
+
+	return 0;
+}
+
+static ssize_t cgroup_max_descendants_write(struct kernfs_open_file *of,
+					   char *buf, size_t nbytes, loff_t off)
+{
+	struct cgroup *cgrp;
+	int descendants;
+	ssize_t ret;
+
+	buf = strstrip(buf);
+	if (!strcmp(buf, "max")) {
+		descendants = INT_MAX;
+	} else {
+		ret = kstrtoint(buf, 0, &descendants);
+		if (ret)
+			return ret;
+	}
+
+	if (descendants < 0 || descendants > INT_MAX)
+		return -ERANGE;
+
+	cgrp = cgroup_kn_lock_live(of->kn, false);
+	if (!cgrp)
+		return -ENOENT;
+
+	cgrp->max_descendants = descendants;
+
+	cgroup_kn_unlock(of->kn);
+
+	return nbytes;
+}
+
+static int cgroup_max_depth_show(struct seq_file *seq, void *v)
+{
+	struct cgroup *cgrp = seq_css(seq)->cgroup;
+	int depth = READ_ONCE(cgrp->max_depth);
+
+	if (depth == INT_MAX)
+		seq_puts(seq, "max\n");
+	else
+		seq_printf(seq, "%d\n", depth);
+
+	return 0;
+}
+
+static ssize_t cgroup_max_depth_write(struct kernfs_open_file *of,
+				      char *buf, size_t nbytes, loff_t off)
+{
+	struct cgroup *cgrp;
+	ssize_t ret;
+	int depth;
+
+	buf = strstrip(buf);
+	if (!strcmp(buf, "max")) {
+		depth = INT_MAX;
+	} else {
+		ret = kstrtoint(buf, 0, &depth);
+		if (ret)
+			return ret;
+	}
+
+	if (depth < 0 || depth > INT_MAX)
+		return -ERANGE;
+
+	cgrp = cgroup_kn_lock_live(of->kn, false);
+	if (!cgrp)
+		return -ENOENT;
+
+	cgrp->max_depth = depth;
+
+	cgroup_kn_unlock(of->kn);
+
+	return nbytes;
+}
+
 static int cgroup_events_show(struct seq_file *seq, void *v)
 {
 	seq_printf(seq, "populated %d\n",
@@ -4309,6 +4397,16 @@ static struct cftype cgroup_base_files[] = {
 		.file_offset = offsetof(struct cgroup, events_file),
 		.seq_show = cgroup_events_show,
 	},
+	{
+		.name = "cgroup.max.descendants",
+		.seq_show = cgroup_max_descendants_show,
+		.write = cgroup_max_descendants_write,
+	},
+	{
+		.name = "cgroup.max.depth",
+		.seq_show = cgroup_max_depth_show,
+		.write = cgroup_max_depth_write,
+	},
 	{ }	/* terminate */
 };
 
@@ -4662,6 +4760,29 @@ static struct cgroup *cgroup_create(struct cgroup *parent)
 	return ERR_PTR(ret);
 }
 
+static bool cgroup_check_hierarchy_limits(struct cgroup *parent)
+{
+	struct cgroup *cgroup;
+	int ret = false;
+	int level = 1;
+
+	lockdep_assert_held(&cgroup_mutex);
+
+	for (cgroup = parent; cgroup; cgroup = cgroup_parent(cgroup)) {
+		if (cgroup->nr_descendants >= cgroup->max_descendants)
+			goto fail;
+
+		if (level > cgroup->max_depth)
+			goto fail;
+
+		level++;
+	}
+
+	ret = true;
+fail:
+	return ret;
+}
+
 int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name, umode_t mode)
 {
 	struct cgroup *parent, *cgrp;
@@ -4676,6 +4797,11 @@ int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name, umode_t mode)
 	if (!parent)
 		return -ENODEV;
 
+	if (!cgroup_check_hierarchy_limits(parent)) {
+		ret = -EAGAIN;
+		goto out_unlock;
+	}
+
 	cgrp = cgroup_create(parent);
 	if (IS_ERR(cgrp)) {
 		ret = PTR_ERR(cgrp);
-- 
2.13.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC 0/4] cgroup hierarchy controls and stats
       [not found] ` <20170802165532.22277-1-guro-b10kYP2dOMg@public.gmane.org>
@ 2017-08-02 19:06   ` Tejun Heo
  0 siblings, 0 replies; 16+ messages in thread
From: Tejun Heo @ 2017-08-02 19:06 UTC (permalink / raw)
  To: Roman Gushchin; +Cc: cgroups-u79uwXL29TY76Z2rM5mHXA

Hello,

On Wed, Aug 02, 2017 at 05:55:28PM +0100, Roman Gushchin wrote:
> Creating cgroup hierearchies of an unreasonable size can affect
> system performance. A user might want to limit the size
> of the cgroup hierarchy.
> 
> This patchset implements an ability to control and monitor cgroup
> hierarchy size.
> 
> Patch 1 implements tracking of live and dying descendant cgroups
>         on each cgroup level.
> Patch 2 adds cgroup.max.descendants and cgroup.max.depth interfaces
>         to set up hierarchy limits.
> Patch 3 adds cgroup.stat interface with simple hierarchy stats.
> Patch 4 is a trivial cleanup.

Applied 1-4 to cgroup/for-4.14.  We *might* have further discussions
on the specifics of the interface but in terms of feature and internal
implementation, I don't think there's anything which can be
controversial.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC 0/4] cgroup hierarchy controls and stats
@ 2017-08-02 18:39 Roman Gushchin
  0 siblings, 0 replies; 16+ messages in thread
From: Roman Gushchin @ 2017-08-02 18:39 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Zefan Li, Waiman Long, Johannes Weiner, kernel-team, linux-doc,
	linux-kernel

Date: Wed,  2 Aug 2017 17:55:28 +0100
From: Roman Gushchin <guro@fb.com>
To: cgroups@vger.kernel.org
Cc: Roman Gushchin <guro@fb.com>
Subject: [RFC 0/4] cgroup hierarchy controls and stats
X-Mailer: git-send-email 2.13.3

Creating cgroup hierearchies of an unreasonable size can affect
system performance. A user might want to limit the size
of the cgroup hierarchy.

This patchset implements an ability to control and monitor cgroup
hierarchy size.

Patch 1 implements tracking of live and dying descendant cgroups
        on each cgroup level.
Patch 2 adds cgroup.max.descendants and cgroup.max.depth interfaces
        to set up hierarchy limits.
Patch 3 adds cgroup.stat interface with simple hierarchy stats.
Patch 4 is a trivial cleanup.

Roman Gushchin (4):
  cgroup: keep track of number of descent cgroups
  cgroup: implement hierarchy limits
  cgroup: add cgroup.stat interface with basic hierarchy stats
  cgroup: re-use the parent pointer in cgroup_destroy_locked()

 Documentation/cgroup-v2.txt |  32 +++++++++
 include/linux/cgroup-defs.h |  13 ++++
 kernel/cgroup/cgroup.c      | 163 +++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 205 insertions(+), 3 deletions(-)

--
2.13.3

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2017-08-02 19:06 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-02 16:55 [RFC 0/4] cgroup hierarchy controls and stats Roman Gushchin
2017-08-02 16:55 ` [RFC 1/4] cgroup: keep track of number of descent cgroups Roman Gushchin
2017-08-02 16:55   ` Roman Gushchin
2017-08-02 16:55 ` [RFC 2/4] cgroup: implement hierarchy limits Roman Gushchin
2017-08-02 16:55   ` Roman Gushchin
2017-08-02 18:44   ` Tejun Heo
2017-08-02 18:44     ` Tejun Heo
2017-08-02 18:55     ` Roman Gushchin
2017-08-02 18:55       ` Roman Gushchin
2017-08-02 16:55 ` [RFC 3/4] cgroup: add cgroup.stat interface with basic hierarchy stats Roman Gushchin
2017-08-02 16:55   ` Roman Gushchin
2017-08-02 16:55 ` [RFC 4/4] cgroup: re-use the parent pointer in cgroup_destroy_locked() Roman Gushchin
2017-08-02 16:55   ` Roman Gushchin
2017-08-02 18:50   ` Tejun Heo
     [not found] ` <20170802165532.22277-1-guro-b10kYP2dOMg@public.gmane.org>
2017-08-02 19:06   ` [RFC 0/4] cgroup hierarchy controls and stats Tejun Heo
2017-08-02 18:39 Roman Gushchin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.