All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET for-4.13] cgroup: implement cgroup2 thread mode, v2
@ 2017-06-10 14:03 ` Tejun Heo
  0 siblings, 0 replies; 47+ messages in thread
From: Tejun Heo @ 2017-06-10 14:03 UTC (permalink / raw)
  To: Li Zefan, hannes, peterz, mingo, longman
  Cc: cgroups, linux-kernel, kernel-team, pjt, luto, efault, torvalds

Hello,

This is v2 of cgroup2 thread mode patchset.  The changes from the last
take[L] are

* Support for mixed thread mode for the root cgroup added.  This
  allows the root cgroup to serve as both a thread root and a parent
  to domain cgroups.  This allows users to use thread mode without any
  nesting while not interfering with domain level cgroup operations.
  Note that this simply makes use of the fact that the system root has
  always been exempt from no-internal-process constraint.  The whole
  resource hierarchy still follows the same basic rules
  w.r.t. resource domains.

* Thread mode enable / disable now piggy backs on the existing control
  mask update mechanism rather than implementing manual css_set update
  mechanism of its own.  This makes the mechanism simpler and more
  flexible.

* Fixes and cleanups, including a fix from Waiman.

It is largely based on the discussions that we had at the plumbers
last year.  Here's the rough outline.

* Thread mode is explicitly enabled on a cgroup by writing "enable"
  into "cgroup.threads" file.  The cgroup shouldn't have any child
  cgroups or enabled controllers.

* Once enabled, arbitrary sub-hierarchy can be created and threads can
  be put anywhere in the subtree by writing TIDs into "cgroup.threads"
  file.  Process granularity and no-internal-process constraint don't
  apply in a threaded subtree.

* To be used in a threaded subtree, controllers should explicitly
  declare thread mode support and should be able to handle internal
  competition in some way.

* The root of a threaded subtree serves as the resource domain for the
  whole subtree.  This is where all the controllers are guaranteed to
  have a common ground and resource consumptions in the threaded
  subtree which aren't tied to a specific thread are charged.
  Non-threaded controllers never see beyond thread root and can assume
  that all controllers will follow the same rules upto that point.

* Root cgroup can enable thread mode anytime and a first level child
  can opt-in to that thread subtree anchored at root by writing "join"
  to "cgroup.threads" files, start its own thread subtree or just be a
  normal cgroup.

This allows threaded controllers to implement thread granular resource
control without getting in the way of system level resource
partitioning.

This patchset contains the following ten patches.  For more details on
the interface and behavior, please refer to 0007.

 0001-cgroup-separate-out-cgroup_has_tasks.patch
 0002-cgroup-reorganize-cgroup.procs-task-write-path.patch
 0003-cgroup-Fix-reference-counting-bug-in-cgroup_procs_wr.patch
 0004-cgroup-add-flags-to-css_task_iter_start-and-implemen.patch
 0005-cgroup-introduce-cgroup-proc_cgrp-and-threaded-css_s.patch
 0006-cgroup-implement-CSS_TASK_ITER_THREADED.patch
 0007-cgroup-implement-cgroup-v2-thread-support.patch
 0008-sched-Misc-preps-for-cgroup-unified-hierarchy-interf.patch
 0009-sched-Implement-interface-for-cgroup-unified-hierarc.patch
 0010-sched-Make-cpu-cpuacct-threaded-controllers.patch

0001-0007 implement cgroup2 thread mode.  0008-0010 enable CPU
controller on cgroup2 and mark them as supporting thread mode.
0008-0010 are included for reference.

The patchset is based on the current master 179145e6312b ("Merge tag
'iommu-fixes-v4.12-rc4' of
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu") and also
available in the following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-cgroup2-threads-v2

diffstat follows.

 Documentation/cgroup-v2.txt     |   99 +++++
 include/linux/cgroup-defs.h     |   34 +
 include/linux/cgroup.h          |   12 
 kernel/cgroup/cgroup-internal.h |    8 
 kernel/cgroup/cgroup-v1.c       |   64 +++
 kernel/cgroup/cgroup.c          |  727 ++++++++++++++++++++++++++++++++--------
 kernel/cgroup/cpuset.c          |    6 
 kernel/cgroup/freezer.c         |    6 
 kernel/cgroup/pids.c            |    1 
 kernel/events/core.c            |    1 
 kernel/sched/core.c             |  150 ++++++++
 kernel/sched/cpuacct.c          |   53 ++
 kernel/sched/cpuacct.h          |    5 
 mm/memcontrol.c                 |    2 
 net/core/netclassid_cgroup.c    |    2 
 15 files changed, 992 insertions(+), 178 deletions(-)

Thanks.

--
tejun

[L] http://lkml.kernel.org/r/20170202200632.13992-1-tj@kernel.org

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCHSET for-4.13] cgroup: implement cgroup2 thread mode, v2
@ 2017-06-10 14:03 ` Tejun Heo
  0 siblings, 0 replies; 47+ messages in thread
From: Tejun Heo @ 2017-06-10 14:03 UTC (permalink / raw)
  To: Li Zefan, hannes-druUgvl0LCNAfugRpC6u6w,
	peterz-wEGCiKHe2LqWVfeAwA7xHQ, mingo-H+wXaHxf7aLQT0dZR+AlfA,
	longman-H+wXaHxf7aLQT0dZR+AlfA
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg,
	pjt-hpIqsD4AKlfQT0dZR+AlfA, luto-kltTT9wpgjJwATOyAt5JVQ,
	efault-Mmb7MZpHnFY, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

Hello,

This is v2 of cgroup2 thread mode patchset.  The changes from the last
take[L] are

* Support for mixed thread mode for the root cgroup added.  This
  allows the root cgroup to serve as both a thread root and a parent
  to domain cgroups.  This allows users to use thread mode without any
  nesting while not interfering with domain level cgroup operations.
  Note that this simply makes use of the fact that the system root has
  always been exempt from no-internal-process constraint.  The whole
  resource hierarchy still follows the same basic rules
  w.r.t. resource domains.

* Thread mode enable / disable now piggy backs on the existing control
  mask update mechanism rather than implementing manual css_set update
  mechanism of its own.  This makes the mechanism simpler and more
  flexible.

* Fixes and cleanups, including a fix from Waiman.

It is largely based on the discussions that we had at the plumbers
last year.  Here's the rough outline.

* Thread mode is explicitly enabled on a cgroup by writing "enable"
  into "cgroup.threads" file.  The cgroup shouldn't have any child
  cgroups or enabled controllers.

* Once enabled, arbitrary sub-hierarchy can be created and threads can
  be put anywhere in the subtree by writing TIDs into "cgroup.threads"
  file.  Process granularity and no-internal-process constraint don't
  apply in a threaded subtree.

* To be used in a threaded subtree, controllers should explicitly
  declare thread mode support and should be able to handle internal
  competition in some way.

* The root of a threaded subtree serves as the resource domain for the
  whole subtree.  This is where all the controllers are guaranteed to
  have a common ground and resource consumptions in the threaded
  subtree which aren't tied to a specific thread are charged.
  Non-threaded controllers never see beyond thread root and can assume
  that all controllers will follow the same rules upto that point.

* Root cgroup can enable thread mode anytime and a first level child
  can opt-in to that thread subtree anchored at root by writing "join"
  to "cgroup.threads" files, start its own thread subtree or just be a
  normal cgroup.

This allows threaded controllers to implement thread granular resource
control without getting in the way of system level resource
partitioning.

This patchset contains the following ten patches.  For more details on
the interface and behavior, please refer to 0007.

 0001-cgroup-separate-out-cgroup_has_tasks.patch
 0002-cgroup-reorganize-cgroup.procs-task-write-path.patch
 0003-cgroup-Fix-reference-counting-bug-in-cgroup_procs_wr.patch
 0004-cgroup-add-flags-to-css_task_iter_start-and-implemen.patch
 0005-cgroup-introduce-cgroup-proc_cgrp-and-threaded-css_s.patch
 0006-cgroup-implement-CSS_TASK_ITER_THREADED.patch
 0007-cgroup-implement-cgroup-v2-thread-support.patch
 0008-sched-Misc-preps-for-cgroup-unified-hierarchy-interf.patch
 0009-sched-Implement-interface-for-cgroup-unified-hierarc.patch
 0010-sched-Make-cpu-cpuacct-threaded-controllers.patch

0001-0007 implement cgroup2 thread mode.  0008-0010 enable CPU
controller on cgroup2 and mark them as supporting thread mode.
0008-0010 are included for reference.

The patchset is based on the current master 179145e6312b ("Merge tag
'iommu-fixes-v4.12-rc4' of
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu") and also
available in the following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-cgroup2-threads-v2

diffstat follows.

 Documentation/cgroup-v2.txt     |   99 +++++
 include/linux/cgroup-defs.h     |   34 +
 include/linux/cgroup.h          |   12 
 kernel/cgroup/cgroup-internal.h |    8 
 kernel/cgroup/cgroup-v1.c       |   64 +++
 kernel/cgroup/cgroup.c          |  727 ++++++++++++++++++++++++++++++++--------
 kernel/cgroup/cpuset.c          |    6 
 kernel/cgroup/freezer.c         |    6 
 kernel/cgroup/pids.c            |    1 
 kernel/events/core.c            |    1 
 kernel/sched/core.c             |  150 ++++++++
 kernel/sched/cpuacct.c          |   53 ++
 kernel/sched/cpuacct.h          |    5 
 mm/memcontrol.c                 |    2 
 net/core/netclassid_cgroup.c    |    2 
 15 files changed, 992 insertions(+), 178 deletions(-)

Thanks.

--
tejun

[L] http://lkml.kernel.org/r/20170202200632.13992-1-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 01/10] cgroup: separate out cgroup_has_tasks()
  2017-06-10 14:03 ` Tejun Heo
  (?)
@ 2017-06-10 14:03 ` Tejun Heo
  -1 siblings, 0 replies; 47+ messages in thread
From: Tejun Heo @ 2017-06-10 14:03 UTC (permalink / raw)
  To: Li Zefan, hannes, peterz, mingo, longman
  Cc: cgroups, linux-kernel, kernel-team, pjt, luto, efault, torvalds,
	Tejun Heo

Separate out cgroup_has_tasks() test from
cgroup_subtree_control_write().  This will be used by the following
changes.

This patch doesn't cause any behavior changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/cgroup/cgroup.c | 49 +++++++++++++++++++++++++++----------------------
 1 file changed, 27 insertions(+), 22 deletions(-)

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 8d4e85eae42c..dcd120af4084 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -581,6 +581,30 @@ static bool css_set_populated(struct css_set *cset)
 	return !list_empty(&cset->tasks) || !list_empty(&cset->mg_tasks);
 }
 
+static bool cgroup_has_tasks(struct cgroup *cgrp)
+{
+	struct cgrp_cset_link *link;
+	bool has_tasks = false;
+
+	/*
+	 * Because namespaces pin csets too, @cgrp->cset_links
+	 * might not be empty even when @cgrp is empty.  Walk and
+	 * verify each cset.
+	 */
+	spin_lock_irq(&css_set_lock);
+
+	list_for_each_entry(link, &cgrp->cset_links, cset_link) {
+		if (css_set_populated(link->cset)) {
+			has_tasks = true;
+			break;
+		}
+	}
+
+	spin_unlock_irq(&css_set_lock);
+
+	return has_tasks;
+}
+
 /**
  * cgroup_update_populated - updated populated count of a cgroup
  * @cgrp: the target cgroup
@@ -2886,28 +2910,9 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
 	 * Except for the root, subtree_control must be zero for a cgroup
 	 * with tasks so that child cgroups don't compete against tasks.
 	 */
-	if (enable && cgroup_parent(cgrp)) {
-		struct cgrp_cset_link *link;
-
-		/*
-		 * Because namespaces pin csets too, @cgrp->cset_links
-		 * might not be empty even when @cgrp is empty.  Walk and
-		 * verify each cset.
-		 */
-		spin_lock_irq(&css_set_lock);
-
-		ret = 0;
-		list_for_each_entry(link, &cgrp->cset_links, cset_link) {
-			if (css_set_populated(link->cset)) {
-				ret = -EBUSY;
-				break;
-			}
-		}
-
-		spin_unlock_irq(&css_set_lock);
-
-		if (ret)
-			goto out_unlock;
+	if (enable && cgroup_parent(cgrp) && cgroup_has_tasks(cgrp)) {
+		ret = -EBUSY;
+		goto out_unlock;
 	}
 
 	/* save and update control masks and prepare csses */
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 02/10] cgroup: reorganize cgroup.procs / task write path
  2017-06-10 14:03 ` Tejun Heo
  (?)
  (?)
@ 2017-06-10 14:03 ` Tejun Heo
  -1 siblings, 0 replies; 47+ messages in thread
From: Tejun Heo @ 2017-06-10 14:03 UTC (permalink / raw)
  To: Li Zefan, hannes, peterz, mingo, longman
  Cc: cgroups, linux-kernel, kernel-team, pjt, luto, efault, torvalds,
	Tejun Heo

Currently, writes "cgroup.procs" and "cgroup.tasks" files are all
handled by __cgroup_procs_write() on both v1 and v2.  This patch
reoragnizes the write path so that there are common helper functions
that different write paths use.

While this somewhat increases LOC, the different paths are no longer
intertwined and each path has more flexibility to implement different
behaviors which will be necessary for the planned v2 thread support.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/cgroup/cgroup-internal.h |   8 +-
 kernel/cgroup/cgroup-v1.c       |  58 ++++++++++++--
 kernel/cgroup/cgroup.c          | 163 +++++++++++++++++++++-------------------
 3 files changed, 142 insertions(+), 87 deletions(-)

diff --git a/kernel/cgroup/cgroup-internal.h b/kernel/cgroup/cgroup-internal.h
index 00f4d6bf048f..f0a0dba97bad 100644
--- a/kernel/cgroup/cgroup-internal.h
+++ b/kernel/cgroup/cgroup-internal.h
@@ -180,10 +180,10 @@ int cgroup_migrate(struct task_struct *leader, bool threadgroup,
 
 int cgroup_attach_task(struct cgroup *dst_cgrp, struct task_struct *leader,
 		       bool threadgroup);
-ssize_t __cgroup_procs_write(struct kernfs_open_file *of, char *buf,
-			     size_t nbytes, loff_t off, bool threadgroup);
-ssize_t cgroup_procs_write(struct kernfs_open_file *of, char *buf, size_t nbytes,
-			   loff_t off);
+struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup)
+	__acquires(&cgroup_threadgroup_rwsem);
+void cgroup_procs_write_finish(void)
+	__releases(&cgroup_threadgroup_rwsem);
 
 void cgroup_lock_and_drain_offline(struct cgroup *cgrp);
 
diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
index 85d75152402d..f13ccab992c7 100644
--- a/kernel/cgroup/cgroup-v1.c
+++ b/kernel/cgroup/cgroup-v1.c
@@ -514,10 +514,58 @@ static int cgroup_pidlist_show(struct seq_file *s, void *v)
 	return 0;
 }
 
-static ssize_t cgroup_tasks_write(struct kernfs_open_file *of,
-				  char *buf, size_t nbytes, loff_t off)
+static ssize_t __cgroup1_procs_write(struct kernfs_open_file *of,
+				     char *buf, size_t nbytes, loff_t off,
+				     bool threadgroup)
 {
-	return __cgroup_procs_write(of, buf, nbytes, off, false);
+	struct cgroup *cgrp;
+	struct task_struct *task;
+	const struct cred *cred, *tcred;
+	ssize_t ret;
+
+	cgrp = cgroup_kn_lock_live(of->kn, false);
+	if (!cgrp)
+		return -ENODEV;
+
+	task = cgroup_procs_write_start(buf, threadgroup);
+	ret = PTR_ERR_OR_ZERO(task);
+	if (ret)
+		goto out_unlock;
+
+	/*
+	 * Even if we're attaching all tasks in the thread group, we only
+	 * need to check permissions on one of them.
+	 */
+	cred = current_cred();
+	tcred = get_task_cred(task);
+	if (!uid_eq(cred->euid, GLOBAL_ROOT_UID) &&
+	    !uid_eq(cred->euid, tcred->uid) &&
+	    !uid_eq(cred->euid, tcred->suid))
+		ret = -EACCES;
+	put_cred(tcred);
+	if (ret)
+		goto out_finish;
+
+	ret = cgroup_attach_task(cgrp, task, threadgroup);
+
+out_finish:
+	cgroup_procs_write_finish();
+out_unlock:
+	cgroup_kn_unlock(of->kn);
+
+	return ret ?: nbytes;
+}
+
+static ssize_t cgroup1_procs_write(struct kernfs_open_file *of,
+				   char *buf, size_t nbytes, loff_t off)
+{
+	return __cgroup1_procs_write(of, buf, nbytes, off, true);
+}
+
+static ssize_t cgroup1_tasks_write(struct kernfs_open_file *of,
+				   char *buf, size_t nbytes, loff_t off)
+{
+	return __cgroup1_procs_write(of, buf, nbytes, off, false);
 }
 
 static ssize_t cgroup_release_agent_write(struct kernfs_open_file *of,
@@ -596,7 +644,7 @@ struct cftype cgroup1_base_files[] = {
 		.seq_stop = cgroup_pidlist_stop,
 		.seq_show = cgroup_pidlist_show,
 		.private = CGROUP_FILE_PROCS,
-		.write = cgroup_procs_write,
+		.write = cgroup1_procs_write,
 	},
 	{
 		.name = "cgroup.clone_children",
@@ -615,7 +663,7 @@ struct cftype cgroup1_base_files[] = {
 		.seq_stop = cgroup_pidlist_stop,
 		.seq_show = cgroup_pidlist_show,
 		.private = CGROUP_FILE_TASKS,
-		.write = cgroup_tasks_write,
+		.write = cgroup1_tasks_write,
 	},
 	{
 		.name = "notify_on_release",
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index dcd120af4084..78a2c9788d40 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -1943,6 +1943,23 @@ int task_cgroup_path(struct task_struct *task, char *buf, size_t buflen)
 }
 EXPORT_SYMBOL_GPL(task_cgroup_path);
 
+static struct cgroup *cgroup_migrate_common_ancestor(struct task_struct *task,
+						     struct cgroup *dst_cgrp)
+{
+	struct cgroup *cgrp;
+
+	lockdep_assert_held(&cgroup_mutex);
+
+	spin_lock_irq(&css_set_lock);
+	cgrp = task_cgroup_from_root(task, &cgrp_dfl_root);
+	spin_unlock_irq(&css_set_lock);
+
+	while (!cgroup_is_descendant(dst_cgrp, cgrp))
+		cgrp = cgroup_parent(cgrp);
+
+	return cgrp;
+}
+
 /**
  * cgroup_migrate_add_task - add a migration target task to a migration context
  * @task: target task
@@ -2375,76 +2392,23 @@ int cgroup_attach_task(struct cgroup *dst_cgrp, struct task_struct *leader,
 	return ret;
 }
 
-static int cgroup_procs_write_permission(struct task_struct *task,
-					 struct cgroup *dst_cgrp,
-					 struct kernfs_open_file *of)
-{
-	int ret = 0;
-
-	if (cgroup_on_dfl(dst_cgrp)) {
-		struct super_block *sb = of->file->f_path.dentry->d_sb;
-		struct cgroup *cgrp;
-		struct inode *inode;
-
-		spin_lock_irq(&css_set_lock);
-		cgrp = task_cgroup_from_root(task, &cgrp_dfl_root);
-		spin_unlock_irq(&css_set_lock);
-
-		while (!cgroup_is_descendant(dst_cgrp, cgrp))
-			cgrp = cgroup_parent(cgrp);
-
-		ret = -ENOMEM;
-		inode = kernfs_get_inode(sb, cgrp->procs_file.kn);
-		if (inode) {
-			ret = inode_permission(inode, MAY_WRITE);
-			iput(inode);
-		}
-	} else {
-		const struct cred *cred = current_cred();
-		const struct cred *tcred = get_task_cred(task);
-
-		/*
-		 * even if we're attaching all tasks in the thread group,
-		 * we only need to check permissions on one of them.
-		 */
-		if (!uid_eq(cred->euid, GLOBAL_ROOT_UID) &&
-		    !uid_eq(cred->euid, tcred->uid) &&
-		    !uid_eq(cred->euid, tcred->suid))
-			ret = -EACCES;
-		put_cred(tcred);
-	}
-
-	return ret;
-}
-
-/*
- * Find the task_struct of the task to attach by vpid and pass it along to the
- * function to attach either it or all tasks in its threadgroup. Will lock
- * cgroup_mutex and threadgroup.
- */
-ssize_t __cgroup_procs_write(struct kernfs_open_file *of, char *buf,
-			     size_t nbytes, loff_t off, bool threadgroup)
+struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup)
+	__acquires(&cgroup_threadgroup_rwsem)
 {
 	struct task_struct *tsk;
-	struct cgroup_subsys *ss;
-	struct cgroup *cgrp;
 	pid_t pid;
-	int ssid, ret;
 
 	if (kstrtoint(strstrip(buf), 0, &pid) || pid < 0)
-		return -EINVAL;
-
-	cgrp = cgroup_kn_lock_live(of->kn, false);
-	if (!cgrp)
-		return -ENODEV;
+		return ERR_PTR(-EINVAL);
 
 	percpu_down_write(&cgroup_threadgroup_rwsem);
+
 	rcu_read_lock();
 	if (pid) {
 		tsk = find_task_by_vpid(pid);
 		if (!tsk) {
-			ret = -ESRCH;
-			goto out_unlock_rcu;
+			tsk = ERR_PTR(-ESRCH);
+			goto out_unlock_threadgroup;
 		}
 	} else {
 		tsk = current;
@@ -2460,35 +2424,30 @@ ssize_t __cgroup_procs_write(struct kernfs_open_file *of, char *buf,
 	 * cgroup with no rt_runtime allocated.  Just say no.
 	 */
 	if (tsk->no_cgroup_migration || (tsk->flags & PF_NO_SETAFFINITY)) {
-		ret = -EINVAL;
-		goto out_unlock_rcu;
+		tsk = ERR_PTR(-EINVAL);
+		goto out_unlock_threadgroup;
 	}
 
 	get_task_struct(tsk);
-	rcu_read_unlock();
-
-	ret = cgroup_procs_write_permission(tsk, cgrp, of);
-	if (!ret)
-		ret = cgroup_attach_task(cgrp, tsk, threadgroup);
-
-	put_task_struct(tsk);
-	goto out_unlock_threadgroup;
+	goto out_unlock_rcu;
 
+out_unlock_threadgroup:
+	percpu_up_write(&cgroup_threadgroup_rwsem);
 out_unlock_rcu:
 	rcu_read_unlock();
-out_unlock_threadgroup:
+	return tsk;
+}
+
+void cgroup_procs_write_finish(void)
+	__releases(&cgroup_threadgroup_rwsem)
+{
+	struct cgroup_subsys *ss;
+	int ssid;
+
 	percpu_up_write(&cgroup_threadgroup_rwsem);
 	for_each_subsys(ss, ssid)
 		if (ss->post_attach)
 			ss->post_attach();
-	cgroup_kn_unlock(of->kn);
-	return ret ?: nbytes;
-}
-
-ssize_t cgroup_procs_write(struct kernfs_open_file *of, char *buf, size_t nbytes,
-			   loff_t off)
-{
-	return __cgroup_procs_write(of, buf, nbytes, off, true);
 }
 
 static void cgroup_print_ss_mask(struct seq_file *seq, u16 ss_mask)
@@ -3793,6 +3752,54 @@ static int cgroup_procs_show(struct seq_file *s, void *v)
 	return 0;
 }
 
+static int cgroup_procs_write_permission(struct cgroup *cgrp,
+					 struct super_block *sb)
+{
+	struct inode *inode;
+	int ret;
+
+	inode = kernfs_get_inode(sb, cgrp->procs_file.kn);
+	if (!inode)
+		return -ENOMEM;
+
+	ret = inode_permission(inode, MAY_WRITE);
+	iput(inode);
+	return ret;
+}
+
+static ssize_t cgroup_procs_write(struct kernfs_open_file *of,
+				  char *buf, size_t nbytes, loff_t off)
+{
+	struct cgroup *cgrp, *common_ancestor;
+	struct task_struct *task;
+	ssize_t ret;
+
+	cgrp = cgroup_kn_lock_live(of->kn, false);
+	if (!cgrp)
+		return -ENODEV;
+
+	task = cgroup_procs_write_start(buf, true);
+	ret = PTR_ERR_OR_ZERO(task);
+	if (ret)
+		goto out_unlock;
+
+	common_ancestor = cgroup_migrate_common_ancestor(task, cgrp);
+
+	ret = cgroup_procs_write_permission(common_ancestor,
+					    of->file->f_path.dentry->d_sb);
+	if (ret)
+		goto out_finish;
+
+	ret = cgroup_attach_task(cgrp, task, true);
+
+out_finish:
+	cgroup_procs_write_finish();
+out_unlock:
+	cgroup_kn_unlock(of->kn);
+
+	return ret ?: nbytes;
+}
+
 /* cgroup core interface files for the default hierarchy */
 static struct cftype cgroup_base_files[] = {
 	{
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 03/10] cgroup: Fix reference counting bug in cgroup_procs_write()
  2017-06-10 14:03 ` Tejun Heo
                   ` (2 preceding siblings ...)
  (?)
@ 2017-06-10 14:03 ` Tejun Heo
  -1 siblings, 0 replies; 47+ messages in thread
From: Tejun Heo @ 2017-06-10 14:03 UTC (permalink / raw)
  To: Li Zefan, hannes, peterz, mingo, longman
  Cc: cgroups, linux-kernel, kernel-team, pjt, luto, efault, torvalds,
	Tejun Heo

From: Waiman Long <longman@redhat.com>

The cgroup_procs_write_start() took a reference to the task structure
which was not properly released within cgroup_procs_write() and so
on. So a put_task_struct() call is added to cgroup_procs_write_finish()
to match the get_task_struct() in cgroup_procs_write_start() to fix
this reference counting error.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/cgroup/cgroup-internal.h | 2 +-
 kernel/cgroup/cgroup-v1.c       | 2 +-
 kernel/cgroup/cgroup.c          | 8 +++++---
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/kernel/cgroup/cgroup-internal.h b/kernel/cgroup/cgroup-internal.h
index f0a0dba97bad..2c8e3a949fc5 100644
--- a/kernel/cgroup/cgroup-internal.h
+++ b/kernel/cgroup/cgroup-internal.h
@@ -182,7 +182,7 @@ int cgroup_attach_task(struct cgroup *dst_cgrp, struct task_struct *leader,
 		       bool threadgroup);
 struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup)
 	__acquires(&cgroup_threadgroup_rwsem);
-void cgroup_procs_write_finish(void)
+void cgroup_procs_write_finish(struct task_struct *task)
 	__releases(&cgroup_threadgroup_rwsem);
 
 void cgroup_lock_and_drain_offline(struct cgroup *cgrp);
diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
index f13ccab992c7..f6dba423e8ff 100644
--- a/kernel/cgroup/cgroup-v1.c
+++ b/kernel/cgroup/cgroup-v1.c
@@ -549,7 +549,7 @@ static ssize_t __cgroup1_procs_write(struct kernfs_open_file *of,
 	ret = cgroup_attach_task(cgrp, task, threadgroup);
 
 out_finish:
-	cgroup_procs_write_finish();
+	cgroup_procs_write_finish(task);
 out_unlock:
 	cgroup_kn_unlock(of->kn);
 
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 78a2c9788d40..ddcbfda642cd 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -2438,12 +2438,15 @@ struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup)
 	return tsk;
 }
 
-void cgroup_procs_write_finish(void)
+void cgroup_procs_write_finish(struct task_struct *task)
 	__releases(&cgroup_threadgroup_rwsem)
 {
 	struct cgroup_subsys *ss;
 	int ssid;
 
+	/* release reference from cgroup_procs_write_start() */
+	put_task_struct(task);
+
 	percpu_up_write(&cgroup_threadgroup_rwsem);
 	for_each_subsys(ss, ssid)
 		if (ss->post_attach)
@@ -3102,7 +3105,6 @@ static int cgroup_addrm_files(struct cgroup_subsys_state *css,
 
 static int cgroup_apply_cftypes(struct cftype *cfts, bool is_add)
 {
-	LIST_HEAD(pending);
 	struct cgroup_subsys *ss = cfts[0].ss;
 	struct cgroup *root = &ss->root->cgrp;
 	struct cgroup_subsys_state *css;
@@ -3793,7 +3795,7 @@ static ssize_t cgroup_procs_write(struct kernfs_open_file *of,
 	ret = cgroup_attach_task(cgrp, task, true);
 
 out_finish:
-	cgroup_procs_write_finish();
+	cgroup_procs_write_finish(task);
 out_unlock:
 	cgroup_kn_unlock(of->kn);
 
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 04/10] cgroup: add @flags to css_task_iter_start() and implement CSS_TASK_ITER_PROCS
  2017-06-10 14:03 ` Tejun Heo
                   ` (3 preceding siblings ...)
  (?)
@ 2017-06-10 14:03 ` Tejun Heo
  -1 siblings, 0 replies; 47+ messages in thread
From: Tejun Heo @ 2017-06-10 14:03 UTC (permalink / raw)
  To: Li Zefan, hannes, peterz, mingo, longman
  Cc: cgroups, linux-kernel, kernel-team, pjt, luto, efault, torvalds,
	Tejun Heo

css_task_iter currently always walks all tasks.  With the scheduled
cgroup v2 thread support, the iterator would need to handle multiple
types of iteration.  As a preparation, add @flags to
css_task_iter_start() and implement CSS_TASK_ITER_PROCS.  If the flag
is not specified, it walks all tasks as before.  When asserted, the
iterator only walks the group leaders.

For now, the only user of the flag is cgroup v2 "cgroup.procs" file
which no longer needs to skip non-leader tasks in cgroup_procs_next().
Note that cgroup v1 "cgroup.procs" can't use the group leader walk as
v1 "cgroup.procs" doesn't mean "list all thread group leaders in the
cgroup" but "list all thread group id's with any threads in the
cgroup".

While at it, update cgroup_procs_show() to use task_pid_vnr() instead
of task_tgid_vnr().  As the iteration guarantees that the function
only sees group leaders, this doesn't change the output and will allow
sharing the function for thread iteration.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/cgroup.h       |  6 +++++-
 kernel/cgroup/cgroup-v1.c    |  6 +++---
 kernel/cgroup/cgroup.c       | 24 ++++++++++++++----------
 kernel/cgroup/cpuset.c       |  6 +++---
 kernel/cgroup/freezer.c      |  6 +++---
 mm/memcontrol.c              |  2 +-
 net/core/netclassid_cgroup.c |  2 +-
 7 files changed, 30 insertions(+), 22 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 710a005c6b7a..8755eb5ece18 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -36,9 +36,13 @@
 #define CGROUP_WEIGHT_DFL		100
 #define CGROUP_WEIGHT_MAX		10000
 
+/* walk only threadgroup leaders */
+#define CSS_TASK_ITER_PROCS		(1U << 0)
+
 /* a css_task_iter should be treated as an opaque object */
 struct css_task_iter {
 	struct cgroup_subsys		*ss;
+	unsigned int			flags;
 
 	struct list_head		*cset_pos;
 	struct list_head		*cset_head;
@@ -129,7 +133,7 @@ struct task_struct *cgroup_taskset_first(struct cgroup_taskset *tset,
 struct task_struct *cgroup_taskset_next(struct cgroup_taskset *tset,
 					struct cgroup_subsys_state **dst_cssp);
 
-void css_task_iter_start(struct cgroup_subsys_state *css,
+void css_task_iter_start(struct cgroup_subsys_state *css, unsigned int flags,
 			 struct css_task_iter *it);
 struct task_struct *css_task_iter_next(struct css_task_iter *it);
 void css_task_iter_end(struct css_task_iter *it);
diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
index f6dba423e8ff..1e101b94f355 100644
--- a/kernel/cgroup/cgroup-v1.c
+++ b/kernel/cgroup/cgroup-v1.c
@@ -121,7 +121,7 @@ int cgroup_transfer_tasks(struct cgroup *to, struct cgroup *from)
 	 * ->can_attach() fails.
 	 */
 	do {
-		css_task_iter_start(&from->self, &it);
+		css_task_iter_start(&from->self, 0, &it);
 		task = css_task_iter_next(&it);
 		if (task)
 			get_task_struct(task);
@@ -377,7 +377,7 @@ static int pidlist_array_load(struct cgroup *cgrp, enum cgroup_filetype type,
 	if (!array)
 		return -ENOMEM;
 	/* now, populate the array */
-	css_task_iter_start(&cgrp->self, &it);
+	css_task_iter_start(&cgrp->self, 0, &it);
 	while ((tsk = css_task_iter_next(&it))) {
 		if (unlikely(n == length))
 			break;
@@ -753,7 +753,7 @@ int cgroupstats_build(struct cgroupstats *stats, struct dentry *dentry)
 	}
 	rcu_read_unlock();
 
-	css_task_iter_start(&cgrp->self, &it);
+	css_task_iter_start(&cgrp->self, 0, &it);
 	while ((tsk = css_task_iter_next(&it))) {
 		switch (tsk->state) {
 		case TASK_RUNNING:
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index ddcbfda642cd..6efd44cfec22 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -3602,6 +3602,7 @@ static void css_task_iter_advance(struct css_task_iter *it)
 	lockdep_assert_held(&css_set_lock);
 	WARN_ON_ONCE(!l);
 
+repeat:
 	/*
 	 * Advance iterator to find next entry.  cset->tasks is consumed
 	 * first and then ->mg_tasks.  After ->mg_tasks, we move onto the
@@ -3616,11 +3617,18 @@ static void css_task_iter_advance(struct css_task_iter *it)
 		css_task_iter_advance_css_set(it);
 	else
 		it->task_pos = l;
+
+	/* if PROCS, skip over tasks which aren't group leaders */
+	if ((it->flags & CSS_TASK_ITER_PROCS) && it->task_pos &&
+	    !thread_group_leader(list_entry(it->task_pos, struct task_struct,
+					    cg_list)))
+		goto repeat;
 }
 
 /**
  * css_task_iter_start - initiate task iteration
  * @css: the css to walk tasks of
+ * @flags: CSS_TASK_ITER_* flags
  * @it: the task iterator to use
  *
  * Initiate iteration through the tasks of @css.  The caller can call
@@ -3628,7 +3636,7 @@ static void css_task_iter_advance(struct css_task_iter *it)
  * returns NULL.  On completion of iteration, css_task_iter_end() must be
  * called.
  */
-void css_task_iter_start(struct cgroup_subsys_state *css,
+void css_task_iter_start(struct cgroup_subsys_state *css, unsigned int flags,
 			 struct css_task_iter *it)
 {
 	/* no one should try to iterate before mounting cgroups */
@@ -3639,6 +3647,7 @@ void css_task_iter_start(struct cgroup_subsys_state *css,
 	spin_lock_irq(&css_set_lock);
 
 	it->ss = css->ss;
+	it->flags = flags;
 
 	if (it->ss)
 		it->cset_pos = &css->cgroup->e_csets[css->ss->id];
@@ -3712,13 +3721,8 @@ static void *cgroup_procs_next(struct seq_file *s, void *v, loff_t *pos)
 {
 	struct kernfs_open_file *of = s->private;
 	struct css_task_iter *it = of->priv;
-	struct task_struct *task;
-
-	do {
-		task = css_task_iter_next(it);
-	} while (task && !thread_group_leader(task));
 
-	return task;
+	return css_task_iter_next(it);
 }
 
 static void *cgroup_procs_start(struct seq_file *s, loff_t *pos)
@@ -3739,10 +3743,10 @@ static void *cgroup_procs_start(struct seq_file *s, loff_t *pos)
 		if (!it)
 			return ERR_PTR(-ENOMEM);
 		of->priv = it;
-		css_task_iter_start(&cgrp->self, it);
+		css_task_iter_start(&cgrp->self, CSS_TASK_ITER_PROCS, it);
 	} else if (!(*pos)++) {
 		css_task_iter_end(it);
-		css_task_iter_start(&cgrp->self, it);
+		css_task_iter_start(&cgrp->self, CSS_TASK_ITER_PROCS, it);
 	}
 
 	return cgroup_procs_next(s, NULL, NULL);
@@ -3750,7 +3754,7 @@ static void *cgroup_procs_start(struct seq_file *s, loff_t *pos)
 
 static int cgroup_procs_show(struct seq_file *s, void *v)
 {
-	seq_printf(s, "%d\n", task_tgid_vnr(v));
+	seq_printf(s, "%d\n", task_pid_vnr(v));
 	return 0;
 }
 
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index ae643412948a..deca1abba66d 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -861,7 +861,7 @@ static void update_tasks_cpumask(struct cpuset *cs)
 	struct css_task_iter it;
 	struct task_struct *task;
 
-	css_task_iter_start(&cs->css, &it);
+	css_task_iter_start(&cs->css, 0, &it);
 	while ((task = css_task_iter_next(&it)))
 		set_cpus_allowed_ptr(task, cs->effective_cpus);
 	css_task_iter_end(&it);
@@ -1106,7 +1106,7 @@ static void update_tasks_nodemask(struct cpuset *cs)
 	 * It's ok if we rebind the same mm twice; mpol_rebind_mm()
 	 * is idempotent.  Also migrate pages in each mm to new nodes.
 	 */
-	css_task_iter_start(&cs->css, &it);
+	css_task_iter_start(&cs->css, 0, &it);
 	while ((task = css_task_iter_next(&it))) {
 		struct mm_struct *mm;
 		bool migrate;
@@ -1299,7 +1299,7 @@ static void update_tasks_flags(struct cpuset *cs)
 	struct css_task_iter it;
 	struct task_struct *task;
 
-	css_task_iter_start(&cs->css, &it);
+	css_task_iter_start(&cs->css, 0, &it);
 	while ((task = css_task_iter_next(&it)))
 		cpuset_update_task_spread_flag(cs, task);
 	css_task_iter_end(&it);
diff --git a/kernel/cgroup/freezer.c b/kernel/cgroup/freezer.c
index 1b72d56edce5..08236798d173 100644
--- a/kernel/cgroup/freezer.c
+++ b/kernel/cgroup/freezer.c
@@ -268,7 +268,7 @@ static void update_if_frozen(struct cgroup_subsys_state *css)
 	rcu_read_unlock();
 
 	/* are all tasks frozen? */
-	css_task_iter_start(css, &it);
+	css_task_iter_start(css, 0, &it);
 
 	while ((task = css_task_iter_next(&it))) {
 		if (freezing(task)) {
@@ -320,7 +320,7 @@ static void freeze_cgroup(struct freezer *freezer)
 	struct css_task_iter it;
 	struct task_struct *task;
 
-	css_task_iter_start(&freezer->css, &it);
+	css_task_iter_start(&freezer->css, 0, &it);
 	while ((task = css_task_iter_next(&it)))
 		freeze_task(task);
 	css_task_iter_end(&it);
@@ -331,7 +331,7 @@ static void unfreeze_cgroup(struct freezer *freezer)
 	struct css_task_iter it;
 	struct task_struct *task;
 
-	css_task_iter_start(&freezer->css, &it);
+	css_task_iter_start(&freezer->css, 0, &it);
 	while ((task = css_task_iter_next(&it)))
 		__thaw_task(task);
 	css_task_iter_end(&it);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 94172089f52f..50aaf770905b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -917,7 +917,7 @@ int mem_cgroup_scan_tasks(struct mem_cgroup *memcg,
 		struct css_task_iter it;
 		struct task_struct *task;
 
-		css_task_iter_start(&iter->css, &it);
+		css_task_iter_start(&iter->css, 0, &it);
 		while (!ret && (task = css_task_iter_next(&it)))
 			ret = fn(task, arg);
 		css_task_iter_end(&it);
diff --git a/net/core/netclassid_cgroup.c b/net/core/netclassid_cgroup.c
index 029a61ac6cdd..5e4f04004a49 100644
--- a/net/core/netclassid_cgroup.c
+++ b/net/core/netclassid_cgroup.c
@@ -100,7 +100,7 @@ static int write_classid(struct cgroup_subsys_state *css, struct cftype *cft,
 
 	cs->classid = (u32)value;
 
-	css_task_iter_start(css, &it);
+	css_task_iter_start(css, 0, &it);
 	while ((p = css_task_iter_next(&it))) {
 		task_lock(p);
 		iterate_fd(p->files, 0, update_classid_sock,
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 05/10] cgroup: introduce cgroup->proc_cgrp and threaded css_set handling
  2017-06-10 14:03 ` Tejun Heo
                   ` (4 preceding siblings ...)
  (?)
@ 2017-06-10 14:03 ` Tejun Heo
  -1 siblings, 0 replies; 47+ messages in thread
From: Tejun Heo @ 2017-06-10 14:03 UTC (permalink / raw)
  To: Li Zefan, hannes, peterz, mingo, longman
  Cc: cgroups, linux-kernel, kernel-team, pjt, luto, efault, torvalds,
	Tejun Heo

cgroup v2 is in the process of growing thread granularity support.
Once thread mode is enabled, the root cgroup of the subtree serves as
the proc_cgrp to which the processes of the subtree conceptually
belong and domain-level resource consumptions not tied to any specific
task are charged.  In the subtree, threads won't be subject to process
granularity or no-internal-task constraint and can be distributed
arbitrarily across the subtree.

This patch introduces cgroup->proc_cgrp along with threaded css_set
handling.

* cgroup->proc_cgrp is NULL if !threaded.  If threaded, points to the
  proc_cgrp (root of the threaded subtree).

* css_set->proc_cset points to self if !threaded.  If threaded, points
  to the css_set which belongs to the cgrp->proc_cgrp.  The proc_cgrp
  serves as the resource domain and needs the matching csses readily
  available.  The proc_cset holds those csses and makes them easily
  accessible.

* All threaded csets are linked on their proc_csets to enable
  iteration of all threaded tasks.

This patch adds the above but doesn't actually use them yet.  The
following patches will build on top.

v2: Added cgroup_is_threaded() helper.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/cgroup-defs.h | 22 +++++++++++
 kernel/cgroup/cgroup.c      | 93 ++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 109 insertions(+), 6 deletions(-)

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index ec47101cb1bf..471773792557 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -163,6 +163,15 @@ struct css_set {
 	/* reference count */
 	refcount_t refcount;
 
+	/*
+	 * If not threaded, the following points to self.  If threaded, to
+	 * a cset which belongs to the top cgroup of the threaded subtree.
+	 * The proc_cset provides access to the process cgroup and its
+	 * csses to which domain level resource consumptions should be
+	 * charged.
+	 */
+	struct css_set __rcu *proc_cset;
+
 	/* the default cgroup associated with this css_set */
 	struct cgroup *dfl_cgrp;
 
@@ -188,6 +197,10 @@ struct css_set {
 	 */
 	struct list_head e_cset_node[CGROUP_SUBSYS_COUNT];
 
+	/* all csets whose ->proc_cset points to this cset */
+	struct list_head threaded_csets;
+	struct list_head threaded_csets_node;
+
 	/*
 	 * List running through all cgroup groups in the same hash
 	 * slot. Protected by css_set_lock
@@ -294,6 +307,15 @@ struct cgroup {
 	struct list_head e_csets[CGROUP_SUBSYS_COUNT];
 
 	/*
+	 * If !threaded, NULL.  If threaded, it points to the top cgroup of
+	 * the threaded subtree, on which it points to self.  Threaded
+	 * subtree is exempt from process granularity and no-internal-task
+	 * constraint.  Domain level resource consumptions which aren't
+	 * tied to a specific task should be charged to the proc_cgrp.
+	 */
+	struct cgroup *proc_cgrp;
+
+	/*
 	 * list of pidlists, up to two for each namespace (one for procs, one
 	 * for tasks); created on demand.
 	 */
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 6efd44cfec22..0fa4ffe84933 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -325,6 +325,12 @@ static struct cgroup *cgroup_parent(struct cgroup *cgrp)
 	return NULL;
 }
 
+/* is @cgrp threaded? regardless of mixed / root / member? */
+static bool cgroup_is_threaded(struct cgroup *cgrp)
+{
+	return cgrp->proc_cgrp;
+}
+
 /* subsystems visibly enabled on a cgroup */
 static u16 cgroup_control(struct cgroup *cgrp)
 {
@@ -560,9 +566,11 @@ EXPORT_SYMBOL_GPL(of_css);
  */
 struct css_set init_css_set = {
 	.refcount		= REFCOUNT_INIT(1),
+	.proc_cset		= RCU_INITIALIZER(&init_css_set),
 	.tasks			= LIST_HEAD_INIT(init_css_set.tasks),
 	.mg_tasks		= LIST_HEAD_INIT(init_css_set.mg_tasks),
 	.task_iters		= LIST_HEAD_INIT(init_css_set.task_iters),
+	.threaded_csets		= LIST_HEAD_INIT(init_css_set.threaded_csets),
 	.cgrp_links		= LIST_HEAD_INIT(init_css_set.cgrp_links),
 	.mg_preload_node	= LIST_HEAD_INIT(init_css_set.mg_preload_node),
 	.mg_node		= LIST_HEAD_INIT(init_css_set.mg_node),
@@ -570,6 +578,17 @@ struct css_set init_css_set = {
 
 static int css_set_count	= 1;	/* 1 for init_css_set */
 
+static struct css_set *proc_css_set(struct css_set *cset)
+{
+	return rcu_dereference_protected(cset->proc_cset,
+					 lockdep_is_held(&css_set_lock));
+}
+
+static bool css_set_threaded(struct css_set *cset)
+{
+	return proc_css_set(cset) != cset;
+}
+
 /**
  * css_set_populated - does a css_set contain any tasks?
  * @cset: target css_set
@@ -756,6 +775,8 @@ void put_css_set_locked(struct css_set *cset)
 	if (!refcount_dec_and_test(&cset->refcount))
 		return;
 
+	WARN_ON_ONCE(!list_empty(&cset->threaded_csets));
+
 	/* This css_set is dead. unlink it and release cgroup and css refs */
 	for_each_subsys(ss, ssid) {
 		list_del(&cset->e_cset_node[ssid]);
@@ -772,6 +793,11 @@ void put_css_set_locked(struct css_set *cset)
 		kfree(link);
 	}
 
+	if (css_set_threaded(cset)) {
+		list_del(&cset->threaded_csets_node);
+		put_css_set_locked(proc_css_set(cset));
+	}
+
 	kfree_rcu(cset, rcu_head);
 }
 
@@ -781,6 +807,7 @@ void put_css_set_locked(struct css_set *cset)
  * @old_cset: existing css_set for a task
  * @new_cgrp: cgroup that's being entered by the task
  * @template: desired set of css pointers in css_set (pre-calculated)
+ * @for_pcset: the comparison is for a new proc_cset
  *
  * Returns true if "cset" matches "old_cset" except for the hierarchy
  * which "new_cgrp" belongs to, for which it should match "new_cgrp".
@@ -788,7 +815,8 @@ void put_css_set_locked(struct css_set *cset)
 static bool compare_css_sets(struct css_set *cset,
 			     struct css_set *old_cset,
 			     struct cgroup *new_cgrp,
-			     struct cgroup_subsys_state *template[])
+			     struct cgroup_subsys_state *template[],
+			     bool for_pcset)
 {
 	struct list_head *l1, *l2;
 
@@ -800,6 +828,32 @@ static bool compare_css_sets(struct css_set *cset,
 	if (memcmp(template, cset->subsys, sizeof(cset->subsys)))
 		return false;
 
+	if (for_pcset) {
+		/*
+		 * We're looking for the pcset of @old_cset.  As @old_cset
+		 * doesn't have its ->proc_cset pointer set yet (we're
+		 * trying to find out what to set it to), @old_cset itself
+		 * may seem like a match here.  Explicitly exlude identity
+		 * matching.
+		 */
+		if (css_set_threaded(cset) || cset == old_cset)
+			return false;
+	} else {
+		bool is_threaded;
+
+		/*
+		 * Otherwise, @cset's threaded state should match the
+		 * default cgroup's.
+		 */
+		if (cgroup_on_dfl(new_cgrp))
+			is_threaded = cgroup_is_threaded(new_cgrp);
+		else
+			is_threaded = cgroup_is_threaded(old_cset->dfl_cgrp);
+
+		if (is_threaded != css_set_threaded(cset))
+			return false;
+	}
+
 	/*
 	 * Compare cgroup pointers in order to distinguish between
 	 * different cgroups in hierarchies.  As different cgroups may
@@ -852,10 +906,12 @@ static bool compare_css_sets(struct css_set *cset,
  * @old_cset: the css_set that we're using before the cgroup transition
  * @cgrp: the cgroup that we're moving into
  * @template: out param for the new set of csses, should be clear on entry
+ * @for_pcset: looking for a new proc_cset
  */
 static struct css_set *find_existing_css_set(struct css_set *old_cset,
 					struct cgroup *cgrp,
-					struct cgroup_subsys_state *template[])
+					struct cgroup_subsys_state *template[],
+					bool for_pcset)
 {
 	struct cgroup_root *root = cgrp->root;
 	struct cgroup_subsys *ss;
@@ -886,7 +942,7 @@ static struct css_set *find_existing_css_set(struct css_set *old_cset,
 
 	key = css_set_hash(template);
 	hash_for_each_possible(css_set_table, cset, hlist, key) {
-		if (!compare_css_sets(cset, old_cset, cgrp, template))
+		if (!compare_css_sets(cset, old_cset, cgrp, template, for_pcset))
 			continue;
 
 		/* This css_set matches what we need */
@@ -968,12 +1024,13 @@ static void link_css_set(struct list_head *tmp_links, struct css_set *cset,
  * find_css_set - return a new css_set with one cgroup updated
  * @old_cset: the baseline css_set
  * @cgrp: the cgroup to be updated
+ * @for_pcset: looking for a new proc_cset
  *
  * Return a new css_set that's equivalent to @old_cset, but with @cgrp
  * substituted into the appropriate hierarchy.
  */
 static struct css_set *find_css_set(struct css_set *old_cset,
-				    struct cgroup *cgrp)
+				    struct cgroup *cgrp, bool for_pcset)
 {
 	struct cgroup_subsys_state *template[CGROUP_SUBSYS_COUNT] = { };
 	struct css_set *cset;
@@ -988,7 +1045,7 @@ static struct css_set *find_css_set(struct css_set *old_cset,
 	/* First see if we already have a cgroup group that matches
 	 * the desired set */
 	spin_lock_irq(&css_set_lock);
-	cset = find_existing_css_set(old_cset, cgrp, template);
+	cset = find_existing_css_set(old_cset, cgrp, template, for_pcset);
 	if (cset)
 		get_css_set(cset);
 	spin_unlock_irq(&css_set_lock);
@@ -1007,9 +1064,11 @@ static struct css_set *find_css_set(struct css_set *old_cset,
 	}
 
 	refcount_set(&cset->refcount, 1);
+	RCU_INIT_POINTER(cset->proc_cset, cset);
 	INIT_LIST_HEAD(&cset->tasks);
 	INIT_LIST_HEAD(&cset->mg_tasks);
 	INIT_LIST_HEAD(&cset->task_iters);
+	INIT_LIST_HEAD(&cset->threaded_csets);
 	INIT_HLIST_NODE(&cset->hlist);
 	INIT_LIST_HEAD(&cset->cgrp_links);
 	INIT_LIST_HEAD(&cset->mg_preload_node);
@@ -1047,6 +1106,28 @@ static struct css_set *find_css_set(struct css_set *old_cset,
 
 	spin_unlock_irq(&css_set_lock);
 
+	/*
+	 * If @cset should be threaded, look up the matching proc_cset and
+	 * link them up.  We first fully initialize @cset then look for the
+	 * pcset.  It's simpler this way and safe as @cset is guaranteed to
+	 * stay empty until we return.
+	 */
+	if (!for_pcset && cgroup_is_threaded(cset->dfl_cgrp)) {
+		struct css_set *pcset;
+
+		pcset = find_css_set(cset, cset->dfl_cgrp->proc_cgrp, true);
+		if (!pcset) {
+			put_css_set(cset);
+			return NULL;
+		}
+
+		spin_lock_irq(&css_set_lock);
+		rcu_assign_pointer(cset->proc_cset, pcset);
+		list_add_tail(&cset->threaded_csets_node,
+			      &pcset->threaded_csets);
+		spin_unlock_irq(&css_set_lock);
+	}
+
 	return cset;
 }
 
@@ -2268,7 +2349,7 @@ int cgroup_migrate_prepare_dst(struct cgroup_mgctx *mgctx)
 		struct cgroup_subsys *ss;
 		int ssid;
 
-		dst_cset = find_css_set(src_cset, src_cset->mg_dst_cgrp);
+		dst_cset = find_css_set(src_cset, src_cset->mg_dst_cgrp, false);
 		if (!dst_cset)
 			goto err;
 
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 06/10] cgroup: implement CSS_TASK_ITER_THREADED
@ 2017-06-10 14:03   ` Tejun Heo
  0 siblings, 0 replies; 47+ messages in thread
From: Tejun Heo @ 2017-06-10 14:03 UTC (permalink / raw)
  To: Li Zefan, hannes, peterz, mingo, longman
  Cc: cgroups, linux-kernel, kernel-team, pjt, luto, efault, torvalds,
	Tejun Heo

cgroup v2 is in the process of growing thread granularity support.
Once thread mode is enabled, the root cgroup of the subtree serves as
the proc_cgrp to which the processes of the subtree conceptually
belong and domain-level resource consumptions not tied to any specific
task are charged.  In the subtree, threads won't be subject to process
granularity or no-internal-task constraint and can be distributed
arbitrarily across the subtree.

This patch implements a new task iterator flag CSS_TASK_ITER_THREADED,
which, when used on a proc_cgrp, makes the iteration include the tasks
on all the associated threaded css_sets.  "cgroup.procs" read path is
updated to use it so that reading the file on a proc_cgrp lists all
processes.  This will also be used by controller implementations which
need to walk processes or tasks at the resource domain level.

Task iteration is implemented nested in css_set iteration.  If
CSS_TASK_ITER_THREADED is specified, after walking tasks of each
!threaded css_set, all the associated threaded css_sets are visited
before moving onto the next !threaded css_set.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/cgroup.h |  6 ++++
 kernel/cgroup/cgroup.c | 81 +++++++++++++++++++++++++++++++++++++++++---------
 2 files changed, 73 insertions(+), 14 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 8755eb5ece18..05da7cc9ea2d 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -38,6 +38,8 @@
 
 /* walk only threadgroup leaders */
 #define CSS_TASK_ITER_PROCS		(1U << 0)
+/* walk threaded css_sets as part of their proc_csets */
+#define CSS_TASK_ITER_THREADED		(1U << 1)
 
 /* a css_task_iter should be treated as an opaque object */
 struct css_task_iter {
@@ -47,11 +49,15 @@ struct css_task_iter {
 	struct list_head		*cset_pos;
 	struct list_head		*cset_head;
 
+	struct list_head		*tcset_pos;
+	struct list_head		*tcset_head;
+
 	struct list_head		*task_pos;
 	struct list_head		*tasks_head;
 	struct list_head		*mg_tasks_head;
 
 	struct css_set			*cur_cset;
+	struct css_set			*cur_pcset;
 	struct task_struct		*cur_task;
 	struct list_head		iters_node;	/* css_set->task_iters */
 };
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 0fa4ffe84933..765c1c27c879 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -3610,27 +3610,36 @@ bool css_has_online_children(struct cgroup_subsys_state *css)
 	return ret;
 }
 
-/**
- * css_task_iter_advance_css_set - advance a task itererator to the next css_set
- * @it: the iterator to advance
- *
- * Advance @it to the next css_set to walk.
- */
-static void css_task_iter_advance_css_set(struct css_task_iter *it)
+static struct css_set *css_task_iter_next_css_set(struct css_task_iter *it)
 {
-	struct list_head *l = it->cset_pos;
+	bool threaded = it->flags & CSS_TASK_ITER_THREADED;
+	struct list_head *l;
 	struct cgrp_cset_link *link;
 	struct css_set *cset;
 
 	lockdep_assert_held(&css_set_lock);
 
-	/* Advance to the next non-empty css_set */
+	/* find the next threaded cset */
+	if (it->tcset_pos) {
+		l = it->tcset_pos->next;
+
+		if (l != it->tcset_head) {
+			it->tcset_pos = l;
+			return container_of(l, struct css_set,
+					    threaded_csets_node);
+		}
+
+		it->tcset_pos = NULL;
+	}
+
+	/* find the next cset */
+	l = it->cset_pos;
+
 	do {
 		l = l->next;
 		if (l == it->cset_head) {
 			it->cset_pos = NULL;
-			it->task_pos = NULL;
-			return;
+			return NULL;
 		}
 
 		if (it->ss) {
@@ -3640,10 +3649,50 @@ static void css_task_iter_advance_css_set(struct css_task_iter *it)
 			link = list_entry(l, struct cgrp_cset_link, cset_link);
 			cset = link->cset;
 		}
-	} while (!css_set_populated(cset));
+
+		/*
+		 * For threaded iterations, threaded csets are walked
+		 * together with their proc_csets.  Skip here.
+		 */
+	} while (threaded && css_set_threaded(cset));
 
 	it->cset_pos = l;
 
+	/* initialize threaded cset walking */
+	if (threaded) {
+		if (it->cur_pcset)
+			put_css_set_locked(it->cur_pcset);
+		it->cur_pcset = cset;
+		get_css_set(cset);
+
+		it->tcset_head = &cset->threaded_csets;
+		it->tcset_pos = &cset->threaded_csets;
+	}
+
+	return cset;
+}
+
+/**
+ * css_task_iter_advance_css_set - advance a task itererator to the next css_set
+ * @it: the iterator to advance
+ *
+ * Advance @it to the next css_set to walk.
+ */
+static void css_task_iter_advance_css_set(struct css_task_iter *it)
+{
+	struct css_set *cset;
+
+	lockdep_assert_held(&css_set_lock);
+
+	/* Advance to the next non-empty css_set */
+	do {
+		cset = css_task_iter_next_css_set(it);
+		if (!cset) {
+			it->task_pos = NULL;
+			return;
+		}
+	} while (!css_set_populated(cset));
+
 	if (!list_empty(&cset->tasks))
 		it->task_pos = cset->tasks.next;
 	else
@@ -3786,6 +3835,9 @@ void css_task_iter_end(struct css_task_iter *it)
 		spin_unlock_irq(&css_set_lock);
 	}
 
+	if (it->cur_pcset)
+		put_css_set(it->cur_pcset);
+
 	if (it->cur_task)
 		put_task_struct(it->cur_task);
 }
@@ -3811,6 +3863,7 @@ static void *cgroup_procs_start(struct seq_file *s, loff_t *pos)
 	struct kernfs_open_file *of = s->private;
 	struct cgroup *cgrp = seq_css(s)->cgroup;
 	struct css_task_iter *it = of->priv;
+	unsigned iter_flags = CSS_TASK_ITER_PROCS | CSS_TASK_ITER_THREADED;
 
 	/*
 	 * When a seq_file is seeked, it's always traversed sequentially
@@ -3824,10 +3877,10 @@ static void *cgroup_procs_start(struct seq_file *s, loff_t *pos)
 		if (!it)
 			return ERR_PTR(-ENOMEM);
 		of->priv = it;
-		css_task_iter_start(&cgrp->self, CSS_TASK_ITER_PROCS, it);
+		css_task_iter_start(&cgrp->self, iter_flags, it);
 	} else if (!(*pos)++) {
 		css_task_iter_end(it);
-		css_task_iter_start(&cgrp->self, CSS_TASK_ITER_PROCS, it);
+		css_task_iter_start(&cgrp->self, iter_flags, it);
 	}
 
 	return cgroup_procs_next(s, NULL, NULL);
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 06/10] cgroup: implement CSS_TASK_ITER_THREADED
@ 2017-06-10 14:03   ` Tejun Heo
  0 siblings, 0 replies; 47+ messages in thread
From: Tejun Heo @ 2017-06-10 14:03 UTC (permalink / raw)
  To: Li Zefan, hannes-druUgvl0LCNAfugRpC6u6w,
	peterz-wEGCiKHe2LqWVfeAwA7xHQ, mingo-H+wXaHxf7aLQT0dZR+AlfA,
	longman-H+wXaHxf7aLQT0dZR+AlfA
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg,
	pjt-hpIqsD4AKlfQT0dZR+AlfA, luto-kltTT9wpgjJwATOyAt5JVQ,
	efault-Mmb7MZpHnFY, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	Tejun Heo

cgroup v2 is in the process of growing thread granularity support.
Once thread mode is enabled, the root cgroup of the subtree serves as
the proc_cgrp to which the processes of the subtree conceptually
belong and domain-level resource consumptions not tied to any specific
task are charged.  In the subtree, threads won't be subject to process
granularity or no-internal-task constraint and can be distributed
arbitrarily across the subtree.

This patch implements a new task iterator flag CSS_TASK_ITER_THREADED,
which, when used on a proc_cgrp, makes the iteration include the tasks
on all the associated threaded css_sets.  "cgroup.procs" read path is
updated to use it so that reading the file on a proc_cgrp lists all
processes.  This will also be used by controller implementations which
need to walk processes or tasks at the resource domain level.

Task iteration is implemented nested in css_set iteration.  If
CSS_TASK_ITER_THREADED is specified, after walking tasks of each
!threaded css_set, all the associated threaded css_sets are visited
before moving onto the next !threaded css_set.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 include/linux/cgroup.h |  6 ++++
 kernel/cgroup/cgroup.c | 81 +++++++++++++++++++++++++++++++++++++++++---------
 2 files changed, 73 insertions(+), 14 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 8755eb5ece18..05da7cc9ea2d 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -38,6 +38,8 @@
 
 /* walk only threadgroup leaders */
 #define CSS_TASK_ITER_PROCS		(1U << 0)
+/* walk threaded css_sets as part of their proc_csets */
+#define CSS_TASK_ITER_THREADED		(1U << 1)
 
 /* a css_task_iter should be treated as an opaque object */
 struct css_task_iter {
@@ -47,11 +49,15 @@ struct css_task_iter {
 	struct list_head		*cset_pos;
 	struct list_head		*cset_head;
 
+	struct list_head		*tcset_pos;
+	struct list_head		*tcset_head;
+
 	struct list_head		*task_pos;
 	struct list_head		*tasks_head;
 	struct list_head		*mg_tasks_head;
 
 	struct css_set			*cur_cset;
+	struct css_set			*cur_pcset;
 	struct task_struct		*cur_task;
 	struct list_head		iters_node;	/* css_set->task_iters */
 };
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 0fa4ffe84933..765c1c27c879 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -3610,27 +3610,36 @@ bool css_has_online_children(struct cgroup_subsys_state *css)
 	return ret;
 }
 
-/**
- * css_task_iter_advance_css_set - advance a task itererator to the next css_set
- * @it: the iterator to advance
- *
- * Advance @it to the next css_set to walk.
- */
-static void css_task_iter_advance_css_set(struct css_task_iter *it)
+static struct css_set *css_task_iter_next_css_set(struct css_task_iter *it)
 {
-	struct list_head *l = it->cset_pos;
+	bool threaded = it->flags & CSS_TASK_ITER_THREADED;
+	struct list_head *l;
 	struct cgrp_cset_link *link;
 	struct css_set *cset;
 
 	lockdep_assert_held(&css_set_lock);
 
-	/* Advance to the next non-empty css_set */
+	/* find the next threaded cset */
+	if (it->tcset_pos) {
+		l = it->tcset_pos->next;
+
+		if (l != it->tcset_head) {
+			it->tcset_pos = l;
+			return container_of(l, struct css_set,
+					    threaded_csets_node);
+		}
+
+		it->tcset_pos = NULL;
+	}
+
+	/* find the next cset */
+	l = it->cset_pos;
+
 	do {
 		l = l->next;
 		if (l == it->cset_head) {
 			it->cset_pos = NULL;
-			it->task_pos = NULL;
-			return;
+			return NULL;
 		}
 
 		if (it->ss) {
@@ -3640,10 +3649,50 @@ static void css_task_iter_advance_css_set(struct css_task_iter *it)
 			link = list_entry(l, struct cgrp_cset_link, cset_link);
 			cset = link->cset;
 		}
-	} while (!css_set_populated(cset));
+
+		/*
+		 * For threaded iterations, threaded csets are walked
+		 * together with their proc_csets.  Skip here.
+		 */
+	} while (threaded && css_set_threaded(cset));
 
 	it->cset_pos = l;
 
+	/* initialize threaded cset walking */
+	if (threaded) {
+		if (it->cur_pcset)
+			put_css_set_locked(it->cur_pcset);
+		it->cur_pcset = cset;
+		get_css_set(cset);
+
+		it->tcset_head = &cset->threaded_csets;
+		it->tcset_pos = &cset->threaded_csets;
+	}
+
+	return cset;
+}
+
+/**
+ * css_task_iter_advance_css_set - advance a task itererator to the next css_set
+ * @it: the iterator to advance
+ *
+ * Advance @it to the next css_set to walk.
+ */
+static void css_task_iter_advance_css_set(struct css_task_iter *it)
+{
+	struct css_set *cset;
+
+	lockdep_assert_held(&css_set_lock);
+
+	/* Advance to the next non-empty css_set */
+	do {
+		cset = css_task_iter_next_css_set(it);
+		if (!cset) {
+			it->task_pos = NULL;
+			return;
+		}
+	} while (!css_set_populated(cset));
+
 	if (!list_empty(&cset->tasks))
 		it->task_pos = cset->tasks.next;
 	else
@@ -3786,6 +3835,9 @@ void css_task_iter_end(struct css_task_iter *it)
 		spin_unlock_irq(&css_set_lock);
 	}
 
+	if (it->cur_pcset)
+		put_css_set(it->cur_pcset);
+
 	if (it->cur_task)
 		put_task_struct(it->cur_task);
 }
@@ -3811,6 +3863,7 @@ static void *cgroup_procs_start(struct seq_file *s, loff_t *pos)
 	struct kernfs_open_file *of = s->private;
 	struct cgroup *cgrp = seq_css(s)->cgroup;
 	struct css_task_iter *it = of->priv;
+	unsigned iter_flags = CSS_TASK_ITER_PROCS | CSS_TASK_ITER_THREADED;
 
 	/*
 	 * When a seq_file is seeked, it's always traversed sequentially
@@ -3824,10 +3877,10 @@ static void *cgroup_procs_start(struct seq_file *s, loff_t *pos)
 		if (!it)
 			return ERR_PTR(-ENOMEM);
 		of->priv = it;
-		css_task_iter_start(&cgrp->self, CSS_TASK_ITER_PROCS, it);
+		css_task_iter_start(&cgrp->self, iter_flags, it);
 	} else if (!(*pos)++) {
 		css_task_iter_end(it);
-		css_task_iter_start(&cgrp->self, CSS_TASK_ITER_PROCS, it);
+		css_task_iter_start(&cgrp->self, iter_flags, it);
 	}
 
 	return cgroup_procs_next(s, NULL, NULL);
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 07/10] cgroup: implement cgroup v2 thread support
@ 2017-06-10 14:03   ` Tejun Heo
  0 siblings, 0 replies; 47+ messages in thread
From: Tejun Heo @ 2017-06-10 14:03 UTC (permalink / raw)
  To: Li Zefan, hannes, peterz, mingo, longman
  Cc: cgroups, linux-kernel, kernel-team, pjt, luto, efault, torvalds,
	Tejun Heo

This patch implements cgroup v2 thread support.  The goal of the
thread mode is supporting hierarchical accounting and control at
thread granularity while staying inside the resource domain model
which allows coordination across different resource controllers and
handling of anonymous resource consumptions.

Once thread mode is enabled on a cgroup, the threads of the processes
which are in its subtree can be placed inside the subtree without
being restricted by process granularity or no-internal-process
constraint.  Note that the threads aren't allowed to escape to a
different threaded subtree.  To be used inside a threaded subtree, a
controller should explicitly support threaded mode and be able to
handle internal competition in the way which is appropriate for the
resource.

The root of a threaded subtree, where thread mode is enabled in the
first place, is called the thread root and serves as the resource
domain for the whole subtree.  This is the last cgroup where
non-threaded controllers are operational and where all the
domain-level resource consumptions in the subtree are accounted.  This
allows threaded controllers to operate at thread granularity when
requested while staying inside the scope of system-level resource
distribution.

As the root cgroup is exempt from the no-internal-process constraint,
it can serve as both a thread root and a parent to normal cgroups.
The root cgroup supports mixed cgroup mode which can be enabled and
disabled anytime as long as there aren't any threaded children.  First
level child cgroups can selectively join the mixed threaded subtree.

Internally, in a threaded subtree, each css_set has its ->proc_cset
pointing to a matching css_set which belongs to the thread root.  This
ensures that thread root level cgroup_subsys_state for all threaded
controllers are readily accessible for domain-level operations.

This patch enables threaded mode for the pids and perf_events
controllers.  Neither has to worry about domain-level resource
consumptions and it's enough to simply set the flag.

For more details on the interface and behavior of the thread mode,
please refer to the section 2-2-2 in Documentation/cgroup-v2.txt added
by this patch.  Note that the documentation update is not complete as
the rest of the documentation needs to be updated accordingly.
Rolling those updates into this patch can be confusing so that will be
separate patches.

v2: - After discussions with Waiman, support for mixed thread mode is
      added.  This should address the issue that Peter pointed out
      where any nesting should be avoided for thread subtrees while
      coexisting with other domain cgroups.

    - Enabling / disabling thread mode now piggy backs on the existing
      control mask update mechanism.

    - Bug fixes and cleanup.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Waiman Long <longman@redhat.com>
---
 Documentation/cgroup-v2.txt |  99 +++++++++++++-
 include/linux/cgroup-defs.h |  12 ++
 kernel/cgroup/cgroup.c      | 323 ++++++++++++++++++++++++++++++++++++++++++--
 kernel/cgroup/pids.c        |   1 +
 kernel/events/core.c        |   1 +
 5 files changed, 420 insertions(+), 16 deletions(-)

diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
index dc5e2dcdbef4..96db84005cb2 100644
--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -16,7 +16,9 @@ CONTENTS
   1-2. What is cgroup?
 2. Basic Operations
   2-1. Mounting
-  2-2. Organizing Processes
+  2-2. Organizing Processes and Threads
+    2-2-1. Processes
+    2-2-2. Threads
   2-3. [Un]populated Notification
   2-4. Controlling Controllers
     2-4-1. Enabling and Disabling
@@ -150,7 +152,9 @@ and experimenting easier, the kernel parameter cgroup_no_v1= allows
 disabling controllers in v1 and make them always available in v2.
 
 
-2-2. Organizing Processes
+2-2. Organizing Processes and Threads
+
+2-2-1. Processes
 
 Initially, only the root cgroup exists to which all processes belong.
 A child cgroup can be created by creating a sub-directory.
@@ -201,6 +205,97 @@ is removed subsequently, " (deleted)" is appended to the path.
   0::/test-cgroup/test-cgroup-nested (deleted)
 
 
+2-2-2. Threads
+
+cgroup v2 supports thread granularity for a subset of controllers to
+support use cases requiring hierarchical resource distribution across
+the threads of a group of processes.  By default, all threads of a
+process belong to the same cgroup, which also serves as the resource
+domain to host resource consumptions which are not specific to a
+process or thread.  The thread mode allows threads to be spread across
+a subtree while still maintaining the common resource domain for them.
+
+Enabling thread mode on a subtree makes it threaded.  The root of a
+threaded subtree is called thread root and serves as the resource
+domain for the entire subtree.  In a threaded subtree, threads of a
+process can be put in different cgroups and are not subject to the no
+internal process constraint - threaded controllers can be enabled on
+non-leaf cgroups whether they have threads in them or not.
+
+Because the root cgroup is not subject to no internal process
+constraint, it can serve both as a thread root and a parent to normal
+cgroups.  This is called mixed thread mode.
+
+Thread mode can be enabled by writing "enable" to "cgroup.threads"
+file.
+
+  # echo enable > cgroup.threads
+
+On a non-root cgroup, to enable the thread mode, the following
+conditions must be met.
+
+- The thread root doesn't have any child cgroups.
+
+- The thread root doesn't have any controllers enabled.
+
+On the root cgroup, only the mixed thread mode is supported and there
+isn't any restriction on enabling it.  On enable, unlike the normal
+thread mode, the whole subtree is not turned into thread subtree.  The
+first level children have to explicitly join the thread subtree.  A
+cgroup can join the existing mixed threaded subtree by writing "join"
+to "cgroup.threads" file.
+
+  # echo join > cgroup.threads
+
+In addition to the usual enable conditions, the following extra
+condition must met for joining.
+
+- The first level child doesn't have any tasks.
+
+Inside a threaded subtree, "cgroup.threads" can be read and contains
+the list of the thread IDs of all threads in the cgroup.  Except that
+the operations are per-thread instead of per-process, "cgroup.threads"
+has the same format and behaves the same way as "cgroup.procs".
+
+The thread root serves as the resource domain for the whole subtree,
+and, while the threads can be scattered across the subtree, all the
+processes are considered to be in the thread root.  "cgroup.procs" in
+a thread root contains the PIDs of all processes in the subtree and is
+not readable in the subtree proper.  However, "cgroup.procs" can be
+written to from anywhere in the subtree to migrate all threads of the
+matching process to the cgroup.
+
+Only threaded controllers can be enabled in a threaded subtree.  When
+a threaded controller is enabled inside a threaded subtree, it only
+accounts for and controls resource consumptions associated with the
+threads in the cgroup and its descendants.  All consumptions which
+aren't tied to a specific thread belong to the thread root.
+
+Because a threaded subtree is exempt from no internal process
+constraint, a threaded controller must be able to handle competition
+between threads in a non-leaf cgroup and its child cgroups.  Each
+threaded controller defines how such competitions are handled.
+
+Thread mode can be disabled by writing "disable" to "cgroup.threads"
+file.
+
+  # echo disable > cgroup.threads
+
+Disabling requires the same conditions as enabling and the following
+extra.
+
+For a non-mixed threaded subtree:
+
+- The cgroup must be the thread root.  Thread mode can't be disabled
+  partially in the subtree.
+
+For the root cgroup:
+
+- There can't be any child cgroup which is a part of the mixed thread
+  subtree.  All first level children must be either non-threaded or
+  thread roots.
+
+
 2-3. [Un]populated Notification
 
 Each non-root cgroup has a "cgroup.events" file which contains
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 471773792557..fb694b9fd533 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -502,6 +502,18 @@ struct cgroup_subsys {
 	bool implicit_on_dfl:1;
 
 	/*
+	 * If %true, the controller, supports threaded mode on the default
+	 * hierarchy.  In a threaded subtree, both process granularity and
+	 * no-internal-process constraint are ignored and a threaded
+	 * controllers should be able to handle that.
+	 *
+	 * Note that as an implicit controller is automatically enabled on
+	 * all cgroups on the default hierarchy, it should also be
+	 * threaded.  implicit && !threaded is not supported.
+	 */
+	bool threaded:1;
+
+	/*
 	 * If %false, this subsystem is properly hierarchical -
 	 * configuration, resource accounting and restriction on a parent
 	 * cgroup cover those of its children.  If %true, hierarchy support
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 765c1c27c879..d319438348c4 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -162,6 +162,9 @@ static u16 cgrp_dfl_inhibit_ss_mask;
 /* some controllers are implicitly enabled on the default hierarchy */
 static u16 cgrp_dfl_implicit_ss_mask;
 
+/* some controllers can be threaded on the default hierarchy */
+static u16 cgrp_dfl_threaded_ss_mask;
+
 /* The list of hierarchy roots */
 LIST_HEAD(cgroup_roots);
 static int cgroup_root_count;
@@ -331,14 +334,60 @@ static bool cgroup_is_threaded(struct cgroup *cgrp)
 	return cgrp->proc_cgrp;
 }
 
+/* is @cgrp root of a threaded subtree? */
+static bool cgroup_is_thread_root(struct cgroup *cgrp)
+{
+	return cgrp->proc_cgrp == cgrp;
+}
+
+/* if threaded, would @cgrp become root of a mixed threaded subtree? */
+static bool cgroup_is_mixable(struct cgroup *cgrp)
+{
+	/*
+	 * Root isn't under domain level resource control exempting it from
+	 * the no-internal-process constraint, so it can serve as a thread
+	 * root and a parent of resource domains at the same time.
+	 */
+	return !cgroup_parent(cgrp);
+}
+
+/* is @cgrp root of a mixed threaded subtree */
+static bool cgroup_is_mixed_root(struct cgroup *cgrp)
+{
+	return cgroup_is_thread_root(cgrp) && cgroup_is_mixable(cgrp);
+}
+
+/* is @cgrp's parent a mixed thread root? */
+static bool cgroup_has_mixed_parent(struct cgroup *cgrp)
+{
+	struct cgroup *parent = cgroup_parent(cgrp);
+
+	return parent && cgroup_is_mixed_root(parent);
+}
+
+/* is @cgrp the first level child of a mixed threaded subtree */
+static bool cgroup_is_mixed_child(struct cgroup *cgrp)
+{
+	struct cgroup *parent = cgroup_parent(cgrp);
+
+	return parent && cgrp->proc_cgrp == parent &&
+		cgroup_is_mixed_root(parent);
+}
+
 /* subsystems visibly enabled on a cgroup */
 static u16 cgroup_control(struct cgroup *cgrp)
 {
 	struct cgroup *parent = cgroup_parent(cgrp);
 	u16 root_ss_mask = cgrp->root->subsys_mask;
 
-	if (parent)
-		return parent->subtree_control;
+	if (parent) {
+		u16 ss_mask = parent->subtree_control;
+
+		/* mixed child can only have threaded subset of controllers */
+		if (cgroup_is_mixed_child(cgrp))
+			ss_mask &= cgrp_dfl_threaded_ss_mask;
+		return ss_mask;
+	}
 
 	if (cgroup_on_dfl(cgrp))
 		root_ss_mask &= ~(cgrp_dfl_inhibit_ss_mask |
@@ -351,8 +400,14 @@ static u16 cgroup_ss_mask(struct cgroup *cgrp)
 {
 	struct cgroup *parent = cgroup_parent(cgrp);
 
-	if (parent)
-		return parent->subtree_ss_mask;
+	if (parent) {
+		u16 ss_mask = parent->subtree_ss_mask;
+
+		/* mixed child can only have threaded subset of controllers */
+		if (cgroup_is_mixed_child(cgrp))
+			ss_mask &= cgrp_dfl_threaded_ss_mask;
+		return ss_mask;
+	}
 
 	return cgrp->root->subsys_mask;
 }
@@ -2233,14 +2288,14 @@ static int cgroup_migrate_execute(struct cgroup_mgctx *mgctx)
  * cgroup_may_migrate_to - verify whether a cgroup can be migration destination
  * @dst_cgrp: destination cgroup to test
  *
- * On the default hierarchy, except for the root, subtree_control must be
- * zero for migration destination cgroups with tasks so that child cgroups
- * don't compete against tasks.
+ * On the default hierarchy, except for the mixable and threaded cgroups,
+ * subtree_control must be zero for migration destination cgroups with
+ * tasks so that child cgroups don't compete against tasks.
  */
 bool cgroup_may_migrate_to(struct cgroup *dst_cgrp)
 {
-	return !cgroup_on_dfl(dst_cgrp) || !cgroup_parent(dst_cgrp) ||
-		!dst_cgrp->subtree_control;
+	return !cgroup_on_dfl(dst_cgrp) || cgroup_is_mixable(dst_cgrp) ||
+		cgroup_is_threaded(dst_cgrp) || !dst_cgrp->subtree_control;
 }
 
 /**
@@ -2949,11 +3004,20 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
 		goto out_unlock;
 	}
 
+	/* can't enable !threaded controllers on a threaded cgroup */
+	if (cgroup_is_threaded(cgrp) && !cgroup_is_mixed_root(cgrp) &&
+	    (enable & ~cgrp_dfl_threaded_ss_mask)) {
+		ret = -EBUSY;
+		goto out_unlock;
+	}
+
 	/*
-	 * Except for the root, subtree_control must be zero for a cgroup
-	 * with tasks so that child cgroups don't compete against tasks.
+	 * Except for mixable and threaded cgroups, subtree_control must be
+	 * zero for a cgroup with tasks so that child cgroups don't compete
+	 * against tasks.
 	 */
-	if (enable && cgroup_parent(cgrp) && cgroup_has_tasks(cgrp)) {
+	if (enable && !cgroup_is_mixable(cgrp) && !cgroup_is_threaded(cgrp) &&
+	    cgroup_has_tasks(cgrp)) {
 		ret = -EBUSY;
 		goto out_unlock;
 	}
@@ -2975,6 +3039,130 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
 	return ret ?: nbytes;
 }
 
+enum thread_mode_op {
+	THREAD_MODE_ENABLE,
+	THREAD_MODE_JOIN,
+	THREAD_MODE_DISABLE,
+};
+
+static int cgroup_vet_thread_mode_op(struct cgroup *cgrp, enum thread_mode_op op)
+{
+	/* verify join conditions first and convert it to ENABLE */
+	if (op == THREAD_MODE_JOIN) {
+		/* can't join if it isn't there */
+		if (!cgroup_has_mixed_parent(cgrp))
+			return -EINVAL;
+		/* avoid needing implicit domain controller migrations */
+		if (cgroup_has_tasks(cgrp))
+			return -EBUSY;
+		/* and follow the same restrictions as enable */
+		op = THREAD_MODE_ENABLE;
+	}
+
+	/*
+	 * The only restriction a mixable root is subject to is that it
+	 * can't end the mixed threaded subtree while there are member
+	 * descendant cgroups in it.
+	 */
+	if (cgroup_is_mixable(cgrp)) {
+		if (op == THREAD_MODE_DISABLE) {
+			struct cgroup *child;
+
+			cgroup_for_each_live_child(child, cgrp)
+				if (cgroup_is_mixed_child(child))
+					return -EBUSY;
+		}
+
+		return 0;
+	}
+
+	/*
+	 * @cgrp is starting or ending a normal threaded subtree.  Make
+	 * sure the subtree is empty and avoid needing implicit domain
+	 * controller migrations.
+	 */
+	if (css_has_online_children(&cgrp->self) || cgrp->subtree_control)
+		return -EBUSY;
+
+	/* no partial disable */
+	if (op == THREAD_MODE_DISABLE && !cgroup_is_thread_root(cgrp))
+		return -EBUSY;
+
+	return 0;
+}
+
+static int cgroup_enable_threaded(struct cgroup *cgrp, bool is_join)
+{
+	struct cgroup *proc_cgrp;
+	int ret;
+
+	lockdep_assert_held(&cgroup_mutex);
+
+	/* noop if already threaded */
+	if (cgroup_is_threaded(cgrp))
+		return 0;
+
+	ret = cgroup_vet_thread_mode_op(cgrp, is_join ? THREAD_MODE_JOIN :
+							THREAD_MODE_ENABLE);
+	if (ret)
+		return ret;
+
+	if (is_join)
+		proc_cgrp = cgroup_parent(cgrp);
+	else
+		proc_cgrp = cgrp;
+
+	cgroup_save_control(cgrp);
+
+	/*
+	 * Mark it threaded.  This makes cgroup_control() and
+	 * cgroup_ss_mask() skip domain controllers.  In turn, the
+	 * following control operations migrate tasks to the matching
+	 * threaded csets.
+	 */
+	cgrp->proc_cgrp = proc_cgrp;
+
+	ret = cgroup_apply_control(cgrp);
+	if (ret)
+		cgrp->proc_cgrp = NULL;
+
+	cgroup_finalize_control(cgrp, ret);
+
+	return ret;
+}
+
+static int cgroup_disable_threaded(struct cgroup *cgrp)
+{
+	struct cgroup *proc_cgrp = cgrp->proc_cgrp;
+	int ret;
+
+	lockdep_assert_held(&cgroup_mutex);
+
+	/* noop if already !threaded */
+	if (!cgroup_is_threaded(cgrp))
+		return 0;
+
+	ret = cgroup_vet_thread_mode_op(cgrp, THREAD_MODE_DISABLE);
+	if (ret)
+		return ret;
+
+	cgroup_save_control(cgrp);
+
+	/*
+	 * Mark it !threaded.  This restores cgroup_control() and
+	 * cgroup_ss_mask() behavior.  See cgroup_enabled_threaded().
+	 */
+	cgrp->proc_cgrp = NULL;
+
+	ret = cgroup_apply_control(cgrp);
+	if (ret)
+		cgrp->proc_cgrp = proc_cgrp;
+
+	cgroup_finalize_control(cgrp, ret);
+
+	return ret;
+}
+
 static int cgroup_events_show(struct seq_file *seq, void *v)
 {
 	seq_printf(seq, "populated %d\n",
@@ -3858,12 +4046,12 @@ static void *cgroup_procs_next(struct seq_file *s, void *v, loff_t *pos)
 	return css_task_iter_next(it);
 }
 
-static void *cgroup_procs_start(struct seq_file *s, loff_t *pos)
+static void *__cgroup_procs_start(struct seq_file *s, loff_t *pos,
+				  unsigned int iter_flags)
 {
 	struct kernfs_open_file *of = s->private;
 	struct cgroup *cgrp = seq_css(s)->cgroup;
 	struct css_task_iter *it = of->priv;
-	unsigned iter_flags = CSS_TASK_ITER_PROCS | CSS_TASK_ITER_THREADED;
 
 	/*
 	 * When a seq_file is seeked, it's always traversed sequentially
@@ -3886,6 +4074,23 @@ static void *cgroup_procs_start(struct seq_file *s, loff_t *pos)
 	return cgroup_procs_next(s, NULL, NULL);
 }
 
+static void *cgroup_procs_start(struct seq_file *s, loff_t *pos)
+{
+	struct cgroup *cgrp = seq_css(s)->cgroup;
+
+	/*
+	 * All processes of a threaded subtree are in the top threaded
+	 * cgroup.  Only threads can be distributed across the subtree.
+	 * Reject reads on cgroup.procs in the subtree proper.  They're
+	 * always empty anyway.
+	 */
+	if (cgroup_is_threaded(cgrp) && !cgroup_is_thread_root(cgrp))
+		return ERR_PTR(-EINVAL);
+
+	return __cgroup_procs_start(s, pos, CSS_TASK_ITER_PROCS |
+					    CSS_TASK_ITER_THREADED);
+}
+
 static int cgroup_procs_show(struct seq_file *s, void *v)
 {
 	seq_printf(s, "%d\n", task_pid_vnr(v));
@@ -3940,6 +4145,79 @@ static ssize_t cgroup_procs_write(struct kernfs_open_file *of,
 	return ret ?: nbytes;
 }
 
+static void *cgroup_threads_start(struct seq_file *s, loff_t *pos)
+{
+	struct cgroup *cgrp = seq_css(s)->cgroup;
+
+	if (!cgroup_is_threaded(cgrp))
+		return ERR_PTR(-EINVAL);
+
+	return __cgroup_procs_start(s, pos, 0);
+}
+
+static ssize_t cgroup_threads_write(struct kernfs_open_file *of,
+				    char *buf, size_t nbytes, loff_t off)
+{
+	struct super_block *sb = of->file->f_path.dentry->d_sb;
+	struct cgroup *cgrp, *common_ancestor;
+	struct task_struct *task;
+	ssize_t ret;
+
+	buf = strstrip(buf);
+
+	cgrp = cgroup_kn_lock_live(of->kn, false);
+	if (!cgrp)
+		return -ENODEV;
+
+	/* cgroup.procs determines delegation, require permission on it too */
+	ret = cgroup_procs_write_permission(cgrp, sb);
+	if (ret)
+		goto out_unlock;
+
+	/* enable or disable? */
+	if (!strcmp(buf, "enable")) {
+		ret = cgroup_enable_threaded(cgrp, false);
+		goto out_unlock;
+	} else if (!strcmp(buf, "join")) {
+		ret = cgroup_enable_threaded(cgrp, true);
+		goto out_unlock;
+	} else if (!strcmp(buf, "disable")) {
+		ret = cgroup_disable_threaded(cgrp);
+		goto out_unlock;
+	}
+
+	/* thread migration */
+	ret = -EINVAL;
+	if (!cgroup_is_threaded(cgrp))
+		goto out_unlock;
+
+	task = cgroup_procs_write_start(buf, false);
+	ret = PTR_ERR_OR_ZERO(task);
+	if (ret)
+		goto out_unlock;
+
+	common_ancestor = cgroup_migrate_common_ancestor(task, cgrp);
+
+	/* can't migrate across disjoint threaded subtrees */
+	ret = -EACCES;
+	if (common_ancestor->proc_cgrp != cgrp->proc_cgrp)
+		goto out_finish;
+
+	/* and follow the cgroup.procs delegation rule */
+	ret = cgroup_procs_write_permission(common_ancestor, sb);
+	if (ret)
+		goto out_finish;
+
+	ret = cgroup_attach_task(cgrp, task, false);
+
+out_finish:
+	cgroup_procs_write_finish(task);
+out_unlock:
+	cgroup_kn_unlock(of->kn);
+
+	return ret ?: nbytes;
+}
+
 /* cgroup core interface files for the default hierarchy */
 static struct cftype cgroup_base_files[] = {
 	{
@@ -3952,6 +4230,14 @@ static struct cftype cgroup_base_files[] = {
 		.write = cgroup_procs_write,
 	},
 	{
+		.name = "cgroup.threads",
+		.release = cgroup_procs_release,
+		.seq_start = cgroup_threads_start,
+		.seq_next = cgroup_procs_next,
+		.seq_show = cgroup_procs_show,
+		.write = cgroup_threads_write,
+	},
+	{
 		.name = "cgroup.controllers",
 		.seq_show = cgroup_controllers_show,
 	},
@@ -4266,6 +4552,9 @@ static struct cgroup *cgroup_create(struct cgroup *parent)
 	cgrp->root = root;
 	cgrp->level = level;
 
+	if (!cgroup_has_mixed_parent(cgrp))
+		cgrp->proc_cgrp = parent->proc_cgrp;
+
 	for (tcgrp = cgrp; tcgrp; tcgrp = cgroup_parent(tcgrp))
 		cgrp->ancestor_ids[tcgrp->level] = tcgrp->id;
 
@@ -4712,11 +5001,17 @@ int __init cgroup_init(void)
 
 		cgrp_dfl_root.subsys_mask |= 1 << ss->id;
 
+		/* implicit controllers must be threaded too */
+		WARN_ON(ss->implicit_on_dfl && !ss->threaded);
+
 		if (ss->implicit_on_dfl)
 			cgrp_dfl_implicit_ss_mask |= 1 << ss->id;
 		else if (!ss->dfl_cftypes)
 			cgrp_dfl_inhibit_ss_mask |= 1 << ss->id;
 
+		if (ss->threaded)
+			cgrp_dfl_threaded_ss_mask |= 1 << ss->id;
+
 		if (ss->dfl_cftypes == ss->legacy_cftypes) {
 			WARN_ON(cgroup_add_cftypes(ss, ss->dfl_cftypes));
 		} else {
diff --git a/kernel/cgroup/pids.c b/kernel/cgroup/pids.c
index 2237201d66d5..9829c67ebc0a 100644
--- a/kernel/cgroup/pids.c
+++ b/kernel/cgroup/pids.c
@@ -345,4 +345,5 @@ struct cgroup_subsys pids_cgrp_subsys = {
 	.free		= pids_free,
 	.legacy_cftypes	= pids_files,
 	.dfl_cftypes	= pids_files,
+	.threaded	= true,
 };
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6e75a5c9412d..62878f3fa67a 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11134,5 +11134,6 @@ struct cgroup_subsys perf_event_cgrp_subsys = {
 	 * controller is not mounted on a legacy hierarchy.
 	 */
 	.implicit_on_dfl = true,
+	.threaded	= true,
 };
 #endif /* CONFIG_CGROUP_PERF */
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 07/10] cgroup: implement cgroup v2 thread support
@ 2017-06-10 14:03   ` Tejun Heo
  0 siblings, 0 replies; 47+ messages in thread
From: Tejun Heo @ 2017-06-10 14:03 UTC (permalink / raw)
  To: Li Zefan, hannes-druUgvl0LCNAfugRpC6u6w,
	peterz-wEGCiKHe2LqWVfeAwA7xHQ, mingo-H+wXaHxf7aLQT0dZR+AlfA,
	longman-H+wXaHxf7aLQT0dZR+AlfA
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg,
	pjt-hpIqsD4AKlfQT0dZR+AlfA, luto-kltTT9wpgjJwATOyAt5JVQ,
	efault-Mmb7MZpHnFY, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	Tejun Heo

This patch implements cgroup v2 thread support.  The goal of the
thread mode is supporting hierarchical accounting and control at
thread granularity while staying inside the resource domain model
which allows coordination across different resource controllers and
handling of anonymous resource consumptions.

Once thread mode is enabled on a cgroup, the threads of the processes
which are in its subtree can be placed inside the subtree without
being restricted by process granularity or no-internal-process
constraint.  Note that the threads aren't allowed to escape to a
different threaded subtree.  To be used inside a threaded subtree, a
controller should explicitly support threaded mode and be able to
handle internal competition in the way which is appropriate for the
resource.

The root of a threaded subtree, where thread mode is enabled in the
first place, is called the thread root and serves as the resource
domain for the whole subtree.  This is the last cgroup where
non-threaded controllers are operational and where all the
domain-level resource consumptions in the subtree are accounted.  This
allows threaded controllers to operate at thread granularity when
requested while staying inside the scope of system-level resource
distribution.

As the root cgroup is exempt from the no-internal-process constraint,
it can serve as both a thread root and a parent to normal cgroups.
The root cgroup supports mixed cgroup mode which can be enabled and
disabled anytime as long as there aren't any threaded children.  First
level child cgroups can selectively join the mixed threaded subtree.

Internally, in a threaded subtree, each css_set has its ->proc_cset
pointing to a matching css_set which belongs to the thread root.  This
ensures that thread root level cgroup_subsys_state for all threaded
controllers are readily accessible for domain-level operations.

This patch enables threaded mode for the pids and perf_events
controllers.  Neither has to worry about domain-level resource
consumptions and it's enough to simply set the flag.

For more details on the interface and behavior of the thread mode,
please refer to the section 2-2-2 in Documentation/cgroup-v2.txt added
by this patch.  Note that the documentation update is not complete as
the rest of the documentation needs to be updated accordingly.
Rolling those updates into this patch can be confusing so that will be
separate patches.

v2: - After discussions with Waiman, support for mixed thread mode is
      added.  This should address the issue that Peter pointed out
      where any nesting should be avoided for thread subtrees while
      coexisting with other domain cgroups.

    - Enabling / disabling thread mode now piggy backs on the existing
      control mask update mechanism.

    - Bug fixes and cleanup.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 Documentation/cgroup-v2.txt |  99 +++++++++++++-
 include/linux/cgroup-defs.h |  12 ++
 kernel/cgroup/cgroup.c      | 323 ++++++++++++++++++++++++++++++++++++++++++--
 kernel/cgroup/pids.c        |   1 +
 kernel/events/core.c        |   1 +
 5 files changed, 420 insertions(+), 16 deletions(-)

diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
index dc5e2dcdbef4..96db84005cb2 100644
--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -16,7 +16,9 @@ CONTENTS
   1-2. What is cgroup?
 2. Basic Operations
   2-1. Mounting
-  2-2. Organizing Processes
+  2-2. Organizing Processes and Threads
+    2-2-1. Processes
+    2-2-2. Threads
   2-3. [Un]populated Notification
   2-4. Controlling Controllers
     2-4-1. Enabling and Disabling
@@ -150,7 +152,9 @@ and experimenting easier, the kernel parameter cgroup_no_v1= allows
 disabling controllers in v1 and make them always available in v2.
 
 
-2-2. Organizing Processes
+2-2. Organizing Processes and Threads
+
+2-2-1. Processes
 
 Initially, only the root cgroup exists to which all processes belong.
 A child cgroup can be created by creating a sub-directory.
@@ -201,6 +205,97 @@ is removed subsequently, " (deleted)" is appended to the path.
   0::/test-cgroup/test-cgroup-nested (deleted)
 
 
+2-2-2. Threads
+
+cgroup v2 supports thread granularity for a subset of controllers to
+support use cases requiring hierarchical resource distribution across
+the threads of a group of processes.  By default, all threads of a
+process belong to the same cgroup, which also serves as the resource
+domain to host resource consumptions which are not specific to a
+process or thread.  The thread mode allows threads to be spread across
+a subtree while still maintaining the common resource domain for them.
+
+Enabling thread mode on a subtree makes it threaded.  The root of a
+threaded subtree is called thread root and serves as the resource
+domain for the entire subtree.  In a threaded subtree, threads of a
+process can be put in different cgroups and are not subject to the no
+internal process constraint - threaded controllers can be enabled on
+non-leaf cgroups whether they have threads in them or not.
+
+Because the root cgroup is not subject to no internal process
+constraint, it can serve both as a thread root and a parent to normal
+cgroups.  This is called mixed thread mode.
+
+Thread mode can be enabled by writing "enable" to "cgroup.threads"
+file.
+
+  # echo enable > cgroup.threads
+
+On a non-root cgroup, to enable the thread mode, the following
+conditions must be met.
+
+- The thread root doesn't have any child cgroups.
+
+- The thread root doesn't have any controllers enabled.
+
+On the root cgroup, only the mixed thread mode is supported and there
+isn't any restriction on enabling it.  On enable, unlike the normal
+thread mode, the whole subtree is not turned into thread subtree.  The
+first level children have to explicitly join the thread subtree.  A
+cgroup can join the existing mixed threaded subtree by writing "join"
+to "cgroup.threads" file.
+
+  # echo join > cgroup.threads
+
+In addition to the usual enable conditions, the following extra
+condition must met for joining.
+
+- The first level child doesn't have any tasks.
+
+Inside a threaded subtree, "cgroup.threads" can be read and contains
+the list of the thread IDs of all threads in the cgroup.  Except that
+the operations are per-thread instead of per-process, "cgroup.threads"
+has the same format and behaves the same way as "cgroup.procs".
+
+The thread root serves as the resource domain for the whole subtree,
+and, while the threads can be scattered across the subtree, all the
+processes are considered to be in the thread root.  "cgroup.procs" in
+a thread root contains the PIDs of all processes in the subtree and is
+not readable in the subtree proper.  However, "cgroup.procs" can be
+written to from anywhere in the subtree to migrate all threads of the
+matching process to the cgroup.
+
+Only threaded controllers can be enabled in a threaded subtree.  When
+a threaded controller is enabled inside a threaded subtree, it only
+accounts for and controls resource consumptions associated with the
+threads in the cgroup and its descendants.  All consumptions which
+aren't tied to a specific thread belong to the thread root.
+
+Because a threaded subtree is exempt from no internal process
+constraint, a threaded controller must be able to handle competition
+between threads in a non-leaf cgroup and its child cgroups.  Each
+threaded controller defines how such competitions are handled.
+
+Thread mode can be disabled by writing "disable" to "cgroup.threads"
+file.
+
+  # echo disable > cgroup.threads
+
+Disabling requires the same conditions as enabling and the following
+extra.
+
+For a non-mixed threaded subtree:
+
+- The cgroup must be the thread root.  Thread mode can't be disabled
+  partially in the subtree.
+
+For the root cgroup:
+
+- There can't be any child cgroup which is a part of the mixed thread
+  subtree.  All first level children must be either non-threaded or
+  thread roots.
+
+
 2-3. [Un]populated Notification
 
 Each non-root cgroup has a "cgroup.events" file which contains
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 471773792557..fb694b9fd533 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -502,6 +502,18 @@ struct cgroup_subsys {
 	bool implicit_on_dfl:1;
 
 	/*
+	 * If %true, the controller, supports threaded mode on the default
+	 * hierarchy.  In a threaded subtree, both process granularity and
+	 * no-internal-process constraint are ignored and a threaded
+	 * controllers should be able to handle that.
+	 *
+	 * Note that as an implicit controller is automatically enabled on
+	 * all cgroups on the default hierarchy, it should also be
+	 * threaded.  implicit && !threaded is not supported.
+	 */
+	bool threaded:1;
+
+	/*
 	 * If %false, this subsystem is properly hierarchical -
 	 * configuration, resource accounting and restriction on a parent
 	 * cgroup cover those of its children.  If %true, hierarchy support
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 765c1c27c879..d319438348c4 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -162,6 +162,9 @@ static u16 cgrp_dfl_inhibit_ss_mask;
 /* some controllers are implicitly enabled on the default hierarchy */
 static u16 cgrp_dfl_implicit_ss_mask;
 
+/* some controllers can be threaded on the default hierarchy */
+static u16 cgrp_dfl_threaded_ss_mask;
+
 /* The list of hierarchy roots */
 LIST_HEAD(cgroup_roots);
 static int cgroup_root_count;
@@ -331,14 +334,60 @@ static bool cgroup_is_threaded(struct cgroup *cgrp)
 	return cgrp->proc_cgrp;
 }
 
+/* is @cgrp root of a threaded subtree? */
+static bool cgroup_is_thread_root(struct cgroup *cgrp)
+{
+	return cgrp->proc_cgrp == cgrp;
+}
+
+/* if threaded, would @cgrp become root of a mixed threaded subtree? */
+static bool cgroup_is_mixable(struct cgroup *cgrp)
+{
+	/*
+	 * Root isn't under domain level resource control exempting it from
+	 * the no-internal-process constraint, so it can serve as a thread
+	 * root and a parent of resource domains at the same time.
+	 */
+	return !cgroup_parent(cgrp);
+}
+
+/* is @cgrp root of a mixed threaded subtree */
+static bool cgroup_is_mixed_root(struct cgroup *cgrp)
+{
+	return cgroup_is_thread_root(cgrp) && cgroup_is_mixable(cgrp);
+}
+
+/* is @cgrp's parent a mixed thread root? */
+static bool cgroup_has_mixed_parent(struct cgroup *cgrp)
+{
+	struct cgroup *parent = cgroup_parent(cgrp);
+
+	return parent && cgroup_is_mixed_root(parent);
+}
+
+/* is @cgrp the first level child of a mixed threaded subtree */
+static bool cgroup_is_mixed_child(struct cgroup *cgrp)
+{
+	struct cgroup *parent = cgroup_parent(cgrp);
+
+	return parent && cgrp->proc_cgrp == parent &&
+		cgroup_is_mixed_root(parent);
+}
+
 /* subsystems visibly enabled on a cgroup */
 static u16 cgroup_control(struct cgroup *cgrp)
 {
 	struct cgroup *parent = cgroup_parent(cgrp);
 	u16 root_ss_mask = cgrp->root->subsys_mask;
 
-	if (parent)
-		return parent->subtree_control;
+	if (parent) {
+		u16 ss_mask = parent->subtree_control;
+
+		/* mixed child can only have threaded subset of controllers */
+		if (cgroup_is_mixed_child(cgrp))
+			ss_mask &= cgrp_dfl_threaded_ss_mask;
+		return ss_mask;
+	}
 
 	if (cgroup_on_dfl(cgrp))
 		root_ss_mask &= ~(cgrp_dfl_inhibit_ss_mask |
@@ -351,8 +400,14 @@ static u16 cgroup_ss_mask(struct cgroup *cgrp)
 {
 	struct cgroup *parent = cgroup_parent(cgrp);
 
-	if (parent)
-		return parent->subtree_ss_mask;
+	if (parent) {
+		u16 ss_mask = parent->subtree_ss_mask;
+
+		/* mixed child can only have threaded subset of controllers */
+		if (cgroup_is_mixed_child(cgrp))
+			ss_mask &= cgrp_dfl_threaded_ss_mask;
+		return ss_mask;
+	}
 
 	return cgrp->root->subsys_mask;
 }
@@ -2233,14 +2288,14 @@ static int cgroup_migrate_execute(struct cgroup_mgctx *mgctx)
  * cgroup_may_migrate_to - verify whether a cgroup can be migration destination
  * @dst_cgrp: destination cgroup to test
  *
- * On the default hierarchy, except for the root, subtree_control must be
- * zero for migration destination cgroups with tasks so that child cgroups
- * don't compete against tasks.
+ * On the default hierarchy, except for the mixable and threaded cgroups,
+ * subtree_control must be zero for migration destination cgroups with
+ * tasks so that child cgroups don't compete against tasks.
  */
 bool cgroup_may_migrate_to(struct cgroup *dst_cgrp)
 {
-	return !cgroup_on_dfl(dst_cgrp) || !cgroup_parent(dst_cgrp) ||
-		!dst_cgrp->subtree_control;
+	return !cgroup_on_dfl(dst_cgrp) || cgroup_is_mixable(dst_cgrp) ||
+		cgroup_is_threaded(dst_cgrp) || !dst_cgrp->subtree_control;
 }
 
 /**
@@ -2949,11 +3004,20 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
 		goto out_unlock;
 	}
 
+	/* can't enable !threaded controllers on a threaded cgroup */
+	if (cgroup_is_threaded(cgrp) && !cgroup_is_mixed_root(cgrp) &&
+	    (enable & ~cgrp_dfl_threaded_ss_mask)) {
+		ret = -EBUSY;
+		goto out_unlock;
+	}
+
 	/*
-	 * Except for the root, subtree_control must be zero for a cgroup
-	 * with tasks so that child cgroups don't compete against tasks.
+	 * Except for mixable and threaded cgroups, subtree_control must be
+	 * zero for a cgroup with tasks so that child cgroups don't compete
+	 * against tasks.
 	 */
-	if (enable && cgroup_parent(cgrp) && cgroup_has_tasks(cgrp)) {
+	if (enable && !cgroup_is_mixable(cgrp) && !cgroup_is_threaded(cgrp) &&
+	    cgroup_has_tasks(cgrp)) {
 		ret = -EBUSY;
 		goto out_unlock;
 	}
@@ -2975,6 +3039,130 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
 	return ret ?: nbytes;
 }
 
+enum thread_mode_op {
+	THREAD_MODE_ENABLE,
+	THREAD_MODE_JOIN,
+	THREAD_MODE_DISABLE,
+};
+
+static int cgroup_vet_thread_mode_op(struct cgroup *cgrp, enum thread_mode_op op)
+{
+	/* verify join conditions first and convert it to ENABLE */
+	if (op == THREAD_MODE_JOIN) {
+		/* can't join if it isn't there */
+		if (!cgroup_has_mixed_parent(cgrp))
+			return -EINVAL;
+		/* avoid needing implicit domain controller migrations */
+		if (cgroup_has_tasks(cgrp))
+			return -EBUSY;
+		/* and follow the same restrictions as enable */
+		op = THREAD_MODE_ENABLE;
+	}
+
+	/*
+	 * The only restriction a mixable root is subject to is that it
+	 * can't end the mixed threaded subtree while there are member
+	 * descendant cgroups in it.
+	 */
+	if (cgroup_is_mixable(cgrp)) {
+		if (op == THREAD_MODE_DISABLE) {
+			struct cgroup *child;
+
+			cgroup_for_each_live_child(child, cgrp)
+				if (cgroup_is_mixed_child(child))
+					return -EBUSY;
+		}
+
+		return 0;
+	}
+
+	/*
+	 * @cgrp is starting or ending a normal threaded subtree.  Make
+	 * sure the subtree is empty and avoid needing implicit domain
+	 * controller migrations.
+	 */
+	if (css_has_online_children(&cgrp->self) || cgrp->subtree_control)
+		return -EBUSY;
+
+	/* no partial disable */
+	if (op == THREAD_MODE_DISABLE && !cgroup_is_thread_root(cgrp))
+		return -EBUSY;
+
+	return 0;
+}
+
+static int cgroup_enable_threaded(struct cgroup *cgrp, bool is_join)
+{
+	struct cgroup *proc_cgrp;
+	int ret;
+
+	lockdep_assert_held(&cgroup_mutex);
+
+	/* noop if already threaded */
+	if (cgroup_is_threaded(cgrp))
+		return 0;
+
+	ret = cgroup_vet_thread_mode_op(cgrp, is_join ? THREAD_MODE_JOIN :
+							THREAD_MODE_ENABLE);
+	if (ret)
+		return ret;
+
+	if (is_join)
+		proc_cgrp = cgroup_parent(cgrp);
+	else
+		proc_cgrp = cgrp;
+
+	cgroup_save_control(cgrp);
+
+	/*
+	 * Mark it threaded.  This makes cgroup_control() and
+	 * cgroup_ss_mask() skip domain controllers.  In turn, the
+	 * following control operations migrate tasks to the matching
+	 * threaded csets.
+	 */
+	cgrp->proc_cgrp = proc_cgrp;
+
+	ret = cgroup_apply_control(cgrp);
+	if (ret)
+		cgrp->proc_cgrp = NULL;
+
+	cgroup_finalize_control(cgrp, ret);
+
+	return ret;
+}
+
+static int cgroup_disable_threaded(struct cgroup *cgrp)
+{
+	struct cgroup *proc_cgrp = cgrp->proc_cgrp;
+	int ret;
+
+	lockdep_assert_held(&cgroup_mutex);
+
+	/* noop if already !threaded */
+	if (!cgroup_is_threaded(cgrp))
+		return 0;
+
+	ret = cgroup_vet_thread_mode_op(cgrp, THREAD_MODE_DISABLE);
+	if (ret)
+		return ret;
+
+	cgroup_save_control(cgrp);
+
+	/*
+	 * Mark it !threaded.  This restores cgroup_control() and
+	 * cgroup_ss_mask() behavior.  See cgroup_enabled_threaded().
+	 */
+	cgrp->proc_cgrp = NULL;
+
+	ret = cgroup_apply_control(cgrp);
+	if (ret)
+		cgrp->proc_cgrp = proc_cgrp;
+
+	cgroup_finalize_control(cgrp, ret);
+
+	return ret;
+}
+
 static int cgroup_events_show(struct seq_file *seq, void *v)
 {
 	seq_printf(seq, "populated %d\n",
@@ -3858,12 +4046,12 @@ static void *cgroup_procs_next(struct seq_file *s, void *v, loff_t *pos)
 	return css_task_iter_next(it);
 }
 
-static void *cgroup_procs_start(struct seq_file *s, loff_t *pos)
+static void *__cgroup_procs_start(struct seq_file *s, loff_t *pos,
+				  unsigned int iter_flags)
 {
 	struct kernfs_open_file *of = s->private;
 	struct cgroup *cgrp = seq_css(s)->cgroup;
 	struct css_task_iter *it = of->priv;
-	unsigned iter_flags = CSS_TASK_ITER_PROCS | CSS_TASK_ITER_THREADED;
 
 	/*
 	 * When a seq_file is seeked, it's always traversed sequentially
@@ -3886,6 +4074,23 @@ static void *cgroup_procs_start(struct seq_file *s, loff_t *pos)
 	return cgroup_procs_next(s, NULL, NULL);
 }
 
+static void *cgroup_procs_start(struct seq_file *s, loff_t *pos)
+{
+	struct cgroup *cgrp = seq_css(s)->cgroup;
+
+	/*
+	 * All processes of a threaded subtree are in the top threaded
+	 * cgroup.  Only threads can be distributed across the subtree.
+	 * Reject reads on cgroup.procs in the subtree proper.  They're
+	 * always empty anyway.
+	 */
+	if (cgroup_is_threaded(cgrp) && !cgroup_is_thread_root(cgrp))
+		return ERR_PTR(-EINVAL);
+
+	return __cgroup_procs_start(s, pos, CSS_TASK_ITER_PROCS |
+					    CSS_TASK_ITER_THREADED);
+}
+
 static int cgroup_procs_show(struct seq_file *s, void *v)
 {
 	seq_printf(s, "%d\n", task_pid_vnr(v));
@@ -3940,6 +4145,79 @@ static ssize_t cgroup_procs_write(struct kernfs_open_file *of,
 	return ret ?: nbytes;
 }
 
+static void *cgroup_threads_start(struct seq_file *s, loff_t *pos)
+{
+	struct cgroup *cgrp = seq_css(s)->cgroup;
+
+	if (!cgroup_is_threaded(cgrp))
+		return ERR_PTR(-EINVAL);
+
+	return __cgroup_procs_start(s, pos, 0);
+}
+
+static ssize_t cgroup_threads_write(struct kernfs_open_file *of,
+				    char *buf, size_t nbytes, loff_t off)
+{
+	struct super_block *sb = of->file->f_path.dentry->d_sb;
+	struct cgroup *cgrp, *common_ancestor;
+	struct task_struct *task;
+	ssize_t ret;
+
+	buf = strstrip(buf);
+
+	cgrp = cgroup_kn_lock_live(of->kn, false);
+	if (!cgrp)
+		return -ENODEV;
+
+	/* cgroup.procs determines delegation, require permission on it too */
+	ret = cgroup_procs_write_permission(cgrp, sb);
+	if (ret)
+		goto out_unlock;
+
+	/* enable or disable? */
+	if (!strcmp(buf, "enable")) {
+		ret = cgroup_enable_threaded(cgrp, false);
+		goto out_unlock;
+	} else if (!strcmp(buf, "join")) {
+		ret = cgroup_enable_threaded(cgrp, true);
+		goto out_unlock;
+	} else if (!strcmp(buf, "disable")) {
+		ret = cgroup_disable_threaded(cgrp);
+		goto out_unlock;
+	}
+
+	/* thread migration */
+	ret = -EINVAL;
+	if (!cgroup_is_threaded(cgrp))
+		goto out_unlock;
+
+	task = cgroup_procs_write_start(buf, false);
+	ret = PTR_ERR_OR_ZERO(task);
+	if (ret)
+		goto out_unlock;
+
+	common_ancestor = cgroup_migrate_common_ancestor(task, cgrp);
+
+	/* can't migrate across disjoint threaded subtrees */
+	ret = -EACCES;
+	if (common_ancestor->proc_cgrp != cgrp->proc_cgrp)
+		goto out_finish;
+
+	/* and follow the cgroup.procs delegation rule */
+	ret = cgroup_procs_write_permission(common_ancestor, sb);
+	if (ret)
+		goto out_finish;
+
+	ret = cgroup_attach_task(cgrp, task, false);
+
+out_finish:
+	cgroup_procs_write_finish(task);
+out_unlock:
+	cgroup_kn_unlock(of->kn);
+
+	return ret ?: nbytes;
+}
+
 /* cgroup core interface files for the default hierarchy */
 static struct cftype cgroup_base_files[] = {
 	{
@@ -3952,6 +4230,14 @@ static struct cftype cgroup_base_files[] = {
 		.write = cgroup_procs_write,
 	},
 	{
+		.name = "cgroup.threads",
+		.release = cgroup_procs_release,
+		.seq_start = cgroup_threads_start,
+		.seq_next = cgroup_procs_next,
+		.seq_show = cgroup_procs_show,
+		.write = cgroup_threads_write,
+	},
+	{
 		.name = "cgroup.controllers",
 		.seq_show = cgroup_controllers_show,
 	},
@@ -4266,6 +4552,9 @@ static struct cgroup *cgroup_create(struct cgroup *parent)
 	cgrp->root = root;
 	cgrp->level = level;
 
+	if (!cgroup_has_mixed_parent(cgrp))
+		cgrp->proc_cgrp = parent->proc_cgrp;
+
 	for (tcgrp = cgrp; tcgrp; tcgrp = cgroup_parent(tcgrp))
 		cgrp->ancestor_ids[tcgrp->level] = tcgrp->id;
 
@@ -4712,11 +5001,17 @@ int __init cgroup_init(void)
 
 		cgrp_dfl_root.subsys_mask |= 1 << ss->id;
 
+		/* implicit controllers must be threaded too */
+		WARN_ON(ss->implicit_on_dfl && !ss->threaded);
+
 		if (ss->implicit_on_dfl)
 			cgrp_dfl_implicit_ss_mask |= 1 << ss->id;
 		else if (!ss->dfl_cftypes)
 			cgrp_dfl_inhibit_ss_mask |= 1 << ss->id;
 
+		if (ss->threaded)
+			cgrp_dfl_threaded_ss_mask |= 1 << ss->id;
+
 		if (ss->dfl_cftypes == ss->legacy_cftypes) {
 			WARN_ON(cgroup_add_cftypes(ss, ss->dfl_cftypes));
 		} else {
diff --git a/kernel/cgroup/pids.c b/kernel/cgroup/pids.c
index 2237201d66d5..9829c67ebc0a 100644
--- a/kernel/cgroup/pids.c
+++ b/kernel/cgroup/pids.c
@@ -345,4 +345,5 @@ struct cgroup_subsys pids_cgrp_subsys = {
 	.free		= pids_free,
 	.legacy_cftypes	= pids_files,
 	.dfl_cftypes	= pids_files,
+	.threaded	= true,
 };
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6e75a5c9412d..62878f3fa67a 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11134,5 +11134,6 @@ struct cgroup_subsys perf_event_cgrp_subsys = {
 	 * controller is not mounted on a legacy hierarchy.
 	 */
 	.implicit_on_dfl = true,
+	.threaded	= true,
 };
 #endif /* CONFIG_CGROUP_PERF */
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 08/10] sched: Misc preps for cgroup unified hierarchy interface
  2017-06-10 14:03 ` Tejun Heo
                   ` (7 preceding siblings ...)
  (?)
@ 2017-06-10 14:03 ` Tejun Heo
  -1 siblings, 0 replies; 47+ messages in thread
From: Tejun Heo @ 2017-06-10 14:03 UTC (permalink / raw)
  To: Li Zefan, hannes, peterz, mingo, longman
  Cc: cgroups, linux-kernel, kernel-team, pjt, luto, efault, torvalds,
	Tejun Heo

Make the following changes in preparation for the cpu controller
interface implementation for the unified hierarchy.  This patch
doesn't cause any functional differences.

* s/cpu_stats_show()/cpu_cfs_stats_show()/

* s/cpu_files/cpu_legacy_files/

* Separate out cpuacct_stats_read() from cpuacct_stats_show().  While
  at it, make the @val array u64 for consistency.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
---
 kernel/sched/core.c    |  8 ++++----
 kernel/sched/cpuacct.c | 29 ++++++++++++++++++-----------
 2 files changed, 22 insertions(+), 15 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 803c3bc274c4..016c3552a2b4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7246,7 +7246,7 @@ static int __cfs_schedulable(struct task_group *tg, u64 period, u64 quota)
 	return ret;
 }
 
-static int cpu_stats_show(struct seq_file *sf, void *v)
+static int cpu_cfs_stats_show(struct seq_file *sf, void *v)
 {
 	struct task_group *tg = css_tg(seq_css(sf));
 	struct cfs_bandwidth *cfs_b = &tg->cfs_bandwidth;
@@ -7286,7 +7286,7 @@ static u64 cpu_rt_period_read_uint(struct cgroup_subsys_state *css,
 }
 #endif /* CONFIG_RT_GROUP_SCHED */
 
-static struct cftype cpu_files[] = {
+static struct cftype cpu_legacy_files[] = {
 #ifdef CONFIG_FAIR_GROUP_SCHED
 	{
 		.name = "shares",
@@ -7307,7 +7307,7 @@ static struct cftype cpu_files[] = {
 	},
 	{
 		.name = "stat",
-		.seq_show = cpu_stats_show,
+		.seq_show = cpu_cfs_stats_show,
 	},
 #endif
 #ifdef CONFIG_RT_GROUP_SCHED
@@ -7333,7 +7333,7 @@ struct cgroup_subsys cpu_cgrp_subsys = {
 	.fork		= cpu_cgroup_fork,
 	.can_attach	= cpu_cgroup_can_attach,
 	.attach		= cpu_cgroup_attach,
-	.legacy_cftypes	= cpu_files,
+	.legacy_cftypes	= cpu_legacy_files,
 	.early_init	= true,
 };
 
diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
index f95ab29a45d0..6151c23f722f 100644
--- a/kernel/sched/cpuacct.c
+++ b/kernel/sched/cpuacct.c
@@ -276,26 +276,33 @@ static int cpuacct_all_seq_show(struct seq_file *m, void *V)
 	return 0;
 }
 
-static int cpuacct_stats_show(struct seq_file *sf, void *v)
+static void cpuacct_stats_read(struct cpuacct *ca,
+			       u64 (*val)[CPUACCT_STAT_NSTATS])
 {
-	struct cpuacct *ca = css_ca(seq_css(sf));
-	s64 val[CPUACCT_STAT_NSTATS];
 	int cpu;
-	int stat;
 
-	memset(val, 0, sizeof(val));
+	memset(val, 0, sizeof(*val));
+
 	for_each_possible_cpu(cpu) {
 		u64 *cpustat = per_cpu_ptr(ca->cpustat, cpu)->cpustat;
 
-		val[CPUACCT_STAT_USER]   += cpustat[CPUTIME_USER];
-		val[CPUACCT_STAT_USER]   += cpustat[CPUTIME_NICE];
-		val[CPUACCT_STAT_SYSTEM] += cpustat[CPUTIME_SYSTEM];
-		val[CPUACCT_STAT_SYSTEM] += cpustat[CPUTIME_IRQ];
-		val[CPUACCT_STAT_SYSTEM] += cpustat[CPUTIME_SOFTIRQ];
+		(*val)[CPUACCT_STAT_USER]   += cpustat[CPUTIME_USER];
+		(*val)[CPUACCT_STAT_USER]   += cpustat[CPUTIME_NICE];
+		(*val)[CPUACCT_STAT_SYSTEM] += cpustat[CPUTIME_SYSTEM];
+		(*val)[CPUACCT_STAT_SYSTEM] += cpustat[CPUTIME_IRQ];
+		(*val)[CPUACCT_STAT_SYSTEM] += cpustat[CPUTIME_SOFTIRQ];
 	}
+}
+
+static int cpuacct_stats_show(struct seq_file *sf, void *v)
+{
+	u64 val[CPUACCT_STAT_NSTATS];
+	int stat;
+
+	cpuacct_stats_read(css_ca(seq_css(sf)), &val);
 
 	for (stat = 0; stat < CPUACCT_STAT_NSTATS; stat++) {
-		seq_printf(sf, "%s %lld\n",
+		seq_printf(sf, "%s %llu\n",
 			   cpuacct_stat_desc[stat],
 			   (long long)nsec_to_clock_t(val[stat]));
 	}
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 09/10] sched: Implement interface for cgroup unified hierarchy
  2017-06-10 14:03 ` Tejun Heo
                   ` (8 preceding siblings ...)
  (?)
@ 2017-06-10 14:03 ` Tejun Heo
  -1 siblings, 0 replies; 47+ messages in thread
From: Tejun Heo @ 2017-06-10 14:03 UTC (permalink / raw)
  To: Li Zefan, hannes, peterz, mingo, longman
  Cc: cgroups, linux-kernel, kernel-team, pjt, luto, efault, torvalds,
	Tejun Heo

While the cpu controller doesn't have any functional problems, there
are a couple interface issues which can be addressed in the v2
interface.

* cpuacct being a separate controller.  This separation is artificial
  and rather pointless as demonstrated by most use cases co-mounting
  the two controllers.  It also forces certain information to be
  accounted twice.

* Use of different time units.  Writable control knobs use
  microseconds, some stat fields use nanoseconds while other cpuacct
  stat fields use centiseconds.

* Control knobs which can't be used in the root cgroup still show up
  in the root.

* Control knob names and semantics aren't consistent with other
  controllers.

This patchset implements cpu controller's interface on the unified
hierarchy which adheres to the controller file conventions described
in Documentation/cgroups/unified-hierarchy.txt.  Overall, the
following changes are made.

* cpuacct is implictly enabled and disabled by cpu and its information
  is reported through "cpu.stat" which now uses microseconds for all
  time durations.  All time duration fields now have "_usec" appended
  to them for clarity.  While this doesn't solve the double accounting
  immediately, once majority of users switch to v2, cpu can directly
  account and report the relevant stats and cpuacct can be disabled on
  the unified hierarchy.

  Note that cpuacct.usage_percpu is currently not included in
  "cpu.stat".  If this information is actually called for, it can be
  added later.

* "cpu.shares" is replaced with "cpu.weight" and operates on the
  standard scale defined by CGROUP_WEIGHT_MIN/DFL/MAX (1, 100, 10000).
  The weight is scaled to scheduler weight so that 100 maps to 1024
  and the ratio relationship is preserved - if weight is W and its
  scaled value is S, W / 100 == S / 1024.  While the mapped range is a
  bit smaller than the orignal scheduler weight range, the dead zones
  on both sides are relatively small and covers wider range than the
  nice value mappings.  This file doesn't make sense in the root
  cgroup and isn't create on root.

* "cpu.cfs_quota_us" and "cpu.cfs_period_us" are replaced by "cpu.max"
  which contains both quota and period.

* "cpu.rt_runtime_us" and "cpu.rt_period_us" are replaced by
  "cpu.rt.max" which contains both runtime and period.

v2: cpu_stats_show() was incorrectly using CONFIG_FAIR_GROUP_SCHED for
    CFS bandwidth stats and also using raw division for u64.  Use
    CONFIG_CFS_BANDWITH and do_div() instead.

    The semantics of "cpu.rt.max" is not fully decided yet.  Dropped
    for now.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
---
 kernel/sched/core.c    | 141 +++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/sched/cpuacct.c |  23 ++++++++
 kernel/sched/cpuacct.h |   5 ++
 3 files changed, 169 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 016c3552a2b4..c86ab2c4ed3b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7325,6 +7325,139 @@ static struct cftype cpu_legacy_files[] = {
 	{ }	/* Terminate */
 };
 
+static int cpu_stats_show(struct seq_file *sf, void *v)
+{
+	cpuacct_cpu_stats_show(sf);
+
+#ifdef CONFIG_CFS_BANDWIDTH
+	{
+		struct task_group *tg = css_tg(seq_css(sf));
+		struct cfs_bandwidth *cfs_b = &tg->cfs_bandwidth;
+		u64 throttled_usec;
+
+		throttled_usec = cfs_b->throttled_time;
+		do_div(throttled_usec, NSEC_PER_USEC);
+
+		seq_printf(sf, "nr_periods %d\n"
+			   "nr_throttled %d\n"
+			   "throttled_usec %llu\n",
+			   cfs_b->nr_periods, cfs_b->nr_throttled,
+			   throttled_usec);
+	}
+#endif
+	return 0;
+}
+
+#ifdef CONFIG_FAIR_GROUP_SCHED
+static u64 cpu_weight_read_u64(struct cgroup_subsys_state *css,
+			       struct cftype *cft)
+{
+	struct task_group *tg = css_tg(css);
+	u64 weight = scale_load_down(tg->shares);
+
+	return DIV_ROUND_CLOSEST_ULL(weight * CGROUP_WEIGHT_DFL, 1024);
+}
+
+static int cpu_weight_write_u64(struct cgroup_subsys_state *css,
+				struct cftype *cftype, u64 weight)
+{
+	/*
+	 * cgroup weight knobs should use the common MIN, DFL and MAX
+	 * values which are 1, 100 and 10000 respectively.  While it loses
+	 * a bit of range on both ends, it maps pretty well onto the shares
+	 * value used by scheduler and the round-trip conversions preserve
+	 * the original value over the entire range.
+	 */
+	if (weight < CGROUP_WEIGHT_MIN || weight > CGROUP_WEIGHT_MAX)
+		return -ERANGE;
+
+	weight = DIV_ROUND_CLOSEST_ULL(weight * 1024, CGROUP_WEIGHT_DFL);
+
+	return sched_group_set_shares(css_tg(css), scale_load(weight));
+}
+#endif
+
+static void __maybe_unused cpu_period_quota_print(struct seq_file *sf,
+						  long period, long quota)
+{
+	if (quota < 0)
+		seq_puts(sf, "max");
+	else
+		seq_printf(sf, "%ld", quota);
+
+	seq_printf(sf, " %ld\n", period);
+}
+
+/* caller should put the current value in *@periodp before calling */
+static int __maybe_unused cpu_period_quota_parse(char *buf,
+						 u64 *periodp, u64 *quotap)
+{
+	char tok[21];	/* U64_MAX */
+
+	if (!sscanf(buf, "%s %llu", tok, periodp))
+		return -EINVAL;
+
+	*periodp *= NSEC_PER_USEC;
+
+	if (sscanf(tok, "%llu", quotap))
+		*quotap *= NSEC_PER_USEC;
+	else if (!strcmp(tok, "max"))
+		*quotap = RUNTIME_INF;
+	else
+		return -EINVAL;
+
+	return 0;
+}
+
+#ifdef CONFIG_CFS_BANDWIDTH
+static int cpu_max_show(struct seq_file *sf, void *v)
+{
+	struct task_group *tg = css_tg(seq_css(sf));
+
+	cpu_period_quota_print(sf, tg_get_cfs_period(tg), tg_get_cfs_quota(tg));
+	return 0;
+}
+
+static ssize_t cpu_max_write(struct kernfs_open_file *of,
+			     char *buf, size_t nbytes, loff_t off)
+{
+	struct task_group *tg = css_tg(of_css(of));
+	u64 period = tg_get_cfs_period(tg);
+	u64 quota;
+	int ret;
+
+	ret = cpu_period_quota_parse(buf, &period, &quota);
+	if (!ret)
+		ret = tg_set_cfs_bandwidth(tg, period, quota);
+	return ret ?: nbytes;
+}
+#endif
+
+static struct cftype cpu_files[] = {
+	{
+		.name = "stat",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.seq_show = cpu_stats_show,
+	},
+#ifdef CONFIG_FAIR_GROUP_SCHED
+	{
+		.name = "weight",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.read_u64 = cpu_weight_read_u64,
+		.write_u64 = cpu_weight_write_u64,
+	},
+#endif
+#ifdef CONFIG_CFS_BANDWIDTH
+	{
+		.name = "max",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.seq_show = cpu_max_show,
+		.write = cpu_max_write,
+	},
+#endif
+	{ }	/* terminate */
+};
+
 struct cgroup_subsys cpu_cgrp_subsys = {
 	.css_alloc	= cpu_cgroup_css_alloc,
 	.css_online	= cpu_cgroup_css_online,
@@ -7334,7 +7467,15 @@ struct cgroup_subsys cpu_cgrp_subsys = {
 	.can_attach	= cpu_cgroup_can_attach,
 	.attach		= cpu_cgroup_attach,
 	.legacy_cftypes	= cpu_legacy_files,
+	.dfl_cftypes	= cpu_files,
 	.early_init	= true,
+#ifdef CONFIG_CGROUP_CPUACCT
+	/*
+	 * cpuacct is enabled together with cpu on the unified hierarchy
+	 * and its stats are reported through "cpu.stat".
+	 */
+	.depends_on	= 1 << cpuacct_cgrp_id,
+#endif
 };
 
 #endif	/* CONFIG_CGROUP_SCHED */
diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
index 6151c23f722f..07ed36cc2600 100644
--- a/kernel/sched/cpuacct.c
+++ b/kernel/sched/cpuacct.c
@@ -347,6 +347,29 @@ static struct cftype files[] = {
 	{ }	/* terminate */
 };
 
+/* used to print cpuacct stats in cpu.stat on the unified hierarchy */
+void cpuacct_cpu_stats_show(struct seq_file *sf)
+{
+	struct cgroup_subsys_state *css;
+	u64 usage, val[CPUACCT_STAT_NSTATS];
+
+	css = cgroup_get_e_css(seq_css(sf)->cgroup, &cpuacct_cgrp_subsys);
+
+	usage = cpuusage_read(css, seq_cft(sf));
+	cpuacct_stats_read(css_ca(css), &val);
+
+	do_div(usage, NSEC_PER_USEC);
+	do_div(val[CPUACCT_STAT_USER], NSEC_PER_USEC);
+	do_div(val[CPUACCT_STAT_SYSTEM], NSEC_PER_USEC);
+
+	seq_printf(sf, "usage_usec %llu\n"
+		   "user_usec %llu\n"
+		   "system_usec %llu\n",
+		   usage, val[CPUACCT_STAT_USER], val[CPUACCT_STAT_SYSTEM]);
+
+	css_put(css);
+}
+
 /*
  * charge this task's execution time to its accounting group.
  *
diff --git a/kernel/sched/cpuacct.h b/kernel/sched/cpuacct.h
index ba72807c73d4..ddf7af466d35 100644
--- a/kernel/sched/cpuacct.h
+++ b/kernel/sched/cpuacct.h
@@ -2,6 +2,7 @@
 
 extern void cpuacct_charge(struct task_struct *tsk, u64 cputime);
 extern void cpuacct_account_field(struct task_struct *tsk, int index, u64 val);
+extern void cpuacct_cpu_stats_show(struct seq_file *sf);
 
 #else
 
@@ -14,4 +15,8 @@ cpuacct_account_field(struct task_struct *tsk, int index, u64 val)
 {
 }
 
+static inline void cpuacct_cpu_stats_show(struct seq_file *sf)
+{
+}
+
 #endif
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 10/10] sched: Make cpu/cpuacct threaded controllers
@ 2017-06-10 14:03   ` Tejun Heo
  0 siblings, 0 replies; 47+ messages in thread
From: Tejun Heo @ 2017-06-10 14:03 UTC (permalink / raw)
  To: Li Zefan, hannes, peterz, mingo, longman
  Cc: cgroups, linux-kernel, kernel-team, pjt, luto, efault, torvalds

From: Waiman Long <longman@redhat.com>

Make cpu and cpuacct cgroup controllers usable within a threaded cgroup.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/sched/core.c    | 1 +
 kernel/sched/cpuacct.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c86ab2c4ed3b..85e5358da560 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7469,6 +7469,7 @@ struct cgroup_subsys cpu_cgrp_subsys = {
 	.legacy_cftypes	= cpu_legacy_files,
 	.dfl_cftypes	= cpu_files,
 	.early_init	= true,
+	.threaded	= true,
 #ifdef CONFIG_CGROUP_CPUACCT
 	/*
 	 * cpuacct is enabled together with cpu on the unified hierarchy
diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
index 07ed36cc2600..ee4976a5dde0 100644
--- a/kernel/sched/cpuacct.c
+++ b/kernel/sched/cpuacct.c
@@ -412,4 +412,5 @@ struct cgroup_subsys cpuacct_cgrp_subsys = {
 	.css_free	= cpuacct_css_free,
 	.legacy_cftypes	= files,
 	.early_init	= true,
+	.threaded	= true,
 };
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 10/10] sched: Make cpu/cpuacct threaded controllers
@ 2017-06-10 14:03   ` Tejun Heo
  0 siblings, 0 replies; 47+ messages in thread
From: Tejun Heo @ 2017-06-10 14:03 UTC (permalink / raw)
  To: Li Zefan, hannes-druUgvl0LCNAfugRpC6u6w,
	peterz-wEGCiKHe2LqWVfeAwA7xHQ, mingo-H+wXaHxf7aLQT0dZR+AlfA,
	longman-H+wXaHxf7aLQT0dZR+AlfA
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg,
	pjt-hpIqsD4AKlfQT0dZR+AlfA, luto-kltTT9wpgjJwATOyAt5JVQ,
	efault-Mmb7MZpHnFY, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

From: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Make cpu and cpuacct cgroup controllers usable within a threaded cgroup.

Signed-off-by: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 kernel/sched/core.c    | 1 +
 kernel/sched/cpuacct.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c86ab2c4ed3b..85e5358da560 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7469,6 +7469,7 @@ struct cgroup_subsys cpu_cgrp_subsys = {
 	.legacy_cftypes	= cpu_legacy_files,
 	.dfl_cftypes	= cpu_files,
 	.early_init	= true,
+	.threaded	= true,
 #ifdef CONFIG_CGROUP_CPUACCT
 	/*
 	 * cpuacct is enabled together with cpu on the unified hierarchy
diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
index 07ed36cc2600..ee4976a5dde0 100644
--- a/kernel/sched/cpuacct.c
+++ b/kernel/sched/cpuacct.c
@@ -412,4 +412,5 @@ struct cgroup_subsys cpuacct_cgrp_subsys = {
 	.css_free	= cpuacct_css_free,
 	.legacy_cftypes	= files,
 	.early_init	= true,
+	.threaded	= true,
 };
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCHSET for-4.13] cgroup: implement cgroup2 thread mode, v2
@ 2017-06-12 12:31   ` Peter Zijlstra
  0 siblings, 0 replies; 47+ messages in thread
From: Peter Zijlstra @ 2017-06-12 12:31 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, hannes, mingo, longman, cgroups, linux-kernel,
	kernel-team, pjt, luto, efault, torvalds


Please don't rush this; also, I might not be around much the coming
weeks due to taking some leave 'soon' (kid #3 is imminent).

And I really need more time to look at this (and re-read the old
discussions, because I've forgot most everything again).

On Sat, Jun 10, 2017 at 10:03:41AM -0400, Tejun Heo wrote:

> * Thread mode is explicitly enabled on a cgroup by writing "enable"
>   into "cgroup.threads" file.  The cgroup shouldn't have any child
>   cgroups or enabled controllers.
> 
> * Once enabled, arbitrary sub-hierarchy can be created and threads can
>   be put anywhere in the subtree by writing TIDs into "cgroup.threads"
>   file.  Process granularity and no-internal-process constraint don't
>   apply in a threaded subtree.
> 
> * To be used in a threaded subtree, controllers should explicitly
>   declare thread mode support and should be able to handle internal
>   competition in some way.
> 
> * The root of a threaded subtree serves as the resource domain for the
>   whole subtree.  This is where all the controllers are guaranteed to
>   have a common ground and resource consumptions in the threaded
>   subtree which aren't tied to a specific thread are charged.
>   Non-threaded controllers never see beyond thread root and can assume
>   that all controllers will follow the same rules upto that point.
> 
> * Root cgroup can enable thread mode anytime and a first level child
>   can opt-in to that thread subtree anchored at root by writing "join"
>   to "cgroup.threads" files, start its own thread subtree or just be a
>   normal cgroup.

Yuck... this again is a consequence of tagging the 'wrong' thing. Again,
the primary construct is the resource domain.

If you use that as a tag, you don't need this weird join crap. Because
as soon as you clear the 'resource domain' flag on a group, it instantly
becomes a thread group and 'obviously' connects to the first parent that
is a resource domain.

And, as per the last time, this threaded marker isn't uniquely
identifying things, so it hard prohibits from ever extending the model
to allow resource domains nested in a thread subtree. Now I understand
why you don't implement that now -- you were struggling with the views
API, but that is no excuse to create an API that permanently disables
that feature.

I cannot at this time remember if there was a strong use-case for that
scenario -- like said, I really need to re-read the email threads, but I
might not have enough time to do so now.

Again, please don't rush this.

> This allows threaded controllers to implement thread granular resource
> control without getting in the way of system level resource
> partitioning.
> 
> This patchset contains the following ten patches.  For more details on
> the interface and behavior, please refer to 0007.
> 
>  0001-cgroup-separate-out-cgroup_has_tasks.patch
>  0002-cgroup-reorganize-cgroup.procs-task-write-path.patch
>  0003-cgroup-Fix-reference-counting-bug-in-cgroup_procs_wr.patch
>  0004-cgroup-add-flags-to-css_task_iter_start-and-implemen.patch
>  0005-cgroup-introduce-cgroup-proc_cgrp-and-threaded-css_s.patch
>  0006-cgroup-implement-CSS_TASK_ITER_THREADED.patch
>  0007-cgroup-implement-cgroup-v2-thread-support.patch
>  0008-sched-Misc-preps-for-cgroup-unified-hierarchy-interf.patch
>  0009-sched-Implement-interface-for-cgroup-unified-hierarc.patch
>  0010-sched-Make-cpu-cpuacct-threaded-controllers.patch
> 
> 0001-0007 implement cgroup2 thread mode.  0008-0010 enable CPU
> controller on cgroup2 and mark them as supporting thread mode.
> 0008-0010 are included for reference.

So I really regret the 'shares' interface; we really should have done a
nice thing.

  https://lkml.kernel.org/r/20170410073622.2y6tnpcd2ssuoztz@hirez.programming.kicks-ass.net

So I would like to change to that instead of the weird 100 thing.

As for the RT thing, the runtime/period thing is not a MAX but a MIN
limit (conceptually -- in practise its both).

Also, we need cpuset to be a thread controller.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHSET for-4.13] cgroup: implement cgroup2 thread mode, v2
@ 2017-06-12 12:31   ` Peter Zijlstra
  0 siblings, 0 replies; 47+ messages in thread
From: Peter Zijlstra @ 2017-06-12 12:31 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, hannes-druUgvl0LCNAfugRpC6u6w,
	mingo-H+wXaHxf7aLQT0dZR+AlfA, longman-H+wXaHxf7aLQT0dZR+AlfA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg,
	pjt-hpIqsD4AKlfQT0dZR+AlfA, luto-kltTT9wpgjJwATOyAt5JVQ,
	efault-Mmb7MZpHnFY, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b


Please don't rush this; also, I might not be around much the coming
weeks due to taking some leave 'soon' (kid #3 is imminent).

And I really need more time to look at this (and re-read the old
discussions, because I've forgot most everything again).

On Sat, Jun 10, 2017 at 10:03:41AM -0400, Tejun Heo wrote:

> * Thread mode is explicitly enabled on a cgroup by writing "enable"
>   into "cgroup.threads" file.  The cgroup shouldn't have any child
>   cgroups or enabled controllers.
> 
> * Once enabled, arbitrary sub-hierarchy can be created and threads can
>   be put anywhere in the subtree by writing TIDs into "cgroup.threads"
>   file.  Process granularity and no-internal-process constraint don't
>   apply in a threaded subtree.
> 
> * To be used in a threaded subtree, controllers should explicitly
>   declare thread mode support and should be able to handle internal
>   competition in some way.
> 
> * The root of a threaded subtree serves as the resource domain for the
>   whole subtree.  This is where all the controllers are guaranteed to
>   have a common ground and resource consumptions in the threaded
>   subtree which aren't tied to a specific thread are charged.
>   Non-threaded controllers never see beyond thread root and can assume
>   that all controllers will follow the same rules upto that point.
> 
> * Root cgroup can enable thread mode anytime and a first level child
>   can opt-in to that thread subtree anchored at root by writing "join"
>   to "cgroup.threads" files, start its own thread subtree or just be a
>   normal cgroup.

Yuck... this again is a consequence of tagging the 'wrong' thing. Again,
the primary construct is the resource domain.

If you use that as a tag, you don't need this weird join crap. Because
as soon as you clear the 'resource domain' flag on a group, it instantly
becomes a thread group and 'obviously' connects to the first parent that
is a resource domain.

And, as per the last time, this threaded marker isn't uniquely
identifying things, so it hard prohibits from ever extending the model
to allow resource domains nested in a thread subtree. Now I understand
why you don't implement that now -- you were struggling with the views
API, but that is no excuse to create an API that permanently disables
that feature.

I cannot at this time remember if there was a strong use-case for that
scenario -- like said, I really need to re-read the email threads, but I
might not have enough time to do so now.

Again, please don't rush this.

> This allows threaded controllers to implement thread granular resource
> control without getting in the way of system level resource
> partitioning.
> 
> This patchset contains the following ten patches.  For more details on
> the interface and behavior, please refer to 0007.
> 
>  0001-cgroup-separate-out-cgroup_has_tasks.patch
>  0002-cgroup-reorganize-cgroup.procs-task-write-path.patch
>  0003-cgroup-Fix-reference-counting-bug-in-cgroup_procs_wr.patch
>  0004-cgroup-add-flags-to-css_task_iter_start-and-implemen.patch
>  0005-cgroup-introduce-cgroup-proc_cgrp-and-threaded-css_s.patch
>  0006-cgroup-implement-CSS_TASK_ITER_THREADED.patch
>  0007-cgroup-implement-cgroup-v2-thread-support.patch
>  0008-sched-Misc-preps-for-cgroup-unified-hierarchy-interf.patch
>  0009-sched-Implement-interface-for-cgroup-unified-hierarc.patch
>  0010-sched-Make-cpu-cpuacct-threaded-controllers.patch
> 
> 0001-0007 implement cgroup2 thread mode.  0008-0010 enable CPU
> controller on cgroup2 and mark them as supporting thread mode.
> 0008-0010 are included for reference.

So I really regret the 'shares' interface; we really should have done a
nice thing.

  https://lkml.kernel.org/r/20170410073622.2y6tnpcd2ssuoztz-Nxj+rRp3nVydTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org

So I would like to change to that instead of the weird 100 thing.

As for the RT thing, the runtime/period thing is not a MAX but a MIN
limit (conceptually -- in practise its both).

Also, we need cpuset to be a thread controller.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 07/10] cgroup: implement cgroup v2 thread support
@ 2017-06-12 15:41     ` Waiman Long
  0 siblings, 0 replies; 47+ messages in thread
From: Waiman Long @ 2017-06-12 15:41 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, hannes, peterz, mingo
  Cc: cgroups, linux-kernel, kernel-team, pjt, luto, efault, torvalds

On 06/10/2017 10:03 AM, Tejun Heo wrote:
> This patch implements cgroup v2 thread support.  The goal of the
> thread mode is supporting hierarchical accounting and control at
> thread granularity while staying inside the resource domain model
> which allows coordination across different resource controllers and
> handling of anonymous resource consumptions.
>
> Once thread mode is enabled on a cgroup, the threads of the processes
> which are in its subtree can be placed inside the subtree without
> being restricted by process granularity or no-internal-process
> constraint.  Note that the threads aren't allowed to escape to a
> different threaded subtree.  To be used inside a threaded subtree, a
> controller should explicitly support threaded mode and be able to
> handle internal competition in the way which is appropriate for the
> resource.
>
> The root of a threaded subtree, where thread mode is enabled in the
> first place, is called the thread root and serves as the resource
> domain for the whole subtree.  This is the last cgroup where
> non-threaded controllers are operational and where all the
> domain-level resource consumptions in the subtree are accounted.  This
> allows threaded controllers to operate at thread granularity when
> requested while staying inside the scope of system-level resource
> distribution.
>
> As the root cgroup is exempt from the no-internal-process constraint,
> it can serve as both a thread root and a parent to normal cgroups.
> The root cgroup supports mixed cgroup mode which can be enabled and
> disabled anytime as long as there aren't any threaded children.  First
> level child cgroups can selectively join the mixed threaded subtree.
>
> Internally, in a threaded subtree, each css_set has its ->proc_cset
> pointing to a matching css_set which belongs to the thread root.  This
> ensures that thread root level cgroup_subsys_state for all threaded
> controllers are readily accessible for domain-level operations.

As far as I understand, the proc_cset thing is just for cgroup.procs
iteration purpose. They are not used for accessing cgroup_subsys_state
for domain-level operation.  In fact, all the relevant CSSes will be
available in the local css_set and there is no need to look elsewhere.

> This patch enables threaded mode for the pids and perf_events
> controllers.  Neither has to worry about domain-level resource
> consumptions and it's enough to simply set the flag.
>
> For more details on the interface and behavior of the thread mode,
> please refer to the section 2-2-2 in Documentation/cgroup-v2.txt added
> by this patch.  Note that the documentation update is not complete as
> the rest of the documentation needs to be updated accordingly.
> Rolling those updates into this patch can be confusing so that will be
> separate patches.
>
> v2: - After discussions with Waiman, support for mixed thread mode is
>       added.  This should address the issue that Peter pointed out
>       where any nesting should be avoided for thread subtrees while
>       coexisting with other domain cgroups.
>
>     - Enabling / disabling thread mode now piggy backs on the existing
>       control mask update mechanism.
>
>     - Bug fixes and cleanup.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Waiman Long <longman@redhat.com>
> ---
>  Documentation/cgroup-v2.txt |  99 +++++++++++++-
>  include/linux/cgroup-defs.h |  12 ++
>  kernel/cgroup/cgroup.c      | 323 ++++++++++++++++++++++++++++++++++++++++++--
>  kernel/cgroup/pids.c        |   1 +
>  kernel/events/core.c        |   1 +
>  5 files changed, 420 insertions(+), 16 deletions(-)
>

> +/* is @cgrp root of a threaded subtree? */
> +static bool cgroup_is_thread_root(struct cgroup *cgrp)
> +{
> +	return cgrp->proc_cgrp == cgrp;
> +}
> +
> +/* if threaded, would @cgrp become root of a mixed threaded subtree? */
> +static bool cgroup_is_mixable(struct cgroup *cgrp)
> +{
> +	/*
> +	 * Root isn't under domain level resource control exempting it from
> +	 * the no-internal-process constraint, so it can serve as a thread
> +	 * root and a parent of resource domains at the same time.
> +	 */
> +	return !cgroup_parent(cgrp);
> +}

Eventually, I would like to see a container root to be regarded as
mixable so that it will look and feel like a real root to the container.
Yes, that will mean having to deal with internal process competition
with resource domain controllers. If we are going to keep the no
internal process constraint, this is the other exception that I would
like to have. We can work around that by having, for example, special
directory for resource domain controllers to manage their resources like
what I have proposed in the resource domain patch. Or we can just let
those resource domain controllers to deal with it.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 07/10] cgroup: implement cgroup v2 thread support
@ 2017-06-12 15:41     ` Waiman Long
  0 siblings, 0 replies; 47+ messages in thread
From: Waiman Long @ 2017-06-12 15:41 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, hannes-druUgvl0LCNAfugRpC6u6w,
	peterz-wEGCiKHe2LqWVfeAwA7xHQ, mingo-H+wXaHxf7aLQT0dZR+AlfA
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg,
	pjt-hpIqsD4AKlfQT0dZR+AlfA, luto-kltTT9wpgjJwATOyAt5JVQ,
	efault-Mmb7MZpHnFY, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

On 06/10/2017 10:03 AM, Tejun Heo wrote:
> This patch implements cgroup v2 thread support.  The goal of the
> thread mode is supporting hierarchical accounting and control at
> thread granularity while staying inside the resource domain model
> which allows coordination across different resource controllers and
> handling of anonymous resource consumptions.
>
> Once thread mode is enabled on a cgroup, the threads of the processes
> which are in its subtree can be placed inside the subtree without
> being restricted by process granularity or no-internal-process
> constraint.  Note that the threads aren't allowed to escape to a
> different threaded subtree.  To be used inside a threaded subtree, a
> controller should explicitly support threaded mode and be able to
> handle internal competition in the way which is appropriate for the
> resource.
>
> The root of a threaded subtree, where thread mode is enabled in the
> first place, is called the thread root and serves as the resource
> domain for the whole subtree.  This is the last cgroup where
> non-threaded controllers are operational and where all the
> domain-level resource consumptions in the subtree are accounted.  This
> allows threaded controllers to operate at thread granularity when
> requested while staying inside the scope of system-level resource
> distribution.
>
> As the root cgroup is exempt from the no-internal-process constraint,
> it can serve as both a thread root and a parent to normal cgroups.
> The root cgroup supports mixed cgroup mode which can be enabled and
> disabled anytime as long as there aren't any threaded children.  First
> level child cgroups can selectively join the mixed threaded subtree.
>
> Internally, in a threaded subtree, each css_set has its ->proc_cset
> pointing to a matching css_set which belongs to the thread root.  This
> ensures that thread root level cgroup_subsys_state for all threaded
> controllers are readily accessible for domain-level operations.

As far as I understand, the proc_cset thing is just for cgroup.procs
iteration purpose. They are not used for accessing cgroup_subsys_state
for domain-level operation.  In fact, all the relevant CSSes will be
available in the local css_set and there is no need to look elsewhere.

> This patch enables threaded mode for the pids and perf_events
> controllers.  Neither has to worry about domain-level resource
> consumptions and it's enough to simply set the flag.
>
> For more details on the interface and behavior of the thread mode,
> please refer to the section 2-2-2 in Documentation/cgroup-v2.txt added
> by this patch.  Note that the documentation update is not complete as
> the rest of the documentation needs to be updated accordingly.
> Rolling those updates into this patch can be confusing so that will be
> separate patches.
>
> v2: - After discussions with Waiman, support for mixed thread mode is
>       added.  This should address the issue that Peter pointed out
>       where any nesting should be avoided for thread subtrees while
>       coexisting with other domain cgroups.
>
>     - Enabling / disabling thread mode now piggy backs on the existing
>       control mask update mechanism.
>
>     - Bug fixes and cleanup.
>
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Cc: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
>  Documentation/cgroup-v2.txt |  99 +++++++++++++-
>  include/linux/cgroup-defs.h |  12 ++
>  kernel/cgroup/cgroup.c      | 323 ++++++++++++++++++++++++++++++++++++++++++--
>  kernel/cgroup/pids.c        |   1 +
>  kernel/events/core.c        |   1 +
>  5 files changed, 420 insertions(+), 16 deletions(-)
>

> +/* is @cgrp root of a threaded subtree? */
> +static bool cgroup_is_thread_root(struct cgroup *cgrp)
> +{
> +	return cgrp->proc_cgrp == cgrp;
> +}
> +
> +/* if threaded, would @cgrp become root of a mixed threaded subtree? */
> +static bool cgroup_is_mixable(struct cgroup *cgrp)
> +{
> +	/*
> +	 * Root isn't under domain level resource control exempting it from
> +	 * the no-internal-process constraint, so it can serve as a thread
> +	 * root and a parent of resource domains at the same time.
> +	 */
> +	return !cgroup_parent(cgrp);
> +}

Eventually, I would like to see a container root to be regarded as
mixable so that it will look and feel like a real root to the container.
Yes, that will mean having to deal with internal process competition
with resource domain controllers. If we are going to keep the no
internal process constraint, this is the other exception that I would
like to have. We can work around that by having, for example, special
directory for resource domain controllers to manage their resources like
what I have proposed in the resource domain patch. Or we can just let
those resource domain controllers to deal with it.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHSET for-4.13] cgroup: implement cgroup2 thread mode, v2
@ 2017-06-12 21:27     ` Tejun Heo
  0 siblings, 0 replies; 47+ messages in thread
From: Tejun Heo @ 2017-06-12 21:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Li Zefan, hannes, mingo, longman, cgroups, linux-kernel,
	kernel-team, pjt, luto, efault, torvalds

Hello, Peter.

On Mon, Jun 12, 2017 at 02:31:50PM +0200, Peter Zijlstra wrote:
> Please don't rush this; also, I might not be around much the coming
> weeks due to taking some leave 'soon' (kid #3 is imminent).

Congrats.  As for this going forward, how can we possibly be slower?

> And I really need more time to look at this (and re-read the old
> discussions, because I've forgot most everything again).

Can we at least unblock the cpu controller part?  We can hash the
details of thread support as long as necessary but I'm not sure it's
reasonable to keep blocking the whole cpu controller at this point.

> > * Root cgroup can enable thread mode anytime and a first level child
> >   can opt-in to that thread subtree anchored at root by writing "join"
> >   to "cgroup.threads" files, start its own thread subtree or just be a
> >   normal cgroup.
> 
> Yuck... this again is a consequence of tagging the 'wrong' thing. Again,
> the primary construct is the resource domain.
> 
> If you use that as a tag, you don't need this weird join crap. Because
> as soon as you clear the 'resource domain' flag on a group, it instantly
> becomes a thread group and 'obviously' connects to the first parent that
> is a resource domain.

It has nothing to do with whether we mark domain or threaded subtrees.
It is solely from whether you wanna express cases where a thread root
is right below another thread root.  Tn's are member cgroup of thread
subtrees where the same number means the same threaded subtree, D's
are of domain cgroups.

The following is straight forward.

	T0
       /  \
      T0   D

The following is too.

	T0
       /  \
      T0   D
            \
	     T1

The question is whether to allow something like the following.

	T0
       /  \
      T0  T1

That's where the "join" thing comes from because we wanna be able to
tell apart whether a cgroup is gonna be a part of the existing thread
subtree or starting its own thread subtree.  There sure are multiple
ways to express that but one way or the other, if you wanna support
topologies like the last one, you have to distinguish the two.

The previous iteration actually was that way, so the only thread mode
operation was setting whether to enable thread or not as before and if
the parent is already thread mode, it'd always join the existing
threaded subtree.  If you like that better, I can post that version
right away.

> And, as per the last time, this threaded marker isn't uniquely
> identifying things, so it hard prohibits from ever extending the model
> to allow resource domains nested in a thread subtree. Now I understand
> why you don't implement that now -- you were struggling with the views
> API, but that is no excuse to create an API that permanently disables
> that feature.

Hmmm?  We can just allow disabling thread mode if we ever get to that.
We can't make arbitrary graphs out of these nodes.  Whatever mode we
put them in, they have to fall in with the overall tree structure, so
I don't think the interface is unnecessarily restricting in that
direction.

> I cannot at this time remember if there was a strong use-case for that
> scenario -- like said, I really need to re-read the email threads, but I
> might not have enough time to do so now.
> 
> Again, please don't rush this.

Well, I don't have a way to do that.

> So I really regret the 'shares' interface; we really should have done a
> nice thing.
> 
>   https://lkml.kernel.org/r/20170410073622.2y6tnpcd2ssuoztz@hirez.programming.kicks-ass.net
> 
> So I would like to change to that instead of the weird 100 thing.

Is it?  Relative weights are pretty fundamental and clearly defined in
expressing work-conserving resource distribution.  Do you have more
details on what you have on mind?

> As for the RT thing, the runtime/period thing is not a MAX but a MIN
> limit (conceptually -- in practise its both).

Yeah, it's a hard allocation.

> Also, we need cpuset to be a thread controller.

Yeah, absolutely.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHSET for-4.13] cgroup: implement cgroup2 thread mode, v2
@ 2017-06-12 21:27     ` Tejun Heo
  0 siblings, 0 replies; 47+ messages in thread
From: Tejun Heo @ 2017-06-12 21:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Li Zefan, hannes-druUgvl0LCNAfugRpC6u6w,
	mingo-H+wXaHxf7aLQT0dZR+AlfA, longman-H+wXaHxf7aLQT0dZR+AlfA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg,
	pjt-hpIqsD4AKlfQT0dZR+AlfA, luto-kltTT9wpgjJwATOyAt5JVQ,
	efault-Mmb7MZpHnFY, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

Hello, Peter.

On Mon, Jun 12, 2017 at 02:31:50PM +0200, Peter Zijlstra wrote:
> Please don't rush this; also, I might not be around much the coming
> weeks due to taking some leave 'soon' (kid #3 is imminent).

Congrats.  As for this going forward, how can we possibly be slower?

> And I really need more time to look at this (and re-read the old
> discussions, because I've forgot most everything again).

Can we at least unblock the cpu controller part?  We can hash the
details of thread support as long as necessary but I'm not sure it's
reasonable to keep blocking the whole cpu controller at this point.

> > * Root cgroup can enable thread mode anytime and a first level child
> >   can opt-in to that thread subtree anchored at root by writing "join"
> >   to "cgroup.threads" files, start its own thread subtree or just be a
> >   normal cgroup.
> 
> Yuck... this again is a consequence of tagging the 'wrong' thing. Again,
> the primary construct is the resource domain.
> 
> If you use that as a tag, you don't need this weird join crap. Because
> as soon as you clear the 'resource domain' flag on a group, it instantly
> becomes a thread group and 'obviously' connects to the first parent that
> is a resource domain.

It has nothing to do with whether we mark domain or threaded subtrees.
It is solely from whether you wanna express cases where a thread root
is right below another thread root.  Tn's are member cgroup of thread
subtrees where the same number means the same threaded subtree, D's
are of domain cgroups.

The following is straight forward.

	T0
       /  \
      T0   D

The following is too.

	T0
       /  \
      T0   D
            \
	     T1

The question is whether to allow something like the following.

	T0
       /  \
      T0  T1

That's where the "join" thing comes from because we wanna be able to
tell apart whether a cgroup is gonna be a part of the existing thread
subtree or starting its own thread subtree.  There sure are multiple
ways to express that but one way or the other, if you wanna support
topologies like the last one, you have to distinguish the two.

The previous iteration actually was that way, so the only thread mode
operation was setting whether to enable thread or not as before and if
the parent is already thread mode, it'd always join the existing
threaded subtree.  If you like that better, I can post that version
right away.

> And, as per the last time, this threaded marker isn't uniquely
> identifying things, so it hard prohibits from ever extending the model
> to allow resource domains nested in a thread subtree. Now I understand
> why you don't implement that now -- you were struggling with the views
> API, but that is no excuse to create an API that permanently disables
> that feature.

Hmmm?  We can just allow disabling thread mode if we ever get to that.
We can't make arbitrary graphs out of these nodes.  Whatever mode we
put them in, they have to fall in with the overall tree structure, so
I don't think the interface is unnecessarily restricting in that
direction.

> I cannot at this time remember if there was a strong use-case for that
> scenario -- like said, I really need to re-read the email threads, but I
> might not have enough time to do so now.
> 
> Again, please don't rush this.

Well, I don't have a way to do that.

> So I really regret the 'shares' interface; we really should have done a
> nice thing.
> 
>   https://lkml.kernel.org/r/20170410073622.2y6tnpcd2ssuoztz-Nxj+rRp3nVydTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org
> 
> So I would like to change to that instead of the weird 100 thing.

Is it?  Relative weights are pretty fundamental and clearly defined in
expressing work-conserving resource distribution.  Do you have more
details on what you have on mind?

> As for the RT thing, the runtime/period thing is not a MAX but a MIN
> limit (conceptually -- in practise its both).

Yeah, it's a hard allocation.

> Also, we need cpuset to be a thread controller.

Yeah, absolutely.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 07/10] cgroup: implement cgroup v2 thread support
  2017-06-12 15:41     ` Waiman Long
  (?)
@ 2017-06-13 14:06     ` Tejun Heo
  -1 siblings, 0 replies; 47+ messages in thread
From: Tejun Heo @ 2017-06-13 14:06 UTC (permalink / raw)
  To: Waiman Long
  Cc: Li Zefan, hannes, peterz, mingo, cgroups, linux-kernel,
	kernel-team, pjt, luto, efault, torvalds

Hello, Waiman.

On Mon, Jun 12, 2017 at 11:41:09AM -0400, Waiman Long wrote:
> > Internally, in a threaded subtree, each css_set has its ->proc_cset
> > pointing to a matching css_set which belongs to the thread root.  This
> > ensures that thread root level cgroup_subsys_state for all threaded
> > controllers are readily accessible for domain-level operations.
>
> As far as I understand, the proc_cset thing is just for cgroup.procs
> iteration purpose. They are not used for accessing cgroup_subsys_state
> for domain-level operation.  In fact, all the relevant CSSes will be
> available in the local css_set and there is no need to look elsewhere.

Because none is implementing domain resource accounting.  Once we
start doing that, we'll have to charge them against domain csses not
the thread ones.

> > +/* if threaded, would @cgrp become root of a mixed threaded subtree? */
> > +static bool cgroup_is_mixable(struct cgroup *cgrp)
> > +{
> > +	/*
> > +	 * Root isn't under domain level resource control exempting it from
> > +	 * the no-internal-process constraint, so it can serve as a thread
> > +	 * root and a parent of resource domains at the same time.
> > +	 */
> > +	return !cgroup_parent(cgrp);
> > +}
> 
> Eventually, I would like to see a container root to be regarded as
> mixable so that it will look and feel like a real root to the container.
> Yes, that will mean having to deal with internal process competition
> with resource domain controllers. If we are going to keep the no
> internal process constraint, this is the other exception that I would
> like to have. We can work around that by having, for example, special
> directory for resource domain controllers to manage their resources like
> what I have proposed in the resource domain patch. Or we can just let
> those resource domain controllers to deal with it.

Yeah, the code and semantics are structured for future expansion.
That said, such expansion has to mean something inherently useful.  Up
until now, all that have been suggested are either cosmetic or not
clearly defined and IMHO do more to obscure what's going on rather
than enable anything fundamental.  What we do with the interface
should follow the function.  If we can come up with a way to lift the
restriction on an internal node which enables some fundamental
capabilities in a clearly defined manner, let's of course do that, but
we shouldn't do that just because we feel like to.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v3 07/10] cgroup: implement cgroup v2 thread support
@ 2017-06-15 20:14     ` Tejun Heo
  0 siblings, 0 replies; 47+ messages in thread
From: Tejun Heo @ 2017-06-15 20:14 UTC (permalink / raw)
  To: Li Zefan, hannes, peterz, mingo, longman
  Cc: cgroups, linux-kernel, kernel-team, pjt, luto, efault, torvalds

This patch implements cgroup v2 thread support.  The goal of the
thread mode is supporting hierarchical accounting and control at
thread granularity while staying inside the resource domain model
which allows coordination across different resource controllers and
handling of anonymous resource consumptions.

Once thread mode is enabled on a cgroup, the threads of the processes
which are in its subtree can be placed inside the subtree without
being restricted by process granularity or no-internal-process
constraint.  Note that the threads aren't allowed to escape to a
different threaded subtree.  To be used inside a threaded subtree, a
controller should explicitly support threaded mode and be able to
handle internal competition in the way which is appropriate for the
resource.

The root of a threaded subtree, where thread mode is enabled in the
first place, is called the thread root and serves as the resource
domain for the whole subtree.  This is the last cgroup where
non-threaded controllers are operational and where all the
domain-level resource consumptions in the subtree are accounted.  This
allows threaded controllers to operate at thread granularity when
requested while staying inside the scope of system-level resource
distribution.

As the root cgroup is exempt from the no-internal-process constraint,
it can serve as both a thread root and a parent to normal cgroups.
The root cgroup supports mixed cgroup mode which can be enabled and
disabled anytime as long as there aren't any threaded children.  First
level child cgroups joins the mixed threaded subtree when thread mode
is enabled on them.

Internally, in a threaded subtree, each css_set has its ->proc_cset
pointing to a matching css_set which belongs to the thread root.  This
ensures that thread root level cgroup_subsys_state for all threaded
controllers are readily accessible for domain-level operations.

This patch enables threaded mode for the pids and perf_events
controllers.  Neither has to worry about domain-level resource
consumptions and it's enough to simply set the flag.

For more details on the interface and behavior of the thread mode,
please refer to the section 2-2-2 in Documentation/cgroup-v2.txt added
by this patch.  Note that the documentation update is not complete as
the rest of the documentation needs to be updated accordingly.
Rolling those updates into this patch can be confusing so that will be
separate patches.

v3: - Dropped "join" and always make mixed children join the parent's
      threaded subtree.

v2: - After discussions with Waiman, support for mixed thread mode is
      added.  This should address the issue that Peter pointed out
      where any nesting should be avoided for thread subtrees while
      coexisting with other domain cgroups.

    - Enabling / disabling thread mode now piggy backs on the existing
      control mask update mechanism.

    - Bug fixes and cleanup.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
Hello,

This is the alternative version where we don't support topologies like
the following.

         root (thread mode)
         /              \
        /                \
       /                  \
      A                    B
 (member of root's     (root of its own
  thread subtree)       thread subtree)

Supporting or not supporting the above topology only has user
interface implications.  Supporting it means more complication or
ugliness in the interface.  Not supporting it obviously results in a
simpler interface but at the cost of some of the flexibility of thread
mode.

Choosing one over the other is unlikely to result in substantial
actual differences.

Thanks.

 Documentation/cgroup-v2.txt |  111 +++++++++++++++
 include/linux/cgroup-defs.h |   12 +
 kernel/cgroup/cgroup.c      |  310 ++++++++++++++++++++++++++++++++++++++++++--
 kernel/cgroup/pids.c        |    1 
 kernel/events/core.c        |    1 
 5 files changed, 419 insertions(+), 16 deletions(-)

--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -16,7 +16,9 @@ CONTENTS
   1-2. What is cgroup?
 2. Basic Operations
   2-1. Mounting
-  2-2. Organizing Processes
+  2-2. Organizing Processes and Threads
+    2-2-1. Processes
+    2-2-2. Threads
   2-3. [Un]populated Notification
   2-4. Controlling Controllers
     2-4-1. Enabling and Disabling
@@ -150,7 +152,9 @@ and experimenting easier, the kernel par
 disabling controllers in v1 and make them always available in v2.
 
 
-2-2. Organizing Processes
+2-2. Organizing Processes and Threads
+
+2-2-1. Processes
 
 Initially, only the root cgroup exists to which all processes belong.
 A child cgroup can be created by creating a sub-directory.
@@ -201,6 +205,109 @@ is removed subsequently, " (deleted)" is
   0::/test-cgroup/test-cgroup-nested (deleted)
 
 
+2-2-2. Threads
+
+cgroup v2 supports thread granularity for a subset of controllers to
+support use cases requiring hierarchical resource distribution across
+the threads of a group of processes.  By default, all threads of a
+process belong to the same cgroup, which also serves as the resource
+domain to host resource consumptions which are not specific to a
+process or thread.  The thread mode allows threads to be spread across
+a subtree while still maintaining the common resource domain for them.
+
+Enabling thread mode on a subtree makes it threaded.  The root of a
+threaded subtree is called thread root and serves as the resource
+domain for the entire subtree.  In a threaded subtree, threads of a
+process can be put in different cgroups and are not subject to the no
+internal process constraint - threaded controllers can be enabled on
+non-leaf cgroups whether they have threads in them or not.
+
+Because the root cgroup is not subject to no internal process
+constraint, it can serve both as a thread root and a parent to normal
+cgroups.  This is called mixed thread mode.
+
+Thread mode can be enabled by writing "enable" to "cgroup.threads"
+file.
+
+  # echo enable > cgroup.threads
+
+On a non-root cgroup, to enable the thread mode, the following
+conditions must be met.
+
+- The thread root doesn't have any child cgroups.
+
+- The thread root doesn't have any controllers enabled.
+
+On the root cgroup, only the mixed thread mode is supported.  The
+following condition should be met to enable.
+
+- The root cgroup doesn't have any threaded children.
+
+Unlike the normal thread mode, the whole subtree is not turned into
+thread subtree.  The first level children have to explicitly opt-in to
+join the thread subtree by enabling thread mode.  To join the thread
+subtree, a first level child must meet the following conditions.
+
+- The first level child doesn't have any child cgroups.
+
+- The first level child doesn't have any controllers enabled.
+
+- The first level child doesn't have any tasks.
+
+Inside a threaded subtree, "cgroup.threads" can be read and contains
+the list of the thread IDs of all threads in the cgroup.  Except that
+the operations are per-thread instead of per-process, "cgroup.threads"
+has the same format and behaves the same way as "cgroup.procs".
+
+The thread root serves as the resource domain for the whole subtree,
+and, while the threads can be scattered across the subtree, all the
+processes are considered to be in the thread root.  "cgroup.procs" in
+a thread root contains the PIDs of all processes in the subtree and is
+not readable in the subtree proper.  However, "cgroup.procs" can be
+written to from anywhere in the subtree to migrate all threads of the
+matching process to the cgroup.
+
+Only threaded controllers can be enabled in a threaded subtree.  When
+a threaded controller is enabled inside a threaded subtree, it only
+accounts for and controls resource consumptions associated with the
+threads in the cgroup and its descendants.  All consumptions which
+aren't tied to a specific thread belong to the thread root.
+
+Because a threaded subtree is exempt from no internal process
+constraint, a threaded controller must be able to handle competition
+between threads in a non-leaf cgroup and its child cgroups.  Each
+threaded controller defines how such competitions are handled.
+
+Thread mode can be disabled by writing "disable" to "cgroup.threads"
+file.
+
+  # echo disable > cgroup.threads
+
+On a normal thread subtree, to disable the thread mode, the following
+conditions must be met.
+
+- The cgroup is a thread root.  Thread mode can't be disabled
+  partially in the subtree.
+
+- The thread root doesn't have any child cgroups.
+
+- The thread root doesn't have any controllers enabled.
+
+On the root cgroup, to disable the mixed thread mode, the following
+condition should be met.
+
+- The root cgroup doesn't have any threaded children.
+
+On a first level child of the mixed thread subtree, the following
+conditions should be met.
+
+- The first level child doesn't have any child cgroups.
+
+- The first level child doesn't have any controllers enabled.
+
+- The first level child doesn't have any tasks.
+
+
 2-3. [Un]populated Notification
 
 Each non-root cgroup has a "cgroup.events" file which contains
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -505,6 +505,18 @@ struct cgroup_subsys {
 	bool implicit_on_dfl:1;
 
 	/*
+	 * If %true, the controller, supports threaded mode on the default
+	 * hierarchy.  In a threaded subtree, both process granularity and
+	 * no-internal-process constraint are ignored and a threaded
+	 * controllers should be able to handle that.
+	 *
+	 * Note that as an implicit controller is automatically enabled on
+	 * all cgroups on the default hierarchy, it should also be
+	 * threaded.  implicit && !threaded is not supported.
+	 */
+	bool threaded:1;
+
+	/*
 	 * If %false, this subsystem is properly hierarchical -
 	 * configuration, resource accounting and restriction on a parent
 	 * cgroup cover those of its children.  If %true, hierarchy support
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -162,6 +162,9 @@ static u16 cgrp_dfl_inhibit_ss_mask;
 /* some controllers are implicitly enabled on the default hierarchy */
 static u16 cgrp_dfl_implicit_ss_mask;
 
+/* some controllers can be threaded on the default hierarchy */
+static u16 cgrp_dfl_threaded_ss_mask;
+
 /* The list of hierarchy roots */
 LIST_HEAD(cgroup_roots);
 static int cgroup_root_count;
@@ -331,14 +334,60 @@ static bool cgroup_is_threaded(struct cg
 	return cgrp->proc_cgrp;
 }
 
+/* is @cgrp root of a threaded subtree? */
+static bool cgroup_is_thread_root(struct cgroup *cgrp)
+{
+	return cgrp->proc_cgrp == cgrp;
+}
+
+/* if threaded, would @cgrp become root of a mixed threaded subtree? */
+static bool cgroup_is_mixable(struct cgroup *cgrp)
+{
+	/*
+	 * Root isn't under domain level resource control exempting it from
+	 * the no-internal-process constraint, so it can serve as a thread
+	 * root and a parent of resource domains at the same time.
+	 */
+	return !cgroup_parent(cgrp);
+}
+
+/* is @cgrp root of a mixed threaded subtree */
+static bool cgroup_is_mixed_root(struct cgroup *cgrp)
+{
+	return cgroup_is_thread_root(cgrp) && cgroup_is_mixable(cgrp);
+}
+
+/* is @cgrp's parent a mixed thread root? */
+static bool cgroup_has_mixed_parent(struct cgroup *cgrp)
+{
+	struct cgroup *parent = cgroup_parent(cgrp);
+
+	return parent && cgroup_is_mixed_root(parent);
+}
+
+/* is @cgrp the first level child of a mixed threaded subtree */
+static bool cgroup_is_mixed_child(struct cgroup *cgrp)
+{
+	struct cgroup *parent = cgroup_parent(cgrp);
+
+	return parent && cgrp->proc_cgrp == parent &&
+		cgroup_is_mixed_root(parent);
+}
+
 /* subsystems visibly enabled on a cgroup */
 static u16 cgroup_control(struct cgroup *cgrp)
 {
 	struct cgroup *parent = cgroup_parent(cgrp);
 	u16 root_ss_mask = cgrp->root->subsys_mask;
 
-	if (parent)
-		return parent->subtree_control;
+	if (parent) {
+		u16 ss_mask = parent->subtree_control;
+
+		/* mixed child can only have threaded subset of controllers */
+		if (cgroup_is_mixed_child(cgrp))
+			ss_mask &= cgrp_dfl_threaded_ss_mask;
+		return ss_mask;
+	}
 
 	if (cgroup_on_dfl(cgrp))
 		root_ss_mask &= ~(cgrp_dfl_inhibit_ss_mask |
@@ -351,8 +400,14 @@ static u16 cgroup_ss_mask(struct cgroup
 {
 	struct cgroup *parent = cgroup_parent(cgrp);
 
-	if (parent)
-		return parent->subtree_ss_mask;
+	if (parent) {
+		u16 ss_mask = parent->subtree_ss_mask;
+
+		/* mixed child can only have threaded subset of controllers */
+		if (cgroup_is_mixed_child(cgrp))
+			ss_mask &= cgrp_dfl_threaded_ss_mask;
+		return ss_mask;
+	}
 
 	return cgrp->root->subsys_mask;
 }
@@ -2241,14 +2296,14 @@ out_release_tset:
  * cgroup_may_migrate_to - verify whether a cgroup can be migration destination
  * @dst_cgrp: destination cgroup to test
  *
- * On the default hierarchy, except for the root, subtree_control must be
- * zero for migration destination cgroups with tasks so that child cgroups
- * don't compete against tasks.
+ * On the default hierarchy, except for the mixable and threaded cgroups,
+ * subtree_control must be zero for migration destination cgroups with
+ * tasks so that child cgroups don't compete against tasks.
  */
 bool cgroup_may_migrate_to(struct cgroup *dst_cgrp)
 {
-	return !cgroup_on_dfl(dst_cgrp) || !cgroup_parent(dst_cgrp) ||
-		!dst_cgrp->subtree_control;
+	return !cgroup_on_dfl(dst_cgrp) || cgroup_is_mixable(dst_cgrp) ||
+		cgroup_is_threaded(dst_cgrp) || !dst_cgrp->subtree_control;
 }
 
 /**
@@ -2957,11 +3012,20 @@ static ssize_t cgroup_subtree_control_wr
 		goto out_unlock;
 	}
 
+	/* can't enable !threaded controllers on a threaded cgroup */
+	if (cgroup_is_threaded(cgrp) && !cgroup_is_mixed_root(cgrp) &&
+	    (enable & ~cgrp_dfl_threaded_ss_mask)) {
+		ret = -EBUSY;
+		goto out_unlock;
+	}
+
 	/*
-	 * Except for the root, subtree_control must be zero for a cgroup
-	 * with tasks so that child cgroups don't compete against tasks.
+	 * Except for mixable and threaded cgroups, subtree_control must be
+	 * zero for a cgroup with tasks so that child cgroups don't compete
+	 * against tasks.
 	 */
-	if (enable && cgroup_parent(cgrp) && cgroup_has_tasks(cgrp)) {
+	if (enable && !cgroup_is_mixable(cgrp) && !cgroup_is_threaded(cgrp) &&
+	    cgroup_has_tasks(cgrp)) {
 		ret = -EBUSY;
 		goto out_unlock;
 	}
@@ -2983,6 +3047,120 @@ out_unlock:
 	return ret ?: nbytes;
 }
 
+static int cgroup_vet_threaded_switch(struct cgroup *cgrp, bool is_enable)
+{
+	if (cgroup_has_mixed_parent(cgrp)) {
+		/*
+		 * @cgrp is joining or leaving an existing mixed threaded
+		 * root.  Avoid needing recursive operations, implicit
+		 * subtree_control changes, or migrations.
+		 */
+		if (css_has_online_children(&cgrp->self) ||
+		    cgrp->subtree_control || cgroup_has_tasks(cgrp))
+			return -EBUSY;
+	} else if (cgroup_is_mixable(cgrp)) {
+		struct cgroup *child;
+
+		/*
+		 * @cgrp is starting or ending a mixed threaded subtree.
+		 * It's allowed to have domain children and enabled
+		 * controllers but we can't change ->proc_cgrp of existing
+		 * threaded children.  Make sure there aren't already
+		 * threaded children.
+		 */
+		cgroup_for_each_live_child(child, cgrp)
+			if (cgroup_is_threaded(child))
+				return -EBUSY;
+	} else {
+		/*
+		 * @cgrp is starting or ending a normal threaded subtree.
+		 * Avoid needing recursive operations, or implicit
+		 * subtree_control changes.
+		 */
+		if (css_has_online_children(&cgrp->self) ||
+		    cgrp->subtree_control)
+			return -EBUSY;
+
+		/* no partial disable */
+		if (!is_enable && !cgroup_is_thread_root(cgrp))
+			return -EBUSY;
+	}
+
+	return 0;
+}
+
+static int cgroup_enable_threaded(struct cgroup *cgrp)
+{
+	struct cgroup *proc_cgrp;
+	int ret;
+
+	lockdep_assert_held(&cgroup_mutex);
+
+	/* noop if already threaded */
+	if (cgroup_is_threaded(cgrp))
+		return 0;
+
+	ret = cgroup_vet_threaded_switch(cgrp, true);
+	if (ret)
+		return ret;
+
+	/* if the parent is mixed threaded root, join the subtree */
+	if (cgroup_has_mixed_parent(cgrp))
+		proc_cgrp = cgroup_parent(cgrp);
+	else
+		proc_cgrp = cgrp;
+
+	cgroup_save_control(cgrp);
+
+	/*
+	 * Mark it threaded.  This makes cgroup_control() and
+	 * cgroup_ss_mask() skip domain controllers.  In turn, the
+	 * following control operations migrate tasks to the matching
+	 * threaded csets.
+	 */
+	cgrp->proc_cgrp = proc_cgrp;
+
+	ret = cgroup_apply_control(cgrp);
+	if (ret)
+		cgrp->proc_cgrp = NULL;
+
+	cgroup_finalize_control(cgrp, ret);
+
+	return ret;
+}
+
+static int cgroup_disable_threaded(struct cgroup *cgrp)
+{
+	struct cgroup *proc_cgrp = cgrp->proc_cgrp;
+	int ret;
+
+	lockdep_assert_held(&cgroup_mutex);
+
+	/* noop if already !threaded */
+	if (!cgroup_is_threaded(cgrp))
+		return 0;
+
+	ret = cgroup_vet_threaded_switch(cgrp, false);
+	if (ret)
+		return ret;
+
+	cgroup_save_control(cgrp);
+
+	/*
+	 * Mark it !threaded.  This restores cgroup_control() and
+	 * cgroup_ss_mask() behavior.  See cgroup_enabled_threaded().
+	 */
+	cgrp->proc_cgrp = NULL;
+
+	ret = cgroup_apply_control(cgrp);
+	if (ret)
+		cgrp->proc_cgrp = proc_cgrp;
+
+	cgroup_finalize_control(cgrp, ret);
+
+	return ret;
+}
+
 static int cgroup_events_show(struct seq_file *seq, void *v)
 {
 	seq_printf(seq, "populated %d\n",
@@ -3866,12 +4044,12 @@ static void *cgroup_procs_next(struct se
 	return css_task_iter_next(it);
 }
 
-static void *cgroup_procs_start(struct seq_file *s, loff_t *pos)
+static void *__cgroup_procs_start(struct seq_file *s, loff_t *pos,
+				  unsigned int iter_flags)
 {
 	struct kernfs_open_file *of = s->private;
 	struct cgroup *cgrp = seq_css(s)->cgroup;
 	struct css_task_iter *it = of->priv;
-	unsigned iter_flags = CSS_TASK_ITER_PROCS | CSS_TASK_ITER_THREADED;
 
 	/*
 	 * When a seq_file is seeked, it's always traversed sequentially
@@ -3894,6 +4072,23 @@ static void *cgroup_procs_start(struct s
 	return cgroup_procs_next(s, NULL, NULL);
 }
 
+static void *cgroup_procs_start(struct seq_file *s, loff_t *pos)
+{
+	struct cgroup *cgrp = seq_css(s)->cgroup;
+
+	/*
+	 * All processes of a threaded subtree are in the top threaded
+	 * cgroup.  Only threads can be distributed across the subtree.
+	 * Reject reads on cgroup.procs in the subtree proper.  They're
+	 * always empty anyway.
+	 */
+	if (cgroup_is_threaded(cgrp) && !cgroup_is_thread_root(cgrp))
+		return ERR_PTR(-EINVAL);
+
+	return __cgroup_procs_start(s, pos, CSS_TASK_ITER_PROCS |
+					    CSS_TASK_ITER_THREADED);
+}
+
 static int cgroup_procs_show(struct seq_file *s, void *v)
 {
 	seq_printf(s, "%d\n", task_pid_vnr(v));
@@ -3948,6 +4143,76 @@ out_unlock:
 	return ret ?: nbytes;
 }
 
+static void *cgroup_threads_start(struct seq_file *s, loff_t *pos)
+{
+	struct cgroup *cgrp = seq_css(s)->cgroup;
+
+	if (!cgroup_is_threaded(cgrp))
+		return ERR_PTR(-EINVAL);
+
+	return __cgroup_procs_start(s, pos, 0);
+}
+
+static ssize_t cgroup_threads_write(struct kernfs_open_file *of,
+				    char *buf, size_t nbytes, loff_t off)
+{
+	struct super_block *sb = of->file->f_path.dentry->d_sb;
+	struct cgroup *cgrp, *common_ancestor;
+	struct task_struct *task;
+	ssize_t ret;
+
+	buf = strstrip(buf);
+
+	cgrp = cgroup_kn_lock_live(of->kn, false);
+	if (!cgrp)
+		return -ENODEV;
+
+	/* cgroup.procs determines delegation, require permission on it too */
+	ret = cgroup_procs_write_permission(cgrp, sb);
+	if (ret)
+		goto out_unlock;
+
+	/* enable or disable? */
+	if (!strcmp(buf, "enable")) {
+		ret = cgroup_enable_threaded(cgrp);
+		goto out_unlock;
+	} else if (!strcmp(buf, "disable")) {
+		ret = cgroup_disable_threaded(cgrp);
+		goto out_unlock;
+	}
+
+	/* thread migration */
+	ret = -EINVAL;
+	if (!cgroup_is_threaded(cgrp))
+		goto out_unlock;
+
+	task = cgroup_procs_write_start(buf, false);
+	ret = PTR_ERR_OR_ZERO(task);
+	if (ret)
+		goto out_unlock;
+
+	common_ancestor = cgroup_migrate_common_ancestor(task, cgrp);
+
+	/* can't migrate across disjoint threaded subtrees */
+	ret = -EACCES;
+	if (common_ancestor->proc_cgrp != cgrp->proc_cgrp)
+		goto out_finish;
+
+	/* and follow the cgroup.procs delegation rule */
+	ret = cgroup_procs_write_permission(common_ancestor, sb);
+	if (ret)
+		goto out_finish;
+
+	ret = cgroup_attach_task(cgrp, task, false);
+
+out_finish:
+	cgroup_procs_write_finish(task);
+out_unlock:
+	cgroup_kn_unlock(of->kn);
+
+	return ret ?: nbytes;
+}
+
 /* cgroup core interface files for the default hierarchy */
 static struct cftype cgroup_base_files[] = {
 	{
@@ -3960,6 +4225,14 @@ static struct cftype cgroup_base_files[]
 		.write = cgroup_procs_write,
 	},
 	{
+		.name = "cgroup.threads",
+		.release = cgroup_procs_release,
+		.seq_start = cgroup_threads_start,
+		.seq_next = cgroup_procs_next,
+		.seq_show = cgroup_procs_show,
+		.write = cgroup_threads_write,
+	},
+	{
 		.name = "cgroup.controllers",
 		.seq_show = cgroup_controllers_show,
 	},
@@ -4274,6 +4547,9 @@ static struct cgroup *cgroup_create(stru
 	cgrp->root = root;
 	cgrp->level = level;
 
+	if (!cgroup_has_mixed_parent(cgrp))
+		cgrp->proc_cgrp = parent->proc_cgrp;
+
 	for (tcgrp = cgrp; tcgrp; tcgrp = cgroup_parent(tcgrp))
 		cgrp->ancestor_ids[tcgrp->level] = tcgrp->id;
 
@@ -4720,11 +4996,17 @@ int __init cgroup_init(void)
 
 		cgrp_dfl_root.subsys_mask |= 1 << ss->id;
 
+		/* implicit controllers must be threaded too */
+		WARN_ON(ss->implicit_on_dfl && !ss->threaded);
+
 		if (ss->implicit_on_dfl)
 			cgrp_dfl_implicit_ss_mask |= 1 << ss->id;
 		else if (!ss->dfl_cftypes)
 			cgrp_dfl_inhibit_ss_mask |= 1 << ss->id;
 
+		if (ss->threaded)
+			cgrp_dfl_threaded_ss_mask |= 1 << ss->id;
+
 		if (ss->dfl_cftypes == ss->legacy_cftypes) {
 			WARN_ON(cgroup_add_cftypes(ss, ss->dfl_cftypes));
 		} else {
--- a/kernel/cgroup/pids.c
+++ b/kernel/cgroup/pids.c
@@ -345,4 +345,5 @@ struct cgroup_subsys pids_cgrp_subsys =
 	.free		= pids_free,
 	.legacy_cftypes	= pids_files,
 	.dfl_cftypes	= pids_files,
+	.threaded	= true,
 };
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11155,5 +11155,6 @@ struct cgroup_subsys perf_event_cgrp_sub
 	 * controller is not mounted on a legacy hierarchy.
 	 */
 	.implicit_on_dfl = true,
+	.threaded	= true,
 };
 #endif /* CONFIG_CGROUP_PERF */

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v3 07/10] cgroup: implement cgroup v2 thread support
@ 2017-06-15 20:14     ` Tejun Heo
  0 siblings, 0 replies; 47+ messages in thread
From: Tejun Heo @ 2017-06-15 20:14 UTC (permalink / raw)
  To: Li Zefan, hannes-druUgvl0LCNAfugRpC6u6w,
	peterz-wEGCiKHe2LqWVfeAwA7xHQ, mingo-H+wXaHxf7aLQT0dZR+AlfA,
	longman-H+wXaHxf7aLQT0dZR+AlfA
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg,
	pjt-hpIqsD4AKlfQT0dZR+AlfA, luto-kltTT9wpgjJwATOyAt5JVQ,
	efault-Mmb7MZpHnFY, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

This patch implements cgroup v2 thread support.  The goal of the
thread mode is supporting hierarchical accounting and control at
thread granularity while staying inside the resource domain model
which allows coordination across different resource controllers and
handling of anonymous resource consumptions.

Once thread mode is enabled on a cgroup, the threads of the processes
which are in its subtree can be placed inside the subtree without
being restricted by process granularity or no-internal-process
constraint.  Note that the threads aren't allowed to escape to a
different threaded subtree.  To be used inside a threaded subtree, a
controller should explicitly support threaded mode and be able to
handle internal competition in the way which is appropriate for the
resource.

The root of a threaded subtree, where thread mode is enabled in the
first place, is called the thread root and serves as the resource
domain for the whole subtree.  This is the last cgroup where
non-threaded controllers are operational and where all the
domain-level resource consumptions in the subtree are accounted.  This
allows threaded controllers to operate at thread granularity when
requested while staying inside the scope of system-level resource
distribution.

As the root cgroup is exempt from the no-internal-process constraint,
it can serve as both a thread root and a parent to normal cgroups.
The root cgroup supports mixed cgroup mode which can be enabled and
disabled anytime as long as there aren't any threaded children.  First
level child cgroups joins the mixed threaded subtree when thread mode
is enabled on them.

Internally, in a threaded subtree, each css_set has its ->proc_cset
pointing to a matching css_set which belongs to the thread root.  This
ensures that thread root level cgroup_subsys_state for all threaded
controllers are readily accessible for domain-level operations.

This patch enables threaded mode for the pids and perf_events
controllers.  Neither has to worry about domain-level resource
consumptions and it's enough to simply set the flag.

For more details on the interface and behavior of the thread mode,
please refer to the section 2-2-2 in Documentation/cgroup-v2.txt added
by this patch.  Note that the documentation update is not complete as
the rest of the documentation needs to be updated accordingly.
Rolling those updates into this patch can be confusing so that will be
separate patches.

v3: - Dropped "join" and always make mixed children join the parent's
      threaded subtree.

v2: - After discussions with Waiman, support for mixed thread mode is
      added.  This should address the issue that Peter pointed out
      where any nesting should be avoided for thread subtrees while
      coexisting with other domain cgroups.

    - Enabling / disabling thread mode now piggy backs on the existing
      control mask update mechanism.

    - Bug fixes and cleanup.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
---
Hello,

This is the alternative version where we don't support topologies like
the following.

         root (thread mode)
         /              \
        /                \
       /                  \
      A                    B
 (member of root's     (root of its own
  thread subtree)       thread subtree)

Supporting or not supporting the above topology only has user
interface implications.  Supporting it means more complication or
ugliness in the interface.  Not supporting it obviously results in a
simpler interface but at the cost of some of the flexibility of thread
mode.

Choosing one over the other is unlikely to result in substantial
actual differences.

Thanks.

 Documentation/cgroup-v2.txt |  111 +++++++++++++++
 include/linux/cgroup-defs.h |   12 +
 kernel/cgroup/cgroup.c      |  310 ++++++++++++++++++++++++++++++++++++++++++--
 kernel/cgroup/pids.c        |    1 
 kernel/events/core.c        |    1 
 5 files changed, 419 insertions(+), 16 deletions(-)

--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -16,7 +16,9 @@ CONTENTS
   1-2. What is cgroup?
 2. Basic Operations
   2-1. Mounting
-  2-2. Organizing Processes
+  2-2. Organizing Processes and Threads
+    2-2-1. Processes
+    2-2-2. Threads
   2-3. [Un]populated Notification
   2-4. Controlling Controllers
     2-4-1. Enabling and Disabling
@@ -150,7 +152,9 @@ and experimenting easier, the kernel par
 disabling controllers in v1 and make them always available in v2.
 
 
-2-2. Organizing Processes
+2-2. Organizing Processes and Threads
+
+2-2-1. Processes
 
 Initially, only the root cgroup exists to which all processes belong.
 A child cgroup can be created by creating a sub-directory.
@@ -201,6 +205,109 @@ is removed subsequently, " (deleted)" is
   0::/test-cgroup/test-cgroup-nested (deleted)
 
 
+2-2-2. Threads
+
+cgroup v2 supports thread granularity for a subset of controllers to
+support use cases requiring hierarchical resource distribution across
+the threads of a group of processes.  By default, all threads of a
+process belong to the same cgroup, which also serves as the resource
+domain to host resource consumptions which are not specific to a
+process or thread.  The thread mode allows threads to be spread across
+a subtree while still maintaining the common resource domain for them.
+
+Enabling thread mode on a subtree makes it threaded.  The root of a
+threaded subtree is called thread root and serves as the resource
+domain for the entire subtree.  In a threaded subtree, threads of a
+process can be put in different cgroups and are not subject to the no
+internal process constraint - threaded controllers can be enabled on
+non-leaf cgroups whether they have threads in them or not.
+
+Because the root cgroup is not subject to no internal process
+constraint, it can serve both as a thread root and a parent to normal
+cgroups.  This is called mixed thread mode.
+
+Thread mode can be enabled by writing "enable" to "cgroup.threads"
+file.
+
+  # echo enable > cgroup.threads
+
+On a non-root cgroup, to enable the thread mode, the following
+conditions must be met.
+
+- The thread root doesn't have any child cgroups.
+
+- The thread root doesn't have any controllers enabled.
+
+On the root cgroup, only the mixed thread mode is supported.  The
+following condition should be met to enable.
+
+- The root cgroup doesn't have any threaded children.
+
+Unlike the normal thread mode, the whole subtree is not turned into
+thread subtree.  The first level children have to explicitly opt-in to
+join the thread subtree by enabling thread mode.  To join the thread
+subtree, a first level child must meet the following conditions.
+
+- The first level child doesn't have any child cgroups.
+
+- The first level child doesn't have any controllers enabled.
+
+- The first level child doesn't have any tasks.
+
+Inside a threaded subtree, "cgroup.threads" can be read and contains
+the list of the thread IDs of all threads in the cgroup.  Except that
+the operations are per-thread instead of per-process, "cgroup.threads"
+has the same format and behaves the same way as "cgroup.procs".
+
+The thread root serves as the resource domain for the whole subtree,
+and, while the threads can be scattered across the subtree, all the
+processes are considered to be in the thread root.  "cgroup.procs" in
+a thread root contains the PIDs of all processes in the subtree and is
+not readable in the subtree proper.  However, "cgroup.procs" can be
+written to from anywhere in the subtree to migrate all threads of the
+matching process to the cgroup.
+
+Only threaded controllers can be enabled in a threaded subtree.  When
+a threaded controller is enabled inside a threaded subtree, it only
+accounts for and controls resource consumptions associated with the
+threads in the cgroup and its descendants.  All consumptions which
+aren't tied to a specific thread belong to the thread root.
+
+Because a threaded subtree is exempt from no internal process
+constraint, a threaded controller must be able to handle competition
+between threads in a non-leaf cgroup and its child cgroups.  Each
+threaded controller defines how such competitions are handled.
+
+Thread mode can be disabled by writing "disable" to "cgroup.threads"
+file.
+
+  # echo disable > cgroup.threads
+
+On a normal thread subtree, to disable the thread mode, the following
+conditions must be met.
+
+- The cgroup is a thread root.  Thread mode can't be disabled
+  partially in the subtree.
+
+- The thread root doesn't have any child cgroups.
+
+- The thread root doesn't have any controllers enabled.
+
+On the root cgroup, to disable the mixed thread mode, the following
+condition should be met.
+
+- The root cgroup doesn't have any threaded children.
+
+On a first level child of the mixed thread subtree, the following
+conditions should be met.
+
+- The first level child doesn't have any child cgroups.
+
+- The first level child doesn't have any controllers enabled.
+
+- The first level child doesn't have any tasks.
+
+
 2-3. [Un]populated Notification
 
 Each non-root cgroup has a "cgroup.events" file which contains
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -505,6 +505,18 @@ struct cgroup_subsys {
 	bool implicit_on_dfl:1;
 
 	/*
+	 * If %true, the controller, supports threaded mode on the default
+	 * hierarchy.  In a threaded subtree, both process granularity and
+	 * no-internal-process constraint are ignored and a threaded
+	 * controllers should be able to handle that.
+	 *
+	 * Note that as an implicit controller is automatically enabled on
+	 * all cgroups on the default hierarchy, it should also be
+	 * threaded.  implicit && !threaded is not supported.
+	 */
+	bool threaded:1;
+
+	/*
 	 * If %false, this subsystem is properly hierarchical -
 	 * configuration, resource accounting and restriction on a parent
 	 * cgroup cover those of its children.  If %true, hierarchy support
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -162,6 +162,9 @@ static u16 cgrp_dfl_inhibit_ss_mask;
 /* some controllers are implicitly enabled on the default hierarchy */
 static u16 cgrp_dfl_implicit_ss_mask;
 
+/* some controllers can be threaded on the default hierarchy */
+static u16 cgrp_dfl_threaded_ss_mask;
+
 /* The list of hierarchy roots */
 LIST_HEAD(cgroup_roots);
 static int cgroup_root_count;
@@ -331,14 +334,60 @@ static bool cgroup_is_threaded(struct cg
 	return cgrp->proc_cgrp;
 }
 
+/* is @cgrp root of a threaded subtree? */
+static bool cgroup_is_thread_root(struct cgroup *cgrp)
+{
+	return cgrp->proc_cgrp == cgrp;
+}
+
+/* if threaded, would @cgrp become root of a mixed threaded subtree? */
+static bool cgroup_is_mixable(struct cgroup *cgrp)
+{
+	/*
+	 * Root isn't under domain level resource control exempting it from
+	 * the no-internal-process constraint, so it can serve as a thread
+	 * root and a parent of resource domains at the same time.
+	 */
+	return !cgroup_parent(cgrp);
+}
+
+/* is @cgrp root of a mixed threaded subtree */
+static bool cgroup_is_mixed_root(struct cgroup *cgrp)
+{
+	return cgroup_is_thread_root(cgrp) && cgroup_is_mixable(cgrp);
+}
+
+/* is @cgrp's parent a mixed thread root? */
+static bool cgroup_has_mixed_parent(struct cgroup *cgrp)
+{
+	struct cgroup *parent = cgroup_parent(cgrp);
+
+	return parent && cgroup_is_mixed_root(parent);
+}
+
+/* is @cgrp the first level child of a mixed threaded subtree */
+static bool cgroup_is_mixed_child(struct cgroup *cgrp)
+{
+	struct cgroup *parent = cgroup_parent(cgrp);
+
+	return parent && cgrp->proc_cgrp == parent &&
+		cgroup_is_mixed_root(parent);
+}
+
 /* subsystems visibly enabled on a cgroup */
 static u16 cgroup_control(struct cgroup *cgrp)
 {
 	struct cgroup *parent = cgroup_parent(cgrp);
 	u16 root_ss_mask = cgrp->root->subsys_mask;
 
-	if (parent)
-		return parent->subtree_control;
+	if (parent) {
+		u16 ss_mask = parent->subtree_control;
+
+		/* mixed child can only have threaded subset of controllers */
+		if (cgroup_is_mixed_child(cgrp))
+			ss_mask &= cgrp_dfl_threaded_ss_mask;
+		return ss_mask;
+	}
 
 	if (cgroup_on_dfl(cgrp))
 		root_ss_mask &= ~(cgrp_dfl_inhibit_ss_mask |
@@ -351,8 +400,14 @@ static u16 cgroup_ss_mask(struct cgroup
 {
 	struct cgroup *parent = cgroup_parent(cgrp);
 
-	if (parent)
-		return parent->subtree_ss_mask;
+	if (parent) {
+		u16 ss_mask = parent->subtree_ss_mask;
+
+		/* mixed child can only have threaded subset of controllers */
+		if (cgroup_is_mixed_child(cgrp))
+			ss_mask &= cgrp_dfl_threaded_ss_mask;
+		return ss_mask;
+	}
 
 	return cgrp->root->subsys_mask;
 }
@@ -2241,14 +2296,14 @@ out_release_tset:
  * cgroup_may_migrate_to - verify whether a cgroup can be migration destination
  * @dst_cgrp: destination cgroup to test
  *
- * On the default hierarchy, except for the root, subtree_control must be
- * zero for migration destination cgroups with tasks so that child cgroups
- * don't compete against tasks.
+ * On the default hierarchy, except for the mixable and threaded cgroups,
+ * subtree_control must be zero for migration destination cgroups with
+ * tasks so that child cgroups don't compete against tasks.
  */
 bool cgroup_may_migrate_to(struct cgroup *dst_cgrp)
 {
-	return !cgroup_on_dfl(dst_cgrp) || !cgroup_parent(dst_cgrp) ||
-		!dst_cgrp->subtree_control;
+	return !cgroup_on_dfl(dst_cgrp) || cgroup_is_mixable(dst_cgrp) ||
+		cgroup_is_threaded(dst_cgrp) || !dst_cgrp->subtree_control;
 }
 
 /**
@@ -2957,11 +3012,20 @@ static ssize_t cgroup_subtree_control_wr
 		goto out_unlock;
 	}
 
+	/* can't enable !threaded controllers on a threaded cgroup */
+	if (cgroup_is_threaded(cgrp) && !cgroup_is_mixed_root(cgrp) &&
+	    (enable & ~cgrp_dfl_threaded_ss_mask)) {
+		ret = -EBUSY;
+		goto out_unlock;
+	}
+
 	/*
-	 * Except for the root, subtree_control must be zero for a cgroup
-	 * with tasks so that child cgroups don't compete against tasks.
+	 * Except for mixable and threaded cgroups, subtree_control must be
+	 * zero for a cgroup with tasks so that child cgroups don't compete
+	 * against tasks.
 	 */
-	if (enable && cgroup_parent(cgrp) && cgroup_has_tasks(cgrp)) {
+	if (enable && !cgroup_is_mixable(cgrp) && !cgroup_is_threaded(cgrp) &&
+	    cgroup_has_tasks(cgrp)) {
 		ret = -EBUSY;
 		goto out_unlock;
 	}
@@ -2983,6 +3047,120 @@ out_unlock:
 	return ret ?: nbytes;
 }
 
+static int cgroup_vet_threaded_switch(struct cgroup *cgrp, bool is_enable)
+{
+	if (cgroup_has_mixed_parent(cgrp)) {
+		/*
+		 * @cgrp is joining or leaving an existing mixed threaded
+		 * root.  Avoid needing recursive operations, implicit
+		 * subtree_control changes, or migrations.
+		 */
+		if (css_has_online_children(&cgrp->self) ||
+		    cgrp->subtree_control || cgroup_has_tasks(cgrp))
+			return -EBUSY;
+	} else if (cgroup_is_mixable(cgrp)) {
+		struct cgroup *child;
+
+		/*
+		 * @cgrp is starting or ending a mixed threaded subtree.
+		 * It's allowed to have domain children and enabled
+		 * controllers but we can't change ->proc_cgrp of existing
+		 * threaded children.  Make sure there aren't already
+		 * threaded children.
+		 */
+		cgroup_for_each_live_child(child, cgrp)
+			if (cgroup_is_threaded(child))
+				return -EBUSY;
+	} else {
+		/*
+		 * @cgrp is starting or ending a normal threaded subtree.
+		 * Avoid needing recursive operations, or implicit
+		 * subtree_control changes.
+		 */
+		if (css_has_online_children(&cgrp->self) ||
+		    cgrp->subtree_control)
+			return -EBUSY;
+
+		/* no partial disable */
+		if (!is_enable && !cgroup_is_thread_root(cgrp))
+			return -EBUSY;
+	}
+
+	return 0;
+}
+
+static int cgroup_enable_threaded(struct cgroup *cgrp)
+{
+	struct cgroup *proc_cgrp;
+	int ret;
+
+	lockdep_assert_held(&cgroup_mutex);
+
+	/* noop if already threaded */
+	if (cgroup_is_threaded(cgrp))
+		return 0;
+
+	ret = cgroup_vet_threaded_switch(cgrp, true);
+	if (ret)
+		return ret;
+
+	/* if the parent is mixed threaded root, join the subtree */
+	if (cgroup_has_mixed_parent(cgrp))
+		proc_cgrp = cgroup_parent(cgrp);
+	else
+		proc_cgrp = cgrp;
+
+	cgroup_save_control(cgrp);
+
+	/*
+	 * Mark it threaded.  This makes cgroup_control() and
+	 * cgroup_ss_mask() skip domain controllers.  In turn, the
+	 * following control operations migrate tasks to the matching
+	 * threaded csets.
+	 */
+	cgrp->proc_cgrp = proc_cgrp;
+
+	ret = cgroup_apply_control(cgrp);
+	if (ret)
+		cgrp->proc_cgrp = NULL;
+
+	cgroup_finalize_control(cgrp, ret);
+
+	return ret;
+}
+
+static int cgroup_disable_threaded(struct cgroup *cgrp)
+{
+	struct cgroup *proc_cgrp = cgrp->proc_cgrp;
+	int ret;
+
+	lockdep_assert_held(&cgroup_mutex);
+
+	/* noop if already !threaded */
+	if (!cgroup_is_threaded(cgrp))
+		return 0;
+
+	ret = cgroup_vet_threaded_switch(cgrp, false);
+	if (ret)
+		return ret;
+
+	cgroup_save_control(cgrp);
+
+	/*
+	 * Mark it !threaded.  This restores cgroup_control() and
+	 * cgroup_ss_mask() behavior.  See cgroup_enabled_threaded().
+	 */
+	cgrp->proc_cgrp = NULL;
+
+	ret = cgroup_apply_control(cgrp);
+	if (ret)
+		cgrp->proc_cgrp = proc_cgrp;
+
+	cgroup_finalize_control(cgrp, ret);
+
+	return ret;
+}
+
 static int cgroup_events_show(struct seq_file *seq, void *v)
 {
 	seq_printf(seq, "populated %d\n",
@@ -3866,12 +4044,12 @@ static void *cgroup_procs_next(struct se
 	return css_task_iter_next(it);
 }
 
-static void *cgroup_procs_start(struct seq_file *s, loff_t *pos)
+static void *__cgroup_procs_start(struct seq_file *s, loff_t *pos,
+				  unsigned int iter_flags)
 {
 	struct kernfs_open_file *of = s->private;
 	struct cgroup *cgrp = seq_css(s)->cgroup;
 	struct css_task_iter *it = of->priv;
-	unsigned iter_flags = CSS_TASK_ITER_PROCS | CSS_TASK_ITER_THREADED;
 
 	/*
 	 * When a seq_file is seeked, it's always traversed sequentially
@@ -3894,6 +4072,23 @@ static void *cgroup_procs_start(struct s
 	return cgroup_procs_next(s, NULL, NULL);
 }
 
+static void *cgroup_procs_start(struct seq_file *s, loff_t *pos)
+{
+	struct cgroup *cgrp = seq_css(s)->cgroup;
+
+	/*
+	 * All processes of a threaded subtree are in the top threaded
+	 * cgroup.  Only threads can be distributed across the subtree.
+	 * Reject reads on cgroup.procs in the subtree proper.  They're
+	 * always empty anyway.
+	 */
+	if (cgroup_is_threaded(cgrp) && !cgroup_is_thread_root(cgrp))
+		return ERR_PTR(-EINVAL);
+
+	return __cgroup_procs_start(s, pos, CSS_TASK_ITER_PROCS |
+					    CSS_TASK_ITER_THREADED);
+}
+
 static int cgroup_procs_show(struct seq_file *s, void *v)
 {
 	seq_printf(s, "%d\n", task_pid_vnr(v));
@@ -3948,6 +4143,76 @@ out_unlock:
 	return ret ?: nbytes;
 }
 
+static void *cgroup_threads_start(struct seq_file *s, loff_t *pos)
+{
+	struct cgroup *cgrp = seq_css(s)->cgroup;
+
+	if (!cgroup_is_threaded(cgrp))
+		return ERR_PTR(-EINVAL);
+
+	return __cgroup_procs_start(s, pos, 0);
+}
+
+static ssize_t cgroup_threads_write(struct kernfs_open_file *of,
+				    char *buf, size_t nbytes, loff_t off)
+{
+	struct super_block *sb = of->file->f_path.dentry->d_sb;
+	struct cgroup *cgrp, *common_ancestor;
+	struct task_struct *task;
+	ssize_t ret;
+
+	buf = strstrip(buf);
+
+	cgrp = cgroup_kn_lock_live(of->kn, false);
+	if (!cgrp)
+		return -ENODEV;
+
+	/* cgroup.procs determines delegation, require permission on it too */
+	ret = cgroup_procs_write_permission(cgrp, sb);
+	if (ret)
+		goto out_unlock;
+
+	/* enable or disable? */
+	if (!strcmp(buf, "enable")) {
+		ret = cgroup_enable_threaded(cgrp);
+		goto out_unlock;
+	} else if (!strcmp(buf, "disable")) {
+		ret = cgroup_disable_threaded(cgrp);
+		goto out_unlock;
+	}
+
+	/* thread migration */
+	ret = -EINVAL;
+	if (!cgroup_is_threaded(cgrp))
+		goto out_unlock;
+
+	task = cgroup_procs_write_start(buf, false);
+	ret = PTR_ERR_OR_ZERO(task);
+	if (ret)
+		goto out_unlock;
+
+	common_ancestor = cgroup_migrate_common_ancestor(task, cgrp);
+
+	/* can't migrate across disjoint threaded subtrees */
+	ret = -EACCES;
+	if (common_ancestor->proc_cgrp != cgrp->proc_cgrp)
+		goto out_finish;
+
+	/* and follow the cgroup.procs delegation rule */
+	ret = cgroup_procs_write_permission(common_ancestor, sb);
+	if (ret)
+		goto out_finish;
+
+	ret = cgroup_attach_task(cgrp, task, false);
+
+out_finish:
+	cgroup_procs_write_finish(task);
+out_unlock:
+	cgroup_kn_unlock(of->kn);
+
+	return ret ?: nbytes;
+}
+
 /* cgroup core interface files for the default hierarchy */
 static struct cftype cgroup_base_files[] = {
 	{
@@ -3960,6 +4225,14 @@ static struct cftype cgroup_base_files[]
 		.write = cgroup_procs_write,
 	},
 	{
+		.name = "cgroup.threads",
+		.release = cgroup_procs_release,
+		.seq_start = cgroup_threads_start,
+		.seq_next = cgroup_procs_next,
+		.seq_show = cgroup_procs_show,
+		.write = cgroup_threads_write,
+	},
+	{
 		.name = "cgroup.controllers",
 		.seq_show = cgroup_controllers_show,
 	},
@@ -4274,6 +4547,9 @@ static struct cgroup *cgroup_create(stru
 	cgrp->root = root;
 	cgrp->level = level;
 
+	if (!cgroup_has_mixed_parent(cgrp))
+		cgrp->proc_cgrp = parent->proc_cgrp;
+
 	for (tcgrp = cgrp; tcgrp; tcgrp = cgroup_parent(tcgrp))
 		cgrp->ancestor_ids[tcgrp->level] = tcgrp->id;
 
@@ -4720,11 +4996,17 @@ int __init cgroup_init(void)
 
 		cgrp_dfl_root.subsys_mask |= 1 << ss->id;
 
+		/* implicit controllers must be threaded too */
+		WARN_ON(ss->implicit_on_dfl && !ss->threaded);
+
 		if (ss->implicit_on_dfl)
 			cgrp_dfl_implicit_ss_mask |= 1 << ss->id;
 		else if (!ss->dfl_cftypes)
 			cgrp_dfl_inhibit_ss_mask |= 1 << ss->id;
 
+		if (ss->threaded)
+			cgrp_dfl_threaded_ss_mask |= 1 << ss->id;
+
 		if (ss->dfl_cftypes == ss->legacy_cftypes) {
 			WARN_ON(cgroup_add_cftypes(ss, ss->dfl_cftypes));
 		} else {
--- a/kernel/cgroup/pids.c
+++ b/kernel/cgroup/pids.c
@@ -345,4 +345,5 @@ struct cgroup_subsys pids_cgrp_subsys =
 	.free		= pids_free,
 	.legacy_cftypes	= pids_files,
 	.dfl_cftypes	= pids_files,
+	.threaded	= true,
 };
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11155,5 +11155,6 @@ struct cgroup_subsys perf_event_cgrp_sub
 	 * controller is not mounted on a legacy hierarchy.
 	 */
 	.implicit_on_dfl = true,
+	.threaded	= true,
 };
 #endif /* CONFIG_CGROUP_PERF */

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHSET for-4.13] cgroup: implement cgroup2 thread mode, v2
@ 2017-06-15 20:16       ` Tejun Heo
  0 siblings, 0 replies; 47+ messages in thread
From: Tejun Heo @ 2017-06-15 20:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Li Zefan, hannes, mingo, longman, cgroups, linux-kernel,
	kernel-team, pjt, luto, efault, torvalds

Hello,

On Mon, Jun 12, 2017 at 05:27:53PM -0400, Tejun Heo wrote:
> The previous iteration actually was that way, so the only thread mode
> operation was setting whether to enable thread or not as before and if
> the parent is already thread mode, it'd always join the existing
> threaded subtree.  If you like that better, I can post that version
> right away.

Just posted v3 of the patch which drops the "join" operation, but do
note that this inherently comes at the cost of losing the ability
express certain topologies.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHSET for-4.13] cgroup: implement cgroup2 thread mode, v2
@ 2017-06-15 20:16       ` Tejun Heo
  0 siblings, 0 replies; 47+ messages in thread
From: Tejun Heo @ 2017-06-15 20:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Li Zefan, hannes-druUgvl0LCNAfugRpC6u6w,
	mingo-H+wXaHxf7aLQT0dZR+AlfA, longman-H+wXaHxf7aLQT0dZR+AlfA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg,
	pjt-hpIqsD4AKlfQT0dZR+AlfA, luto-kltTT9wpgjJwATOyAt5JVQ,
	efault-Mmb7MZpHnFY, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

Hello,

On Mon, Jun 12, 2017 at 05:27:53PM -0400, Tejun Heo wrote:
> The previous iteration actually was that way, so the only thread mode
> operation was setting whether to enable thread or not as before and if
> the parent is already thread mode, it'd always join the existing
> threaded subtree.  If you like that better, I can post that version
> right away.

Just posted v3 of the patch which drops the "join" operation, but do
note that this inherently comes at the cost of losing the ability
express certain topologies.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH] cgroup: update debug controller to print out thread mode information
@ 2017-06-15 20:17   ` Tejun Heo
  0 siblings, 0 replies; 47+ messages in thread
From: Tejun Heo @ 2017-06-15 20:17 UTC (permalink / raw)
  To: Li Zefan, hannes, peterz, mingo, longman
  Cc: cgroups, linux-kernel, kernel-team, pjt, luto, efault, torvalds

From: Waiman Long <longman@redhat.com>

Update debug controller so that it prints out debug info about thread
mode.

 1) The relationship between proc_cset and threaded_csets are displayed.
 2) The status of being a thread root or threaded cgroup is displayed.

This patch is extracted from Waiman's larger patch.

Patch-originally-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
---
Hello,

Waiman, this is the thread mode support part of your debug patch.
I'll include this in the thread mode patchset and apply it together.
It seems to work fine but if you spot anything silly, please let me
know.

Thanks.

 kernel/cgroup/debug.c |   57 +++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 43 insertions(+), 14 deletions(-)

--- a/kernel/cgroup/debug.c
+++ b/kernel/cgroup/debug.c
@@ -114,27 +114,53 @@ static int cgroup_css_links_read(struct
 {
 	struct cgroup_subsys_state *css = seq_css(seq);
 	struct cgrp_cset_link *link;
-	int dead_cnt = 0, extra_refs = 0;
+	int dead_cnt = 0, extra_refs = 0, threaded_csets = 0;
 
 	spin_lock_irq(&css_set_lock);
+
+	if (css->cgroup->proc_cgrp)
+		seq_puts(seq, (css->cgroup->proc_cgrp == css->cgroup) ?
+			 "[thread root]\n" : "[threaded]\n");
+
 	list_for_each_entry(link, &css->cgroup->cset_links, cset_link) {
 		struct css_set *cset = link->cset;
 		struct task_struct *task;
 		int count = 0;
 		int refcnt = refcount_read(&cset->refcount);
 
-		seq_printf(seq, " %d", refcnt);
-		if (refcnt - cset->nr_tasks > 0) {
-			int extra = refcnt - cset->nr_tasks;
-
-			seq_printf(seq, " +%d", extra);
-			/*
-			 * Take out the one additional reference in
-			 * init_css_set.
-			 */
-			if (cset == &init_css_set)
-				extra--;
-			extra_refs += extra;
+		/*
+		 * Print out the proc_cset and threaded_cset relationship
+		 * and highlight difference between refcount and task_count.
+		 */
+		seq_printf(seq, "css_set %pK", cset);
+		if (rcu_dereference_protected(cset->proc_cset, 1) != cset) {
+			threaded_csets++;
+			seq_printf(seq, "=>%pK", cset->proc_cset);
+		}
+		if (!list_empty(&cset->threaded_csets)) {
+			struct css_set *tcset;
+			int idx = 0;
+
+			list_for_each_entry(tcset, &cset->threaded_csets,
+					    threaded_csets_node) {
+				seq_puts(seq, idx ? "," : "<=");
+				seq_printf(seq, "%pK", tcset);
+				idx++;
+			}
+		} else {
+			seq_printf(seq, " %d", refcnt);
+			if (refcnt - cset->nr_tasks > 0) {
+				int extra = refcnt - cset->nr_tasks;
+
+				seq_printf(seq, " +%d", extra);
+				/*
+				 * Take out the one additional reference in
+				 * init_css_set.
+				 */
+				if (cset == &init_css_set)
+					extra--;
+				extra_refs += extra;
+			}
 		}
 		seq_puts(seq, "\n");
 
@@ -163,10 +189,12 @@ static int cgroup_css_links_read(struct
 	}
 	spin_unlock_irq(&css_set_lock);
 
-	if (!dead_cnt && !extra_refs)
+	if (!dead_cnt && !extra_refs && !threaded_csets)
 		return 0;
 
 	seq_puts(seq, "\n");
+	if (threaded_csets)
+		seq_printf(seq, "threaded css_sets = %d\n", threaded_csets);
 	if (extra_refs)
 		seq_printf(seq, "extra references = %d\n", extra_refs);
 	if (dead_cnt)
@@ -342,6 +370,7 @@ struct cgroup_subsys debug_cgrp_subsys =
 	.css_alloc	= debug_css_alloc,
 	.css_free	= debug_css_free,
 	.legacy_cftypes	= debug_legacy_files,
+	.threaded	= true,
 };
 
 /*

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH] cgroup: update debug controller to print out thread mode information
@ 2017-06-15 20:17   ` Tejun Heo
  0 siblings, 0 replies; 47+ messages in thread
From: Tejun Heo @ 2017-06-15 20:17 UTC (permalink / raw)
  To: Li Zefan, hannes-druUgvl0LCNAfugRpC6u6w,
	peterz-wEGCiKHe2LqWVfeAwA7xHQ, mingo-H+wXaHxf7aLQT0dZR+AlfA,
	longman-H+wXaHxf7aLQT0dZR+AlfA
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg,
	pjt-hpIqsD4AKlfQT0dZR+AlfA, luto-kltTT9wpgjJwATOyAt5JVQ,
	efault-Mmb7MZpHnFY, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

From: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Update debug controller so that it prints out debug info about thread
mode.

 1) The relationship between proc_cset and threaded_csets are displayed.
 2) The status of being a thread root or threaded cgroup is displayed.

This patch is extracted from Waiman's larger patch.

Patch-originally-by: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
Hello,

Waiman, this is the thread mode support part of your debug patch.
I'll include this in the thread mode patchset and apply it together.
It seems to work fine but if you spot anything silly, please let me
know.

Thanks.

 kernel/cgroup/debug.c |   57 +++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 43 insertions(+), 14 deletions(-)

--- a/kernel/cgroup/debug.c
+++ b/kernel/cgroup/debug.c
@@ -114,27 +114,53 @@ static int cgroup_css_links_read(struct
 {
 	struct cgroup_subsys_state *css = seq_css(seq);
 	struct cgrp_cset_link *link;
-	int dead_cnt = 0, extra_refs = 0;
+	int dead_cnt = 0, extra_refs = 0, threaded_csets = 0;
 
 	spin_lock_irq(&css_set_lock);
+
+	if (css->cgroup->proc_cgrp)
+		seq_puts(seq, (css->cgroup->proc_cgrp == css->cgroup) ?
+			 "[thread root]\n" : "[threaded]\n");
+
 	list_for_each_entry(link, &css->cgroup->cset_links, cset_link) {
 		struct css_set *cset = link->cset;
 		struct task_struct *task;
 		int count = 0;
 		int refcnt = refcount_read(&cset->refcount);
 
-		seq_printf(seq, " %d", refcnt);
-		if (refcnt - cset->nr_tasks > 0) {
-			int extra = refcnt - cset->nr_tasks;
-
-			seq_printf(seq, " +%d", extra);
-			/*
-			 * Take out the one additional reference in
-			 * init_css_set.
-			 */
-			if (cset == &init_css_set)
-				extra--;
-			extra_refs += extra;
+		/*
+		 * Print out the proc_cset and threaded_cset relationship
+		 * and highlight difference between refcount and task_count.
+		 */
+		seq_printf(seq, "css_set %pK", cset);
+		if (rcu_dereference_protected(cset->proc_cset, 1) != cset) {
+			threaded_csets++;
+			seq_printf(seq, "=>%pK", cset->proc_cset);
+		}
+		if (!list_empty(&cset->threaded_csets)) {
+			struct css_set *tcset;
+			int idx = 0;
+
+			list_for_each_entry(tcset, &cset->threaded_csets,
+					    threaded_csets_node) {
+				seq_puts(seq, idx ? "," : "<=");
+				seq_printf(seq, "%pK", tcset);
+				idx++;
+			}
+		} else {
+			seq_printf(seq, " %d", refcnt);
+			if (refcnt - cset->nr_tasks > 0) {
+				int extra = refcnt - cset->nr_tasks;
+
+				seq_printf(seq, " +%d", extra);
+				/*
+				 * Take out the one additional reference in
+				 * init_css_set.
+				 */
+				if (cset == &init_css_set)
+					extra--;
+				extra_refs += extra;
+			}
 		}
 		seq_puts(seq, "\n");
 
@@ -163,10 +189,12 @@ static int cgroup_css_links_read(struct
 	}
 	spin_unlock_irq(&css_set_lock);
 
-	if (!dead_cnt && !extra_refs)
+	if (!dead_cnt && !extra_refs && !threaded_csets)
 		return 0;
 
 	seq_puts(seq, "\n");
+	if (threaded_csets)
+		seq_printf(seq, "threaded css_sets = %d\n", threaded_csets);
 	if (extra_refs)
 		seq_printf(seq, "extra references = %d\n", extra_refs);
 	if (dead_cnt)
@@ -342,6 +370,7 @@ struct cgroup_subsys debug_cgrp_subsys =
 	.css_alloc	= debug_css_alloc,
 	.css_free	= debug_css_free,
 	.legacy_cftypes	= debug_legacy_files,
+	.threaded	= true,
 };
 
 /*

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHSET for-4.13] cgroup: implement cgroup2 thread mode, v2
@ 2017-06-27  7:01       ` Peter Zijlstra
  0 siblings, 0 replies; 47+ messages in thread
From: Peter Zijlstra @ 2017-06-27  7:01 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, hannes, mingo, longman, cgroups, linux-kernel,
	kernel-team, pjt, luto, efault, torvalds


I'm slowly getting back to things...

On Mon, Jun 12, 2017 at 05:27:53PM -0400, Tejun Heo wrote:

> > > * Root cgroup can enable thread mode anytime and a first level child
> > >   can opt-in to that thread subtree anchored at root by writing "join"
> > >   to "cgroup.threads" files, start its own thread subtree or just be a
> > >   normal cgroup.
> > 
> > Yuck... this again is a consequence of tagging the 'wrong' thing. Again,
> > the primary construct is the resource domain.
> > 
> > If you use that as a tag, you don't need this weird join crap. Because
> > as soon as you clear the 'resource domain' flag on a group, it instantly
> > becomes a thread group and 'obviously' connects to the first parent that
> > is a resource domain.
> 
> It has nothing to do with whether we mark domain or threaded subtrees.
> It is solely from whether you wanna express cases where a thread root
> is right below another thread root.  Tn's are member cgroup of thread
> subtrees where the same number means the same threaded subtree, D's
> are of domain cgroups.
> 
> The following is straight forward.
> 
> 	  T0
>        /  \
>       T0   D
> 
> The following is too.
> 
> 	  T0
>        /  \
>       T0   D
>             \
> 	       T1
> 
> The question is whether to allow something like the following.
> 
> 	  T0
>        /  \
>       T0  T1
> 
> That's where the "join" thing comes from because we wanna be able to
> tell apart whether a cgroup is gonna be a part of the existing thread
> subtree or starting its own thread subtree.  There sure are multiple
> ways to express that but one way or the other, if you wanna support
> topologies like the last one, you have to distinguish the two.

Hmm,.. I had not considered that. I was strictly considering the root to
always be a resource domain. What use-case or scenario did you have
to want to do this?

That is, what is the meaning of having T1 be a separate 'root' if its
not also a resource domain?

> > And, as per the last time, this threaded marker isn't uniquely
> > identifying things, so it hard prohibits from ever extending the model
> > to allow resource domains nested in a thread subtree. Now I understand
> > why you don't implement that now -- you were struggling with the views
> > API, but that is no excuse to create an API that permanently disables
> > that feature.
> 
> Hmmm?  We can just allow disabling thread mode if we ever get to that.
> We can't make arbitrary graphs out of these nodes.  Whatever mode we
> put them in, they have to fall in with the overall tree structure, so
> I don't think the interface is unnecessarily restricting in that
> direction.

IIRC the problem with the 'threaded' marker is that it doesn't clearly
capture what a resource domain is.

That is, assuming that a thread root is always a resource domain, we get
the following problem:

If we set 'threaded' on the root group in order to create a thread
(sub)group. If we then want to create another domain group, we'd have to
clear 'threaded' on that.

	R (t=1)
       / \
(t=1) T   D (t=0)

So far so good. However, now we want to create another thread group
under our domain group D, so we have to set its 'threaded' marker again:

	R (t=1)
       / \
(t=1) T   D (t=1)
         /
	T (t=1)

And we can no longer identify D as a resource domain. If OTOH we mark
'domain' we get:

	R (d=1)
       / \
(d=0) T   D (d=1)
         /
	T (d=0)

Which clearly identifies the domains and the thread only groups.


Your objections to doing this were representing the resource controllers
in the intermediate thread-only groups like:

   R
    \
     T  -- what to do with eg. memcg here?
      \
       D
        \
	 T

I suggested having all resource controllers represented with a soft-link
back into the (thread-root) resource domain. But you were not convinced
and worried people were going to be confused.


Your current proposal treats the resource controllers as disabled in
thread subgroups -- which is perfectly fine given the constraint that we
cannot have new domains in a thread sub-tree. My worry is that we don't
paint ourselves in a corner and create an interface where we can never
extend to allow this.

The immediate use-case for wanting this would be allowing tasks in a
thread-group to start a VM or container (which would then want a
(sub) resource domain).


> > So I really regret the 'shares' interface; we really should have done a
> > nice thing.
> > 
> >   https://lkml.kernel.org/r/20170410073622.2y6tnpcd2ssuoztz@hirez.programming.kicks-ass.net
> > 
> > So I would like to change to that instead of the weird 100 thing.
> 
> Is it?  Relative weights are pretty fundamental and clearly defined in
> expressing work-conserving resource distribution.  Do you have more
> details on what you have on mind?

I'm not wanting to change the relative weight thing. I merely wish to
change the interface for setting the group weight to match that of the
task weight, since both directly compete against one another.

That is, a group weight is the same kind of weight as a task weight.
Therefore it would make much more sense to set then in equal units too.
Not having done that from the beginning was rather silly.

By having a 100 based weight it becomes very hard to match the weight of
nice()'ed tasks (something that was already tricky with the current
shares interface because we'd need to share the nice weight table with
userspace).

By having both in nice units, its both conceptually clear that they're
the same kind of weight and easier to match weights.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHSET for-4.13] cgroup: implement cgroup2 thread mode, v2
@ 2017-06-27  7:01       ` Peter Zijlstra
  0 siblings, 0 replies; 47+ messages in thread
From: Peter Zijlstra @ 2017-06-27  7:01 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, hannes-druUgvl0LCNAfugRpC6u6w,
	mingo-H+wXaHxf7aLQT0dZR+AlfA, longman-H+wXaHxf7aLQT0dZR+AlfA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg,
	pjt-hpIqsD4AKlfQT0dZR+AlfA, luto-kltTT9wpgjJwATOyAt5JVQ,
	efault-Mmb7MZpHnFY, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b


I'm slowly getting back to things...

On Mon, Jun 12, 2017 at 05:27:53PM -0400, Tejun Heo wrote:

> > > * Root cgroup can enable thread mode anytime and a first level child
> > >   can opt-in to that thread subtree anchored at root by writing "join"
> > >   to "cgroup.threads" files, start its own thread subtree or just be a
> > >   normal cgroup.
> > 
> > Yuck... this again is a consequence of tagging the 'wrong' thing. Again,
> > the primary construct is the resource domain.
> > 
> > If you use that as a tag, you don't need this weird join crap. Because
> > as soon as you clear the 'resource domain' flag on a group, it instantly
> > becomes a thread group and 'obviously' connects to the first parent that
> > is a resource domain.
> 
> It has nothing to do with whether we mark domain or threaded subtrees.
> It is solely from whether you wanna express cases where a thread root
> is right below another thread root.  Tn's are member cgroup of thread
> subtrees where the same number means the same threaded subtree, D's
> are of domain cgroups.
> 
> The following is straight forward.
> 
> 	  T0
>        /  \
>       T0   D
> 
> The following is too.
> 
> 	  T0
>        /  \
>       T0   D
>             \
> 	       T1
> 
> The question is whether to allow something like the following.
> 
> 	  T0
>        /  \
>       T0  T1
> 
> That's where the "join" thing comes from because we wanna be able to
> tell apart whether a cgroup is gonna be a part of the existing thread
> subtree or starting its own thread subtree.  There sure are multiple
> ways to express that but one way or the other, if you wanna support
> topologies like the last one, you have to distinguish the two.

Hmm,.. I had not considered that. I was strictly considering the root to
always be a resource domain. What use-case or scenario did you have
to want to do this?

That is, what is the meaning of having T1 be a separate 'root' if its
not also a resource domain?

> > And, as per the last time, this threaded marker isn't uniquely
> > identifying things, so it hard prohibits from ever extending the model
> > to allow resource domains nested in a thread subtree. Now I understand
> > why you don't implement that now -- you were struggling with the views
> > API, but that is no excuse to create an API that permanently disables
> > that feature.
> 
> Hmmm?  We can just allow disabling thread mode if we ever get to that.
> We can't make arbitrary graphs out of these nodes.  Whatever mode we
> put them in, they have to fall in with the overall tree structure, so
> I don't think the interface is unnecessarily restricting in that
> direction.

IIRC the problem with the 'threaded' marker is that it doesn't clearly
capture what a resource domain is.

That is, assuming that a thread root is always a resource domain, we get
the following problem:

If we set 'threaded' on the root group in order to create a thread
(sub)group. If we then want to create another domain group, we'd have to
clear 'threaded' on that.

	R (t=1)
       / \
(t=1) T   D (t=0)

So far so good. However, now we want to create another thread group
under our domain group D, so we have to set its 'threaded' marker again:

	R (t=1)
       / \
(t=1) T   D (t=1)
         /
	T (t=1)

And we can no longer identify D as a resource domain. If OTOH we mark
'domain' we get:

	R (d=1)
       / \
(d=0) T   D (d=1)
         /
	T (d=0)

Which clearly identifies the domains and the thread only groups.


Your objections to doing this were representing the resource controllers
in the intermediate thread-only groups like:

   R
    \
     T  -- what to do with eg. memcg here?
      \
       D
        \
	 T

I suggested having all resource controllers represented with a soft-link
back into the (thread-root) resource domain. But you were not convinced
and worried people were going to be confused.


Your current proposal treats the resource controllers as disabled in
thread subgroups -- which is perfectly fine given the constraint that we
cannot have new domains in a thread sub-tree. My worry is that we don't
paint ourselves in a corner and create an interface where we can never
extend to allow this.

The immediate use-case for wanting this would be allowing tasks in a
thread-group to start a VM or container (which would then want a
(sub) resource domain).


> > So I really regret the 'shares' interface; we really should have done a
> > nice thing.
> > 
> >   https://lkml.kernel.org/r/20170410073622.2y6tnpcd2ssuoztz-Nxj+rRp3nVydTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org
> > 
> > So I would like to change to that instead of the weird 100 thing.
> 
> Is it?  Relative weights are pretty fundamental and clearly defined in
> expressing work-conserving resource distribution.  Do you have more
> details on what you have on mind?

I'm not wanting to change the relative weight thing. I merely wish to
change the interface for setting the group weight to match that of the
task weight, since both directly compete against one another.

That is, a group weight is the same kind of weight as a task weight.
Therefore it would make much more sense to set then in equal units too.
Not having done that from the beginning was rather silly.

By having a 100 based weight it becomes very hard to match the weight of
nice()'ed tasks (something that was already tricky with the current
shares interface because we'd need to share the nice weight table with
userspace).

By having both in nice units, its both conceptually clear that they're
the same kind of weight and easier to match weights.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHSET for-4.13] cgroup: implement cgroup2 thread mode, v2
  2017-06-27  7:01       ` Peter Zijlstra
@ 2017-06-30 13:23         ` Tejun Heo
  -1 siblings, 0 replies; 47+ messages in thread
From: Tejun Heo @ 2017-06-30 13:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Li Zefan, hannes, mingo, longman, cgroups, linux-kernel,
	kernel-team, pjt, luto, efault, torvalds

Hello, Peter.

On Tue, Jun 27, 2017 at 09:01:43AM +0200, Peter Zijlstra wrote:
> 
> I'm slowly getting back to things...

Welcome back.

> On Mon, Jun 12, 2017 at 05:27:53PM -0400, Tejun Heo wrote:
> > That's where the "join" thing comes from because we wanna be able to
> > tell apart whether a cgroup is gonna be a part of the existing thread
> > subtree or starting its own thread subtree.  There sure are multiple
> > ways to express that but one way or the other, if you wanna support
> > topologies like the last one, you have to distinguish the two.
> 
> Hmm,.. I had not considered that. I was strictly considering the root to
> always be a resource domain. What use-case or scenario did you have
> to want to do this?

I don't have any really.  It's mostly just from interface
completeness.

> That is, what is the meaning of having T1 be a separate 'root' if its
> not also a resource domain?

T1 is a resource domain.  Hmm... I think I see what you mean.  Let's
continue below.

> IIRC the problem with the 'threaded' marker is that it doesn't clearly
> capture what a resource domain is.
> 
> That is, assuming that a thread root is always a resource domain, we get
> the following problem:
> 
> If we set 'threaded' on the root group in order to create a thread
> (sub)group. If we then want to create another domain group, we'd have to
> clear 'threaded' on that.
> 
> 	R (t=1)
>        / \
> (t=1) T   D (t=0)
> 
> So far so good. However, now we want to create another thread group
> under our domain group D, so we have to set its 'threaded' marker again:
> 
> 	R (t=1)
>        / \
> (t=1) T   D (t=1)
>          /
> 	T (t=1)
> 
> And we can no longer identify D as a resource domain. If OTOH we mark
> 'domain' we get:
> 
> 	R (d=1)
>        / \
> (d=0) T   D (d=1)
>          /
> 	T (d=0)
> 
> Which clearly identifies the domains and the thread only groups.

So, the difference between the two interfaces is that the one I
proposed is marking the thread root which makes all its descendants
threaded while the above is marking each individual cgroup as being
whether a resource domain or threaded.

> Your objections to doing this were representing the resource controllers
> in the intermediate thread-only groups like:
> 
>    R
>     \
>      T  -- what to do with eg. memcg here?
>       \
>        D
>         \
> 	 T

And that's a perfectly valid point and as you pointed out the downside
of marking each node separately is that the interface would allow
configurations which aren't supported (at least for now) and that
there's just more to configure - the user has to set the mode on each
node after creation which is just the natural cost of being able to
express more.

> I suggested having all resource controllers represented with a soft-link
> back into the (thread-root) resource domain. But you were not convinced
> and worried people were going to be confused.

There's more to it than just confusion because resource interface
files belong to the parent cgroup rather than the cgroup which hosts
the files.  This becomes clear when thinking about which files a
container should be granted write access to when delegating a cgroup
subtree to it.  I'm sure we can get around it some way but we need to
be careful here.

> Your current proposal treats the resource controllers as disabled in
> thread subgroups -- which is perfectly fine given the constraint that we
> cannot have new domains in a thread sub-tree. My worry is that we don't
> paint ourselves in a corner and create an interface where we can never
> extend to allow this.

Understood.  We can err out on unsupported configurations and expand
in the future as necessary.

> I'm not wanting to change the relative weight thing. I merely wish to
> change the interface for setting the group weight to match that of the
> task weight, since both directly compete against one another.
> 
> That is, a group weight is the same kind of weight as a task weight.
> Therefore it would make much more sense to set then in equal units too.
> Not having done that from the beginning was rather silly.
> 
> By having a 100 based weight it becomes very hard to match the weight of
> nice()'ed tasks (something that was already tricky with the current
> shares interface because we'd need to share the nice weight table with
> userspace).
> 
> By having both in nice units, its both conceptually clear that they're
> the same kind of weight and easier to match weights.

I think there is another side to it.  If you just think about threads,
nice levels is the only thing we have.  It's a relative interface
which isn't defined rigidly and that can be a plus in that it allows
the kernel the room to maneuver with evolving implentations,
heuristics and so on.

Unfortunaltey, viewed from containerized resource control side, this
is very difficult to deal with.  The scale isn't strictly defined,
rather coarse and in generally difficult to work with - e.g. what do
you do when the system is split 20%, 80% and then you wanna add
another workload so that it becomes 20%, 50%, 30%?  And then there is
the problem with consistency with other resource types which may also
use proportional distribution.

But, it doesn't have to be this or that.  We can easily support both
units by simply allowing, say, "-5n" to be written to cpu.weight file
and interpret that as the nice value and exposing the closest nice
value in the cpu.stat file.

Does that sound workable?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHSET for-4.13] cgroup: implement cgroup2 thread mode, v2
@ 2017-06-30 13:23         ` Tejun Heo
  0 siblings, 0 replies; 47+ messages in thread
From: Tejun Heo @ 2017-06-30 13:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Li Zefan, hannes-druUgvl0LCNAfugRpC6u6w,
	mingo-H+wXaHxf7aLQT0dZR+AlfA, longman-H+wXaHxf7aLQT0dZR+AlfA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg,
	pjt-hpIqsD4AKlfQT0dZR+AlfA, luto-kltTT9wpgjJwATOyAt5JVQ,
	efault-Mmb7MZpHnFY, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

Hello, Peter.

On Tue, Jun 27, 2017 at 09:01:43AM +0200, Peter Zijlstra wrote:
> 
> I'm slowly getting back to things...

Welcome back.

> On Mon, Jun 12, 2017 at 05:27:53PM -0400, Tejun Heo wrote:
> > That's where the "join" thing comes from because we wanna be able to
> > tell apart whether a cgroup is gonna be a part of the existing thread
> > subtree or starting its own thread subtree.  There sure are multiple
> > ways to express that but one way or the other, if you wanna support
> > topologies like the last one, you have to distinguish the two.
> 
> Hmm,.. I had not considered that. I was strictly considering the root to
> always be a resource domain. What use-case or scenario did you have
> to want to do this?

I don't have any really.  It's mostly just from interface
completeness.

> That is, what is the meaning of having T1 be a separate 'root' if its
> not also a resource domain?

T1 is a resource domain.  Hmm... I think I see what you mean.  Let's
continue below.

> IIRC the problem with the 'threaded' marker is that it doesn't clearly
> capture what a resource domain is.
> 
> That is, assuming that a thread root is always a resource domain, we get
> the following problem:
> 
> If we set 'threaded' on the root group in order to create a thread
> (sub)group. If we then want to create another domain group, we'd have to
> clear 'threaded' on that.
> 
> 	R (t=1)
>        / \
> (t=1) T   D (t=0)
> 
> So far so good. However, now we want to create another thread group
> under our domain group D, so we have to set its 'threaded' marker again:
> 
> 	R (t=1)
>        / \
> (t=1) T   D (t=1)
>          /
> 	T (t=1)
> 
> And we can no longer identify D as a resource domain. If OTOH we mark
> 'domain' we get:
> 
> 	R (d=1)
>        / \
> (d=0) T   D (d=1)
>          /
> 	T (d=0)
> 
> Which clearly identifies the domains and the thread only groups.

So, the difference between the two interfaces is that the one I
proposed is marking the thread root which makes all its descendants
threaded while the above is marking each individual cgroup as being
whether a resource domain or threaded.

> Your objections to doing this were representing the resource controllers
> in the intermediate thread-only groups like:
> 
>    R
>     \
>      T  -- what to do with eg. memcg here?
>       \
>        D
>         \
> 	 T

And that's a perfectly valid point and as you pointed out the downside
of marking each node separately is that the interface would allow
configurations which aren't supported (at least for now) and that
there's just more to configure - the user has to set the mode on each
node after creation which is just the natural cost of being able to
express more.

> I suggested having all resource controllers represented with a soft-link
> back into the (thread-root) resource domain. But you were not convinced
> and worried people were going to be confused.

There's more to it than just confusion because resource interface
files belong to the parent cgroup rather than the cgroup which hosts
the files.  This becomes clear when thinking about which files a
container should be granted write access to when delegating a cgroup
subtree to it.  I'm sure we can get around it some way but we need to
be careful here.

> Your current proposal treats the resource controllers as disabled in
> thread subgroups -- which is perfectly fine given the constraint that we
> cannot have new domains in a thread sub-tree. My worry is that we don't
> paint ourselves in a corner and create an interface where we can never
> extend to allow this.

Understood.  We can err out on unsupported configurations and expand
in the future as necessary.

> I'm not wanting to change the relative weight thing. I merely wish to
> change the interface for setting the group weight to match that of the
> task weight, since both directly compete against one another.
> 
> That is, a group weight is the same kind of weight as a task weight.
> Therefore it would make much more sense to set then in equal units too.
> Not having done that from the beginning was rather silly.
> 
> By having a 100 based weight it becomes very hard to match the weight of
> nice()'ed tasks (something that was already tricky with the current
> shares interface because we'd need to share the nice weight table with
> userspace).
> 
> By having both in nice units, its both conceptually clear that they're
> the same kind of weight and easier to match weights.

I think there is another side to it.  If you just think about threads,
nice levels is the only thing we have.  It's a relative interface
which isn't defined rigidly and that can be a plus in that it allows
the kernel the room to maneuver with evolving implentations,
heuristics and so on.

Unfortunaltey, viewed from containerized resource control side, this
is very difficult to deal with.  The scale isn't strictly defined,
rather coarse and in generally difficult to work with - e.g. what do
you do when the system is split 20%, 80% and then you wanna add
another workload so that it becomes 20%, 50%, 30%?  And then there is
the problem with consistency with other resource types which may also
use proportional distribution.

But, it doesn't have to be this or that.  We can easily support both
units by simply allowing, say, "-5n" to be written to cpu.weight file
and interpret that as the nice value and exposing the closest nice
value in the cpu.stat file.

Does that sound workable?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHSET for-4.13] cgroup: implement cgroup2 thread mode, v2
  2017-06-30 13:23         ` Tejun Heo
  (?)
@ 2017-07-10  8:32         ` Peter Zijlstra
  2017-07-10 21:01             ` Waiman Long
  -1 siblings, 1 reply; 47+ messages in thread
From: Peter Zijlstra @ 2017-07-10  8:32 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, hannes, mingo, longman, cgroups, linux-kernel,
	kernel-team, pjt, luto, efault, torvalds

On Fri, Jun 30, 2017 at 09:23:24AM -0400, Tejun Heo wrote:
> On Tue, Jun 27, 2017 at 09:01:43AM +0200, Peter Zijlstra wrote:
> > On Mon, Jun 12, 2017 at 05:27:53PM -0400, Tejun Heo wrote:

> > IIRC the problem with the 'threaded' marker is that it doesn't clearly
> > capture what a resource domain is.
> > 
> > That is, assuming that a thread root is always a resource domain, we get
> > the following problem:
> > 
> > If we set 'threaded' on the root group in order to create a thread
> > (sub)group. If we then want to create another domain group, we'd have to
> > clear 'threaded' on that.
> > 
> > 	R (t=1)
> >        / \
> > (t=1) T   D (t=0)
> > 
> > So far so good. However, now we want to create another thread group
> > under our domain group D, so we have to set its 'threaded' marker again:
> > 
> > 	R (t=1)
> >        / \
> > (t=1) T   D (t=1)
> >          /
> > 	T (t=1)
> > 
> > And we can no longer identify D as a resource domain. If OTOH we mark
> > 'domain' we get:
> > 
> > 	R (d=1)
> >        / \
> > (d=0) T   D (d=1)
> >          /
> > 	T (d=0)
> > 
> > Which clearly identifies the domains and the thread only groups.
> 
> So, the difference between the two interfaces is that the one I
> proposed is marking the thread root which makes all its descendants
> threaded while the above is marking each individual cgroup as being
> whether a resource domain or threaded.

You start by marking the thread root, but then continue to mark all
'threaded' (including root). This then leads to the problem described
above where you cannot (easily) (re)discover what the actual root is.

My proposal differs in that we retain a clear difference between
resource domain / root and threaded (sub)trees.

> > Your objections to doing this were representing the resource controllers
> > in the intermediate thread-only groups like:
> > 
> >    R
> >     \
> >      T  -- what to do with eg. memcg here?
> >       \
> >        D
> >         \
> > 	     T
> 
> And that's a perfectly valid point and as you pointed out the downside
> of marking each node separately is that the interface would allow
> configurations which aren't supported (at least for now) and that
> there's just more to configure - the user has to set the mode on each
> node after creation which is just the natural cost of being able to
> express more.

I'm not sure my proposal results in _more_ configuration per-se. Yes
there are some differences, but they go both ways, with the threaded tag
its easier to create multiple threaded subgroups, but with the domain
tag its easier to create multiple domain subgroups.

And I think the bias for the domain tag -- easier to create more domains
-- is the right one. The whole threaded thing is fairly special purpose.

In any case; I'm fine with initially not supporting domains nested under
thread groups -- although I do think there's valid use-cases for doing
so.

> > I suggested having all resource controllers represented with a soft-link
> > back into the (thread-root) resource domain. But you were not convinced
> > and worried people were going to be confused.
> 
> There's more to it than just confusion because resource interface
> files belong to the parent cgroup rather than the cgroup which hosts
> the files.  This becomes clear when thinking about which files a
> container should be granted write access to when delegating a cgroup
> subtree to it.  I'm sure we can get around it some way but we need to
> be careful here.

I would suggest having all the back-links be RO. That way you can never
grant write permission, can never actually change things that don't make
'sense', but get a really good clue where you need to go.

> > By having both in nice units, its both conceptually clear that they're
> > the same kind of weight and easier to match weights.

> But, it doesn't have to be this or that.  We can easily support both
> units by simply allowing, say, "-5n" to be written to cpu.weight file
> and interpret that as the nice value and exposing the closest nice
> value in the cpu.stat file.
> 
> Does that sound workable?

Not sure; reading the value would become somewhat awkward I suppose. But
maybe we can simply do two files, whichever is written to last takes
precedence. And when a !nice weight is written, the nice file returns
-EINVAL or something.

Not particularly pretty though...

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHSET for-4.13] cgroup: implement cgroup2 thread mode, v2
@ 2017-07-10 21:01             ` Waiman Long
  0 siblings, 0 replies; 47+ messages in thread
From: Waiman Long @ 2017-07-10 21:01 UTC (permalink / raw)
  To: Peter Zijlstra, Tejun Heo
  Cc: Li Zefan, hannes, mingo, cgroups, linux-kernel, kernel-team, pjt,
	luto, efault, torvalds

On 07/10/2017 04:32 AM, Peter Zijlstra wrote:
> On Fri, Jun 30, 2017 at 09:23:24AM -0400, Tejun Heo wrote:
>> On Tue, Jun 27, 2017 at 09:01:43AM +0200, Peter Zijlstra wrote:
>>> On Mon, Jun 12, 2017 at 05:27:53PM -0400, Tejun Heo wrote:
>>> IIRC the problem with the 'threaded' marker is that it doesn't clearly
>>> capture what a resource domain is.
>>>
>>> That is, assuming that a thread root is always a resource domain, we get
>>> the following problem:
>>>
>>> If we set 'threaded' on the root group in order to create a thread
>>> (sub)group. If we then want to create another domain group, we'd have to
>>> clear 'threaded' on that.
>>>
>>> 	R (t=1)
>>>        / \
>>> (t=1) T   D (t=0)
>>>
>>> So far so good. However, now we want to create another thread group
>>> under our domain group D, so we have to set its 'threaded' marker again:
>>>
>>> 	R (t=1)
>>>        / \
>>> (t=1) T   D (t=1)
>>>          /
>>> 	T (t=1)

This configuration is actually not possible with Tejun's latest v3 patch
which took out the "join" operation. Maybe we should keep the "join"
operation if this configuration is likely to happen.

>>> And we can no longer identify D as a resource domain. If OTOH we mark
>>> 'domain' we get:
>>>
>>> 	R (d=1)
>>>        / \
>>> (d=0) T   D (d=1)
>>>          /
>>> 	T (d=0)
>>>
>>> Which clearly identifies the domains and the thread only groups.
>> So, the difference between the two interfaces is that the one I
>> proposed is marking the thread root which makes all its descendants
>> threaded while the above is marking each individual cgroup as being
>> whether a resource domain or threaded.
> You start by marking the thread root, but then continue to mark all
> 'threaded' (including root). This then leads to the problem described
> above where you cannot (easily) (re)discover what the actual root is.

I don't think that is true. Internally, we can always find out if a
cgroup is a thread root. Externally, the presence of resource domain
control knobs in a threaded cgroup will indicate that it is a thread root.

> My proposal differs in that we retain a clear difference between
> resource domain / root and threaded (sub)trees.

For me, I have no preference of using either the threaded or the domain
marker as long as some kind of join operation that allows the
configuration above is present in the thread mode. They both looks good
to me. It is just a matter of which aspect of the cgroup we want to
emphasize. I would suggest we reach a consensus ASAP and move forward to
other more substantial issues in cgroup v2.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHSET for-4.13] cgroup: implement cgroup2 thread mode, v2
@ 2017-07-10 21:01             ` Waiman Long
  0 siblings, 0 replies; 47+ messages in thread
From: Waiman Long @ 2017-07-10 21:01 UTC (permalink / raw)
  To: Peter Zijlstra, Tejun Heo
  Cc: Li Zefan, hannes-druUgvl0LCNAfugRpC6u6w,
	mingo-H+wXaHxf7aLQT0dZR+AlfA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg,
	pjt-hpIqsD4AKlfQT0dZR+AlfA, luto-kltTT9wpgjJwATOyAt5JVQ,
	efault-Mmb7MZpHnFY, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

On 07/10/2017 04:32 AM, Peter Zijlstra wrote:
> On Fri, Jun 30, 2017 at 09:23:24AM -0400, Tejun Heo wrote:
>> On Tue, Jun 27, 2017 at 09:01:43AM +0200, Peter Zijlstra wrote:
>>> On Mon, Jun 12, 2017 at 05:27:53PM -0400, Tejun Heo wrote:
>>> IIRC the problem with the 'threaded' marker is that it doesn't clearly
>>> capture what a resource domain is.
>>>
>>> That is, assuming that a thread root is always a resource domain, we get
>>> the following problem:
>>>
>>> If we set 'threaded' on the root group in order to create a thread
>>> (sub)group. If we then want to create another domain group, we'd have to
>>> clear 'threaded' on that.
>>>
>>> 	R (t=1)
>>>        / \
>>> (t=1) T   D (t=0)
>>>
>>> So far so good. However, now we want to create another thread group
>>> under our domain group D, so we have to set its 'threaded' marker again:
>>>
>>> 	R (t=1)
>>>        / \
>>> (t=1) T   D (t=1)
>>>          /
>>> 	T (t=1)

This configuration is actually not possible with Tejun's latest v3 patch
which took out the "join" operation. Maybe we should keep the "join"
operation if this configuration is likely to happen.

>>> And we can no longer identify D as a resource domain. If OTOH we mark
>>> 'domain' we get:
>>>
>>> 	R (d=1)
>>>        / \
>>> (d=0) T   D (d=1)
>>>          /
>>> 	T (d=0)
>>>
>>> Which clearly identifies the domains and the thread only groups.
>> So, the difference between the two interfaces is that the one I
>> proposed is marking the thread root which makes all its descendants
>> threaded while the above is marking each individual cgroup as being
>> whether a resource domain or threaded.
> You start by marking the thread root, but then continue to mark all
> 'threaded' (including root). This then leads to the problem described
> above where you cannot (easily) (re)discover what the actual root is.

I don't think that is true. Internally, we can always find out if a
cgroup is a thread root. Externally, the presence of resource domain
control knobs in a threaded cgroup will indicate that it is a thread root.

> My proposal differs in that we retain a clear difference between
> resource domain / root and threaded (sub)trees.

For me, I have no preference of using either the threaded or the domain
marker as long as some kind of join operation that allows the
configuration above is present in the thread mode. They both looks good
to me. It is just a matter of which aspect of the cgroup we want to
emphasize. I would suggest we reach a consensus ASAP and move forward to
other more substantial issues in cgroup v2.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHSET for-4.13] cgroup: implement cgroup2 thread mode, v2
@ 2017-07-11 12:15               ` Peter Zijlstra
  0 siblings, 0 replies; 47+ messages in thread
From: Peter Zijlstra @ 2017-07-11 12:15 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, hannes, mingo, cgroups, linux-kernel,
	kernel-team, pjt, luto, efault, torvalds

On Mon, Jul 10, 2017 at 05:01:19PM -0400, Waiman Long wrote:
> On 07/10/2017 04:32 AM, Peter Zijlstra wrote:
> > On Fri, Jun 30, 2017 at 09:23:24AM -0400, Tejun Heo wrote:
> >> On Tue, Jun 27, 2017 at 09:01:43AM +0200, Peter Zijlstra wrote:
> >>> On Mon, Jun 12, 2017 at 05:27:53PM -0400, Tejun Heo wrote:
> >>> IIRC the problem with the 'threaded' marker is that it doesn't clearly
> >>> capture what a resource domain is.
> >>>
> >>> That is, assuming that a thread root is always a resource domain, we get
> >>> the following problem:
> >>>
> >>> If we set 'threaded' on the root group in order to create a thread
> >>> (sub)group. If we then want to create another domain group, we'd have to
> >>> clear 'threaded' on that.
> >>>
> >>> 	R (t=1)
> >>>        / \
> >>> (t=1) T   D (t=0)
> >>>
> >>> So far so good. However, now we want to create another thread group
> >>> under our domain group D, so we have to set its 'threaded' marker again:
> >>>
> >>> 	R (t=1)
> >>>        / \
> >>> (t=1) T   D (t=1)
> >>>          /
> >>> 	T (t=1)
> 
> This configuration is actually not possible with Tejun's latest v3 patch
> which took out the "join" operation. Maybe we should keep the "join"
> operation if this configuration is likely to happen.

Wait what? Why not? That's a fairly fundamental setup that needs to be
possible. I understood the 'join' thing was for something else entirely.
TJ said the 'join' was to allow thread-roots that were not domain
controllers -- which I didn't get the point of.

> >>> And we can no longer identify D as a resource domain. If OTOH we mark
> >>> 'domain' we get:
> >>>
> >>> 	R (d=1)
> >>>        / \
> >>> (d=0) T   D (d=1)
> >>>          /
> >>> 	T (d=0)
> >>>
> >>> Which clearly identifies the domains and the thread only groups.
> >> So, the difference between the two interfaces is that the one I
> >> proposed is marking the thread root which makes all its descendants
> >> threaded while the above is marking each individual cgroup as being
> >> whether a resource domain or threaded.
> > You start by marking the thread root, but then continue to mark all
> > 'threaded' (including root). This then leads to the problem described
> > above where you cannot (easily) (re)discover what the actual root is.
> 
> I don't think that is true. Internally, we can always find out if a
> cgroup is a thread root. Externally, the presence of resource domain
> control knobs in a threaded cgroup will indicate that it is a thread root.

You're confusing thread root with resource domain. While a resource
domain must be a thread root the reverse is not necessarily so (this is
what I understood the 'join' thing to be for).

And this is detection by inference, which breaks the moment you disable
all resource domain controllers, because at that point those files will
not be present.

> > My proposal differs in that we retain a clear difference between
> > resource domain / root and threaded (sub)trees.
> 
> For me, I have no preference of using either the threaded or the domain
> marker as long as some kind of join operation that allows the
> configuration above is present in the thread mode. They both looks good
> to me. It is just a matter of which aspect of the cgroup we want to
> emphasize. I would suggest we reach a consensus ASAP and move forward to
> other more substantial issues in cgroup v2.

I think you're confused on join. Join should not be needed.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHSET for-4.13] cgroup: implement cgroup2 thread mode, v2
@ 2017-07-11 12:15               ` Peter Zijlstra
  0 siblings, 0 replies; 47+ messages in thread
From: Peter Zijlstra @ 2017-07-11 12:15 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, hannes-druUgvl0LCNAfugRpC6u6w,
	mingo-H+wXaHxf7aLQT0dZR+AlfA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg,
	pjt-hpIqsD4AKlfQT0dZR+AlfA, luto-kltTT9wpgjJwATOyAt5JVQ,
	efault-Mmb7MZpHnFY, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

On Mon, Jul 10, 2017 at 05:01:19PM -0400, Waiman Long wrote:
> On 07/10/2017 04:32 AM, Peter Zijlstra wrote:
> > On Fri, Jun 30, 2017 at 09:23:24AM -0400, Tejun Heo wrote:
> >> On Tue, Jun 27, 2017 at 09:01:43AM +0200, Peter Zijlstra wrote:
> >>> On Mon, Jun 12, 2017 at 05:27:53PM -0400, Tejun Heo wrote:
> >>> IIRC the problem with the 'threaded' marker is that it doesn't clearly
> >>> capture what a resource domain is.
> >>>
> >>> That is, assuming that a thread root is always a resource domain, we get
> >>> the following problem:
> >>>
> >>> If we set 'threaded' on the root group in order to create a thread
> >>> (sub)group. If we then want to create another domain group, we'd have to
> >>> clear 'threaded' on that.
> >>>
> >>> 	R (t=1)
> >>>        / \
> >>> (t=1) T   D (t=0)
> >>>
> >>> So far so good. However, now we want to create another thread group
> >>> under our domain group D, so we have to set its 'threaded' marker again:
> >>>
> >>> 	R (t=1)
> >>>        / \
> >>> (t=1) T   D (t=1)
> >>>          /
> >>> 	T (t=1)
> 
> This configuration is actually not possible with Tejun's latest v3 patch
> which took out the "join" operation. Maybe we should keep the "join"
> operation if this configuration is likely to happen.

Wait what? Why not? That's a fairly fundamental setup that needs to be
possible. I understood the 'join' thing was for something else entirely.
TJ said the 'join' was to allow thread-roots that were not domain
controllers -- which I didn't get the point of.

> >>> And we can no longer identify D as a resource domain. If OTOH we mark
> >>> 'domain' we get:
> >>>
> >>> 	R (d=1)
> >>>        / \
> >>> (d=0) T   D (d=1)
> >>>          /
> >>> 	T (d=0)
> >>>
> >>> Which clearly identifies the domains and the thread only groups.
> >> So, the difference between the two interfaces is that the one I
> >> proposed is marking the thread root which makes all its descendants
> >> threaded while the above is marking each individual cgroup as being
> >> whether a resource domain or threaded.
> > You start by marking the thread root, but then continue to mark all
> > 'threaded' (including root). This then leads to the problem described
> > above where you cannot (easily) (re)discover what the actual root is.
> 
> I don't think that is true. Internally, we can always find out if a
> cgroup is a thread root. Externally, the presence of resource domain
> control knobs in a threaded cgroup will indicate that it is a thread root.

You're confusing thread root with resource domain. While a resource
domain must be a thread root the reverse is not necessarily so (this is
what I understood the 'join' thing to be for).

And this is detection by inference, which breaks the moment you disable
all resource domain controllers, because at that point those files will
not be present.

> > My proposal differs in that we retain a clear difference between
> > resource domain / root and threaded (sub)trees.
> 
> For me, I have no preference of using either the threaded or the domain
> marker as long as some kind of join operation that allows the
> configuration above is present in the thread mode. They both looks good
> to me. It is just a matter of which aspect of the cgroup we want to
> emphasize. I would suggest we reach a consensus ASAP and move forward to
> other more substantial issues in cgroup v2.

I think you're confused on join. Join should not be needed.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHSET for-4.13] cgroup: implement cgroup2 thread mode, v2
@ 2017-07-11 14:14                 ` Waiman Long
  0 siblings, 0 replies; 47+ messages in thread
From: Waiman Long @ 2017-07-11 14:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Li Zefan, hannes, mingo, cgroups, linux-kernel,
	kernel-team, pjt, luto, efault, torvalds

On 07/11/2017 08:15 AM, Peter Zijlstra wrote:
> On Mon, Jul 10, 2017 at 05:01:19PM -0400, Waiman Long wrote:
>> On 07/10/2017 04:32 AM, Peter Zijlstra wrote:
>>> On Fri, Jun 30, 2017 at 09:23:24AM -0400, Tejun Heo wrote:
>>>> On Tue, Jun 27, 2017 at 09:01:43AM +0200, Peter Zijlstra wrote:
>>>>> On Mon, Jun 12, 2017 at 05:27:53PM -0400, Tejun Heo wrote:
>>>>> IIRC the problem with the 'threaded' marker is that it doesn't clearly
>>>>> capture what a resource domain is.
>>>>>
>>>>> That is, assuming that a thread root is always a resource domain, we get
>>>>> the following problem:
>>>>>
>>>>> If we set 'threaded' on the root group in order to create a thread
>>>>> (sub)group. If we then want to create another domain group, we'd have to
>>>>> clear 'threaded' on that.
>>>>>
>>>>> 	R (t=1)
>>>>>        / \
>>>>> (t=1) T   D (t=0)
>>>>>
>>>>> So far so good. However, now we want to create another thread group
>>>>> under our domain group D, so we have to set its 'threaded' marker again:
>>>>>
>>>>> 	R (t=1)
>>>>>        / \
>>>>> (t=1) T   D (t=1)
>>>>>          /
>>>>> 	T (t=1)
>> This configuration is actually not possible with Tejun's latest v3 patch
>> which took out the "join" operation. Maybe we should keep the "join"
>> operation if this configuration is likely to happen.
> Wait what? Why not? That's a fairly fundamental setup that needs to be
> possible. I understood the 'join' thing was for something else entirely.
> TJ said the 'join' was to allow thread-roots that were not domain
> controllers -- which I didn't get the point of.

The "join" was a special op for the children of cgroup root to join the
root as part of a threaded subtree. The children can instead use the
"enable" option to become a thread root which was the configuration
shown above.  This behavior applied only to children of root. Down the
hierarchy, you can't have configuration like:

     R (t=0)
    / \
       D (t=1)
      / \
     T   D (t=1)
       
Instead, you can have

     R (t=0)
    / \
       D (t=0)
      / \
(t=1)D   D(t=1)

With Tejun's v3 patch, the "join" operation was removed and "enable"
behaved like "join" in joining the threaded subtree of the root. I was
wrong in saying that the configuration listed in your example was not
possible. It was, but it depends on the order of activating the thread
mode. If we enables thread mode on a child of root first followed by the
root itself, we can have your configuration, but not in the reverse
order. It was possible in the reverse order in the previous patch.

>>>>> And we can no longer identify D as a resource domain. If OTOH we mark
>>>>> 'domain' we get:
>>>>>
>>>>> 	R (d=1)
>>>>>        / \
>>>>> (d=0) T   D (d=1)
>>>>>          /
>>>>> 	T (d=0)
>>>>>
>>>>> Which clearly identifies the domains and the thread only groups.
>>>> So, the difference between the two interfaces is that the one I
>>>> proposed is marking the thread root which makes all its descendants
>>>> threaded while the above is marking each individual cgroup as being
>>>> whether a resource domain or threaded.
>>> You start by marking the thread root, but then continue to mark all
>>> 'threaded' (including root). This then leads to the problem described
>>> above where you cannot (easily) (re)discover what the actual root is.
>> I don't think that is true. Internally, we can always find out if a
>> cgroup is a thread root. Externally, the presence of resource domain
>> control knobs in a threaded cgroup will indicate that it is a thread root.
> You're confusing thread root with resource domain. While a resource
> domain must be a thread root the reverse is not necessarily so (this is
> what I understood the 'join' thing to be for).

I know the difference between thread root and resource domain. In the
current scheme, all the cgroups which are not threaded under a thread
root are resource domain.

> And this is detection by inference, which breaks the moment you disable
> all resource domain controllers, because at that point those files will
> not be present.

It is true that there is no external marker to find out if a threaded
cgroup is a root or not when the parent of a thread root is also a
thread root of a separate threaded subtree if the domain controller
files are not present. However, we can always add a status file to
indicate the state of threaded-ness of a cgroup if we want to.

>>> My proposal differs in that we retain a clear difference between
>>> resource domain / root and threaded (sub)trees.
>> For me, I have no preference of using either the threaded or the domain
>> marker as long as some kind of join operation that allows the
>> configuration above is present in the thread mode. They both looks good
>> to me. It is just a matter of which aspect of the cgroup we want to
>> emphasize. I would suggest we reach a consensus ASAP and move forward to
>> other more substantial issues in cgroup v2.
> I think you're confused on join. Join should not be needed.

Tejun's patch makes resource domain the default and threaded-ness as an
additional attribute that needs to be specified. Your proposal make
non-resource domain where threads can exist as the default and resource
domain as something that needs to be explicitly specified. They are just
different ways of partitioning a cgroup hierarchy into different
domains. Tejun's patch has a well defined boundary for threaded subtree
where threads can be migrated from one part of a subtree to another.
Your proposal is less clear-cut on how to handle thread migration. 

Yes, the "join" operation may not be needed. It is just a matter of how
much flexibility we want to specify the desirable cgroup configuration.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHSET for-4.13] cgroup: implement cgroup2 thread mode, v2
@ 2017-07-11 14:14                 ` Waiman Long
  0 siblings, 0 replies; 47+ messages in thread
From: Waiman Long @ 2017-07-11 14:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Li Zefan, hannes-druUgvl0LCNAfugRpC6u6w,
	mingo-H+wXaHxf7aLQT0dZR+AlfA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg,
	pjt-hpIqsD4AKlfQT0dZR+AlfA, luto-kltTT9wpgjJwATOyAt5JVQ,
	efault-Mmb7MZpHnFY, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

On 07/11/2017 08:15 AM, Peter Zijlstra wrote:
> On Mon, Jul 10, 2017 at 05:01:19PM -0400, Waiman Long wrote:
>> On 07/10/2017 04:32 AM, Peter Zijlstra wrote:
>>> On Fri, Jun 30, 2017 at 09:23:24AM -0400, Tejun Heo wrote:
>>>> On Tue, Jun 27, 2017 at 09:01:43AM +0200, Peter Zijlstra wrote:
>>>>> On Mon, Jun 12, 2017 at 05:27:53PM -0400, Tejun Heo wrote:
>>>>> IIRC the problem with the 'threaded' marker is that it doesn't clearly
>>>>> capture what a resource domain is.
>>>>>
>>>>> That is, assuming that a thread root is always a resource domain, we get
>>>>> the following problem:
>>>>>
>>>>> If we set 'threaded' on the root group in order to create a thread
>>>>> (sub)group. If we then want to create another domain group, we'd have to
>>>>> clear 'threaded' on that.
>>>>>
>>>>> 	R (t=1)
>>>>>        / \
>>>>> (t=1) T   D (t=0)
>>>>>
>>>>> So far so good. However, now we want to create another thread group
>>>>> under our domain group D, so we have to set its 'threaded' marker again:
>>>>>
>>>>> 	R (t=1)
>>>>>        / \
>>>>> (t=1) T   D (t=1)
>>>>>          /
>>>>> 	T (t=1)
>> This configuration is actually not possible with Tejun's latest v3 patch
>> which took out the "join" operation. Maybe we should keep the "join"
>> operation if this configuration is likely to happen.
> Wait what? Why not? That's a fairly fundamental setup that needs to be
> possible. I understood the 'join' thing was for something else entirely.
> TJ said the 'join' was to allow thread-roots that were not domain
> controllers -- which I didn't get the point of.

The "join" was a special op for the children of cgroup root to join the
root as part of a threaded subtree. The children can instead use the
"enable" option to become a thread root which was the configuration
shown above.  This behavior applied only to children of root. Down the
hierarchy, you can't have configuration like:

     R (t=0)
    / \
       D (t=1)
      / \
     T   D (t=1)
       
Instead, you can have

     R (t=0)
    / \
       D (t=0)
      / \
(t=1)D   D(t=1)

With Tejun's v3 patch, the "join" operation was removed and "enable"
behaved like "join" in joining the threaded subtree of the root. I was
wrong in saying that the configuration listed in your example was not
possible. It was, but it depends on the order of activating the thread
mode. If we enables thread mode on a child of root first followed by the
root itself, we can have your configuration, but not in the reverse
order. It was possible in the reverse order in the previous patch.

>>>>> And we can no longer identify D as a resource domain. If OTOH we mark
>>>>> 'domain' we get:
>>>>>
>>>>> 	R (d=1)
>>>>>        / \
>>>>> (d=0) T   D (d=1)
>>>>>          /
>>>>> 	T (d=0)
>>>>>
>>>>> Which clearly identifies the domains and the thread only groups.
>>>> So, the difference between the two interfaces is that the one I
>>>> proposed is marking the thread root which makes all its descendants
>>>> threaded while the above is marking each individual cgroup as being
>>>> whether a resource domain or threaded.
>>> You start by marking the thread root, but then continue to mark all
>>> 'threaded' (including root). This then leads to the problem described
>>> above where you cannot (easily) (re)discover what the actual root is.
>> I don't think that is true. Internally, we can always find out if a
>> cgroup is a thread root. Externally, the presence of resource domain
>> control knobs in a threaded cgroup will indicate that it is a thread root.
> You're confusing thread root with resource domain. While a resource
> domain must be a thread root the reverse is not necessarily so (this is
> what I understood the 'join' thing to be for).

I know the difference between thread root and resource domain. In the
current scheme, all the cgroups which are not threaded under a thread
root are resource domain.

> And this is detection by inference, which breaks the moment you disable
> all resource domain controllers, because at that point those files will
> not be present.

It is true that there is no external marker to find out if a threaded
cgroup is a root or not when the parent of a thread root is also a
thread root of a separate threaded subtree if the domain controller
files are not present. However, we can always add a status file to
indicate the state of threaded-ness of a cgroup if we want to.

>>> My proposal differs in that we retain a clear difference between
>>> resource domain / root and threaded (sub)trees.
>> For me, I have no preference of using either the threaded or the domain
>> marker as long as some kind of join operation that allows the
>> configuration above is present in the thread mode. They both looks good
>> to me. It is just a matter of which aspect of the cgroup we want to
>> emphasize. I would suggest we reach a consensus ASAP and move forward to
>> other more substantial issues in cgroup v2.
> I think you're confused on join. Join should not be needed.

Tejun's patch makes resource domain the default and threaded-ness as an
additional attribute that needs to be specified. Your proposal make
non-resource domain where threads can exist as the default and resource
domain as something that needs to be explicitly specified. They are just
different ways of partitioning a cgroup hierarchy into different
domains. Tejun's patch has a well defined boundary for threaded subtree
where threads can be migrated from one part of a subtree to another.
Your proposal is less clear-cut on how to handle thread migration. 

Yes, the "join" operation may not be needed. It is just a matter of how
much flexibility we want to specify the desirable cgroup configuration.

Cheers,
Longman



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHSET for-4.13] cgroup: implement cgroup2 thread mode, v2
@ 2017-07-11 16:52                   ` Peter Zijlstra
  0 siblings, 0 replies; 47+ messages in thread
From: Peter Zijlstra @ 2017-07-11 16:52 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, hannes, mingo, cgroups, linux-kernel,
	kernel-team, pjt, luto, efault, torvalds

On Tue, Jul 11, 2017 at 10:14:42AM -0400, Waiman Long wrote:

> The "join" was a special op for the children of cgroup root to join the
> root as part of a threaded subtree. The children can instead use the
> "enable" option to become a thread root which was the configuration
> shown above.  This behavior applied only to children of root. Down the
> hierarchy, you can't have configuration like:
> 
>      R (t=0)
>     / \
>        D (t=1)
>       / \
>      T   D (t=1)

Why not?

First you create:

      R (t=0)
     / \
        D (t=1)
       / \
      T   T (t=1)

Then you flip t=0 like:

      R (t=0)
     / \
        D (t=1)
       / \
      T   D (t=0)

And then you flip t=1 again:

      R (t=0)
     / \
        D (t=1)
       / \
      T   D (t=1)

> With Tejun's v3 patch, the "join" operation was removed and "enable"

I've no clue what 'enable' is... :-(

> behaved like "join" in joining the threaded subtree of the root. I was
> wrong in saying that the configuration listed in your example was not
> possible. It was, but it depends on the order of activating the thread
> mode. If we enables thread mode on a child of root first followed by the
> root itself, we can have your configuration, but not in the reverse
> order. It was possible in the reverse order in the previous patch.

Just create a T child, then flip t=0 to convert it to D, then flip it to
1 again to create a new thread-root, no?

> > And this is detection by inference, which breaks the moment you disable
> > all resource domain controllers, because at that point those files will
> > not be present.
> 
> It is true that there is no external marker to find out if a threaded
> cgroup is a root or not when the parent of a thread root is also a
> thread root of a separate threaded subtree if the domain controller
> files are not present. However, we can always add a status file to
> indicate the state of threaded-ness of a cgroup if we want to.

Why add status files when a simple change in marker can readily provide
this information?

> Tejun's patch makes resource domain the default and threaded-ness as an
> additional attribute that needs to be specified. Your proposal make
> non-resource domain where threads can exist as the default and resource
> domain as something that needs to be explicitly specified.

Not so. My proposal has resource domains as the default (remember, root
_MUST_ be a resource domain, therefore we must start start with
root.d=1). Therefore, any new subgroup will be a resource domain by
default and if you ignore the new attribute it will work exactly like
cgroup-v2 does today.

Only if you clear the new attribute do you get a thread subgroup.

> They are just
> different ways of partitioning a cgroup hierarchy into different
> domains. Tejun's patch has a well defined boundary for threaded subtree
> where threads can be migrated from one part of a subtree to another.
> Your proposal is less clear-cut on how to handle thread migration. 

Disagree again. We have the exact same boundaries. Just ensure the
migration doesn't escape the resource domain.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHSET for-4.13] cgroup: implement cgroup2 thread mode, v2
@ 2017-07-11 16:52                   ` Peter Zijlstra
  0 siblings, 0 replies; 47+ messages in thread
From: Peter Zijlstra @ 2017-07-11 16:52 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, hannes-druUgvl0LCNAfugRpC6u6w,
	mingo-H+wXaHxf7aLQT0dZR+AlfA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg,
	pjt-hpIqsD4AKlfQT0dZR+AlfA, luto-kltTT9wpgjJwATOyAt5JVQ,
	efault-Mmb7MZpHnFY, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

On Tue, Jul 11, 2017 at 10:14:42AM -0400, Waiman Long wrote:

> The "join" was a special op for the children of cgroup root to join the
> root as part of a threaded subtree. The children can instead use the
> "enable" option to become a thread root which was the configuration
> shown above.  This behavior applied only to children of root. Down the
> hierarchy, you can't have configuration like:
> 
>      R (t=0)
>     / \
>        D (t=1)
>       / \
>      T   D (t=1)

Why not?

First you create:

      R (t=0)
     / \
        D (t=1)
       / \
      T   T (t=1)

Then you flip t=0 like:

      R (t=0)
     / \
        D (t=1)
       / \
      T   D (t=0)

And then you flip t=1 again:

      R (t=0)
     / \
        D (t=1)
       / \
      T   D (t=1)

> With Tejun's v3 patch, the "join" operation was removed and "enable"

I've no clue what 'enable' is... :-(

> behaved like "join" in joining the threaded subtree of the root. I was
> wrong in saying that the configuration listed in your example was not
> possible. It was, but it depends on the order of activating the thread
> mode. If we enables thread mode on a child of root first followed by the
> root itself, we can have your configuration, but not in the reverse
> order. It was possible in the reverse order in the previous patch.

Just create a T child, then flip t=0 to convert it to D, then flip it to
1 again to create a new thread-root, no?

> > And this is detection by inference, which breaks the moment you disable
> > all resource domain controllers, because at that point those files will
> > not be present.
> 
> It is true that there is no external marker to find out if a threaded
> cgroup is a root or not when the parent of a thread root is also a
> thread root of a separate threaded subtree if the domain controller
> files are not present. However, we can always add a status file to
> indicate the state of threaded-ness of a cgroup if we want to.

Why add status files when a simple change in marker can readily provide
this information?

> Tejun's patch makes resource domain the default and threaded-ness as an
> additional attribute that needs to be specified. Your proposal make
> non-resource domain where threads can exist as the default and resource
> domain as something that needs to be explicitly specified.

Not so. My proposal has resource domains as the default (remember, root
_MUST_ be a resource domain, therefore we must start start with
root.d=1). Therefore, any new subgroup will be a resource domain by
default and if you ignore the new attribute it will work exactly like
cgroup-v2 does today.

Only if you clear the new attribute do you get a thread subgroup.

> They are just
> different ways of partitioning a cgroup hierarchy into different
> domains. Tejun's patch has a well defined boundary for threaded subtree
> where threads can be migrated from one part of a subtree to another.
> Your proposal is less clear-cut on how to handle thread migration. 

Disagree again. We have the exact same boundaries. Just ensure the
migration doesn't escape the resource domain.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHSET for-4.13] cgroup: implement cgroup2 thread mode, v2
@ 2017-07-11 21:12                     ` Waiman Long
  0 siblings, 0 replies; 47+ messages in thread
From: Waiman Long @ 2017-07-11 21:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Li Zefan, hannes, mingo, cgroups, linux-kernel,
	kernel-team, pjt, luto, efault, torvalds

On 07/11/2017 12:52 PM, Peter Zijlstra wrote:
> On Tue, Jul 11, 2017 at 10:14:42AM -0400, Waiman Long wrote:
>
>> The "join" was a special op for the children of cgroup root to join the
>> root as part of a threaded subtree. The children can instead use the
>> "enable" option to become a thread root which was the configuration
>> shown above.  This behavior applied only to children of root. Down the
>> hierarchy, you can't have configuration like:
>>
>>      R (t=0)
>>     / \
>>        D (t=1)
>>       / \
>>      T   D (t=1)
> Why not?
>
> First you create:
>
>       R (t=0)
>      / \
>         D (t=1)
>        / \
>       T   T (t=1)
>
> Then you flip t=0 like:
>
>       R (t=0)
>      / \
>         D (t=1)
>        / \
>       T   D (t=0)
>
> And then you flip t=1 again:
>
>       R (t=0)
>      / \
>         D (t=1)
>        / \
>       T   D (t=1)

Tejun's thread mode patch has constraints on what operations are allowed
and what aren't. For a threaded subtree, thread mode cannot be disabled
in the middle of the tree. You have to remove all the child cgroups in
the subtree before you can disable thread mode at the thread root level.
So the second step will not be allowed. We can certainly argue if it is
a good thing or not. What I am talking about is the current behavior of
the patch.

>> With Tejun's v3 patch, the "join" operation was removed and "enable"
> I've no clue what 'enable' is... :-(

The keywords to turn on and off thread mode are:

enable: t=1
disable: t=0

As discussed above, there are constraints on when that transition is
allowed to happen.

>> behaved like "join" in joining the threaded subtree of the root. I was
>> wrong in saying that the configuration listed in your example was not
>> possible. It was, but it depends on the order of activating the thread
>> mode. If we enables thread mode on a child of root first followed by the
>> root itself, we can have your configuration, but not in the reverse
>> order. It was possible in the reverse order in the previous patch.
> Just create a T child, then flip t=0 to convert it to D, then flip it to
> 1 again to create a new thread-root, no?

The constraints are there to make it easier to code and observe
guidelines like no internal process constraint. So what you said above
is not current allowed.

>>> And this is detection by inference, which breaks the moment you disable
>>> all resource domain controllers, because at that point those files will
>>> not be present.
>> It is true that there is no external marker to find out if a threaded
>> cgroup is a root or not when the parent of a thread root is also a
>> thread root of a separate threaded subtree if the domain controller
>> files are not present. However, we can always add a status file to
>> indicate the state of threaded-ness of a cgroup if we want to.
> Why add status files when a simple change in marker can readily provide
> this information?

For thread mode, the only way to find out if a cgroup is in that mode is
to dump out the content of the cgroup.procs and cgroup.threads files.
Reading cgroup.threads will return error if thread mode is not enabled.
Reading cgroup.procs is allowed in the thread root, but not in the rest
of the threaded subtree. So there is a way to find out, but kind of
indirect.


>> Tejun's patch makes resource domain the default and threaded-ness as an
>> additional attribute that needs to be specified. Your proposal make
>> non-resource domain where threads can exist as the default and resource
>> domain as something that needs to be explicitly specified.
> Not so. My proposal has resource domains as the default (remember, root
> _MUST_ be a resource domain, therefore we must start start with
> root.d=1). Therefore, any new subgroup will be a resource domain by
> default and if you ignore the new attribute it will work exactly like
> cgroup-v2 does today.
>
> Only if you clear the new attribute do you get a thread subgroup.

OK.

>> They are just
>> different ways of partitioning a cgroup hierarchy into different
>> domains. Tejun's patch has a well defined boundary for threaded subtree
>> where threads can be migrated from one part of a subtree to another.
>> Your proposal is less clear-cut on how to handle thread migration. 
> Disagree again. We have the exact same boundaries. Just ensure the
> migration doesn't escape the resource domain.
>
Agreed.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHSET for-4.13] cgroup: implement cgroup2 thread mode, v2
@ 2017-07-11 21:12                     ` Waiman Long
  0 siblings, 0 replies; 47+ messages in thread
From: Waiman Long @ 2017-07-11 21:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Li Zefan, hannes-druUgvl0LCNAfugRpC6u6w,
	mingo-H+wXaHxf7aLQT0dZR+AlfA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg,
	pjt-hpIqsD4AKlfQT0dZR+AlfA, luto-kltTT9wpgjJwATOyAt5JVQ,
	efault-Mmb7MZpHnFY, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

On 07/11/2017 12:52 PM, Peter Zijlstra wrote:
> On Tue, Jul 11, 2017 at 10:14:42AM -0400, Waiman Long wrote:
>
>> The "join" was a special op for the children of cgroup root to join the
>> root as part of a threaded subtree. The children can instead use the
>> "enable" option to become a thread root which was the configuration
>> shown above.  This behavior applied only to children of root. Down the
>> hierarchy, you can't have configuration like:
>>
>>      R (t=0)
>>     / \
>>        D (t=1)
>>       / \
>>      T   D (t=1)
> Why not?
>
> First you create:
>
>       R (t=0)
>      / \
>         D (t=1)
>        / \
>       T   T (t=1)
>
> Then you flip t=0 like:
>
>       R (t=0)
>      / \
>         D (t=1)
>        / \
>       T   D (t=0)
>
> And then you flip t=1 again:
>
>       R (t=0)
>      / \
>         D (t=1)
>        / \
>       T   D (t=1)

Tejun's thread mode patch has constraints on what operations are allowed
and what aren't. For a threaded subtree, thread mode cannot be disabled
in the middle of the tree. You have to remove all the child cgroups in
the subtree before you can disable thread mode at the thread root level.
So the second step will not be allowed. We can certainly argue if it is
a good thing or not. What I am talking about is the current behavior of
the patch.

>> With Tejun's v3 patch, the "join" operation was removed and "enable"
> I've no clue what 'enable' is... :-(

The keywords to turn on and off thread mode are:

enable: t=1
disable: t=0

As discussed above, there are constraints on when that transition is
allowed to happen.

>> behaved like "join" in joining the threaded subtree of the root. I was
>> wrong in saying that the configuration listed in your example was not
>> possible. It was, but it depends on the order of activating the thread
>> mode. If we enables thread mode on a child of root first followed by the
>> root itself, we can have your configuration, but not in the reverse
>> order. It was possible in the reverse order in the previous patch.
> Just create a T child, then flip t=0 to convert it to D, then flip it to
> 1 again to create a new thread-root, no?

The constraints are there to make it easier to code and observe
guidelines like no internal process constraint. So what you said above
is not current allowed.

>>> And this is detection by inference, which breaks the moment you disable
>>> all resource domain controllers, because at that point those files will
>>> not be present.
>> It is true that there is no external marker to find out if a threaded
>> cgroup is a root or not when the parent of a thread root is also a
>> thread root of a separate threaded subtree if the domain controller
>> files are not present. However, we can always add a status file to
>> indicate the state of threaded-ness of a cgroup if we want to.
> Why add status files when a simple change in marker can readily provide
> this information?

For thread mode, the only way to find out if a cgroup is in that mode is
to dump out the content of the cgroup.procs and cgroup.threads files.
Reading cgroup.threads will return error if thread mode is not enabled.
Reading cgroup.procs is allowed in the thread root, but not in the rest
of the threaded subtree. So there is a way to find out, but kind of
indirect.


>> Tejun's patch makes resource domain the default and threaded-ness as an
>> additional attribute that needs to be specified. Your proposal make
>> non-resource domain where threads can exist as the default and resource
>> domain as something that needs to be explicitly specified.
> Not so. My proposal has resource domains as the default (remember, root
> _MUST_ be a resource domain, therefore we must start start with
> root.d=1). Therefore, any new subgroup will be a resource domain by
> default and if you ignore the new attribute it will work exactly like
> cgroup-v2 does today.
>
> Only if you clear the new attribute do you get a thread subgroup.

OK.

>> They are just
>> different ways of partitioning a cgroup hierarchy into different
>> domains. Tejun's patch has a well defined boundary for threaded subtree
>> where threads can be migrated from one part of a subtree to another.
>> Your proposal is less clear-cut on how to handle thread migration. 
> Disagree again. We have the exact same boundaries. Just ensure the
> migration doesn't escape the resource domain.
>
Agreed.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHSET for-4.13] cgroup: implement cgroup2 thread mode, v2
@ 2017-07-12  7:45                       ` Peter Zijlstra
  0 siblings, 0 replies; 47+ messages in thread
From: Peter Zijlstra @ 2017-07-12  7:45 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, hannes, mingo, cgroups, linux-kernel,
	kernel-team, pjt, luto, efault, torvalds

On Tue, Jul 11, 2017 at 05:12:39PM -0400, Waiman Long wrote:
> On 07/11/2017 12:52 PM, Peter Zijlstra wrote:
> > On Tue, Jul 11, 2017 at 10:14:42AM -0400, Waiman Long wrote:
> >
> >> The "join" was a special op for the children of cgroup root to join the
> >> root as part of a threaded subtree. The children can instead use the
> >> "enable" option to become a thread root which was the configuration
> >> shown above.  This behavior applied only to children of root. Down the
> >> hierarchy, you can't have configuration like:
> >>
> >>      R (t=0)
> >>     / \
> >>        D (t=1)
> >>       / \
> >>      T   D (t=1)
> > Why not?
> >
> > First you create:
> >
> >       R (t=0)
> >      / \
> >         D (t=1)
> >        / \
> >       T   T (t=1)
> >
> > Then you flip t=0 like:
> >
> >       R (t=0)
> >      / \
> >         D (t=1)
> >        / \
> >       T   D (t=0)
> >
> > And then you flip t=1 again:
> >
> >       R (t=0)
> >      / \
> >         D (t=1)
> >        / \
> >       T   D (t=1)
> 
> Tejun's thread mode patch has constraints on what operations are allowed
> and what aren't. For a threaded subtree, thread mode cannot be disabled
> in the middle of the tree.

Where in that scenario did I change anything in the middle? All
operations were on a leaf group.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHSET for-4.13] cgroup: implement cgroup2 thread mode, v2
@ 2017-07-12  7:45                       ` Peter Zijlstra
  0 siblings, 0 replies; 47+ messages in thread
From: Peter Zijlstra @ 2017-07-12  7:45 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, hannes-druUgvl0LCNAfugRpC6u6w,
	mingo-H+wXaHxf7aLQT0dZR+AlfA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg,
	pjt-hpIqsD4AKlfQT0dZR+AlfA, luto-kltTT9wpgjJwATOyAt5JVQ,
	efault-Mmb7MZpHnFY, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

On Tue, Jul 11, 2017 at 05:12:39PM -0400, Waiman Long wrote:
> On 07/11/2017 12:52 PM, Peter Zijlstra wrote:
> > On Tue, Jul 11, 2017 at 10:14:42AM -0400, Waiman Long wrote:
> >
> >> The "join" was a special op for the children of cgroup root to join the
> >> root as part of a threaded subtree. The children can instead use the
> >> "enable" option to become a thread root which was the configuration
> >> shown above.  This behavior applied only to children of root. Down the
> >> hierarchy, you can't have configuration like:
> >>
> >>      R (t=0)
> >>     / \
> >>        D (t=1)
> >>       / \
> >>      T   D (t=1)
> > Why not?
> >
> > First you create:
> >
> >       R (t=0)
> >      / \
> >         D (t=1)
> >        / \
> >       T   T (t=1)
> >
> > Then you flip t=0 like:
> >
> >       R (t=0)
> >      / \
> >         D (t=1)
> >        / \
> >       T   D (t=0)
> >
> > And then you flip t=1 again:
> >
> >       R (t=0)
> >      / \
> >         D (t=1)
> >        / \
> >       T   D (t=1)
> 
> Tejun's thread mode patch has constraints on what operations are allowed
> and what aren't. For a threaded subtree, thread mode cannot be disabled
> in the middle of the tree.

Where in that scenario did I change anything in the middle? All
operations were on a leaf group.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHSET for-4.13] cgroup: implement cgroup2 thread mode, v2
@ 2017-07-12 14:00                         ` Waiman Long
  0 siblings, 0 replies; 47+ messages in thread
From: Waiman Long @ 2017-07-12 14:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Li Zefan, hannes, mingo, cgroups, linux-kernel,
	kernel-team, pjt, luto, efault, torvalds

On 07/12/2017 03:45 AM, Peter Zijlstra wrote:
> On Tue, Jul 11, 2017 at 05:12:39PM -0400, Waiman Long wrote:
>> On 07/11/2017 12:52 PM, Peter Zijlstra wrote:
>>> On Tue, Jul 11, 2017 at 10:14:42AM -0400, Waiman Long wrote:
>>>
>>>> The "join" was a special op for the children of cgroup root to join the
>>>> root as part of a threaded subtree. The children can instead use the
>>>> "enable" option to become a thread root which was the configuration
>>>> shown above.  This behavior applied only to children of root. Down the
>>>> hierarchy, you can't have configuration like:
>>>>
>>>>      R (t=0)
>>>>     / \
>>>>        D (t=1)
>>>>       / \
>>>>      T   D (t=1)
>>> Why not?
>>>
>>> First you create:
>>>
>>>       R (t=0)
>>>      / \
>>>         D (t=1)
>>>        / \
>>>       T   T (t=1)
>>>
>>> Then you flip t=0 like:
>>>
>>>       R (t=0)
>>>      / \
>>>         D (t=1)
>>>        / \
>>>       T   D (t=0)
>>>
>>> And then you flip t=1 again:
>>>
>>>       R (t=0)
>>>      / \
>>>         D (t=1)
>>>        / \
>>>       T   D (t=1)
>> Tejun's thread mode patch has constraints on what operations are allowed
>> and what aren't. For a threaded subtree, thread mode cannot be disabled
>> in the middle of the tree.
> Where in that scenario did I change anything in the middle? All
> operations were on a leaf group.

What I mean is that you can't disable thread mode if not at the thread
root with no children left.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHSET for-4.13] cgroup: implement cgroup2 thread mode, v2
@ 2017-07-12 14:00                         ` Waiman Long
  0 siblings, 0 replies; 47+ messages in thread
From: Waiman Long @ 2017-07-12 14:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Li Zefan, hannes-druUgvl0LCNAfugRpC6u6w,
	mingo-H+wXaHxf7aLQT0dZR+AlfA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg,
	pjt-hpIqsD4AKlfQT0dZR+AlfA, luto-kltTT9wpgjJwATOyAt5JVQ,
	efault-Mmb7MZpHnFY, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

On 07/12/2017 03:45 AM, Peter Zijlstra wrote:
> On Tue, Jul 11, 2017 at 05:12:39PM -0400, Waiman Long wrote:
>> On 07/11/2017 12:52 PM, Peter Zijlstra wrote:
>>> On Tue, Jul 11, 2017 at 10:14:42AM -0400, Waiman Long wrote:
>>>
>>>> The "join" was a special op for the children of cgroup root to join the
>>>> root as part of a threaded subtree. The children can instead use the
>>>> "enable" option to become a thread root which was the configuration
>>>> shown above.  This behavior applied only to children of root. Down the
>>>> hierarchy, you can't have configuration like:
>>>>
>>>>      R (t=0)
>>>>     / \
>>>>        D (t=1)
>>>>       / \
>>>>      T   D (t=1)
>>> Why not?
>>>
>>> First you create:
>>>
>>>       R (t=0)
>>>      / \
>>>         D (t=1)
>>>        / \
>>>       T   T (t=1)
>>>
>>> Then you flip t=0 like:
>>>
>>>       R (t=0)
>>>      / \
>>>         D (t=1)
>>>        / \
>>>       T   D (t=0)
>>>
>>> And then you flip t=1 again:
>>>
>>>       R (t=0)
>>>      / \
>>>         D (t=1)
>>>        / \
>>>       T   D (t=1)
>> Tejun's thread mode patch has constraints on what operations are allowed
>> and what aren't. For a threaded subtree, thread mode cannot be disabled
>> in the middle of the tree.
> Where in that scenario did I change anything in the middle? All
> operations were on a leaf group.

What I mean is that you can't disable thread mode if not at the thread
root with no children left.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2017-07-12 14:00 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-10 14:03 [PATCHSET for-4.13] cgroup: implement cgroup2 thread mode, v2 Tejun Heo
2017-06-10 14:03 ` Tejun Heo
2017-06-10 14:03 ` [PATCH 01/10] cgroup: separate out cgroup_has_tasks() Tejun Heo
2017-06-10 14:03 ` [PATCH 02/10] cgroup: reorganize cgroup.procs / task write path Tejun Heo
2017-06-10 14:03 ` [PATCH 03/10] cgroup: Fix reference counting bug in cgroup_procs_write() Tejun Heo
2017-06-10 14:03 ` [PATCH 04/10] cgroup: add @flags to css_task_iter_start() and implement CSS_TASK_ITER_PROCS Tejun Heo
2017-06-10 14:03 ` [PATCH 05/10] cgroup: introduce cgroup->proc_cgrp and threaded css_set handling Tejun Heo
2017-06-10 14:03 ` [PATCH 06/10] cgroup: implement CSS_TASK_ITER_THREADED Tejun Heo
2017-06-10 14:03   ` Tejun Heo
2017-06-10 14:03 ` [PATCH 07/10] cgroup: implement cgroup v2 thread support Tejun Heo
2017-06-10 14:03   ` Tejun Heo
2017-06-12 15:41   ` Waiman Long
2017-06-12 15:41     ` Waiman Long
2017-06-13 14:06     ` Tejun Heo
2017-06-15 20:14   ` [PATCH v3 " Tejun Heo
2017-06-15 20:14     ` Tejun Heo
2017-06-10 14:03 ` [PATCH 08/10] sched: Misc preps for cgroup unified hierarchy interface Tejun Heo
2017-06-10 14:03 ` [PATCH 09/10] sched: Implement interface for cgroup unified hierarchy Tejun Heo
2017-06-10 14:03 ` [PATCH 10/10] sched: Make cpu/cpuacct threaded controllers Tejun Heo
2017-06-10 14:03   ` Tejun Heo
2017-06-12 12:31 ` [PATCHSET for-4.13] cgroup: implement cgroup2 thread mode, v2 Peter Zijlstra
2017-06-12 12:31   ` Peter Zijlstra
2017-06-12 21:27   ` Tejun Heo
2017-06-12 21:27     ` Tejun Heo
2017-06-15 20:16     ` Tejun Heo
2017-06-15 20:16       ` Tejun Heo
2017-06-27  7:01     ` Peter Zijlstra
2017-06-27  7:01       ` Peter Zijlstra
2017-06-30 13:23       ` Tejun Heo
2017-06-30 13:23         ` Tejun Heo
2017-07-10  8:32         ` Peter Zijlstra
2017-07-10 21:01           ` Waiman Long
2017-07-10 21:01             ` Waiman Long
2017-07-11 12:15             ` Peter Zijlstra
2017-07-11 12:15               ` Peter Zijlstra
2017-07-11 14:14               ` Waiman Long
2017-07-11 14:14                 ` Waiman Long
2017-07-11 16:52                 ` Peter Zijlstra
2017-07-11 16:52                   ` Peter Zijlstra
2017-07-11 21:12                   ` Waiman Long
2017-07-11 21:12                     ` Waiman Long
2017-07-12  7:45                     ` Peter Zijlstra
2017-07-12  7:45                       ` Peter Zijlstra
2017-07-12 14:00                       ` Waiman Long
2017-07-12 14:00                         ` Waiman Long
2017-06-15 20:17 ` [PATCH] cgroup: update debug controller to print out thread mode information Tejun Heo
2017-06-15 20:17   ` Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.