All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET cgroup/for-4.2] cgroup: require write perm on common ancestor for migration
@ 2015-06-16 19:10 ` Tejun Heo
  0 siblings, 0 replies; 26+ messages in thread
From: Tejun Heo @ 2015-06-16 19:10 UTC (permalink / raw)
  To: lizefan, hannes; +Cc: cgroups, linux-kernel, kernel-team

Hello,

On traditional hierarchies, if a task has write access to "tasks" or
"cgroup.procs" file of a cgroup and its euid agrees with the target,
it can move the target to the cgroup; however, this allows a delegatee
to smuggle processes across disjoint sub-hierarchies violating the
organizational structure and resource restrictions imposed from higher
up.

To prevent these breaches, this patchset makes unified hierarchy
require write access to cgroup.procs of the common ancestor of the
source and destination cgroups.  It also adds documentation on how
delegation of sub-hierarchies should be done on unified hierarchy.

This patchset contains the following four patches.

 0001-kernfs-make-kernfs_get_inode-public.patch
 0002-cgroup-separate-out-cgroup_procs_write_permission-fr.patch
 0003-cgroup-require-write-perm-on-common-ancestor-when-mo.patch
 0004-cgroup-add-delegation-section-to-unified-hierarchy-d.patch

0001-0002 are prep patches.  0003 implements the common ancestor rule
and 0004 documents delegation on unified hierarchy.

This patchset is on top of cgroup/for-4.2 4d205676c102 ("MAINTAINERS:
add a cgroup core co-maintainer") and available in the following git
branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-cgroup-delegation

diffstat follows.  Thanks.

 Documentation/cgroups/unified-hierarchy.txt |  102 +++++++++++++++++++++++-----
 fs/kernfs/kernfs-internal.h                 |    1 
 include/linux/cgroup-defs.h                 |    1 
 include/linux/kernfs.h                      |    5 +
 kernel/cgroup.c                             |   64 +++++++++++++----
 5 files changed, 139 insertions(+), 34 deletions(-)

--
tejun

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCHSET cgroup/for-4.2] cgroup: require write perm on common ancestor for migration
@ 2015-06-16 19:10 ` Tejun Heo
  0 siblings, 0 replies; 26+ messages in thread
From: Tejun Heo @ 2015-06-16 19:10 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, hannes-druUgvl0LCNAfugRpC6u6w
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg

Hello,

On traditional hierarchies, if a task has write access to "tasks" or
"cgroup.procs" file of a cgroup and its euid agrees with the target,
it can move the target to the cgroup; however, this allows a delegatee
to smuggle processes across disjoint sub-hierarchies violating the
organizational structure and resource restrictions imposed from higher
up.

To prevent these breaches, this patchset makes unified hierarchy
require write access to cgroup.procs of the common ancestor of the
source and destination cgroups.  It also adds documentation on how
delegation of sub-hierarchies should be done on unified hierarchy.

This patchset contains the following four patches.

 0001-kernfs-make-kernfs_get_inode-public.patch
 0002-cgroup-separate-out-cgroup_procs_write_permission-fr.patch
 0003-cgroup-require-write-perm-on-common-ancestor-when-mo.patch
 0004-cgroup-add-delegation-section-to-unified-hierarchy-d.patch

0001-0002 are prep patches.  0003 implements the common ancestor rule
and 0004 documents delegation on unified hierarchy.

This patchset is on top of cgroup/for-4.2 4d205676c102 ("MAINTAINERS:
add a cgroup core co-maintainer") and available in the following git
branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-cgroup-delegation

diffstat follows.  Thanks.

 Documentation/cgroups/unified-hierarchy.txt |  102 +++++++++++++++++++++++-----
 fs/kernfs/kernfs-internal.h                 |    1 
 include/linux/cgroup-defs.h                 |    1 
 include/linux/kernfs.h                      |    5 +
 kernel/cgroup.c                             |   64 +++++++++++++----
 5 files changed, 139 insertions(+), 34 deletions(-)

--
tejun

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/4] kernfs: make kernfs_get_inode() public
@ 2015-06-16 19:10   ` Tejun Heo
  0 siblings, 0 replies; 26+ messages in thread
From: Tejun Heo @ 2015-06-16 19:10 UTC (permalink / raw)
  To: lizefan, hannes
  Cc: cgroups, linux-kernel, kernel-team, Tejun Heo, Greg Kroah-Hartman

Move kernfs_get_inode() prototype from fs/kernfs/kernfs-internal.h to
include/linux/kernfs.h.  It obtains the matching inode for a
kernfs_node.

It will be used by cgroup for inode based permission checks for now
but is generally useful.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
Hello, Greg.

Given that cgroup will be the first and only, for now, user, I think
it'd be the eaiest to route this with the related cgroup patches
through the cgroup tree.  What do you think?

Thanks.

 fs/kernfs/kernfs-internal.h | 1 -
 include/linux/kernfs.h      | 5 +++++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h
index af9fa74..6762bfb 100644
--- a/fs/kernfs/kernfs-internal.h
+++ b/fs/kernfs/kernfs-internal.h
@@ -76,7 +76,6 @@ extern struct kmem_cache *kernfs_node_cache;
 /*
  * inode.c
  */
-struct inode *kernfs_get_inode(struct super_block *sb, struct kernfs_node *kn);
 void kernfs_evict_inode(struct inode *inode);
 int kernfs_iop_permission(struct inode *inode, int mask);
 int kernfs_iop_setattr(struct dentry *dentry, struct iattr *iattr);
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 71ecdab..e6b2f7d 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -277,6 +277,7 @@ void kernfs_put(struct kernfs_node *kn);
 
 struct kernfs_node *kernfs_node_from_dentry(struct dentry *dentry);
 struct kernfs_root *kernfs_root_from_sb(struct super_block *sb);
+struct inode *kernfs_get_inode(struct super_block *sb, struct kernfs_node *kn);
 
 struct kernfs_root *kernfs_create_root(struct kernfs_syscall_ops *scops,
 				       unsigned int flags, void *priv);
@@ -352,6 +353,10 @@ static inline struct kernfs_node *kernfs_node_from_dentry(struct dentry *dentry)
 static inline struct kernfs_root *kernfs_root_from_sb(struct super_block *sb)
 { return NULL; }
 
+static inline struct inode *
+kernfs_get_inode(struct super_block *sb, struct kernfs_node *kn)
+{ return NULL; }
+
 static inline struct kernfs_root *
 kernfs_create_root(struct kernfs_syscall_ops *scops, unsigned int flags,
 		   void *priv)
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 1/4] kernfs: make kernfs_get_inode() public
@ 2015-06-16 19:10   ` Tejun Heo
  0 siblings, 0 replies; 26+ messages in thread
From: Tejun Heo @ 2015-06-16 19:10 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, hannes-druUgvl0LCNAfugRpC6u6w
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg,
	Tejun Heo, Greg Kroah-Hartman

Move kernfs_get_inode() prototype from fs/kernfs/kernfs-internal.h to
include/linux/kernfs.h.  It obtains the matching inode for a
kernfs_node.

It will be used by cgroup for inode based permission checks for now
but is generally useful.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
---
Hello, Greg.

Given that cgroup will be the first and only, for now, user, I think
it'd be the eaiest to route this with the related cgroup patches
through the cgroup tree.  What do you think?

Thanks.

 fs/kernfs/kernfs-internal.h | 1 -
 include/linux/kernfs.h      | 5 +++++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h
index af9fa74..6762bfb 100644
--- a/fs/kernfs/kernfs-internal.h
+++ b/fs/kernfs/kernfs-internal.h
@@ -76,7 +76,6 @@ extern struct kmem_cache *kernfs_node_cache;
 /*
  * inode.c
  */
-struct inode *kernfs_get_inode(struct super_block *sb, struct kernfs_node *kn);
 void kernfs_evict_inode(struct inode *inode);
 int kernfs_iop_permission(struct inode *inode, int mask);
 int kernfs_iop_setattr(struct dentry *dentry, struct iattr *iattr);
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 71ecdab..e6b2f7d 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -277,6 +277,7 @@ void kernfs_put(struct kernfs_node *kn);
 
 struct kernfs_node *kernfs_node_from_dentry(struct dentry *dentry);
 struct kernfs_root *kernfs_root_from_sb(struct super_block *sb);
+struct inode *kernfs_get_inode(struct super_block *sb, struct kernfs_node *kn);
 
 struct kernfs_root *kernfs_create_root(struct kernfs_syscall_ops *scops,
 				       unsigned int flags, void *priv);
@@ -352,6 +353,10 @@ static inline struct kernfs_node *kernfs_node_from_dentry(struct dentry *dentry)
 static inline struct kernfs_root *kernfs_root_from_sb(struct super_block *sb)
 { return NULL; }
 
+static inline struct inode *
+kernfs_get_inode(struct super_block *sb, struct kernfs_node *kn)
+{ return NULL; }
+
 static inline struct kernfs_root *
 kernfs_create_root(struct kernfs_syscall_ops *scops, unsigned int flags,
 		   void *priv)
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/4] cgroup: separate out cgroup_procs_write_permission() from __cgroup_procs_write()
@ 2015-06-16 19:10   ` Tejun Heo
  0 siblings, 0 replies; 26+ messages in thread
From: Tejun Heo @ 2015-06-16 19:10 UTC (permalink / raw)
  To: lizefan, hannes; +Cc: cgroups, linux-kernel, kernel-team, Tejun Heo

Separate out task / process migration permission check from
__cgroup_procs_write() into cgroup_procs_write_permission().

* Permission check is moved right above the actual migration and no
  longer performed while holding rcu_read_lock().
  cgroup_procs_write_permission() uses get_task_cred() / put_cred()
  instead of __task_cred().  Also, !root trying to migrate kthreadd or
  PF_NO_SETAFFINITY tasks will now fail with -EINVAL rather than
  -EACCES which should be fine.

* The same permission check is now performed even when moving self by
  specifying 0 as pid.  This always succeeds so there's no functional
  difference.  We'll add more permission checks later and the benefits
  of keeping both cases consistent outweigh the minute overhead of
  doing perm checks on pid 0 case.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/cgroup.c | 38 ++++++++++++++++++++++++--------------
 1 file changed, 24 insertions(+), 14 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 12b580f..4504d64 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2392,6 +2392,25 @@ static int cgroup_attach_task(struct cgroup *dst_cgrp,
 	return ret;
 }
 
+static int cgroup_procs_write_permission(struct task_struct *task)
+{
+	const struct cred *cred = current_cred();
+	const struct cred *tcred = get_task_cred(task);
+	int ret = 0;
+
+	/*
+	 * even if we're attaching all tasks in the thread group, we only
+	 * need to check permissions on one of them.
+	 */
+	if (!uid_eq(cred->euid, GLOBAL_ROOT_UID) &&
+	    !uid_eq(cred->euid, tcred->uid) &&
+	    !uid_eq(cred->euid, tcred->suid))
+		ret = -EACCES;
+
+	put_cred(tcred);
+	return ret;
+}
+
 /*
  * Find the task_struct of the task to attach by vpid and pass it along to the
  * function to attach either it or all tasks in its threadgroup. Will lock
@@ -2401,7 +2420,6 @@ static ssize_t __cgroup_procs_write(struct kernfs_open_file *of, char *buf,
 				    size_t nbytes, loff_t off, bool threadgroup)
 {
 	struct task_struct *tsk;
-	const struct cred *cred = current_cred(), *tcred;
 	struct cgroup *cgrp;
 	pid_t pid;
 	int ret;
@@ -2421,19 +2439,9 @@ static ssize_t __cgroup_procs_write(struct kernfs_open_file *of, char *buf,
 			ret = -ESRCH;
 			goto out_unlock_rcu;
 		}
-		/*
-		 * even if we're attaching all tasks in the thread group, we
-		 * only need to check permissions on one of them.
-		 */
-		tcred = __task_cred(tsk);
-		if (!uid_eq(cred->euid, GLOBAL_ROOT_UID) &&
-		    !uid_eq(cred->euid, tcred->uid) &&
-		    !uid_eq(cred->euid, tcred->suid)) {
-			ret = -EACCES;
-			goto out_unlock_rcu;
-		}
-	} else
+	} else {
 		tsk = current;
+	}
 
 	if (threadgroup)
 		tsk = tsk->group_leader;
@@ -2451,7 +2459,9 @@ static ssize_t __cgroup_procs_write(struct kernfs_open_file *of, char *buf,
 	get_task_struct(tsk);
 	rcu_read_unlock();
 
-	ret = cgroup_attach_task(cgrp, tsk, threadgroup);
+	ret = cgroup_procs_write_permission(tsk);
+	if (!ret)
+		ret = cgroup_attach_task(cgrp, tsk, threadgroup);
 
 	put_task_struct(tsk);
 	goto out_unlock_threadgroup;
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/4] cgroup: separate out cgroup_procs_write_permission() from __cgroup_procs_write()
@ 2015-06-16 19:10   ` Tejun Heo
  0 siblings, 0 replies; 26+ messages in thread
From: Tejun Heo @ 2015-06-16 19:10 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, hannes-druUgvl0LCNAfugRpC6u6w
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg,
	Tejun Heo

Separate out task / process migration permission check from
__cgroup_procs_write() into cgroup_procs_write_permission().

* Permission check is moved right above the actual migration and no
  longer performed while holding rcu_read_lock().
  cgroup_procs_write_permission() uses get_task_cred() / put_cred()
  instead of __task_cred().  Also, !root trying to migrate kthreadd or
  PF_NO_SETAFFINITY tasks will now fail with -EINVAL rather than
  -EACCES which should be fine.

* The same permission check is now performed even when moving self by
  specifying 0 as pid.  This always succeeds so there's no functional
  difference.  We'll add more permission checks later and the benefits
  of keeping both cases consistent outweigh the minute overhead of
  doing perm checks on pid 0 case.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 kernel/cgroup.c | 38 ++++++++++++++++++++++++--------------
 1 file changed, 24 insertions(+), 14 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 12b580f..4504d64 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2392,6 +2392,25 @@ static int cgroup_attach_task(struct cgroup *dst_cgrp,
 	return ret;
 }
 
+static int cgroup_procs_write_permission(struct task_struct *task)
+{
+	const struct cred *cred = current_cred();
+	const struct cred *tcred = get_task_cred(task);
+	int ret = 0;
+
+	/*
+	 * even if we're attaching all tasks in the thread group, we only
+	 * need to check permissions on one of them.
+	 */
+	if (!uid_eq(cred->euid, GLOBAL_ROOT_UID) &&
+	    !uid_eq(cred->euid, tcred->uid) &&
+	    !uid_eq(cred->euid, tcred->suid))
+		ret = -EACCES;
+
+	put_cred(tcred);
+	return ret;
+}
+
 /*
  * Find the task_struct of the task to attach by vpid and pass it along to the
  * function to attach either it or all tasks in its threadgroup. Will lock
@@ -2401,7 +2420,6 @@ static ssize_t __cgroup_procs_write(struct kernfs_open_file *of, char *buf,
 				    size_t nbytes, loff_t off, bool threadgroup)
 {
 	struct task_struct *tsk;
-	const struct cred *cred = current_cred(), *tcred;
 	struct cgroup *cgrp;
 	pid_t pid;
 	int ret;
@@ -2421,19 +2439,9 @@ static ssize_t __cgroup_procs_write(struct kernfs_open_file *of, char *buf,
 			ret = -ESRCH;
 			goto out_unlock_rcu;
 		}
-		/*
-		 * even if we're attaching all tasks in the thread group, we
-		 * only need to check permissions on one of them.
-		 */
-		tcred = __task_cred(tsk);
-		if (!uid_eq(cred->euid, GLOBAL_ROOT_UID) &&
-		    !uid_eq(cred->euid, tcred->uid) &&
-		    !uid_eq(cred->euid, tcred->suid)) {
-			ret = -EACCES;
-			goto out_unlock_rcu;
-		}
-	} else
+	} else {
 		tsk = current;
+	}
 
 	if (threadgroup)
 		tsk = tsk->group_leader;
@@ -2451,7 +2459,9 @@ static ssize_t __cgroup_procs_write(struct kernfs_open_file *of, char *buf,
 	get_task_struct(tsk);
 	rcu_read_unlock();
 
-	ret = cgroup_attach_task(cgrp, tsk, threadgroup);
+	ret = cgroup_procs_write_permission(tsk);
+	if (!ret)
+		ret = cgroup_attach_task(cgrp, tsk, threadgroup);
 
 	put_task_struct(tsk);
 	goto out_unlock_threadgroup;
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/4] cgroup: require write perm on common ancestor when moving processes on the default hierarchy
@ 2015-06-16 19:10   ` Tejun Heo
  0 siblings, 0 replies; 26+ messages in thread
From: Tejun Heo @ 2015-06-16 19:10 UTC (permalink / raw)
  To: lizefan, hannes; +Cc: cgroups, linux-kernel, kernel-team, Tejun Heo

On traditional hierarchies, if a task has write access to "tasks" or
"cgroup.procs" file of a cgroup and its euid agrees with the target,
it can move the target to the cgroup; however, consider the following
scenario.  The owner of each cgroup is in the parentheses.

 R (root) - 0 (root) - 00 (user1) - 000 (user1)
          |                       \ 001 (user1)
          \ 1 (root) - 10 (user1)

The subtrees of 00 and 10 are delegated to user1; however, while both
subtrees may belong to the same user, it is clear that the two
subtrees are to be isolated - they're under completely separate
resource limits imposed by 0 and 1, respectively.  Note that 0 and 1
aren't strictly necessary but added to ease illustrating the issue.

If user1 is allowed to move processes between the two subtrees, the
intention of the hierarchy - keeping a given group of processes under
a subtree with certain resource restrictions while delegating
management of the subtree - can be circumvented by user1.

This happens because migration permission check doesn't consider the
hierarchical nature of cgroups.  To fix the issue, this patch adds an
extra permission requirement when userland tries to migrate a process
in the default hierarchy - the issuing task must have write access to
the common ancestor of "cgroup.procs" file of the ancestor in addition
to the destination's.

Conceptually, the issuer must be able to move the target process from
the source cgroup to the common ancestor of source and destination
cgroups and then to the destination.  As long as delegation is done in
a proper top-down way, this guarantees that a delegatee can't smuggle
processes across disjoint delegation domains.

The next patch will add documentation on the delegation model on the
default hierarchy.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/cgroup-defs.h |  1 +
 kernel/cgroup.c             | 30 +++++++++++++++++++++++++++---
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index c5588c4..93755a6 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -220,6 +220,7 @@ struct cgroup {
 	int populated_cnt;
 
 	struct kernfs_node *kn;		/* cgroup kernfs entry */
+	struct kernfs_node *procs_kn;	/* kn for "cgroup.procs" */
 	struct kernfs_node *populated_kn; /* kn for "cgroup.subtree_populated" */
 
 	/*
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 4504d64..d3ca3fe 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2392,7 +2392,9 @@ static int cgroup_attach_task(struct cgroup *dst_cgrp,
 	return ret;
 }
 
-static int cgroup_procs_write_permission(struct task_struct *task)
+static int cgroup_procs_write_permission(struct task_struct *task,
+					 struct cgroup *dst_cgrp,
+					 struct kernfs_open_file *of)
 {
 	const struct cred *cred = current_cred();
 	const struct cred *tcred = get_task_cred(task);
@@ -2407,6 +2409,26 @@ static int cgroup_procs_write_permission(struct task_struct *task)
 	    !uid_eq(cred->euid, tcred->suid))
 		ret = -EACCES;
 
+	if (cgroup_on_dfl(dst_cgrp)) {
+		struct super_block *sb = of->file->f_path.dentry->d_sb;
+		struct cgroup *cgrp;
+		struct inode *inode;
+
+		down_read(&css_set_rwsem);
+		cgrp = task_cgroup_from_root(task, &cgrp_dfl_root);
+		up_read(&css_set_rwsem);
+
+		while (!cgroup_is_descendant(dst_cgrp, cgrp))
+			cgrp = cgroup_parent(cgrp);
+
+		ret = -ENOMEM;
+		inode = kernfs_get_inode(sb, cgrp->procs_kn);
+		if (inode) {
+			ret = inode_permission(inode, MAY_WRITE);
+			iput(inode);
+		}
+	}
+
 	put_cred(tcred);
 	return ret;
 }
@@ -2459,7 +2481,7 @@ static ssize_t __cgroup_procs_write(struct kernfs_open_file *of, char *buf,
 	get_task_struct(tsk);
 	rcu_read_unlock();
 
-	ret = cgroup_procs_write_permission(tsk);
+	ret = cgroup_procs_write_permission(tsk, cgrp, of);
 	if (!ret)
 		ret = cgroup_attach_task(cgrp, tsk, threadgroup);
 
@@ -3087,7 +3109,9 @@ static int cgroup_add_file(struct cgroup *cgrp, struct cftype *cft)
 		return ret;
 	}
 
-	if (cft->seq_show == cgroup_populated_show)
+	if (cft->write == cgroup_procs_write)
+		cgrp->procs_kn = kn;
+	else if (cft->seq_show == cgroup_populated_show)
 		cgrp->populated_kn = kn;
 	return 0;
 }
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/4] cgroup: require write perm on common ancestor when moving processes on the default hierarchy
@ 2015-06-16 19:10   ` Tejun Heo
  0 siblings, 0 replies; 26+ messages in thread
From: Tejun Heo @ 2015-06-16 19:10 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, hannes-druUgvl0LCNAfugRpC6u6w
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg,
	Tejun Heo

On traditional hierarchies, if a task has write access to "tasks" or
"cgroup.procs" file of a cgroup and its euid agrees with the target,
it can move the target to the cgroup; however, consider the following
scenario.  The owner of each cgroup is in the parentheses.

 R (root) - 0 (root) - 00 (user1) - 000 (user1)
          |                       \ 001 (user1)
          \ 1 (root) - 10 (user1)

The subtrees of 00 and 10 are delegated to user1; however, while both
subtrees may belong to the same user, it is clear that the two
subtrees are to be isolated - they're under completely separate
resource limits imposed by 0 and 1, respectively.  Note that 0 and 1
aren't strictly necessary but added to ease illustrating the issue.

If user1 is allowed to move processes between the two subtrees, the
intention of the hierarchy - keeping a given group of processes under
a subtree with certain resource restrictions while delegating
management of the subtree - can be circumvented by user1.

This happens because migration permission check doesn't consider the
hierarchical nature of cgroups.  To fix the issue, this patch adds an
extra permission requirement when userland tries to migrate a process
in the default hierarchy - the issuing task must have write access to
the common ancestor of "cgroup.procs" file of the ancestor in addition
to the destination's.

Conceptually, the issuer must be able to move the target process from
the source cgroup to the common ancestor of source and destination
cgroups and then to the destination.  As long as delegation is done in
a proper top-down way, this guarantees that a delegatee can't smuggle
processes across disjoint delegation domains.

The next patch will add documentation on the delegation model on the
default hierarchy.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 include/linux/cgroup-defs.h |  1 +
 kernel/cgroup.c             | 30 +++++++++++++++++++++++++++---
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index c5588c4..93755a6 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -220,6 +220,7 @@ struct cgroup {
 	int populated_cnt;
 
 	struct kernfs_node *kn;		/* cgroup kernfs entry */
+	struct kernfs_node *procs_kn;	/* kn for "cgroup.procs" */
 	struct kernfs_node *populated_kn; /* kn for "cgroup.subtree_populated" */
 
 	/*
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 4504d64..d3ca3fe 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2392,7 +2392,9 @@ static int cgroup_attach_task(struct cgroup *dst_cgrp,
 	return ret;
 }
 
-static int cgroup_procs_write_permission(struct task_struct *task)
+static int cgroup_procs_write_permission(struct task_struct *task,
+					 struct cgroup *dst_cgrp,
+					 struct kernfs_open_file *of)
 {
 	const struct cred *cred = current_cred();
 	const struct cred *tcred = get_task_cred(task);
@@ -2407,6 +2409,26 @@ static int cgroup_procs_write_permission(struct task_struct *task)
 	    !uid_eq(cred->euid, tcred->suid))
 		ret = -EACCES;
 
+	if (cgroup_on_dfl(dst_cgrp)) {
+		struct super_block *sb = of->file->f_path.dentry->d_sb;
+		struct cgroup *cgrp;
+		struct inode *inode;
+
+		down_read(&css_set_rwsem);
+		cgrp = task_cgroup_from_root(task, &cgrp_dfl_root);
+		up_read(&css_set_rwsem);
+
+		while (!cgroup_is_descendant(dst_cgrp, cgrp))
+			cgrp = cgroup_parent(cgrp);
+
+		ret = -ENOMEM;
+		inode = kernfs_get_inode(sb, cgrp->procs_kn);
+		if (inode) {
+			ret = inode_permission(inode, MAY_WRITE);
+			iput(inode);
+		}
+	}
+
 	put_cred(tcred);
 	return ret;
 }
@@ -2459,7 +2481,7 @@ static ssize_t __cgroup_procs_write(struct kernfs_open_file *of, char *buf,
 	get_task_struct(tsk);
 	rcu_read_unlock();
 
-	ret = cgroup_procs_write_permission(tsk);
+	ret = cgroup_procs_write_permission(tsk, cgrp, of);
 	if (!ret)
 		ret = cgroup_attach_task(cgrp, tsk, threadgroup);
 
@@ -3087,7 +3109,9 @@ static int cgroup_add_file(struct cgroup *cgrp, struct cftype *cft)
 		return ret;
 	}
 
-	if (cft->seq_show == cgroup_populated_show)
+	if (cft->write == cgroup_procs_write)
+		cgrp->procs_kn = kn;
+	else if (cft->seq_show == cgroup_populated_show)
 		cgrp->populated_kn = kn;
 	return 0;
 }
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/4] cgroup: add delegation section to unified hierarchy documentation
@ 2015-06-16 19:10   ` Tejun Heo
  0 siblings, 0 replies; 26+ messages in thread
From: Tejun Heo @ 2015-06-16 19:10 UTC (permalink / raw)
  To: lizefan, hannes; +Cc: cgroups, linux-kernel, kernel-team, Tejun Heo

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 Documentation/cgroups/unified-hierarchy.txt | 102 +++++++++++++++++++++++-----
 1 file changed, 84 insertions(+), 18 deletions(-)

diff --git a/Documentation/cgroups/unified-hierarchy.txt b/Documentation/cgroups/unified-hierarchy.txt
index eb102fb..fef5f5d 100644
--- a/Documentation/cgroups/unified-hierarchy.txt
+++ b/Documentation/cgroups/unified-hierarchy.txt
@@ -17,15 +17,18 @@ CONTENTS
 3. Structural Constraints
   3-1. Top-down
   3-2. No internal tasks
-4. Other Changes
-  4-1. [Un]populated Notification
-  4-2. Other Core Changes
-  4-3. Per-Controller Changes
-    4-3-1. blkio
-    4-3-2. cpuset
-    4-3-3. memory
-5. Planned Changes
-  5-1. CAP for resource control
+4. Delegation
+  4-1. Model of delegation
+  4-2. Common ancestor rule
+5. Other Changes
+  5-1. [Un]populated Notification
+  5-2. Other Core Changes
+  5-3. Per-Controller Changes
+    5-3-1. blkio
+    5-3-2. cpuset
+    5-3-3. memory
+6. Planned Changes
+  6-1. CAP for resource control
 
 
 1. Background
@@ -245,9 +248,72 @@ cgroup must create children and transfer all its tasks to the children
 before enabling controllers in its "cgroup.subtree_control" file.
 
 
-4. Other Changes
+4. Delegation
 
-4-1. [Un]populated Notification
+4-1. Model of delegation
+
+A cgroup can be delegated to a less privileged user by granting write
+access of the directory and its "cgroup.procs" file to the user.  Note
+that the resource control knobs in a given directory concern the
+resources of the parent and thus must not be delegated along with the
+directory.
+
+Once delegated, the user can build sub-hierarchy under the directory,
+organize processes as it sees fit and further distribute the resources
+it got from the parent.  The limits and other settings of all resource
+controllers are hierarchical and regardless of what happens in the
+delegated sub-hierarchy, nothing can escape the resource restrictions
+imposed by the parent.
+
+Currently, cgroup doesn't impose any restrictions on the number of
+cgroups in or nesting depth of a delegated sub-hierarchy; however,
+this may in the future be limited explicitly.
+
+
+4-2. Common ancestor rule
+
+Let's say cgroups C0 and C1 have been delegated to user U0 who created
+C00, C01 under C0 and C10 under C1 as follows.
+
+ ~~~~~~~~~~~~~ - C0 - C00
+ ~ cgroup    ~      \ C01
+ ~ hierarchy ~
+ ~~~~~~~~~~~~~ - C1 - C10
+
+C0 and C1 are separate entities in terms of resource distribution
+regardless of their relative positions in the hierarchy.  The
+resources the processes under C0 are entitled to are controlled by
+C0's ancestors and may be completely different from C1.  It's clear
+that the intention of delegating C0 to U0 is allowing U0 to organize
+the processes under C0 and further control the distribution of C0's
+resources.
+
+On traditional hierarchies, if a task has write access to "tasks" or
+"cgroup.procs" file of a cgroup and its uid agrees with the target, it
+can move the target to the cgroup.  In the above example, U0 will not
+only be able to move processes in each sub-hierarchy but also across
+the two sub-hierarchies, effectively allowing it to violate the
+organizational and resource restrictions implied by the hierarchical
+structure above C0 and C1.
+
+On the unified hierarchy, to write to a "cgroup.procs" file, in
+addition to the usual write permission to the file and uid match, the
+writer must also have write acess to the "cgroup.procs" file of the
+common ancestor of the source and destination cgroups.  This prevents
+delegatees from smuggling processes across disjoint sub-hierarchies.
+
+For example, in the above scenario, let's say U0 wants to write the
+pid of a process which has a matching uid and is currently in C10 into
+"C00/cgroup.procs".  U0 obviously has write access to the file and
+migration permission on the process; however, the common ancestor of
+the source cgroup C10 and the destination cgroup C00 is above the
+points of delegation and U0 would not have write access to its
+"cgroup.procs" and thus be denied with -EACCES.
+
+
+5. Other Changes
+
+5-1. [Un]populated Notification
 
 cgroup users often need a way to determine when a cgroup's
 subhierarchy becomes empty so that it can be cleaned up.  cgroup
@@ -289,7 +355,7 @@ supported and the interface files "release_agent" and
 "notify_on_release" do not exist.
 
 
-4-2. Other Core Changes
+5-2. Other Core Changes
 
 - None of the mount options is allowed.
 
@@ -306,14 +372,14 @@ supported and the interface files "release_agent" and
 - The "cgroup.clone_children" file is removed.
 
 
-4-3. Per-Controller Changes
+5-3. Per-Controller Changes
 
-4-3-1. blkio
+5-3-1. blkio
 
 - blk-throttle becomes properly hierarchical.
 
 
-4-3-2. cpuset
+5-3-2. cpuset
 
 - Tasks are kept in empty cpusets after hotplug and take on the masks
   of the nearest non-empty ancestor, instead of being moved to it.
@@ -322,7 +388,7 @@ supported and the interface files "release_agent" and
   masks of the nearest non-empty ancestor.
 
 
-4-3-3. memory
+5-3-3. memory
 
 - use_hierarchy is on by default and the cgroup file for the flag is
   not created.
@@ -407,9 +473,9 @@ supported and the interface files "release_agent" and
   memory.low, memory.high, and memory.max will use the string "max" to
   indicate and set the highest possible value.
 
-5. Planned Changes
+6. Planned Changes
 
-5-1. CAP for resource control
+6-1. CAP for resource control
 
 Unified hierarchy will require one of the capabilities(7), which is
 yet to be decided, for all resource control related knobs.  Process
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/4] cgroup: add delegation section to unified hierarchy documentation
@ 2015-06-16 19:10   ` Tejun Heo
  0 siblings, 0 replies; 26+ messages in thread
From: Tejun Heo @ 2015-06-16 19:10 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, hannes-druUgvl0LCNAfugRpC6u6w
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg,
	Tejun Heo

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 Documentation/cgroups/unified-hierarchy.txt | 102 +++++++++++++++++++++++-----
 1 file changed, 84 insertions(+), 18 deletions(-)

diff --git a/Documentation/cgroups/unified-hierarchy.txt b/Documentation/cgroups/unified-hierarchy.txt
index eb102fb..fef5f5d 100644
--- a/Documentation/cgroups/unified-hierarchy.txt
+++ b/Documentation/cgroups/unified-hierarchy.txt
@@ -17,15 +17,18 @@ CONTENTS
 3. Structural Constraints
   3-1. Top-down
   3-2. No internal tasks
-4. Other Changes
-  4-1. [Un]populated Notification
-  4-2. Other Core Changes
-  4-3. Per-Controller Changes
-    4-3-1. blkio
-    4-3-2. cpuset
-    4-3-3. memory
-5. Planned Changes
-  5-1. CAP for resource control
+4. Delegation
+  4-1. Model of delegation
+  4-2. Common ancestor rule
+5. Other Changes
+  5-1. [Un]populated Notification
+  5-2. Other Core Changes
+  5-3. Per-Controller Changes
+    5-3-1. blkio
+    5-3-2. cpuset
+    5-3-3. memory
+6. Planned Changes
+  6-1. CAP for resource control
 
 
 1. Background
@@ -245,9 +248,72 @@ cgroup must create children and transfer all its tasks to the children
 before enabling controllers in its "cgroup.subtree_control" file.
 
 
-4. Other Changes
+4. Delegation
 
-4-1. [Un]populated Notification
+4-1. Model of delegation
+
+A cgroup can be delegated to a less privileged user by granting write
+access of the directory and its "cgroup.procs" file to the user.  Note
+that the resource control knobs in a given directory concern the
+resources of the parent and thus must not be delegated along with the
+directory.
+
+Once delegated, the user can build sub-hierarchy under the directory,
+organize processes as it sees fit and further distribute the resources
+it got from the parent.  The limits and other settings of all resource
+controllers are hierarchical and regardless of what happens in the
+delegated sub-hierarchy, nothing can escape the resource restrictions
+imposed by the parent.
+
+Currently, cgroup doesn't impose any restrictions on the number of
+cgroups in or nesting depth of a delegated sub-hierarchy; however,
+this may in the future be limited explicitly.
+
+
+4-2. Common ancestor rule
+
+Let's say cgroups C0 and C1 have been delegated to user U0 who created
+C00, C01 under C0 and C10 under C1 as follows.
+
+ ~~~~~~~~~~~~~ - C0 - C00
+ ~ cgroup    ~      \ C01
+ ~ hierarchy ~
+ ~~~~~~~~~~~~~ - C1 - C10
+
+C0 and C1 are separate entities in terms of resource distribution
+regardless of their relative positions in the hierarchy.  The
+resources the processes under C0 are entitled to are controlled by
+C0's ancestors and may be completely different from C1.  It's clear
+that the intention of delegating C0 to U0 is allowing U0 to organize
+the processes under C0 and further control the distribution of C0's
+resources.
+
+On traditional hierarchies, if a task has write access to "tasks" or
+"cgroup.procs" file of a cgroup and its uid agrees with the target, it
+can move the target to the cgroup.  In the above example, U0 will not
+only be able to move processes in each sub-hierarchy but also across
+the two sub-hierarchies, effectively allowing it to violate the
+organizational and resource restrictions implied by the hierarchical
+structure above C0 and C1.
+
+On the unified hierarchy, to write to a "cgroup.procs" file, in
+addition to the usual write permission to the file and uid match, the
+writer must also have write acess to the "cgroup.procs" file of the
+common ancestor of the source and destination cgroups.  This prevents
+delegatees from smuggling processes across disjoint sub-hierarchies.
+
+For example, in the above scenario, let's say U0 wants to write the
+pid of a process which has a matching uid and is currently in C10 into
+"C00/cgroup.procs".  U0 obviously has write access to the file and
+migration permission on the process; however, the common ancestor of
+the source cgroup C10 and the destination cgroup C00 is above the
+points of delegation and U0 would not have write access to its
+"cgroup.procs" and thus be denied with -EACCES.
+
+
+5. Other Changes
+
+5-1. [Un]populated Notification
 
 cgroup users often need a way to determine when a cgroup's
 subhierarchy becomes empty so that it can be cleaned up.  cgroup
@@ -289,7 +355,7 @@ supported and the interface files "release_agent" and
 "notify_on_release" do not exist.
 
 
-4-2. Other Core Changes
+5-2. Other Core Changes
 
 - None of the mount options is allowed.
 
@@ -306,14 +372,14 @@ supported and the interface files "release_agent" and
 - The "cgroup.clone_children" file is removed.
 
 
-4-3. Per-Controller Changes
+5-3. Per-Controller Changes
 
-4-3-1. blkio
+5-3-1. blkio
 
 - blk-throttle becomes properly hierarchical.
 
 
-4-3-2. cpuset
+5-3-2. cpuset
 
 - Tasks are kept in empty cpusets after hotplug and take on the masks
   of the nearest non-empty ancestor, instead of being moved to it.
@@ -322,7 +388,7 @@ supported and the interface files "release_agent" and
   masks of the nearest non-empty ancestor.
 
 
-4-3-3. memory
+5-3-3. memory
 
 - use_hierarchy is on by default and the cgroup file for the flag is
   not created.
@@ -407,9 +473,9 @@ supported and the interface files "release_agent" and
   memory.low, memory.high, and memory.max will use the string "max" to
   indicate and set the highest possible value.
 
-5. Planned Changes
+6. Planned Changes
 
-5-1. CAP for resource control
+6-1. CAP for resource control
 
 Unified hierarchy will require one of the capabilities(7), which is
 yet to be decided, for all resource control related knobs.  Process
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/4] kernfs: make kernfs_get_inode() public
@ 2015-06-16 20:58     ` Greg Kroah-Hartman
  0 siblings, 0 replies; 26+ messages in thread
From: Greg Kroah-Hartman @ 2015-06-16 20:58 UTC (permalink / raw)
  To: Tejun Heo; +Cc: lizefan, hannes, cgroups, linux-kernel, kernel-team

On Tue, Jun 16, 2015 at 03:10:14PM -0400, Tejun Heo wrote:
> Move kernfs_get_inode() prototype from fs/kernfs/kernfs-internal.h to
> include/linux/kernfs.h.  It obtains the matching inode for a
> kernfs_node.
> 
> It will be used by cgroup for inode based permission checks for now
> but is generally useful.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> ---
> Hello, Greg.
> 
> Given that cgroup will be the first and only, for now, user, I think
> it'd be the eaiest to route this with the related cgroup patches
> through the cgroup tree.  What do you think?

That's fine with me, feel free to add:

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

to it.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/4] kernfs: make kernfs_get_inode() public
@ 2015-06-16 20:58     ` Greg Kroah-Hartman
  0 siblings, 0 replies; 26+ messages in thread
From: Greg Kroah-Hartman @ 2015-06-16 20:58 UTC (permalink / raw)
  To: Tejun Heo
  Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, hannes-druUgvl0LCNAfugRpC6u6w,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg

On Tue, Jun 16, 2015 at 03:10:14PM -0400, Tejun Heo wrote:
> Move kernfs_get_inode() prototype from fs/kernfs/kernfs-internal.h to
> include/linux/kernfs.h.  It obtains the matching inode for a
> kernfs_node.
> 
> It will be used by cgroup for inode based permission checks for now
> but is generally useful.
> 
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Cc: Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
> ---
> Hello, Greg.
> 
> Given that cgroup will be the first and only, for now, user, I think
> it'd be the eaiest to route this with the related cgroup patches
> through the cgroup tree.  What do you think?

That's fine with me, feel free to add:

Acked-by: Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>

to it.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4] cgroup: require write perm on common ancestor when moving processes on the default hierarchy
@ 2015-06-18  3:14     ` Zefan Li
  0 siblings, 0 replies; 26+ messages in thread
From: Zefan Li @ 2015-06-18  3:14 UTC (permalink / raw)
  To: Tejun Heo; +Cc: hannes, cgroups, linux-kernel, kernel-team

Hi Tejun,

> -static int cgroup_procs_write_permission(struct task_struct *task)
> +static int cgroup_procs_write_permission(struct task_struct *task,
> +					 struct cgroup *dst_cgrp,
> +					 struct kernfs_open_file *of)
>  {
>  	const struct cred *cred = current_cred();
>  	const struct cred *tcred = get_task_cred(task);
> @@ -2407,6 +2409,26 @@ static int cgroup_procs_write_permission(struct task_struct *task)
>  	    !uid_eq(cred->euid, tcred->suid))
>  		ret = -EACCES;
>  
> +	if (cgroup_on_dfl(dst_cgrp)) {

if (!ret && cgroup_on_dfl(dst_cgrp))

> +		struct super_block *sb = of->file->f_path.dentry->d_sb;
> +		struct cgroup *cgrp;
> +		struct inode *inode;
> +
> +		down_read(&css_set_rwsem);
> +		cgrp = task_cgroup_from_root(task, &cgrp_dfl_root);
> +		up_read(&css_set_rwsem);
> +
> +		while (!cgroup_is_descendant(dst_cgrp, cgrp))
> +			cgrp = cgroup_parent(cgrp);
> +
> +		ret = -ENOMEM;
> +		inode = kernfs_get_inode(sb, cgrp->procs_kn);
> +		if (inode) {
> +			ret = inode_permission(inode, MAY_WRITE);
> +			iput(inode);
> +		}
> +	}
> +
>  	put_cred(tcred);
>  	return ret;
>  }


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4] cgroup: require write perm on common ancestor when moving processes on the default hierarchy
@ 2015-06-18  3:14     ` Zefan Li
  0 siblings, 0 replies; 26+ messages in thread
From: Zefan Li @ 2015-06-18  3:14 UTC (permalink / raw)
  To: Tejun Heo
  Cc: hannes-druUgvl0LCNAfugRpC6u6w, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg

Hi Tejun,

> -static int cgroup_procs_write_permission(struct task_struct *task)
> +static int cgroup_procs_write_permission(struct task_struct *task,
> +					 struct cgroup *dst_cgrp,
> +					 struct kernfs_open_file *of)
>  {
>  	const struct cred *cred = current_cred();
>  	const struct cred *tcred = get_task_cred(task);
> @@ -2407,6 +2409,26 @@ static int cgroup_procs_write_permission(struct task_struct *task)
>  	    !uid_eq(cred->euid, tcred->suid))
>  		ret = -EACCES;
>  
> +	if (cgroup_on_dfl(dst_cgrp)) {

if (!ret && cgroup_on_dfl(dst_cgrp))

> +		struct super_block *sb = of->file->f_path.dentry->d_sb;
> +		struct cgroup *cgrp;
> +		struct inode *inode;
> +
> +		down_read(&css_set_rwsem);
> +		cgrp = task_cgroup_from_root(task, &cgrp_dfl_root);
> +		up_read(&css_set_rwsem);
> +
> +		while (!cgroup_is_descendant(dst_cgrp, cgrp))
> +			cgrp = cgroup_parent(cgrp);
> +
> +		ret = -ENOMEM;
> +		inode = kernfs_get_inode(sb, cgrp->procs_kn);
> +		if (inode) {
> +			ret = inode_permission(inode, MAY_WRITE);
> +			iput(inode);
> +		}
> +	}
> +
>  	put_cred(tcred);
>  	return ret;
>  }

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 3/4] cgroup: require write perm on common ancestor when moving processes on the default hierarchy
  2015-06-16 19:10   ` Tejun Heo
  (?)
  (?)
@ 2015-06-18 17:59   ` Tejun Heo
  2015-06-18 19:04       ` Johannes Weiner
  -1 siblings, 1 reply; 26+ messages in thread
From: Tejun Heo @ 2015-06-18 17:59 UTC (permalink / raw)
  To: lizefan, hannes; +Cc: cgroups, linux-kernel, kernel-team

On traditional hierarchies, if a task has write access to "tasks" or
"cgroup.procs" file of a cgroup and its euid agrees with the target,
it can move the target to the cgroup; however, consider the following
scenario.  The owner of each cgroup is in the parentheses.

 R (root) - 0 (root) - 00 (user1) - 000 (user1)
          |                       \ 001 (user1)
          \ 1 (root) - 10 (user1)

The subtrees of 00 and 10 are delegated to user1; however, while both
subtrees may belong to the same user, it is clear that the two
subtrees are to be isolated - they're under completely separate
resource limits imposed by 0 and 1, respectively.  Note that 0 and 1
aren't strictly necessary but added to ease illustrating the issue.

If user1 is allowed to move processes between the two subtrees, the
intention of the hierarchy - keeping a given group of processes under
a subtree with certain resource restrictions while delegating
management of the subtree - can be circumvented by user1.

This happens because migration permission check doesn't consider the
hierarchical nature of cgroups.  To fix the issue, this patch adds an
extra permission requirement when userland tries to migrate a process
in the default hierarchy - the issuing task must have write access to
the common ancestor of "cgroup.procs" file of the ancestor in addition
to the destination's.

Conceptually, the issuer must be able to move the target process from
the source cgroup to the common ancestor of source and destination
cgroups and then to the destination.  As long as delegation is done in
a proper top-down way, this guarantees that a delegatee can't smuggle
processes across disjoint delegation domains.

The next patch will add documentation on the delegation model on the
default hierarchy.

v2: Fixed missing !ret test.  Spotted by Li Zefan.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
---
 include/linux/cgroup-defs.h |    1 +
 kernel/cgroup.c             |   30 +++++++++++++++++++++++++++---
 2 files changed, 28 insertions(+), 3 deletions(-)

--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -220,6 +220,7 @@ struct cgroup {
 	int populated_cnt;
 
 	struct kernfs_node *kn;		/* cgroup kernfs entry */
+	struct kernfs_node *procs_kn;	/* kn for "cgroup.procs" */
 	struct kernfs_node *populated_kn; /* kn for "cgroup.subtree_populated" */
 
 	/*
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2392,7 +2392,9 @@ static int cgroup_attach_task(struct cgr
 	return ret;
 }
 
-static int cgroup_procs_write_permission(struct task_struct *task)
+static int cgroup_procs_write_permission(struct task_struct *task,
+					 struct cgroup *dst_cgrp,
+					 struct kernfs_open_file *of)
 {
 	const struct cred *cred = current_cred();
 	const struct cred *tcred = get_task_cred(task);
@@ -2407,6 +2409,26 @@ static int cgroup_procs_write_permission
 	    !uid_eq(cred->euid, tcred->suid))
 		ret = -EACCES;
 
+	if (!ret && cgroup_on_dfl(dst_cgrp)) {
+		struct super_block *sb = of->file->f_path.dentry->d_sb;
+		struct cgroup *cgrp;
+		struct inode *inode;
+
+		down_read(&css_set_rwsem);
+		cgrp = task_cgroup_from_root(task, &cgrp_dfl_root);
+		up_read(&css_set_rwsem);
+
+		while (!cgroup_is_descendant(dst_cgrp, cgrp))
+			cgrp = cgroup_parent(cgrp);
+
+		ret = -ENOMEM;
+		inode = kernfs_get_inode(sb, cgrp->procs_kn);
+		if (inode) {
+			ret = inode_permission(inode, MAY_WRITE);
+			iput(inode);
+		}
+	}
+
 	put_cred(tcred);
 	return ret;
 }
@@ -2459,7 +2481,7 @@ static ssize_t __cgroup_procs_write(stru
 	get_task_struct(tsk);
 	rcu_read_unlock();
 
-	ret = cgroup_procs_write_permission(tsk);
+	ret = cgroup_procs_write_permission(tsk, cgrp, of);
 	if (!ret)
 		ret = cgroup_attach_task(cgrp, tsk, threadgroup);
 
@@ -3087,7 +3109,9 @@ static int cgroup_add_file(struct cgroup
 		return ret;
 	}
 
-	if (cft->seq_show == cgroup_populated_show)
+	if (cft->write == cgroup_procs_write)
+		cgrp->procs_kn = kn;
+	else if (cft->seq_show == cgroup_populated_show)
 		cgrp->populated_kn = kn;
 	return 0;
 }

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4] cgroup: require write perm on common ancestor when moving processes on the default hierarchy
  2015-06-18  3:14     ` Zefan Li
  (?)
@ 2015-06-18 17:59     ` Tejun Heo
  -1 siblings, 0 replies; 26+ messages in thread
From: Tejun Heo @ 2015-06-18 17:59 UTC (permalink / raw)
  To: Zefan Li; +Cc: hannes, cgroups, linux-kernel, kernel-team

On Thu, Jun 18, 2015 at 11:14:38AM +0800, Zefan Li wrote:
> > +	if (cgroup_on_dfl(dst_cgrp)) {
> 
> if (!ret && cgroup_on_dfl(dst_cgrp))

Oops, thanks a lot for spotting it. :)

-- 
tejun

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/4] cgroup: add delegation section to unified hierarchy documentation
@ 2015-06-18 19:01     ` Johannes Weiner
  0 siblings, 0 replies; 26+ messages in thread
From: Johannes Weiner @ 2015-06-18 19:01 UTC (permalink / raw)
  To: Tejun Heo; +Cc: lizefan, cgroups, linux-kernel, kernel-team

On Tue, Jun 16, 2015 at 03:10:17PM -0400, Tejun Heo wrote:
> +4-2. Common ancestor rule
> +
> +Let's say cgroups C0 and C1 have been delegated to user U0 who created
> +C00, C01 under C0 and C10 under C1 as follows.
> +
> + ~~~~~~~~~~~~~ - C0 - C00
> + ~ cgroup    ~      \ C01
> + ~ hierarchy ~
> + ~~~~~~~~~~~~~ - C1 - C10
> +
> +C0 and C1 are separate entities in terms of resource distribution
> +regardless of their relative positions in the hierarchy.  The
> +resources the processes under C0 are entitled to are controlled by
> +C0's ancestors and may be completely different from C1.  It's clear
> +that the intention of delegating C0 to U0 is allowing U0 to organize
> +the processes under C0 and further control the distribution of C0's
> +resources.
> +
> +On traditional hierarchies, if a task has write access to "tasks" or
> +"cgroup.procs" file of a cgroup and its uid agrees with the target, it
> +can move the target to the cgroup.  In the above example, U0 will not
> +only be able to move processes in each sub-hierarchy but also across
> +the two sub-hierarchies, effectively allowing it to violate the
> +organizational and resource restrictions implied by the hierarchical
> +structure above C0 and C1.
> +
> +On the unified hierarchy, to write to a "cgroup.procs" file, in
> +addition to the usual write permission to the file and uid match, the
> +writer must also have write acess to the "cgroup.procs" file of the
> +common ancestor of the source and destination cgroups.  This prevents
> +delegatees from smuggling processes across disjoint sub-hierarchies.

I think this would be better as the first paragraph in 4.2, because it
nicely explains and summarizes the rule and its reasoning; then follow
up with detailed explanations and examples that corroborate the design
choice.

Otherwise, the patch looks good to me.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/4] cgroup: add delegation section to unified hierarchy documentation
@ 2015-06-18 19:01     ` Johannes Weiner
  0 siblings, 0 replies; 26+ messages in thread
From: Johannes Weiner @ 2015-06-18 19:01 UTC (permalink / raw)
  To: Tejun Heo
  Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg

On Tue, Jun 16, 2015 at 03:10:17PM -0400, Tejun Heo wrote:
> +4-2. Common ancestor rule
> +
> +Let's say cgroups C0 and C1 have been delegated to user U0 who created
> +C00, C01 under C0 and C10 under C1 as follows.
> +
> + ~~~~~~~~~~~~~ - C0 - C00
> + ~ cgroup    ~      \ C01
> + ~ hierarchy ~
> + ~~~~~~~~~~~~~ - C1 - C10
> +
> +C0 and C1 are separate entities in terms of resource distribution
> +regardless of their relative positions in the hierarchy.  The
> +resources the processes under C0 are entitled to are controlled by
> +C0's ancestors and may be completely different from C1.  It's clear
> +that the intention of delegating C0 to U0 is allowing U0 to organize
> +the processes under C0 and further control the distribution of C0's
> +resources.
> +
> +On traditional hierarchies, if a task has write access to "tasks" or
> +"cgroup.procs" file of a cgroup and its uid agrees with the target, it
> +can move the target to the cgroup.  In the above example, U0 will not
> +only be able to move processes in each sub-hierarchy but also across
> +the two sub-hierarchies, effectively allowing it to violate the
> +organizational and resource restrictions implied by the hierarchical
> +structure above C0 and C1.
> +
> +On the unified hierarchy, to write to a "cgroup.procs" file, in
> +addition to the usual write permission to the file and uid match, the
> +writer must also have write acess to the "cgroup.procs" file of the
> +common ancestor of the source and destination cgroups.  This prevents
> +delegatees from smuggling processes across disjoint sub-hierarchies.

I think this would be better as the first paragraph in 4.2, because it
nicely explains and summarizes the rule and its reasoning; then follow
up with detailed explanations and examples that corroborate the design
choice.

Otherwise, the patch looks good to me.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 3/4] cgroup: require write perm on common ancestor when moving processes on the default hierarchy
@ 2015-06-18 19:04       ` Johannes Weiner
  0 siblings, 0 replies; 26+ messages in thread
From: Johannes Weiner @ 2015-06-18 19:04 UTC (permalink / raw)
  To: Tejun Heo; +Cc: lizefan, cgroups, linux-kernel, kernel-team

On Thu, Jun 18, 2015 at 01:59:27PM -0400, Tejun Heo wrote:
> On traditional hierarchies, if a task has write access to "tasks" or
> "cgroup.procs" file of a cgroup and its euid agrees with the target,
> it can move the target to the cgroup; however, consider the following
> scenario.  The owner of each cgroup is in the parentheses.
> 
>  R (root) - 0 (root) - 00 (user1) - 000 (user1)
>           |                       \ 001 (user1)
>           \ 1 (root) - 10 (user1)
> 
> The subtrees of 00 and 10 are delegated to user1; however, while both
> subtrees may belong to the same user, it is clear that the two
> subtrees are to be isolated - they're under completely separate
> resource limits imposed by 0 and 1, respectively.  Note that 0 and 1
> aren't strictly necessary but added to ease illustrating the issue.
> 
> If user1 is allowed to move processes between the two subtrees, the
> intention of the hierarchy - keeping a given group of processes under
> a subtree with certain resource restrictions while delegating
> management of the subtree - can be circumvented by user1.
> 
> This happens because migration permission check doesn't consider the
> hierarchical nature of cgroups.  To fix the issue, this patch adds an
> extra permission requirement when userland tries to migrate a process
> in the default hierarchy - the issuing task must have write access to
> the common ancestor of "cgroup.procs" file of the ancestor in addition
> to the destination's.
> 
> Conceptually, the issuer must be able to move the target process from
> the source cgroup to the common ancestor of source and destination
> cgroups and then to the destination.  As long as delegation is done in
> a proper top-down way, this guarantees that a delegatee can't smuggle
> processes across disjoint delegation domains.
> 
> The next patch will add documentation on the delegation model on the
> default hierarchy.
> 
> v2: Fixed missing !ret test.  Spotted by Li Zefan.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Li Zefan <lizefan@huawei.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 3/4] cgroup: require write perm on common ancestor when moving processes on the default hierarchy
@ 2015-06-18 19:04       ` Johannes Weiner
  0 siblings, 0 replies; 26+ messages in thread
From: Johannes Weiner @ 2015-06-18 19:04 UTC (permalink / raw)
  To: Tejun Heo
  Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg

On Thu, Jun 18, 2015 at 01:59:27PM -0400, Tejun Heo wrote:
> On traditional hierarchies, if a task has write access to "tasks" or
> "cgroup.procs" file of a cgroup and its euid agrees with the target,
> it can move the target to the cgroup; however, consider the following
> scenario.  The owner of each cgroup is in the parentheses.
> 
>  R (root) - 0 (root) - 00 (user1) - 000 (user1)
>           |                       \ 001 (user1)
>           \ 1 (root) - 10 (user1)
> 
> The subtrees of 00 and 10 are delegated to user1; however, while both
> subtrees may belong to the same user, it is clear that the two
> subtrees are to be isolated - they're under completely separate
> resource limits imposed by 0 and 1, respectively.  Note that 0 and 1
> aren't strictly necessary but added to ease illustrating the issue.
> 
> If user1 is allowed to move processes between the two subtrees, the
> intention of the hierarchy - keeping a given group of processes under
> a subtree with certain resource restrictions while delegating
> management of the subtree - can be circumvented by user1.
> 
> This happens because migration permission check doesn't consider the
> hierarchical nature of cgroups.  To fix the issue, this patch adds an
> extra permission requirement when userland tries to migrate a process
> in the default hierarchy - the issuing task must have write access to
> the common ancestor of "cgroup.procs" file of the ancestor in addition
> to the destination's.
> 
> Conceptually, the issuer must be able to move the target process from
> the source cgroup to the common ancestor of source and destination
> cgroups and then to the destination.  As long as delegation is done in
> a proper top-down way, this guarantees that a delegatee can't smuggle
> processes across disjoint delegation domains.
> 
> The next patch will add documentation on the delegation model on the
> default hierarchy.
> 
> v2: Fixed missing !ret test.  Spotted by Li Zefan.
> 
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Cc: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

Acked-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 4/4] cgroup: add delegation section to unified hierarchy documentation
@ 2015-06-18 20:23     ` Tejun Heo
  0 siblings, 0 replies; 26+ messages in thread
From: Tejun Heo @ 2015-06-18 20:23 UTC (permalink / raw)
  To: lizefan, hannes; +Cc: cgroups, linux-kernel, kernel-team

v2: Rearranged paragraphs as suggested by Johannes Weiner.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
---
 Documentation/cgroups/unified-hierarchy.txt |  102 +++++++++++++++++++++++-----
 1 file changed, 84 insertions(+), 18 deletions(-)

--- a/Documentation/cgroups/unified-hierarchy.txt
+++ b/Documentation/cgroups/unified-hierarchy.txt
@@ -17,15 +17,18 @@ CONTENTS
 3. Structural Constraints
   3-1. Top-down
   3-2. No internal tasks
-4. Other Changes
-  4-1. [Un]populated Notification
-  4-2. Other Core Changes
-  4-3. Per-Controller Changes
-    4-3-1. blkio
-    4-3-2. cpuset
-    4-3-3. memory
-5. Planned Changes
-  5-1. CAP for resource control
+4. Delegation
+  4-1. Model of delegation
+  4-2. Common ancestor rule
+5. Other Changes
+  5-1. [Un]populated Notification
+  5-2. Other Core Changes
+  5-3. Per-Controller Changes
+    5-3-1. blkio
+    5-3-2. cpuset
+    5-3-3. memory
+6. Planned Changes
+  6-1. CAP for resource control
 
 
 1. Background
@@ -245,9 +248,72 @@ cgroup must create children and transfer
 before enabling controllers in its "cgroup.subtree_control" file.
 
 
-4. Other Changes
+4. Delegation
 
-4-1. [Un]populated Notification
+4-1. Model of delegation
+
+A cgroup can be delegated to a less privileged user by granting write
+access of the directory and its "cgroup.procs" file to the user.  Note
+that the resource control knobs in a given directory concern the
+resources of the parent and thus must not be delegated along with the
+directory.
+
+Once delegated, the user can build sub-hierarchy under the directory,
+organize processes as it sees fit and further distribute the resources
+it got from the parent.  The limits and other settings of all resource
+controllers are hierarchical and regardless of what happens in the
+delegated sub-hierarchy, nothing can escape the resource restrictions
+imposed by the parent.
+
+Currently, cgroup doesn't impose any restrictions on the number of
+cgroups in or nesting depth of a delegated sub-hierarchy; however,
+this may in the future be limited explicitly.
+
+
+4-2. Common ancestor rule
+
+On the unified hierarchy, to write to a "cgroup.procs" file, in
+addition to the usual write permission to the file and uid match, the
+writer must also have write access to the "cgroup.procs" file of the
+common ancestor of the source and destination cgroups.  This prevents
+delegatees from smuggling processes across disjoint sub-hierarchies.
+
+Let's say cgroups C0 and C1 have been delegated to user U0 who created
+C00, C01 under C0 and C10 under C1 as follows.
+
+ ~~~~~~~~~~~~~ - C0 - C00
+ ~ cgroup    ~      \ C01
+ ~ hierarchy ~
+ ~~~~~~~~~~~~~ - C1 - C10
+
+C0 and C1 are separate entities in terms of resource distribution
+regardless of their relative positions in the hierarchy.  The
+resources the processes under C0 are entitled to are controlled by
+C0's ancestors and may be completely different from C1.  It's clear
+that the intention of delegating C0 to U0 is allowing U0 to organize
+the processes under C0 and further control the distribution of C0's
+resources.
+
+On traditional hierarchies, if a task has write access to "tasks" or
+"cgroup.procs" file of a cgroup and its uid agrees with the target, it
+can move the target to the cgroup.  In the above example, U0 will not
+only be able to move processes in each sub-hierarchy but also across
+the two sub-hierarchies, effectively allowing it to violate the
+organizational and resource restrictions implied by the hierarchical
+structure above C0 and C1.
+
+On the unified hierarchy, let's say U0 wants to write the pid of a
+process which has a matching uid and is currently in C10 into
+"C00/cgroup.procs".  U0 obviously has write access to the file and
+migration permission on the process; however, the common ancestor of
+the source cgroup C10 and the destination cgroup C00 is above the
+points of delegation and U0 would not have write access to its
+"cgroup.procs" and thus be denied with -EACCES.
+
+
+5. Other Changes
+
+5-1. [Un]populated Notification
 
 cgroup users often need a way to determine when a cgroup's
 subhierarchy becomes empty so that it can be cleaned up.  cgroup
@@ -289,7 +355,7 @@ supported and the interface files "relea
 "notify_on_release" do not exist.
 
 
-4-2. Other Core Changes
+5-2. Other Core Changes
 
 - None of the mount options is allowed.
 
@@ -306,14 +372,14 @@ supported and the interface files "relea
 - The "cgroup.clone_children" file is removed.
 
 
-4-3. Per-Controller Changes
+5-3. Per-Controller Changes
 
-4-3-1. blkio
+5-3-1. blkio
 
 - blk-throttle becomes properly hierarchical.
 
 
-4-3-2. cpuset
+5-3-2. cpuset
 
 - Tasks are kept in empty cpusets after hotplug and take on the masks
   of the nearest non-empty ancestor, instead of being moved to it.
@@ -322,7 +388,7 @@ supported and the interface files "relea
   masks of the nearest non-empty ancestor.
 
 
-4-3-3. memory
+5-3-3. memory
 
 - use_hierarchy is on by default and the cgroup file for the flag is
   not created.
@@ -407,9 +473,9 @@ supported and the interface files "relea
   memory.low, memory.high, and memory.max will use the string "max" to
   indicate and set the highest possible value.
 
-5. Planned Changes
+6. Planned Changes
 
-5-1. CAP for resource control
+6-1. CAP for resource control
 
 Unified hierarchy will require one of the capabilities(7), which is
 yet to be decided, for all resource control related knobs.  Process

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 4/4] cgroup: add delegation section to unified hierarchy documentation
@ 2015-06-18 20:23     ` Tejun Heo
  0 siblings, 0 replies; 26+ messages in thread
From: Tejun Heo @ 2015-06-18 20:23 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, hannes-druUgvl0LCNAfugRpC6u6w
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg

v2: Rearranged paragraphs as suggested by Johannes Weiner.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
---
 Documentation/cgroups/unified-hierarchy.txt |  102 +++++++++++++++++++++++-----
 1 file changed, 84 insertions(+), 18 deletions(-)

--- a/Documentation/cgroups/unified-hierarchy.txt
+++ b/Documentation/cgroups/unified-hierarchy.txt
@@ -17,15 +17,18 @@ CONTENTS
 3. Structural Constraints
   3-1. Top-down
   3-2. No internal tasks
-4. Other Changes
-  4-1. [Un]populated Notification
-  4-2. Other Core Changes
-  4-3. Per-Controller Changes
-    4-3-1. blkio
-    4-3-2. cpuset
-    4-3-3. memory
-5. Planned Changes
-  5-1. CAP for resource control
+4. Delegation
+  4-1. Model of delegation
+  4-2. Common ancestor rule
+5. Other Changes
+  5-1. [Un]populated Notification
+  5-2. Other Core Changes
+  5-3. Per-Controller Changes
+    5-3-1. blkio
+    5-3-2. cpuset
+    5-3-3. memory
+6. Planned Changes
+  6-1. CAP for resource control
 
 
 1. Background
@@ -245,9 +248,72 @@ cgroup must create children and transfer
 before enabling controllers in its "cgroup.subtree_control" file.
 
 
-4. Other Changes
+4. Delegation
 
-4-1. [Un]populated Notification
+4-1. Model of delegation
+
+A cgroup can be delegated to a less privileged user by granting write
+access of the directory and its "cgroup.procs" file to the user.  Note
+that the resource control knobs in a given directory concern the
+resources of the parent and thus must not be delegated along with the
+directory.
+
+Once delegated, the user can build sub-hierarchy under the directory,
+organize processes as it sees fit and further distribute the resources
+it got from the parent.  The limits and other settings of all resource
+controllers are hierarchical and regardless of what happens in the
+delegated sub-hierarchy, nothing can escape the resource restrictions
+imposed by the parent.
+
+Currently, cgroup doesn't impose any restrictions on the number of
+cgroups in or nesting depth of a delegated sub-hierarchy; however,
+this may in the future be limited explicitly.
+
+
+4-2. Common ancestor rule
+
+On the unified hierarchy, to write to a "cgroup.procs" file, in
+addition to the usual write permission to the file and uid match, the
+writer must also have write access to the "cgroup.procs" file of the
+common ancestor of the source and destination cgroups.  This prevents
+delegatees from smuggling processes across disjoint sub-hierarchies.
+
+Let's say cgroups C0 and C1 have been delegated to user U0 who created
+C00, C01 under C0 and C10 under C1 as follows.
+
+ ~~~~~~~~~~~~~ - C0 - C00
+ ~ cgroup    ~      \ C01
+ ~ hierarchy ~
+ ~~~~~~~~~~~~~ - C1 - C10
+
+C0 and C1 are separate entities in terms of resource distribution
+regardless of their relative positions in the hierarchy.  The
+resources the processes under C0 are entitled to are controlled by
+C0's ancestors and may be completely different from C1.  It's clear
+that the intention of delegating C0 to U0 is allowing U0 to organize
+the processes under C0 and further control the distribution of C0's
+resources.
+
+On traditional hierarchies, if a task has write access to "tasks" or
+"cgroup.procs" file of a cgroup and its uid agrees with the target, it
+can move the target to the cgroup.  In the above example, U0 will not
+only be able to move processes in each sub-hierarchy but also across
+the two sub-hierarchies, effectively allowing it to violate the
+organizational and resource restrictions implied by the hierarchical
+structure above C0 and C1.
+
+On the unified hierarchy, let's say U0 wants to write the pid of a
+process which has a matching uid and is currently in C10 into
+"C00/cgroup.procs".  U0 obviously has write access to the file and
+migration permission on the process; however, the common ancestor of
+the source cgroup C10 and the destination cgroup C00 is above the
+points of delegation and U0 would not have write access to its
+"cgroup.procs" and thus be denied with -EACCES.
+
+
+5. Other Changes
+
+5-1. [Un]populated Notification
 
 cgroup users often need a way to determine when a cgroup's
 subhierarchy becomes empty so that it can be cleaned up.  cgroup
@@ -289,7 +355,7 @@ supported and the interface files "relea
 "notify_on_release" do not exist.
 
 
-4-2. Other Core Changes
+5-2. Other Core Changes
 
 - None of the mount options is allowed.
 
@@ -306,14 +372,14 @@ supported and the interface files "relea
 - The "cgroup.clone_children" file is removed.
 
 
-4-3. Per-Controller Changes
+5-3. Per-Controller Changes
 
-4-3-1. blkio
+5-3-1. blkio
 
 - blk-throttle becomes properly hierarchical.
 
 
-4-3-2. cpuset
+5-3-2. cpuset
 
 - Tasks are kept in empty cpusets after hotplug and take on the masks
   of the nearest non-empty ancestor, instead of being moved to it.
@@ -322,7 +388,7 @@ supported and the interface files "relea
   masks of the nearest non-empty ancestor.
 
 
-4-3-3. memory
+5-3-3. memory
 
 - use_hierarchy is on by default and the cgroup file for the flag is
   not created.
@@ -407,9 +473,9 @@ supported and the interface files "relea
   memory.low, memory.high, and memory.max will use the string "max" to
   indicate and set the highest possible value.
 
-5. Planned Changes
+6. Planned Changes
 
-5-1. CAP for resource control
+6-1. CAP for resource control
 
 Unified hierarchy will require one of the capabilities(7), which is
 yet to be decided, for all resource control related knobs.  Process

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 4/4] cgroup: add delegation section to unified hierarchy documentation
@ 2015-06-18 20:46       ` Johannes Weiner
  0 siblings, 0 replies; 26+ messages in thread
From: Johannes Weiner @ 2015-06-18 20:46 UTC (permalink / raw)
  To: Tejun Heo; +Cc: lizefan, cgroups, linux-kernel, kernel-team

On Thu, Jun 18, 2015 at 04:23:27PM -0400, Tejun Heo wrote:
> v2: Rearranged paragraphs as suggested by Johannes Weiner.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 4/4] cgroup: add delegation section to unified hierarchy documentation
@ 2015-06-18 20:46       ` Johannes Weiner
  0 siblings, 0 replies; 26+ messages in thread
From: Johannes Weiner @ 2015-06-18 20:46 UTC (permalink / raw)
  To: Tejun Heo
  Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg

On Thu, Jun 18, 2015 at 04:23:27PM -0400, Tejun Heo wrote:
> v2: Rearranged paragraphs as suggested by Johannes Weiner.
> 
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Cc: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>

Acked-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHSET cgroup/for-4.2] cgroup: require write perm on common ancestor for migration
@ 2015-06-18 20:55   ` Tejun Heo
  0 siblings, 0 replies; 26+ messages in thread
From: Tejun Heo @ 2015-06-18 20:55 UTC (permalink / raw)
  To: lizefan, hannes; +Cc: cgroups, linux-kernel, kernel-team

Applied to cgroup/for-4.2.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHSET cgroup/for-4.2] cgroup: require write perm on common ancestor for migration
@ 2015-06-18 20:55   ` Tejun Heo
  0 siblings, 0 replies; 26+ messages in thread
From: Tejun Heo @ 2015-06-18 20:55 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, hannes-druUgvl0LCNAfugRpC6u6w
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg

Applied to cgroup/for-4.2.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2015-06-18 20:55 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-16 19:10 [PATCHSET cgroup/for-4.2] cgroup: require write perm on common ancestor for migration Tejun Heo
2015-06-16 19:10 ` Tejun Heo
2015-06-16 19:10 ` [PATCH 1/4] kernfs: make kernfs_get_inode() public Tejun Heo
2015-06-16 19:10   ` Tejun Heo
2015-06-16 20:58   ` Greg Kroah-Hartman
2015-06-16 20:58     ` Greg Kroah-Hartman
2015-06-16 19:10 ` [PATCH 2/4] cgroup: separate out cgroup_procs_write_permission() from __cgroup_procs_write() Tejun Heo
2015-06-16 19:10   ` Tejun Heo
2015-06-16 19:10 ` [PATCH 3/4] cgroup: require write perm on common ancestor when moving processes on the default hierarchy Tejun Heo
2015-06-16 19:10   ` Tejun Heo
2015-06-18  3:14   ` Zefan Li
2015-06-18  3:14     ` Zefan Li
2015-06-18 17:59     ` Tejun Heo
2015-06-18 17:59   ` [PATCH v2 " Tejun Heo
2015-06-18 19:04     ` Johannes Weiner
2015-06-18 19:04       ` Johannes Weiner
2015-06-16 19:10 ` [PATCH 4/4] cgroup: add delegation section to unified hierarchy documentation Tejun Heo
2015-06-16 19:10   ` Tejun Heo
2015-06-18 19:01   ` Johannes Weiner
2015-06-18 19:01     ` Johannes Weiner
2015-06-18 20:23   ` [PATCH v2 " Tejun Heo
2015-06-18 20:23     ` Tejun Heo
2015-06-18 20:46     ` Johannes Weiner
2015-06-18 20:46       ` Johannes Weiner
2015-06-18 20:55 ` [PATCHSET cgroup/for-4.2] cgroup: require write perm on common ancestor for migration Tejun Heo
2015-06-18 20:55   ` Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.