[PATCH v3 0/2] cgroup: allow management of subtrees by new cgroup namespaces

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 0/2] cgroup: allow management of subtrees by new cgroup namespaces
@ 2016-05-02 14:01 Aleksa Sarai
  2016-05-02 14:01 ` [PATCH v3 1/2] cgroup: apply common ancestor cgroup.procs restriction in cgroupv1 Aleksa Sarai
  2016-05-02 14:01 ` [PATCH v3 2/2] cgroup: allow management of subtrees by new cgroup namespaces Aleksa Sarai
  0 siblings, 2 replies; 10+ messages in thread
From: Aleksa Sarai @ 2016-05-02 14:01 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner
  Cc: cgroups, linux-kernel, dev, Aleksa Sarai, Aleksa Sarai

This is an updated version of v2 of this patchset[1]. It includes an
improvement to cgroup core to correctly apply the common ancestor
cgroup.procs restriction on cgroupv1 hierarchies. This fixes
187fe84067bd ("cgroup: require write perm on common ancestor when moving
processes on the default hierarchy"), ensuring that the three guarantees
described in the second patch are held for both cgroupv1 and cgroupv2.

In addition, this patchset now includes a way to disable the auto-mode
changing functionality. An administrator may disable it on a
cgroup-by-cgroup basis by setting the cgroups to have the permissions
a-rx. This update also includes an updated version of the comment
describing the guarantees given by Unix directory permissions and cgroup
core.

[1]: https://lkml.org/lkml/2016/5/1/87

Aleksa Sarai (2):
  cgroup: apply common ancestor cgroup.procs restriction in cgroupv1
  cgroup: allow management of subtrees by new cgroup namespaces

 kernel/cgroup.c | 97 ++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 92 insertions(+), 5 deletions(-)

-- 
2.8.1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3 1/2] cgroup: apply common ancestor cgroup.procs restriction in cgroupv1
  2016-05-02 14:01 [PATCH v3 0/2] cgroup: allow management of subtrees by new cgroup namespaces Aleksa Sarai
@ 2016-05-02 14:01 ` Aleksa Sarai
  2016-05-02 16:03   ` Tejun Heo
  2016-05-02 14:01 ` [PATCH v3 2/2] cgroup: allow management of subtrees by new cgroup namespaces Aleksa Sarai
  1 sibling, 1 reply; 10+ messages in thread
From: Aleksa Sarai @ 2016-05-02 14:01 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner
  Cc: cgroups, linux-kernel, dev, Aleksa Sarai, Aleksa Sarai

The common ancestor restriction for moving tasks between cgroups (the
process moving the task must have the right to write to cgroup.procs in
both the destination and common ancestor cgroup) only applied for
cgroupv2 (cgroups in the default hierarchy). This meant that there was a
different policy for unprivileged users in the different cgroup
hierarchies.

Update cgroup_procs_write_permission() to apply the cgroup.procs
restriction regardless of the cgroup root of the destination cgroup.
However, if the task doesn't have any association with the destination
hierarchy, there's no permission check to be done. In addition, if the
destination cgroup is a descendant of the task's current cgroup then
there are no further permission checks. This is important for
unprivileged processes creating subtrees.

---
% sudo mount -t cgroup -o pids cgroup /sys/fs/cgroup/pids
% sudo mkdir /sys/fs/cgroup/pids/parent
% sudo chmod a+w /sys/fs/cgroup/pids/parent
% echo $$ | sudo tee /sys/fs/cgroup/pids/parent/cgroup.procs
1337
% mkdir /sys/fs/cgroup/pids/parent/{a,b}
% echo $$ | tee /sys/fs/cgroup/pids/parent/a/cgroup.procs
1337
% echo $$ | tee /sys/fs/cgroup/pids/parent/b/cgroup.procs
tee: /sys/fs/cgroup/pids/parent/b/cgroup.procs: Permission denied
---

Fixes: 187fe84067bd ("cgroup: require write perm on common ancestor when
moving processes on the default hierarchy")
Cc: dev@opencontainers.org
Signed-off-by: Aleksa Sarai <asarai@suse.de>

Signed-off-by: Aleksa Sarai <asarai@suse.de>
---
 kernel/cgroup.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 909a7d31ffd3..149217681054 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2784,7 +2784,7 @@ static int cgroup_procs_write_permission(struct task_struct *task,
 	int ret = 0;
 
 	/*
-	 * even if we're attaching all tasks in the thread group, we only
+	 * Even if we're attaching all tasks in the thread group, we only
 	 * need to check permissions on one of them.
 	 */
 	if (!uid_eq(cred->euid, GLOBAL_ROOT_UID) &&
@@ -2792,15 +2792,22 @@ static int cgroup_procs_write_permission(struct task_struct *task,
 	    !uid_eq(cred->euid, tcred->suid))
 		ret = -EACCES;
 
-	if (!ret && cgroup_on_dfl(dst_cgrp)) {
+	if (!ret) {
 		struct super_block *sb = of->file->f_path.dentry->d_sb;
 		struct cgroup *cgrp;
 		struct inode *inode;
 
 		spin_lock_bh(&css_set_lock);
-		cgrp = task_cgroup_from_root(task, &cgrp_dfl_root);
+		cgrp = task_cgroup_from_root(task, dst_cgrp->root);
 		spin_unlock_bh(&css_set_lock);
 
+		/*
+		 * We don't do any cgroup.proc permission checks if the user is trying
+		 * to mode the process to subtree of the current cgroup it is in.
+		 */
+		if (cgroup_is_descendant(dst_cgrp, cgrp))
+			goto out_put_cred;
+
 		while (!cgroup_is_descendant(dst_cgrp, cgrp))
 			cgrp = cgroup_parent(cgrp);
 
@@ -2812,6 +2819,7 @@ static int cgroup_procs_write_permission(struct task_struct *task,
 		}
 	}
 
+out_put_cred:
 	put_cred(tcred);
 	return ret;
 }
@@ -4824,6 +4832,7 @@ static struct cftype cgroup_dfl_base_files[] = {
 static struct cftype cgroup_legacy_base_files[] = {
 	{
 		.name = "cgroup.procs",
+		.file_offset = offsetof(struct cgroup, procs_file),
 		.seq_start = cgroup_pidlist_start,
 		.seq_next = cgroup_pidlist_next,
 		.seq_stop = cgroup_pidlist_stop,
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 2/2] cgroup: allow management of subtrees by new cgroup namespaces
  2016-05-02 14:01 [PATCH v3 0/2] cgroup: allow management of subtrees by new cgroup namespaces Aleksa Sarai
  2016-05-02 14:01 ` [PATCH v3 1/2] cgroup: apply common ancestor cgroup.procs restriction in cgroupv1 Aleksa Sarai
@ 2016-05-02 14:01 ` Aleksa Sarai
  2016-05-02 16:06   ` Tejun Heo
  1 sibling, 1 reply; 10+ messages in thread
From: Aleksa Sarai @ 2016-05-02 14:01 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner
  Cc: cgroups, linux-kernel, dev, Aleksa Sarai, Aleksa Sarai

Allow an unprivileged processes to control subtrees of their associated
cgroup, a necessary feature if an unprivileged container (set up with an
unprivileged user namespace) wishes to take advantage of cgroups for its
own subprocesses.

Change the mode of the cgroup directory for each cgroup association,
allowing the process to create subtrees and modify the limits of the
subtrees *without* allowing the process to modify its own limits. Due to
the cgroup core restrictions and unix permission model, this allows for
processes to create new subtrees without breaking the cgroup limits for
the process.

In addition, this change doesn't add any odd or new functionality (it
essentially emulates a privileged user allowing a process to create
subtrees of its current cgroup association). This means that client code
can take advantage of this without being aware of the kernel change.

It should be noted that the mode changing isn't done when a process
attaches to an existing cgroup namespace, because the process which
created the cgroup namespace may have decided to disallow other
processes from adding subtrees to its namespace. The mode changing is
also not done if the original mode of the cgroup didn't have a+rx, so
administrators can opt-out of this feature on a cgroup-by-cgroup basis.

Cc: dev@opencontainers.org
Signed-off-by: Aleksa Sarai <asarai@suse.de>
---
 kernel/cgroup.c | 82 +++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 80 insertions(+), 2 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 149217681054..2b44b6e4fa60 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -3562,6 +3562,14 @@ static int cgroup_kn_set_ugid(struct kernfs_node *kn)
 	return kernfs_setattr(kn, &iattr);
 }
 
+static int cgroup_kn_set_mode(struct kernfs_node *kn, umode_t mode)
+{
+	struct iattr iattr = { .ia_valid = ATTR_MODE,
+			       .ia_mode = mode, };
+
+	return kernfs_setattr(kn, &iattr);
+}
+
 static int cgroup_add_file(struct cgroup_subsys_state *css, struct cgroup *cgrp,
 			   struct cftype *cft)
 {
@@ -6233,12 +6241,18 @@ void free_cgroup_ns(struct cgroup_namespace *ns)
 }
 EXPORT_SYMBOL(free_cgroup_ns);
 
+#define S_IRXO (S_IROTH | S_IXOTH)
+
 struct cgroup_namespace *copy_cgroup_ns(unsigned long flags,
 					struct user_namespace *user_ns,
 					struct cgroup_namespace *old_ns)
 {
+	struct cgroup_subsys *ss;
 	struct cgroup_namespace *new_ns;
 	struct css_set *cset;
+	int ssid, err;
+	umode_t mode[CGROUP_SUBSYS_COUNT];
+	u16 updated_mask = 0;
 
 	BUG_ON(!old_ns);
 
@@ -6253,11 +6267,60 @@ struct cgroup_namespace *copy_cgroup_ns(unsigned long flags,
 
 	mutex_lock(&cgroup_mutex);
 	spin_lock_bh(&css_set_lock);
-
 	cset = task_css_set(current);
 	get_css_set(cset);
-
 	spin_unlock_bh(&css_set_lock);
+
+	/*
+	 * When creating a new cgroup namespace, we change the permissions of
+	 * the cgroup's directory to be a+w. This is necessary in order to
+	 * allow new cgroup namespaces to manage their own subtrees. This does
+	 * not allow for an escape from cgroup policy for three reasons:
+	 *
+	 * 1. cgroups are hierarchical, so any subtree must (at the very least)
+	 *    obey the original cgroup's restrictions.
+	 *
+	 * 2. The unix permission model for directories does not allow a user
+	 *    with write access to a directory to directly modify the dentries.
+	 *    While a user can unlink such files in a normal directory, in
+	 *    cgroupfs this is not allowed.
+	 *
+	 * 3. cgroup core doesn't allow tasks to be migrated by users that have
+	 *    write access to two subtrees' cgroup.procs files unless they also
+	 *    have write access to the common ancestor of the two subtrees'
+	 *    cgroup.procs file. Thus you cannot use a complicit process in
+	 *    less restrictive cgroup to overcome your own cgroup restriction.
+	 *
+	 * Therefore, we can safely change the mode of the cgroup without any
+	 * ill effects. We don't do this on cgroupns_install(), because an
+	 * process might not want to allow other processes to create subtrees in
+	 * a cgroup namespace they've created.
+	 *
+	 * In addition, we only change the permission if the cgroup directory
+	 * already has a+rx. This means that an administrator can disable this
+	 * functionality for a cgroup by making it no longer world rx.
+	 */
+	rcu_read_lock();
+	for_each_subsys(ss, ssid) {
+		struct kernfs_node *kn = cset->subsys[ssid]->cgroup->kn;
+
+		kernfs_get(kn);
+		kernfs_break_active_protection(kn);
+
+		mode[ssid] = kn->mode;
+		if (mode & S_IRXO == S_IRXO)
+			err = cgroup_kn_set_mode(kn, mode[ssid] | S_IWOTH);
+
+		kernfs_unbreak_active_protection(kn);
+		kernfs_put(kn);
+
+		if (err)
+			goto err_unset_mode;
+
+		updated_mask |= 1 << ssid;
+	}
+	rcu_read_unlock();
+
 	mutex_unlock(&cgroup_mutex);
 
 	new_ns = alloc_cgroup_ns();
@@ -6270,6 +6333,21 @@ struct cgroup_namespace *copy_cgroup_ns(unsigned long flags,
 	new_ns->root_cset = cset;
 
 	return new_ns;
+
+err_unset_mode:
+	/* Clean up the mode changes. */
+	do_each_subsys_mask(ss, ssid, updated_mask) {
+		struct kernfs_node *kn = cset->subsys[ssid]->cgroup->kn;
+
+		kernfs_break_active_protection(kn);
+		cgroup_kn_set_mode(kn, mode[ssid]);
+		kernfs_unbreak_active_protection(kn);
+	} while_each_subsys_mask();
+
+	rcu_read_unlock();
+	mutex_unlock(&cgroup_mutex);
+
+	return ERR_PTR(err);
 }
 
 static inline struct cgroup_namespace *to_cg_ns(struct ns_common *ns)
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 1/2] cgroup: apply common ancestor cgroup.procs restriction in cgroupv1
  2016-05-02 14:01 ` [PATCH v3 1/2] cgroup: apply common ancestor cgroup.procs restriction in cgroupv1 Aleksa Sarai
@ 2016-05-02 16:03   ` Tejun Heo
  2016-05-03  1:44     ` Aleksa Sarai
  0 siblings, 1 reply; 10+ messages in thread
From: Tejun Heo @ 2016-05-02 16:03 UTC (permalink / raw)
  To: Aleksa Sarai
  Cc: Li Zefan, Johannes Weiner, cgroups, linux-kernel, dev, Aleksa Sarai

On Tue, May 03, 2016 at 12:01:20AM +1000, Aleksa Sarai wrote:
> The common ancestor restriction for moving tasks between cgroups (the
> process moving the task must have the right to write to cgroup.procs in
> both the destination and common ancestor cgroup) only applied for
> cgroupv2 (cgroups in the default hierarchy). This meant that there was a
> different policy for unprivileged users in the different cgroup
> hierarchies.
> 
> Update cgroup_procs_write_permission() to apply the cgroup.procs
> restriction regardless of the cgroup root of the destination cgroup.
> However, if the task doesn't have any association with the destination
> hierarchy, there's no permission check to be done. In addition, if the
> destination cgroup is a descendant of the task's current cgroup then
> there are no further permission checks. This is important for
> unprivileged processes creating subtrees.

So, you can't apply a new restriction like this retro-actively to
cgroup v1 hierarchies.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 2/2] cgroup: allow management of subtrees by new cgroup namespaces
  2016-05-02 14:01 ` [PATCH v3 2/2] cgroup: allow management of subtrees by new cgroup namespaces Aleksa Sarai
@ 2016-05-02 16:06   ` Tejun Heo
  2016-05-03  1:52     ` Aleksa Sarai
  0 siblings, 1 reply; 10+ messages in thread
From: Tejun Heo @ 2016-05-02 16:06 UTC (permalink / raw)
  To: Aleksa Sarai
  Cc: Li Zefan, Johannes Weiner, cgroups, linux-kernel, dev, Aleksa Sarai

Hello,

On Tue, May 03, 2016 at 12:01:21AM +1000, Aleksa Sarai wrote:
> Allow an unprivileged processes to control subtrees of their associated
> cgroup, a necessary feature if an unprivileged container (set up with an
> unprivileged user namespace) wishes to take advantage of cgroups for its
> own subprocesses.
> 
> Change the mode of the cgroup directory for each cgroup association,
> allowing the process to create subtrees and modify the limits of the
> subtrees *without* allowing the process to modify its own limits. Due to
> the cgroup core restrictions and unix permission model, this allows for
> processes to create new subtrees without breaking the cgroup limits for
> the process.

I don't get why this is necessary.  What's wrong with the parent
setting up permission correctly for the namespace?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 1/2] cgroup: apply common ancestor cgroup.procs restriction in cgroupv1
  2016-05-02 16:03   ` Tejun Heo
@ 2016-05-03  1:44     ` Aleksa Sarai
  0 siblings, 0 replies; 10+ messages in thread
From: Aleksa Sarai @ 2016-05-03  1:44 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, Johannes Weiner, cgroups, linux-kernel, dev, Aleksa Sarai

> So, you can't apply a new restriction like this retro-actively to
> cgroup v1 hierarchies.

Okay, understood. I'll see if it's possible to make this change safe 
without adding restrictions to cgroupv1 hierarchies.

-- 
Aleksa Sarai
Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 2/2] cgroup: allow management of subtrees by new cgroup namespaces
  2016-05-02 16:06   ` Tejun Heo
@ 2016-05-03  1:52     ` Aleksa Sarai
  2016-05-03 15:55       ` Tejun Heo
  0 siblings, 1 reply; 10+ messages in thread
From: Aleksa Sarai @ 2016-05-03  1:52 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, Johannes Weiner, cgroups, linux-kernel, dev,
	Aleksa Sarai, James Bottomley

>> Change the mode of the cgroup directory for each cgroup association,
>> allowing the process to create subtrees and modify the limits of the
>> subtrees *without* allowing the process to modify its own limits. Due to
>> the cgroup core restrictions and unix permission model, this allows for
>> processes to create new subtrees without breaking the cgroup limits for
>> the process.
>
> I don't get why this is necessary.  What's wrong with the parent
> setting up permission correctly for the namespace?

The parent setting this up requires either:

1. A privileged process giving the process write access to the cgroup 
directory it is currently in. Since no software does this by default, 
and in addition it might not always make sense (systemd doesn't like 
processes messing around in their respective cgroups), this has to be 
dealt with better.

2. The process itself is a privileged process, which is not the usecase 
I'm going for with rootless containers. If you have root, you can do 
whatever you want in this regard and this feature doesn't affect you.

The main reason for this patchset is because I would like to make sure 
that unprivileged processes can take advantage of cgroup features (such 
as the freezer cgroup, and to just do regular resource limiting). Since 
cgroups are a hierarchy, I can see no fundamental reason why this is not 
possible. And the cgroup namespace appears to be the perfect way of 
doing it. I firmly believe there is a simple and safe way of allowing 
unprivileged processes to create subtrees of their current cgroup.

However, I agree with James that this patchset isn't ideal (it was my 
first rough attempt). I think I'll get to work on properly virtualising 
/sys/fs/cgroup, which will allow for a new cgroup namespace to modify 
subtrees (but without allowing for cgroup escape) -- by pinning what pid 
namespace the cgroup was created under. We can use the same type of 
virtualization that /proc does (except instead of selectively showing 
the dentries, we selectively show different owners of the dentries).

Would that be acceptable?

-- 
Aleksa Sarai
Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 2/2] cgroup: allow management of subtrees by new cgroup namespaces
  2016-05-03  1:52     ` Aleksa Sarai
@ 2016-05-03 15:55       ` Tejun Heo
  2016-05-04  9:58         ` Aleksa Sarai
  0 siblings, 1 reply; 10+ messages in thread
From: Tejun Heo @ 2016-05-03 15:55 UTC (permalink / raw)
  To: Aleksa Sarai
  Cc: Li Zefan, Johannes Weiner, cgroups, linux-kernel, dev,
	Aleksa Sarai, James Bottomley

Hello, Aleksa.

On Tue, May 03, 2016 at 11:52:22AM +1000, Aleksa Sarai wrote:
> However, I agree with James that this patchset isn't ideal (it was my first
> rough attempt). I think I'll get to work on properly virtualising
> /sys/fs/cgroup, which will allow for a new cgroup namespace to modify
> subtrees (but without allowing for cgroup escape) -- by pinning what pid
> namespace the cgroup was created under. We can use the same type of
> virtualization that /proc does (except instead of selectively showing the
> dentries, we selectively show different owners of the dentries).
> 
> Would that be acceptable?

I'm still not sold on the idea.  For better or worse, the permission
model is mostly based on vfs and I don't want to deviate too much as
that's likely to become confusing pretty quickly.  If a sub-hierarchy
is to be delegated, that's upto whomever is controlling cgroup
hierarchy in the sub-domain.  We can expand the perm checks to
consider user namespaces but I'd like to avoid going beyond that.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 2/2] cgroup: allow management of subtrees by new cgroup namespaces
  2016-05-03 15:55       ` Tejun Heo
@ 2016-05-04  9:58         ` Aleksa Sarai
  2016-05-09 14:04           ` Aleksa Sarai
  0 siblings, 1 reply; 10+ messages in thread
From: Aleksa Sarai @ 2016-05-04  9:58 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, Johannes Weiner, cgroups, linux-kernel, dev,
	Aleksa Sarai, James Bottomley

>> However, I agree with James that this patchset isn't ideal (it was my first
>> rough attempt). I think I'll get to work on properly virtualising
>> /sys/fs/cgroup, which will allow for a new cgroup namespace to modify
>> subtrees (but without allowing for cgroup escape) -- by pinning what pid
>> namespace the cgroup was created under. We can use the same type of
>> virtualization that /proc does (except instead of selectively showing the
>> dentries, we selectively show different owners of the dentries).
>>
>> Would that be acceptable?
>
> I'm still not sold on the idea.  For better or worse, the permission
> model is mostly based on vfs and I don't want to deviate too much as
> that's likely to become confusing pretty quickly.  If a sub-hierarchy
> is to be delegated, that's upto whomever is controlling cgroup
> hierarchy in the sub-domain.  We can expand the perm checks to
> consider user namespaces but I'd like to avoid going beyond that.

As I mentioned in the other thread, I had another idea for a way to do 
this (that was more complicated to implement, so I went with this 
simpler patch first):

On unshare(), we create a new cgroup that is a child of the calling 
process's current cgroup association (in all of the hierarchies, 
obviously). The new cgroup directory (and contained files) are owned by 
current_fs_{u,g}id(). The process is then moved into the cgroup, and the 
root of the cgroup namespace is changed to be that cgroup. This way, 
there would be no disparity between the VFS and cgroup permission model 
-- there'll be a global view of the cgroup hierarchy that everyone 
agrees on.

I had three concerns with this patch:

1. It would cause issues with the no internal process constraint of 
cgroupv2. I spent some time trying to figure out how cgroupv2 would act 
in this case (do all of the processes automatically get moved into new 
subdirectories?), but couldn't figure it out. If it does move all of the 
processes into the subdirectory, we'd have to make a sink cgroup as well 
as the one for the namespace -- which then just becomes inefficient (you 
have a cgroup that has no purpose from an administration perspective).

2. We'd have to come up with a way to make the name of the new cgroup 
resistent to clashes (especially with cgroups already created by other 
processes), which smacks of a suboptimal solution to the problem.

3. We'd be creating cgroups and attaching processes to the cgroups 
without explicitly going through the VFS layer. This presumably means 
that other parts of userspace might not get alerted properly to the 
changes. I'm not really sure how we should deal with that, but it sounds 
like it could cause problems for someone.

-- 
Aleksa Sarai
Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 2/2] cgroup: allow management of subtrees by new cgroup namespaces
  2016-05-04  9:58         ` Aleksa Sarai
@ 2016-05-09 14:04           ` Aleksa Sarai
  0 siblings, 0 replies; 10+ messages in thread
From: Aleksa Sarai @ 2016-05-09 14:04 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, Johannes Weiner, cgroups, linux-kernel, dev,
	Aleksa Sarai, James Bottomley

>>> However, I agree with James that this patchset isn't ideal (it was my
>>> first
>>> rough attempt). I think I'll get to work on properly virtualising
>>> /sys/fs/cgroup, which will allow for a new cgroup namespace to modify
>>> subtrees (but without allowing for cgroup escape) -- by pinning what pid
>>> namespace the cgroup was created under. We can use the same type of
>>> virtualization that /proc does (except instead of selectively showing
>>> the
>>> dentries, we selectively show different owners of the dentries).
>>>
>>> Would that be acceptable?
>>
>> I'm still not sold on the idea.  For better or worse, the permission
>> model is mostly based on vfs and I don't want to deviate too much as
>> that's likely to become confusing pretty quickly.  If a sub-hierarchy
>> is to be delegated, that's upto whomever is controlling cgroup
>> hierarchy in the sub-domain.  We can expand the perm checks to
>> consider user namespaces but I'd like to avoid going beyond that.
>
> As I mentioned in the other thread, I had another idea for a way to do
> this (that was more complicated to implement, so I went with this
> simpler patch first):
>
> On unshare(), we create a new cgroup that is a child of the calling
> process's current cgroup association (in all of the hierarchies,
> obviously). The new cgroup directory (and contained files) are owned by
> current_fs_{u,g}id(). The process is then moved into the cgroup, and the
> root of the cgroup namespace is changed to be that cgroup. This way,
> there would be no disparity between the VFS and cgroup permission model
> -- there'll be a global view of the cgroup hierarchy that everyone
> agrees on.
>
> I had three concerns with this patch:
>
> 1. It would cause issues with the no internal process constraint of
> cgroupv2. I spent some time trying to figure out how cgroupv2 would act
> in this case (do all of the processes automatically get moved into new
> subdirectories?), but couldn't figure it out. If it does move all of the
> processes into the subdirectory, we'd have to make a sink cgroup as well
> as the one for the namespace -- which then just becomes inefficient (you
> have a cgroup that has no purpose from an administration perspective).
>
> 2. We'd have to come up with a way to make the name of the new cgroup
> resistent to clashes (especially with cgroups already created by other
> processes), which smacks of a suboptimal solution to the problem.
>
> 3. We'd be creating cgroups and attaching processes to the cgroups
> without explicitly going through the VFS layer. This presumably means
> that other parts of userspace might not get alerted properly to the
> changes. I'm not really sure how we should deal with that, but it sounds
> like it could cause problems for someone.

Does anyone have any opinions on this idea?

-- 
Aleksa Sarai
Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-05-09 14:04 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-02 14:01 [PATCH v3 0/2] cgroup: allow management of subtrees by new cgroup namespaces Aleksa Sarai
2016-05-02 14:01 ` [PATCH v3 1/2] cgroup: apply common ancestor cgroup.procs restriction in cgroupv1 Aleksa Sarai
2016-05-02 16:03   ` Tejun Heo
2016-05-03  1:44     ` Aleksa Sarai
2016-05-02 14:01 ` [PATCH v3 2/2] cgroup: allow management of subtrees by new cgroup namespaces Aleksa Sarai
2016-05-02 16:06   ` Tejun Heo
2016-05-03  1:52     ` Aleksa Sarai
2016-05-03 15:55       ` Tejun Heo
2016-05-04  9:58         ` Aleksa Sarai
2016-05-09 14:04           ` Aleksa Sarai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).