From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751032AbcETR2w (ORCPT ); Fri, 20 May 2016 13:28:52 -0400 Received: from bedivere.hansenpartnership.com ([66.63.167.143]:46096 "EHLO bedivere.hansenpartnership.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752235AbcETR2u (ORCPT ); Fri, 20 May 2016 13:28:50 -0400 Message-ID: <1463765326.8091.42.camel@HansenPartnership.com> Subject: Re: [PATCH v4 0/2] cgroup: allow management of subtrees by new cgroup namespaces From: James Bottomley To: Tejun Heo Cc: Aleksa Sarai , Li Zefan , Johannes Weiner , Aleksa Sarai , cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, dev@opencontainers.org Date: Fri, 20 May 2016 13:28:46 -0400 In-Reply-To: <20160520165326.GE5632@htj.duckdns.org> References: <1463196000-13900-1-git-send-email-asarai@suse.de> <573F23D0.2030500@suse.de> <20160520152244.GB5632@htj.duckdns.org> <1463758258.8091.3.camel@HansenPartnership.com> <20160520160352.GC5632@htj.duckdns.org> <1463760550.8091.13.camel@HansenPartnership.com> <20160520161759.GD5632@htj.duckdns.org> <1463761509.8091.19.camel@HansenPartnership.com> <20160520165326.GE5632@htj.duckdns.org> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.16.5 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 2016-05-20 at 12:53 -0400, Tejun Heo wrote: > Hello, > > On Fri, May 20, 2016 at 12:25:09PM -0400, James Bottomley wrote: > > OK, so is the only problem cleanup? If so, what if I proposed that > > a > > For generic cases, it's a much larger problem. We'd have to change > delegation model completely so that delegations are allowed by > default, which btw can't be allowed on v1 hierarchies as some > controllers don't behave properly hierarchically in v1 and would > allow unpriv users to escape the constraints of its ancestors. Just so I'm clear: by delegation you mean create a subdirectory in the cgroup hierarchy with a non-root owner? We may have a solution for the escape constraints problem: see below. > > cgroup directory could only be created by the owner of the userns > > (which would be any old unprivileged user) iff they create a cgroup > > ns and the cgroup ns would be responsible for removing it again, so > > the cgroup subdirectory would be tied to the cgroup namespace as > > its holder and we'd use release of the cgroup to remove all the > > directories? > > Unfortunately, cgroup hierarchy isn't designed to support this sort > of automatic delegation. Unpriv processes would be able to escape > constraints on v1 with some controllers and on v2 controllers have to > be explicitly enabled by root for delegated scope to have access to > them. Not necessarily. We also talked about pinning the cgroup tree so that once you enter the cgroup namespace, your current cgroup directory becomes your root, meaning you can't cd back into the ancestors and thus can't write their tasks file, meaning, I think, that it should be impossible to escape ancestor constraints. > We can try to isolate these delegated subtrees and make them > work transparently, which rgroup tried to do, but that collides > directly with the vfs conventions (rgroups don't show up in cgroup > hierarchy at all so avoid this problem). Well, let's see if we can solve it within the current framework first. > > Why does an unpriv NS need to have cgroup delegated to it without > cooperation from cgroup manager? There's actually many answers to this. The one I'm insterested in is the ability for applications to make use of container features without having to ask permission from some orchestration engine. The problem most people are looking at is how do I prevent the cgroup manager from running as root, because that's a security problem waiting to happen. > If for resource control, I'm pretty sure we don't want to allow > that without explicit cooperation from the enclosing scope. The enclosing scope should be allowed to define the parameters (happens today with namespaces) but there shouldn't be an active "thing" which is the permission gateway. > Overall, it feels like this is trying to work around an issue which > should be solved from userland. So it's not impossible to have some setuid (or CAP_ scoped) universal binary do this. We do this today for the user namespace range of uids problem. However, it would have to be something that operated independently of the cgroup manager, since every container orchestration system wants to be their own cgroup manager, so there's no one true one. James