From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752628AbcFUPzv (ORCPT ); Tue, 21 Jun 2016 11:55:51 -0400 Received: from h2.hallyn.com ([78.46.35.8]:48632 "EHLO h2.hallyn.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751757AbcFUPzp (ORCPT ); Tue, 21 Jun 2016 11:55:45 -0400 Date: Tue, 21 Jun 2016 10:45:27 -0500 From: "Serge E. Hallyn" To: Topi Miettinen Cc: serge@hallyn.com, linux-kernel@vger.kernel.org Subject: Re: [RFC] capabilities: add capability cgroup controller Message-ID: <20160621154527.GA10565@mail.hallyn.com> References: <1466278320-17024-1-git-send-email-toiwoton@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Quoting Topi Miettinen (toiwoton@gmail.com): > On 06/19/16 20:01, serge@hallyn.com wrote: > > apologies for top posting, this phone doesn't support inline) > > > > Where are you preventing less privileged tasks from limiting the caps of a more privileged task? It looks like you are relying on the cgroupfs for that? > > I didn't think that aspect. Some of that could be dealt with by > preventing tasks which don't have CAP_SETPCAP to make other tasks join > or set the bounding set. One problem is that the privileges would not be > checked at cgroup.procs open(2) time but only when writing. In general, > less privileged tasks should not be able to gain new capabilities even > if they were somehow able to join the cgroup and also your case must be > addressed in full. > > > > > Overall I'm not a fan of this for several reasons. Can you tell us precisely what your use case is? > > There are two. > > 1. Capability use tracking at cgroup level. There is no way to know > which capabilities have been used and which could be trimmed. With > cgroup approach, we can also keep track of how subprocesses use > capabilities. Thus the administrator can quickly get a reasonable > estimate of a bounding set just by reading the capability.used file. So to estimate the privileges needed by an application? Note this could also be done with something like systemtap, but that's not as friendly of course. Keeping the tracking part separate from enforcement might be worthwhile. If you wanted to push that part of the patchset, we could keep discussing the enforcement aspect separately. > 2. cgroup approach to capability management. Currently the capabilities > are inherited with bounding set and ambient capabilities taking their > part. With cgroups, additional limits can be set which apply to the > whole group. I admit that the difference to the current model is small. > > Could you list the several reasons you mentioned? Should have done it sunday while my mind was clear on it The first is that while we normally think of preventing a less privileged task from becoming more privileged, it can be just as dangerous to allow a less privileged task from robbing a more privileged task of some capability. See in particular the sendmail capability story. By allowing an unprivileged task to run a setuid-root task in an unexpected configuration - namely, denying it the ability to setuid(), it was possible to get a root owned task doing your bidding. So that's why I'm particularly concerned about allowing cgroupfs dac permissions to dictate who gets to say what privileges other tasks on the system can get. Another reason is simply that the capability calculation scheme is for historical reasons already quite complicated. So if there is something worthwhile to add we can discuss, but it'll take a compelling otherwise-unsolvable use case to convince me we should complicate it further. In general, capabilites can be very cleanly predicted by looking at the parent task and the file being executed. Adding a cgroup into the mix allows basically any random task to sneak in, change the setting, and make a process unexpectedly not get a privileged on a new execve when it did get it on the previous execve. As amorgan will point out, posix caps are meant to be purely orthogonal to dac. We have hooks in place to make setuid work, but those can be shut off to get a system where uid root is noone special (other than owning system files). So again, allowing a root user through cgroupfs access to change the bounding set for other tasks flies in the face of that. (we're already smudging that picture with the user-namespaced filecaps, though trying not to) -serge