From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751710AbcETXcW (ORCPT <rfc822;w@1wt.eu>);
	Fri, 20 May 2016 19:32:22 -0400
Received: from h2.hallyn.com ([78.46.35.8]:51164 "EHLO h2.hallyn.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1751661AbcETXcT (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Fri, 20 May 2016 19:32:19 -0400
Date: Fri, 20 May 2016 18:32:16 -0500
From: "Serge E. Hallyn" <serge@hallyn.com>
To: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>,
        "Eric W. Biederman" <ebiederm@xmission.com>,
        LKML <linux-kernel@vger.kernel.org>, Jann Horn <jann@thejh.net>,
        Seth Forshee <seth.forshee@canonical.com>,
        LSM <linux-security-module@vger.kernel.org>,
        "Andrew G. Morgan" <morgan@kernel.org>,
        Kees Cook <keescook@chromium.org>,
        Michael Kerrisk-manpages <mtk.manpages@gmail.com>,
        "Serge E. Hallyn" <serge.hallyn@ubuntu.com>,
        Linux API <linux-api@vger.kernel.org>,
        Andy Lutomirski <luto@amacapital.net>,
        Linux Containers <containers@lists.linux-foundation.org>
Subject: Re: [PATCH RFC] user-namespaced file capabilities - now with more
 magic
Message-ID: <20160520233216.GA14872@mail.hallyn.com>
References: <20160516214804.GA5926@mail.hallyn.com>
 <20160518215752.GA9187@mail.hallyn.com>
 <1463691236.2465.74.camel@linux.vnet.ibm.com>
 <20160520034048.GA31216@mail.hallyn.com>
 <1463743150.2465.100.camel@linux.vnet.ibm.com>
 <87mvnklh20.fsf@x220.int.ebiederm.org>
 <20160520192607.GA11601@mail.hallyn.com>
 <87iny8h5yv.fsf@x220.int.ebiederm.org>
 <20160520195902.GB12101@mail.hallyn.com>
 <1463786592.2763.74.camel@linux.vnet.ibm.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1463786592.2763.74.camel@linux.vnet.ibm.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Quoting Mimi Zohar (zohar@linux.vnet.ibm.com):
> On Fri, 2016-05-20 at 14:59 -0500, Serge E. Hallyn wrote:
> > Quoting Eric W. Biederman (ebiederm@xmission.com):
> > > "Serge E. Hallyn" <serge@hallyn.com> writes:
> > > 
> > > > Quoting Eric W. Biederman (ebiederm@xmission.com):
> > > >> Mimi Zohar <zohar@linux.vnet.ibm.com> writes:
> > > >> 
> > > >> > On Thu, 2016-05-19 at 22:40 -0500, Serge E. Hallyn wrote:
> > > >> >> Quoting Mimi Zohar (zohar@linux.vnet.ibm.com):
> > > >> >> > On Wed, 2016-05-18 at 16:57 -0500, Serge E. Hallyn wrote:
> > > >> >
> > > >> >> > > diff --git a/fs/xattr.c b/fs/xattr.c
> > > >> >> > > index 4861322..5c0e7ae 100644
> > > >> >> > > --- a/fs/xattr.c
> > > >> >> > > +++ b/fs/xattr.c
> > > >> >> > > @@ -94,11 +94,26 @@ int __vfs_setxattr_noperm(struct dentry *dentry, const char *name,
> > > >> >> > >  {
> > > >> >> > >  	struct inode *inode = dentry->d_inode;
> > > >> >> > >  	int error = -EOPNOTSUPP;
> > > >> >> > > +	void *wvalue = NULL;
> > > >> >> > > +	size_t wsize = 0;
> > > >> >> > >  	int issec = !strncmp(name, XATTR_SECURITY_PREFIX,
> > > >> >> > >  				   XATTR_SECURITY_PREFIX_LEN);
> > > >> >> > > 
> > > >> >> > > -	if (issec)
> > > >> >> > > +	if (issec) {
> > > >> >> > >  		inode->i_flags &= ~S_NOSEC;
> > > >> >> > > +		/* if root in a non-init user_ns tries to set
> > > >> >> > > +		 * security.capability, write a security.nscapability
> > > >> >> > > +		 * in its place */
> > > >> >> > > +		if (!strcmp(name, "security.capability") &&
> > > >> >> > > +				current_user_ns() != &init_user_ns) {
> > > >> >> > > +			cap_setxattr_make_nscap(dentry, value, size, &wvalue, &wsize);
> > > >> >> > > +			if (!wvalue)
> > > >> >> > > +				return -EPERM;
> > > >> >> > > +			value = wvalue;
> > > >> >> > > +			size = wsize;
> > > >> >> > > +			name = "security.nscapability";
> > > >> >> > > +		}
> > > >> >> > 
> > > >> >> > The call to capable_wrt_inode_uidgid() is hidden behind
> > > >> >> > cap_setxattr_make_nscap().  Does it make sense to call it here instead,
> > > >> >> > before the security.capability test?  This would lay the foundation for
> > > >> >> > doing something similar for IMA.
> > > >> >> 
> > > >> >> Might make sense to move that.  Though looking at it with fresh eyes I wonder
> > > >> >> whether adding less code here at __vfs_setxattr_noperm(), i.e.
> > > >> >> 
> > > >> >> 		if (!cap_setxattr_makenscap(dentry, &value, &size, &name))
> > > >> >> 			return -EPERM;
> > > >> >> 
> > > >> >> would be cleaner.
> > > >> >
> > > >> > Yes, it would be cleaner,  but I'm suggesting you do all the hard work
> > > >> > making it generic.  Then the rest of us can follow your lead.  Its more
> > > >> > likely that you'll get it right.  At a high level, it might look like:
> > > >> >
> > > >> >                /* Permit root in a non-init user_ns to modify the security
> > > >> >                  * namespace xattr equivalents (eg. nscapability, ns_ima, etc). 
> > > >> >                  */
> > > >> >                 if ((current_user_ns() != &init_user_ns) &&
> > > >> >                         capable_wrt_inode_uidgid(inode, CAP_SETFCAP)) {
> > > >> >
> > > >> > 			if  security..capability
> > > >> > 				call capability  /* set nscapability? */
> > > >> >
> > > >> > 			else if security.ima 
> > > >> > 				call ima 	/* set ns_ima? */
> > > >> > 		}
> > > >> 
> > > >> Hmm.  I am confused about this part of the strategy.
> > > >> 
> > > >> I don't understand the capability vs nscapability distinction.  It seems
> > > >> to add complexity without benefit.
> > > >
> > > > ...  Well, yes, we could simply make a new version of security.capability
> > > > xattr, and make rootid == 0 mean it was written by the init_user_ns.  Is
> > > > that what you mean?
> > > 
> > > Yes.
> > > 
> > > That would seem to simplify the logic to ensure the policy we enforce is
> > > consistent with what is on disk.
> > 
> > I'll give that a shot.  I think the reason I did it this way was that I'm
> > still kind of stuck in the not-magic way of thinking about it.  But yeah
> > with the kernel magically writing inthe kuid there's probably no reason not
> > to.
> 
> Totally confused.  Will this method allow multiple instances of the
> xattr on disk? 

No, but we don't actually want that anyway.  The current behavior for
security.capability is that it works in all user namespaces.  So we
want to continue the behavior that if root in the init_user_ns sets a
capability, that works in all namespaces.  Allowing other namespaces
to set the capability would only be confusing.

So in the patchset I had, security.capability can only be set by
init_user_ns but works in all namespaces.  security.nscapability
cannot be set if secrity.capability is set.  And security.nscapability
works in all child namespaces of the root uid which set the cap.

-serge