From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1755354AbbG3RZk (ORCPT <rfc822;w@1wt.eu>);
	Thu, 30 Jul 2015 13:25:40 -0400
Received: from mail-ob0-f172.google.com ([209.85.214.172]:34256 "EHLO
	mail-ob0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751477AbbG3RZh (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Thu, 30 Jul 2015 13:25:37 -0400
Date: Thu, 30 Jul 2015 12:25:17 -0500
From: Seth Forshee <seth.forshee@canonical.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Casey Schaufler <casey@schaufler-ca.com>,
        Stephen Smalley <sds@tycho.nsa.gov>,
        Andy Lutomirski <luto@amacapital.net>,
        Alexander Viro <viro@zeniv.linux.org.uk>,
        Linux FS Devel <linux-fsdevel@vger.kernel.org>,
        LSM List <linux-security-module@vger.kernel.org>,
        SELinux-NSA <selinux@tycho.nsa.gov>,
        Serge Hallyn <serge.hallyn@canonical.com>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 0/7] Initial support for user namespace owned mounts
Message-ID: <20150730172517.GB131344@ubuntu-hedt>
References: <CALCETrUTgfRzzdk3T0wZASSE+KC9S+kmyZbD6-xStS2RRaGiBw@mail.gmail.com>
 <20150721203550.GA80838@ubuntu-hedt>
 <55AEF75F.9010703@schaufler-ca.com>
 <20150722155634.GB124342@ubuntu-hedt>
 <55AFDCA6.10201@schaufler-ca.com>
 <20150722193223.GD124342@ubuntu-hedt>
 <55B02FBD.4040606@schaufler-ca.com>
 <20150728204009.GF83521@ubuntu-hedt>
 <55BA4E48.50109@schaufler-ca.com>
 <878u9xlgo8.fsf@x220.int.ebiederm.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <878u9xlgo8.fsf@x220.int.ebiederm.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Thu, Jul 30, 2015 at 12:05:27PM -0500, Eric W. Biederman wrote:
> Casey Schaufler <casey@schaufler-ca.com> writes:
> 
> > On 7/28/2015 1:40 PM, Seth Forshee wrote:
> >> On Wed, Jul 22, 2015 at 05:05:17PM -0700, Casey Schaufler wrote:
> >>>> This is what I currently think you want for user ns mounts:
> >>>>
> >>>>  1. smk_root and smk_default are assigned the label of the backing
> >>>>     device.
> >>>>  2. s_root is assigned the transmute property.
> >>>>  3. For existing files:
> >>>>     a. Files with the same label as the backing device are accessible.
> >>>>     b. Files with any other label are not accessible.
> >>> That's right. Accept correct data, reject anything that's not right.
> >>>
> >>>> If this is right, there are a couple lingering questions in my mind.
> >>>>
> >>>> First, what happens with files created in directories with the same
> >>>> label as the backing device but without the transmute property set? The
> >>>> inode for the new file will initially be labeled with smk_of_current(),
> >>>> but then during d_instantiate it will get smk_default and thus end up
> >>>> with the label we want. So that seems okay.
> >>> Yes.
> >>>
> >>>> The second is whether files with the SMACK64EXEC attribute is still a
> >>>> problem. It seems it is, for files with the same label as the backing
> >>>> store at least. I think we can simply skip the code that reads out this
> >>>> xattr and sets smk_task for user ns mounts, or else skip assigning the
> >>>> label to the new task in bprm_set_creds. The latter seems more
> >>>> consistent with the approach you've suggested for dealing with labels
> >>>> from disk.
> >>> Yes, I think that skipping the smk_fetch(XATTR_NAME_SMACKEXEC, ...) in
> >>> smack_d_instantiate for unprivileged mounts would do the trick.
> >>>
> >>>> So I guess all of that seems okay, though perhaps a bit restrictive
> >>>> given that the user who mounted the filesystem already has full access
> >>>> to the backing store.
> >>> In truth, there is no reason to expect that the "user" who did the
> >>> mount will ever have a Smack label that differs from the label of
> >>> the backing store. If what we've got here seems restrictive, it's
> >>> because you've got access from someone other than the "user".
> >>>
> >>>> Please let me know whether or not this matches up with what you are
> >>>> thinking, then I can procede with the implementation.
> >>> My current mindset is that, if you're going to allow unprivileged
> >>> mounts of user defined backing stores, this is as safe as we can
> >>> make it.
> >> All right, I've got a patch which I think does this, and I've managed to
> >> do some testing to confirm that it behaves like I expect. How does this
> >> look?
> >>
> >> What's missing is getting the label from the block device inode; as
> >> Stephen discovered the inode that I thought we could get the label from
> >> turned out to be the wrong one. Afaict we would need a new hook in order
> >> to do that, so for now I'm using the label of the proccess calling
> >> mount.
> >
> > That will be OK if the mount processing checks for write access to
> > the backing store. I haven't looked to see if it does. If it doesn't
> > the problems should be pretty obvious.
> 
> 
> do_new_mount
>   vfs_kern_mount
>     mount_fs
>       ...
>         mount_bdev
>           blkdev_get_by_path(...,FMODE_READ| FMODE_WRITE | FMODE_EXCL,...)
>             lookup_bdev
>               kern_path
>                 filename_lookup
>                   path_lookupat
>                     lookup_last
>                       walk_component
>             blkdev_get(...,mode,...)
>               __blkdev_get(...,mode,...)
>                 devcgroup_inode_permission(bdev->bd_inode, perm)
> 
> *scratches my head*
> 
> It looks like we don't actually check the permissions on the block
> device.  Tomoyo has a hack for it.  nfsd does something.  There is
> devcgroup silliness.
> 
> But overall it looks like we depend on capable(CAP_SYS_ADMIN).
> 
> Seth I do believe we have found another area of the vfs we will need to
> short up before allowing unprivileged mounts of block device based
> filesystems.
> 
> It looks like there are enough hacks someone with a clue coming through
> and making the code make more sense seems like a good idea anyway.

Yep, I just came to the same conclusion myself, and I also verified the
behavior emperically. That's definitely a problem. I'll get to work on
fixing that.

Seth

From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from goalie.tycho.ncsc.mil (goalie [144.51.242.250])
 by tarius.tycho.ncsc.mil (8.14.4/8.14.4) with ESMTP id t6UHPdJ0026159
 for <selinux@tycho.nsa.gov>; Thu, 30 Jul 2015 13:25:42 -0400
Received: by obnw1 with SMTP id w1so35758402obn.3
 for <selinux@tycho.nsa.gov>; Thu, 30 Jul 2015 10:25:37 -0700 (PDT)
Date: Thu, 30 Jul 2015 12:25:17 -0500
From: Seth Forshee <seth.forshee@canonical.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Subject: Re: [PATCH 0/7] Initial support for user namespace owned mounts
Message-ID: <20150730172517.GB131344@ubuntu-hedt>
References: <CALCETrUTgfRzzdk3T0wZASSE+KC9S+kmyZbD6-xStS2RRaGiBw@mail.gmail.com>
 <20150721203550.GA80838@ubuntu-hedt>
 <55AEF75F.9010703@schaufler-ca.com>
 <20150722155634.GB124342@ubuntu-hedt>
 <55AFDCA6.10201@schaufler-ca.com>
 <20150722193223.GD124342@ubuntu-hedt>
 <55B02FBD.4040606@schaufler-ca.com>
 <20150728204009.GF83521@ubuntu-hedt>
 <55BA4E48.50109@schaufler-ca.com>
 <878u9xlgo8.fsf@x220.int.ebiederm.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
In-Reply-To: <878u9xlgo8.fsf@x220.int.ebiederm.org>
Cc: Serge Hallyn <serge.hallyn@canonical.com>,
 "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
 Andy Lutomirski <luto@amacapital.net>,
 Linux FS Devel <linux-fsdevel@vger.kernel.org>,
 LSM List <linux-security-module@vger.kernel.org>,
 Alexander Viro <viro@zeniv.linux.org.uk>, SELinux-NSA <selinux@tycho.nsa.gov>,
 Stephen Smalley <sds@tycho.nsa.gov>
List-Id: "Security-Enhanced Linux \(SELinux\) mailing list"
 <selinux.tycho.nsa.gov>
List-Post: <mailto:selinux@tycho.nsa.gov>
List-Help: <mailto:selinux-request@tycho.nsa.gov?subject=help>

On Thu, Jul 30, 2015 at 12:05:27PM -0500, Eric W. Biederman wrote:
> Casey Schaufler <casey@schaufler-ca.com> writes:
> 
> > On 7/28/2015 1:40 PM, Seth Forshee wrote:
> >> On Wed, Jul 22, 2015 at 05:05:17PM -0700, Casey Schaufler wrote:
> >>>> This is what I currently think you want for user ns mounts:
> >>>>
> >>>>  1. smk_root and smk_default are assigned the label of the backing
> >>>>     device.
> >>>>  2. s_root is assigned the transmute property.
> >>>>  3. For existing files:
> >>>>     a. Files with the same label as the backing device are accessible.
> >>>>     b. Files with any other label are not accessible.
> >>> That's right. Accept correct data, reject anything that's not right.
> >>>
> >>>> If this is right, there are a couple lingering questions in my mind.
> >>>>
> >>>> First, what happens with files created in directories with the same
> >>>> label as the backing device but without the transmute property set? The
> >>>> inode for the new file will initially be labeled with smk_of_current(),
> >>>> but then during d_instantiate it will get smk_default and thus end up
> >>>> with the label we want. So that seems okay.
> >>> Yes.
> >>>
> >>>> The second is whether files with the SMACK64EXEC attribute is still a
> >>>> problem. It seems it is, for files with the same label as the backing
> >>>> store at least. I think we can simply skip the code that reads out this
> >>>> xattr and sets smk_task for user ns mounts, or else skip assigning the
> >>>> label to the new task in bprm_set_creds. The latter seems more
> >>>> consistent with the approach you've suggested for dealing with labels
> >>>> from disk.
> >>> Yes, I think that skipping the smk_fetch(XATTR_NAME_SMACKEXEC, ...) in
> >>> smack_d_instantiate for unprivileged mounts would do the trick.
> >>>
> >>>> So I guess all of that seems okay, though perhaps a bit restrictive
> >>>> given that the user who mounted the filesystem already has full access
> >>>> to the backing store.
> >>> In truth, there is no reason to expect that the "user" who did the
> >>> mount will ever have a Smack label that differs from the label of
> >>> the backing store. If what we've got here seems restrictive, it's
> >>> because you've got access from someone other than the "user".
> >>>
> >>>> Please let me know whether or not this matches up with what you are
> >>>> thinking, then I can procede with the implementation.
> >>> My current mindset is that, if you're going to allow unprivileged
> >>> mounts of user defined backing stores, this is as safe as we can
> >>> make it.
> >> All right, I've got a patch which I think does this, and I've managed to
> >> do some testing to confirm that it behaves like I expect. How does this
> >> look?
> >>
> >> What's missing is getting the label from the block device inode; as
> >> Stephen discovered the inode that I thought we could get the label from
> >> turned out to be the wrong one. Afaict we would need a new hook in order
> >> to do that, so for now I'm using the label of the proccess calling
> >> mount.
> >
> > That will be OK if the mount processing checks for write access to
> > the backing store. I haven't looked to see if it does. If it doesn't
> > the problems should be pretty obvious.
> 
> 
> do_new_mount
>   vfs_kern_mount
>     mount_fs
>       ...
>         mount_bdev
>           blkdev_get_by_path(...,FMODE_READ| FMODE_WRITE | FMODE_EXCL,...)
>             lookup_bdev
>               kern_path
>                 filename_lookup
>                   path_lookupat
>                     lookup_last
>                       walk_component
>             blkdev_get(...,mode,...)
>               __blkdev_get(...,mode,...)
>                 devcgroup_inode_permission(bdev->bd_inode, perm)
> 
> *scratches my head*
> 
> It looks like we don't actually check the permissions on the block
> device.  Tomoyo has a hack for it.  nfsd does something.  There is
> devcgroup silliness.
> 
> But overall it looks like we depend on capable(CAP_SYS_ADMIN).
> 
> Seth I do believe we have found another area of the vfs we will need to
> short up before allowing unprivileged mounts of block device based
> filesystems.
> 
> It looks like there are enough hacks someone with a clue coming through
> and making the code make more sense seems like a good idea anyway.

Yep, I just came to the same conclusion myself, and I also verified the
behavior emperically. That's definitely a problem. I'll get to work on
fixing that.

Seth