From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from youngberry.canonical.com ([91.189.89.112]:55733 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756127AbeFOUsA (ORCPT ); Fri, 15 Jun 2018 16:48:00 -0400 Received: from mail-it0-f70.google.com ([209.85.214.70]) by youngberry.canonical.com with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1fTvdX-0000UA-Ea for linux-fsdevel@vger.kernel.org; Fri, 15 Jun 2018 20:47:59 +0000 Received: by mail-it0-f70.google.com with SMTP id 13-v6so2790057itl.7 for ; Fri, 15 Jun 2018 13:47:59 -0700 (PDT) Date: Fri, 15 Jun 2018 15:47:56 -0500 From: Seth Forshee To: James Bottomley Cc: Aleksa Sarai , containers@lists.linux-foundation.org, Matthew Wilcox , Christian Brauner , Tyler Hicks , linux-fsdevel@vger.kernel.org Subject: Re: shiftfs status and future development Message-ID: <20180615204756.GN30028@ubuntu-xps13> References: <20180614184448.GC30028@ubuntu-xps13> <20180615135638.GA29299@mail.hallyn.com> <20180615145917.GF30028@ubuntu-xps13> <20180615152529.GA23527@bombadil.infradead.org> <1529078955.4048.12.camel@HansenPartnership.com> <20180615170435.pt2qtnr762z7w634@gordon> <1529083329.4048.19.camel@HansenPartnership.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1529083329.4048.19.camel@HansenPartnership.com> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Fri, Jun 15, 2018 at 10:22:09AM -0700, James Bottomley wrote: > On Sat, 2018-06-16 at 03:04 +1000, Aleksa Sarai wrote: > > On 2018-06-15, James Bottomley > > wrote: > > > > >  - Supports any id maps possible for a user namespace > > > > > > > > Have we already ruled out storing the container's UID/GID/perms > > > > in an extended attribute, and having all the files owned by the > > > > owner of the container from the perspective of the unshifted > > > > fs.  Then shiftfs reads the xattr and presents the files with the > > > > container's idea of what the UID is? > > > > > > I've got an experimental patch set that does the *mark* as an > > > xattr.  > > > > I forgot to ask you about this when we all met face-to-face -- can > > you go over what the purpose of marking the mounts before being able > > to shifts is? When I saw your demo at LPC I was quite confused about > > what it was doing (I think you mentioned it was a security feature, > > but I must admit I didn't follow the explanation). > > OK, so the basic security problem is that an unprivileged tenant cannot > be allowed arbitrary access to both the shifted and underlying > unshifted locations because they can do writes to the shifted mount > that appear at real uid/gid 0 in the underlying unshifted location, > setting up all sorts of unpleasant threats of which suid execution is > just the most obvious one. > > My mount marking solution, which the v2 (and forthcoming v3) has is the > idea that the admin buries the real underlying location deep in a path > inaccessible (to the tenant) part of the filesystem and then exposes a > marked mount point to the tenant by doing > > mount -t shiftfs -o mark > > Then in the location we can block the potential > exploits. When the tenant is building an unprivileged container, it > can do > > mount -t shiftfs > > And the will now have the shifting in place. More generally, we can't allow an unprivileged user ns to mount any subtree with an id shift unless the context that controls that subtree (i.e. CAP_SYS_ADMIN in sb->s_user_ns) allows it. Otherwise it would be a simple matter for any user to create a user ns and make an id shifted mount of /. The marking in shiftfs is one way of solving this problem. I don't know if you saw my comments about marking earlier in the thread. Tl;dr, I think that the new mount apis in the filesystem context patches could allow an alternative to marking. I think we should be able to arrange it so that the "host" context sets up a mount fd for shiftfs mounting a sepecific subtree then passes that fd into the container. The container can then use the fd to attach the mount to its filesystem tree. This will provide all the benefits of marking without that awkward intermediate mount point. Of course those patches haven't been merged yet, but based on the discussion I've seen their prospects look good. > This scheme is ephemeral (the marked mount has to be recreated on every > boot) and rather complex, so the alternative is to add a permanent mark > to so that regular tenant access can be secured > (or even prohibited) but the tenant can still do > > mount -t shiftfs This of course would not be possible in my proposed mount fd scheme. > To get the shifting properties in the container. In this version of > the scheme, the shift mountable directory is marked with a security > xattr that is permanent (survives reboot) but requires that the > filesystem support xattrs, of course. > > The down side of the xattr scheme is that the securing against the > tenant part becomes an xattr enforced thing rather than a shiftfs > enforced thing, so it has to be an additional patch to the kernel > itself rather than being inside a self contained module. Would this work for nested containers? I guess it should be fine to allow setting that xattr for CAP_SYS_ADMIN in sb->s_user_ns, so probably so. Thanks, Seth